Sysdig Platform CLI - Alerts

This section explains concepts and notations in the set of the Monitor Alerts commands provided.

Usage

The Monitor Alerts section allows the user to perform basic CRUD actions:

$ sdc-cli alert --help
Usage: sdc-cli alert [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  add       Add an alert
  add-json  Add an alert from a json file
  del       Delete alerts
  get       Get alert
  list      List all alerts
  update    Update an alert

List Alerts

To list the alerts, you can use the list subcommand:

$ sdc-cli alert list
id             name                                                   type          enabled        severityLabel        
2119842        [APM] HTTP Status 500                                  MANUAL        False          LOW                  
2119819        [APM] Service with HTTP error count warning            MANUAL        False          LOW                  
2119839        [Cloud] AWS CPU credits low                            MANUAL        True           LOW                  
2119844        [Cloud] LoadBalancer connection errors                 MANUAL        True           LOW                  
2119843        [Cloud] LoadBalancer response times                    MANUAL        True           LOW                  
2119831        [Kubernetes] Deployment degraded                       MANUAL        False          LOW                  
2119837        [Kubernetes] Deployment might be stuck                 MANUAL        False          LOW                  
2119830        [Kubernetes] Deployment with no Pods running           MANUAL        False          MEDIUM               
2119827        [Kubernetes] Failed to pull image                      EVENT         False          MEDIUM               
2119828        [Kubernetes] Liveness probe failed                     EVENT         False          LOW                  
2119835        [Kubernetes] Node network unavailable                  MANUAL        False          LOW                  
2119834        [Kubernetes] Node out of disk                          MANUAL        False          LOW                  
2119833        [Kubernetes] Node under disk pressure                  MANUAL        False          LOW                  
2119836        [Kubernetes] Node under memory pressure                MANUAL        False          LOW                  
2119829        [Kubernetes] Readiness probe failed                    EVENT         False          LOW                  
2119832        [Kubernetes] Service with no running containers        MANUAL        False          LOW                  
2119820        [System] Filesystem device full warning                MANUAL        False          LOW                  
2119823        [System] Host has been restarted                       MANUAL        True           LOW                  
2119826        [System] Host is down                                  MANUAL        True           LOW                  
2119822        [System] Lack of free swap space warning               MANUAL        False          LOW                  
2119818        [System] Network error warning                         MANUAL        False          LOW                  
2119840        [System] Node Load Alert AWS                           MANUAL        True           LOW                  
2119841        [System] Node Load Alert GKE                           MANUAL        True           LOW                  
2119825        [System] Out of free inodes warning                    MANUAL        True           LOW                  
2119821        [System] Out of memory warning                         MANUAL        True           LOW                  
2119838        [System] Rate of change of disk usage                  MANUAL        True           LOW                  
2119824        [System] Root volume disk usage warning                MANUAL        True           LOW                  

Retrieve more info from an alert

You can retrieve more information from an alert, using the get subcommand, and specifying either the ID of the alert, or the name.

$ sdc-cli alert get 2119834 
id:                       2119834
name:                     [Kubernetes] Node out of disk
type:                     MANUAL
enabled:                  False
severityLabel:            LOW
segmentBy:                ['kubernetes.node.name']
segmentCondition:
    {
      "type": "ANY"
    }
condition:                avg(avg(kubernetes.node.outOfDisk)) != 0
timespan:                 600000000
$ sdc-cli alert get '[Kubernetes] Deployment degraded'
id:                       2119831
name:                     [Kubernetes] Deployment degraded
type:                     MANUAL
enabled:                  False
severityLabel:            LOW
segmentBy:                ['kubernetes.deployment.name', 'kubernetes.namespace.name']
segmentCondition:
    {
      "type": "ANY"
    }
condition:                avg(timeAvg(kubernetes.deployment.replicas.available)) < avg(timeAvg(kubernetes.deployment.replicas.desired))
timespan:                 600000000

Add Alert

You can add alerts either with parameters, or with a JSON file result from get subcommand.

$ sdc-cli alert add --help
Usage: sdc-cli alert add [OPTIONS] NAME

Options:
  --description TEXT        The alert description. This will appear in the
                            Sysdig Monitor UI and in notification emails.
  --severity INTEGER        syslog-encoded alert severity. This is a number
                            from 0 to 7 where 0 means 'emergency' and 7 is
                            'debug'. (by default is 4)
  --atleast INTEGER         the number of consecutive seconds the condition
                            must be satisfied for the alert to fire. (by
                            default is 600)
  --condition TEXT          the alert condition, as described here https://app
                            .sysdigcloud.com/apidocs/#!/Alerts/post_api_alerts
  --disable                 The alert will be disabled when created. (by
                            default is enabled)
  --segment TEXT            A segmentation criteria that can be used to apply
                            the alert to multiple entities.
  --segment-condition TEXT  When segment is specified (and therefore the alert
                            will cover multiple entities) this field is used
                            to determine when it will fire. In particular, you
                            have two options for segment-condition: 'ANY' (the
                            alert will fire when at least one of the monitored
                            entities satisfies the condition) and 'ALL' (the
                            alert will fire when all of the monitored entities
                            satisfy the condition).
  --user-filter TEXT        a boolean expression combining Sysdig Monitor
                            segmentation criteria that makes it possible to
                            reduce the scope of the alert. For example:
                            kubernetes.namespace.name='production' and
                            container.image='nginx'
  --notify TEXT             the type of notification you want this alert to
                            generate. Options are 'EMAIL', 'SNS',
                            'PAGER_DUTY', 'SYSDIG_DUMP'
  --annotation TEXT         a pair 'key=value' custom property that you can
                            associate to this alert for automation or
                            management reasons
  --promql                  Define if the alert to be created follows the
                            PromQL syntax
  --help                    Show this message and exit.

For example, let’s add an alert to detect a CrashLoopBackOff in the last 5 minutes:

$ sdc-cli alert add \
        --description "CrashLoopBackOff detected" \
        --severity 2 \
        --atleast 300 \
        --condition 'sum(avg(kubernetes.pod.restart.count)) > 1' \
        '[Kubernetes] Pod crash/restart loop'

id:                       2124028
name:                     [Kubernetes] Pod crash/restart loop
description:              CrashLoopBackOff detected
type:                     MANUAL
enabled:                  True
severityLabel:            MEDIUM
condition:                sum(avg(kubernetes.pod.restart.count)) > 1
timespan:                 300000000

Update alert

The alert update can only be used with a JSON file, retrieved from the get subcommand.

$ sdc-cli --json alert get '[System] Out of memory warning' > alert.json
$ vim alert.json # Edit the alert in the file
$ sdc-cli alert update alert.json 
id:                       2119821
name:                     [System] Out of memory warning
description:              Out of memory warning
type:                     MANUAL
enabled:                  False
severityLabel:            LOW
segmentBy:                ['host.hostName']
segmentCondition:
    {
      "type": "ANY"
    }
condition:                avg(avg(memory.bytes.available)) < 419430400
timespan:                 300000000

Remove an alert

You can also remove any alert specifying it’s ID:

$ sdc-cli alert del 2124028
Success