Sysdig Platform CLI - Alerts
This section explains concepts and notations in the set of the Monitor Alerts commands provided.
Usage
The Monitor Alerts section allows the user to perform basic CRUD actions:
$ sdc-cli alert --help
Usage: sdc-cli alert [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
add Add an alert
add-json Add an alert from a json file
del Delete alerts
get Get alert
list List all alerts
update Update an alert
List Alerts
To list the alerts, you can use the list
subcommand:
$ sdc-cli alert list
id name type enabled severityLabel
2119842 [APM] HTTP Status 500 MANUAL False LOW
2119819 [APM] Service with HTTP error count warning MANUAL False LOW
2119839 [Cloud] AWS CPU credits low MANUAL True LOW
2119844 [Cloud] LoadBalancer connection errors MANUAL True LOW
2119843 [Cloud] LoadBalancer response times MANUAL True LOW
2119831 [Kubernetes] Deployment degraded MANUAL False LOW
2119837 [Kubernetes] Deployment might be stuck MANUAL False LOW
2119830 [Kubernetes] Deployment with no Pods running MANUAL False MEDIUM
2119827 [Kubernetes] Failed to pull image EVENT False MEDIUM
2119828 [Kubernetes] Liveness probe failed EVENT False LOW
2119835 [Kubernetes] Node network unavailable MANUAL False LOW
2119834 [Kubernetes] Node out of disk MANUAL False LOW
2119833 [Kubernetes] Node under disk pressure MANUAL False LOW
2119836 [Kubernetes] Node under memory pressure MANUAL False LOW
2119829 [Kubernetes] Readiness probe failed EVENT False LOW
2119832 [Kubernetes] Service with no running containers MANUAL False LOW
2119820 [System] Filesystem device full warning MANUAL False LOW
2119823 [System] Host has been restarted MANUAL True LOW
2119826 [System] Host is down MANUAL True LOW
2119822 [System] Lack of free swap space warning MANUAL False LOW
2119818 [System] Network error warning MANUAL False LOW
2119840 [System] Node Load Alert AWS MANUAL True LOW
2119841 [System] Node Load Alert GKE MANUAL True LOW
2119825 [System] Out of free inodes warning MANUAL True LOW
2119821 [System] Out of memory warning MANUAL True LOW
2119838 [System] Rate of change of disk usage MANUAL True LOW
2119824 [System] Root volume disk usage warning MANUAL True LOW
Retrieve more info from an alert
You can retrieve more information from an alert, using the get
subcommand, and specifying
either the ID of the alert, or the name.
$ sdc-cli alert get 2119834
id: 2119834
name: [Kubernetes] Node out of disk
type: MANUAL
enabled: False
severityLabel: LOW
segmentBy: ['kubernetes.node.name']
segmentCondition:
{
"type": "ANY"
}
condition: avg(avg(kubernetes.node.outOfDisk)) != 0
timespan: 600000000
$ sdc-cli alert get '[Kubernetes] Deployment degraded'
id: 2119831
name: [Kubernetes] Deployment degraded
type: MANUAL
enabled: False
severityLabel: LOW
segmentBy: ['kubernetes.deployment.name', 'kubernetes.namespace.name']
segmentCondition:
{
"type": "ANY"
}
condition: avg(timeAvg(kubernetes.deployment.replicas.available)) < avg(timeAvg(kubernetes.deployment.replicas.desired))
timespan: 600000000
Add Alert
You can add alerts either with parameters, or with a JSON file result from get
subcommand.
$ sdc-cli alert add --help
Usage: sdc-cli alert add [OPTIONS] NAME
Options:
--description TEXT The alert description. This will appear in the
Sysdig Monitor UI and in notification emails.
--severity INTEGER syslog-encoded alert severity. This is a number
from 0 to 7 where 0 means 'emergency' and 7 is
'debug'. (by default is 4)
--atleast INTEGER the number of consecutive seconds the condition
must be satisfied for the alert to fire. (by
default is 600)
--condition TEXT the alert condition, as described here https://app
.sysdigcloud.com/apidocs/#!/Alerts/post_api_alerts
--disable The alert will be disabled when created. (by
default is enabled)
--segment TEXT A segmentation criteria that can be used to apply
the alert to multiple entities.
--segment-condition TEXT When segment is specified (and therefore the alert
will cover multiple entities) this field is used
to determine when it will fire. In particular, you
have two options for segment-condition: 'ANY' (the
alert will fire when at least one of the monitored
entities satisfies the condition) and 'ALL' (the
alert will fire when all of the monitored entities
satisfy the condition).
--user-filter TEXT a boolean expression combining Sysdig Monitor
segmentation criteria that makes it possible to
reduce the scope of the alert. For example:
kubernetes.namespace.name='production' and
container.image='nginx'
--notify TEXT the type of notification you want this alert to
generate. Options are 'EMAIL', 'SNS',
'PAGER_DUTY', 'SYSDIG_DUMP'
--annotation TEXT a pair 'key=value' custom property that you can
associate to this alert for automation or
management reasons
--promql Define if the alert to be created follows the
PromQL syntax
--help Show this message and exit.
For example, let’s add an alert to detect a CrashLoopBackOff in the last 5 minutes:
$ sdc-cli alert add \
--description "CrashLoopBackOff detected" \
--severity 2 \
--atleast 300 \
--condition 'sum(avg(kubernetes.pod.restart.count)) > 1' \
'[Kubernetes] Pod crash/restart loop'
id: 2124028
name: [Kubernetes] Pod crash/restart loop
description: CrashLoopBackOff detected
type: MANUAL
enabled: True
severityLabel: MEDIUM
condition: sum(avg(kubernetes.pod.restart.count)) > 1
timespan: 300000000
Update alert
The alert update can only be used with a JSON file, retrieved from the get
subcommand.
$ sdc-cli --json alert get '[System] Out of memory warning' > alert.json
$ vim alert.json # Edit the alert in the file
$ sdc-cli alert update alert.json
id: 2119821
name: [System] Out of memory warning
description: Out of memory warning
type: MANUAL
enabled: False
severityLabel: LOW
segmentBy: ['host.hostName']
segmentCondition:
{
"type": "ANY"
}
condition: avg(avg(memory.bytes.available)) < 419430400
timespan: 300000000
Remove an alert
You can also remove any alert specifying it’s ID:
$ sdc-cli alert del 2124028
Success