annotate rules/rules_main.yaml @ 9:17db5e8e7a2f

big rules and scrape config updates
author drewp@bigasterisk.com
date Sun, 04 Dec 2022 02:08:08 -0800
parents config/rules_main.yaml@1eb6e6a2b9b6
children b6720e379d5b
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
1 groups:
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
2 # docs: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
3 # "Whenever the alert expression results in one or more vector
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
4 # elements at a given point in time, the alert counts as active for
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
5 # these elements' label sets."
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
6
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
7 # also https://www.metricfire.com/blog/top-5-prometheus-alertmanager-gotchas/#Missing-metrics
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
8 #
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
9 # any presence of starlette_request_duration_seconds_created{app_name="starlette",method="GET",path="/",status_code="200"} 1.6460176156784086e+09 means someone forgot to set app name
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
10
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
11 # - name: webcam
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
12 # rules:
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
13 # waiting for twinscam revival
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
14 # - alert: twinscam_not_reporting
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
15 # expr: absent(cam_pipeline_state{job="webcam-record-twinscam"})
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
16 # for: 2m
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
17 # labels:
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
18 # severity: losingData
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
19 # annotations:
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
20 # summary: "webcam-record-twinscam is not reporting metrics {{ $labels }}"
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
21
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
22 # - alert: cam_garagehall_not_reporting
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
23 # expr: absent(cam_pipeline_state{job="webcam-record-garagehall"})
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
24 # for: 2m
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
25 # labels:
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
26 # severity: losingData
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
27 # annotations:
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
28 # # summary: "webcam-record-garagehall is not reporting metrics {{ $labels }}"
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
29
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
30 # - alert: cam_pipeline_stopped
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
31 # expr: sum without (instance) (cam_pipeline_state{cam_pipeline_state="playing"}) < 1
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
32 # for: 10m
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
33 # labels:
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
34 # severity: losingData
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
35 # annotations:
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
36 # summary: "webcam-record gst pipeline is not state=playing {{ $labels }}"
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
37
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
38 # - alert: cam_not_advancing
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
39 # expr: rate(cam_stream_bytes{element="splitmux"}[3m]) < 0.2
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
40 # for: 10m
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
41 # labels:
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
42 # severity: losingData
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
43 # annotations:
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
44 # summary: "cam output bytes is advancing too slowly. {{ $labels }}"
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
45
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
46 # - alert: webcam_indexer_stalled
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
47 # expr: rate(webcam_indexer_update_count{job="webcam-indexer"}[5m]) < .01
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
48 # for: 10m
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
49 # labels:
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
50 # severity: webcamUsersAffected
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
51 # annotations:
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
52 # summary: "webcam indexer update loop is stalled"
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
53
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
54 - name: Outages
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
55 rules:
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
56 - alert: powereagleStalled
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
57 expr: rate(house_power_w[100m]) == 0
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
58 for: 0m
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
59 labels:
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
60 severity: losingData
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
61 annotations:
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
62 summary: "power eagle data stalled"
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
63 description: "logs at https://bigasterisk.com/k/clusters/local/namespaces/default/deployments/power-eagle/logs"
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
64
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
65 - alert: powereagleAbsent
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
66 expr: absent_over_time(house_power_w[5m])
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
67 for: 2m
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
68 labels:
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
69 severity: losingData
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
70 annotations:
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
71 summary: "power eagle data missing"
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
72 description: "logs at https://bigasterisk.com/k/clusters/local/namespaces/default/deployments/power-eagle/logs"
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
73
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
74 # - alert: wifi_scrape_errors
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
75 # expr: rate(poll_errors_total{job="wifi"}[2m]) > .1
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
76 # labels:
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
77 # severity: houseUsersAffected
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
78 # annotations:
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
79 # summary: "errors getting wifi users list"
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
80
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
81 # - alert: absent_mitmproxy
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
82 # expr: absent(process_resident_memory_bytes{job="mitmproxy"})
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
83 # labels:
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
84 # severity: houseUsersAffected
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
85 # annotations:
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
86 # summary: "mitmproxy metrics not responding. See https://bigasterisk.com/grafana/d/ix3hMAdMk/webfilter?orgId=1&from=now-12h&to=now and https://bigasterisk.com/k/clusters/local/namespaces/default/deployments/mitmproxy (metrics actually come from webfilter.py plugin)"
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
87
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
88 - alert: absent_zigbee_dash
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
89 expr: absent(container_last_seen{container="zigbee2mqtt-dash"})
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
90
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
91 - alert: net_routes_sync
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
92 expr: rate(starlette_request_duration_seconds_count{app_name="net_routes",path="/routes"}[5m]) < 1/70
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
93 for: 10m
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
94 labels:
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
95 severity: houseUsersAffected
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
96 annotations:
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
97 summary: "net_routes is not getting regular updates"
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
98
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
99
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
100 - name: alerts
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
101 rules:
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
102 - {alert: housePower, for: 24h, labels: {severity: waste}, expr: "house_power_w > 4000", annotations: {summary: "house power usage over 3KW {{ $labels }}"}}
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
103 - {alert: disk1, for: 20m, labels: {severity: warning}, expr: 'disk_free{path=~"/(d[1-9])?"} < 20G', annotations: {summary: "low disk_free {{ $labels }}"}}
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
104 - {alert: disk2, for: 20m, labels: {severity: warning}, expr: 'disk_free{path="/stor6/my"} < 100G', annotations: {summary: "low disk_free {{ $labels }}"}}
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
105 - {alert: disk3, for: 20m, labels: {severity: warning}, expr: 'round(increase(disk_used{fstype="zfs",path=~"^/stor6.*"}[1w]) / 1M) > 500', annotations: {summary: "high mb/week on zfs dir {{ $labels }}"}}
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
106 - {alert: oom, for: 1m, labels: {severity: warning}, expr: 'predict_linear(mem_free[5m], 5m) / 1M < 100', annotations: {summary: "host about to run OOM {{ $labels }}"}}
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
107 - {alert: high_logging, for: 20m, labels: {severity: waste}, expr: 'sum by (container) (rate(kubelet_container_log_filesystem_used_bytes[30m])) > 30000', annotations: {summary: "high log output rate {{ $labels }}"}}
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
108 - {alert: stale_process, for: 1d, labels: {severity: dataRisk}, expr: 'round((time() - filestat_modification_time/1e9) / 86400) > 14', annotations: {summary: "process time is old {{ $labels }}"}}
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
109 - {alert: starlette, for: 1m, labels: {severity: fix}, expr: 'starlette_request_duration_seconds_created{app_name="starlette"}', annotations: {summary: "set starlette app name {{ $labels }}"}}
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
110 - alert: ssl_certs_expiring_soon
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
111 expr: min((min_over_time(probe_ssl_earliest_cert_expiry[1d])-time())/86400) < 10
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
112 labels:
9
17db5e8e7a2f big rules and scrape config updates
drewp@bigasterisk.com
parents: 4
diff changeset
113 severity: warning
4
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
114 annotations:
1eb6e6a2b9b6 version control configs finally; use configmaps to present them to VM
drewp@bigasterisk.com
parents:
diff changeset
115 summary: "cert expiring soon. See https://bigasterisk.com/grafana/d/z1YtDa3Gz/certs?orgId=1\nVALUE = {{ $value }}\n LABELS = {{ $labels }}"