Rules

mail

57.427s ago

427.3us

Rule State Error Last Evaluation Evaluation Time
alert: PostfixDown expr: node_systemd_unit_state{name="postfix.service",state="active"} == 0 for: 2m labels: service: postfix severity: critical annotations: description: Postfix mail service has been inactive for more than 2 minutes summary: Postfix is down on {{ $labels.instance }} ok 57.427s ago 257.7us
alert: DovecotDown expr: node_systemd_unit_state{name="dovecot.service",state="active"} == 0 for: 2m labels: service: dovecot severity: critical annotations: description: Dovecot IMAP service has been inactive for more than 2 minutes summary: Dovecot is down on {{ $labels.instance }} ok 57.427s ago 67.63us
alert: PostfixMailQueueGrowing expr: node_postfix_queue_size > 50 for: 15m labels: service: postfix severity: warning annotations: description: 'Mail queue has {{ $value }} messages (threshold: 50)' summary: Postfix mail queue growing on {{ $labels.instance }} ok 57.427s ago 45.89us
alert: PostfixMailQueueCritical expr: node_postfix_queue_size > 200 for: 5m labels: service: postfix severity: critical annotations: description: Mail queue has {{ $value }} messages — possible delivery failure summary: Postfix mail queue critical on {{ $labels.instance }} ok 57.427s ago 40.18us

system

15.343s ago

7.632ms

Rule State Error Last Evaluation Evaluation Time
alert: HighCPULoad expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85 for: 5m labels: severity: warning annotations: description: 'CPU usage is {{ printf "%.1f" $value }}% (threshold: 85%)' summary: High CPU load on {{ $labels.instance }} ok 15.343s ago 663.4us
alert: CriticalCPULoad expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 95 for: 2m labels: severity: critical annotations: description: 'CPU usage is {{ printf "%.1f" $value }}% (threshold: 95%)' summary: Critical CPU load on {{ $labels.instance }} ok 15.342s ago 383.6us
alert: HighMemoryUsage expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85 for: 5m labels: severity: warning annotations: description: 'Memory usage is {{ printf "%.1f" $value }}% (threshold: 85%)' summary: High memory usage on {{ $labels.instance }} ok 15.342s ago 314.9us
alert: CriticalMemoryUsage expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 95 for: 2m labels: severity: critical annotations: description: Memory usage is {{ printf "%.1f" $value }}% summary: Critical memory usage on {{ $labels.instance }} ok 15.342s ago 295.7us
alert: DiskSpaceWarning expr: (1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs"} / node_filesystem_size_bytes{fstype!~"tmpfs|fuse.lxcfs"})) * 100 > 80 for: 5m labels: severity: warning annotations: description: Disk {{ $labels.mountpoint }} is {{ printf "%.1f" $value }}% full summary: Disk space warning on {{ $labels.instance }} ok 15.342s ago 522us
alert: DiskSpaceCritical expr: (1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs"} / node_filesystem_size_bytes{fstype!~"tmpfs|fuse.lxcfs"})) * 100 > 90 for: 2m labels: severity: critical annotations: description: Disk {{ $labels.mountpoint }} is {{ printf "%.1f" $value }}% full summary: Critical disk space on {{ $labels.instance }} ok 15.341s ago 514.1us
alert: DiskWillFillIn24h expr: predict_linear(node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs"}[6h], 24 * 3600) < 0 for: 30m labels: severity: warning annotations: description: '{{ $labels.mountpoint }} is predicted to be full within 24 hours' summary: Disk will fill within 24h on {{ $labels.instance }} ok 15.341s ago 1.134ms
alert: HighLoadAverage expr: node_load15 / on (instance) count by (instance) (node_cpu_seconds_total{mode="idle"}) > 0.8 for: 10m labels: severity: warning annotations: description: 15-minute load average per CPU is {{ printf "%.2f" $value }} summary: High load average on {{ $labels.instance }} ok 15.34s ago 275.3us
alert: ServerDown expr: up == 0 for: 1m labels: severity: critical annotations: description: '{{ $labels.instance }} ({{ $labels.job }}) has been unreachable for more than 1 minute' summary: Instance {{ $labels.instance }} is down ok 15.34s ago 409.9us
alert: UnexpectedReboot expr: node_time_seconds - node_boot_time_seconds < 300 labels: severity: warning annotations: description: Server has been up for less than 5 minutes — possible unexpected reboot summary: 'Server rebooted: {{ $labels.instance }}' ok 15.339s ago 229us
alert: SystemdServiceFailed expr: node_systemd_unit_state{state="failed"} == 1 for: 2m labels: severity: warning annotations: description: Service {{ $labels.name }} is in failed state summary: Systemd service failed on {{ $labels.instance }} ok 15.339s ago 2.85ms

web

22.569s ago

1.011ms

Rule State Error Last Evaluation Evaluation Time
alert: NginxDown expr: node_systemd_unit_state{name="nginx.service",state="active"} == 0 for: 1m labels: service: nginx severity: critical annotations: description: Nginx web server has been inactive for more than 1 minute summary: Nginx is down on {{ $labels.instance }} ok 22.569s ago 488.1us
alert: ApacheDown expr: node_systemd_unit_state{name="apache2.service",state="active"} == 0 for: 1m labels: service: apache2 severity: critical annotations: description: Apache web server has been inactive for more than 1 minute summary: Apache is down on {{ $labels.instance }} ok 22.568s ago 194.2us
alert: SSLCertExpiringWarning expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30 for: 1h labels: severity: warning annotations: description: Certificate expires in {{ $value | humanizeDuration }} summary: 'SSL certificate expiring soon: {{ $labels.instance }}' ok 22.568s ago 161.8us
alert: SSLCertExpiringCritical expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 7 for: 1h labels: severity: critical annotations: description: Certificate expires in {{ $value | humanizeDuration }} — renew immediately summary: 'SSL certificate expiring in 7 days: {{ $labels.instance }}' ok 22.568s ago 145.7us