Nagios
URL: http://109.199.120.120:8089/nagios/ Credentials: nagiosadmin / coderz123 Container:coderz-nagios
Config: /opt/coderz/configs/nagios/
Nagios provides active health monitoring for the server and all stack services. Unlike Prometheus (which collects metrics passively), Nagios actively probes every endpoint every 1–5 minutes and immediately flags anything that goes DOWN or CRITICAL.
What It Monitors
System Resources
| Check | Warning | Critical | Interval |
|---|---|---|---|
| CPU Load (1m avg) | > 5.0 | > 10.0 | 1 min |
| Memory Usage | > 85% | > 95% | 1 min |
| Disk Space (/) | < 20% free | < 10% free | 5 min |
| Swap Usage | < 20% free | < 10% free | 5 min |
| Running Processes | > 400 | > 600 | 5 min |
Web / Reverse Proxy
| Service | Port | Path | Interval |
|---|---|---|---|
| Nginx HTTP | 80 | / | 1 min |
| Nginx HTTPS | 443 | / | 1 min |
Monitoring Stack
| Service | Port | Path | Interval |
|---|---|---|---|
| Grafana | 3000 | /login | 1 min |
| Prometheus | 9090 | /-/healthy | 1 min |
| Node Exporter | 9100 | /metrics | 2 min |
| cAdvisor | 8080 | /healthz | 2 min |
| Nagios (self) | 8089 | /nagios/ | 2 min |
APIs
| Service | Port | Path | Interval |
|---|---|---|---|
| .NET API health | 5050 | /api/health | 1 min |
| .NET API items | 5050 | /api/items | 2 min |
| WebApp API | 8888 | /api/health | 1 min |
Logging Stack
| Service | Port | Path / Type | Interval |
|---|---|---|---|
| Kibana | 5601 | /api/status | 2 min |
| Elasticsearch | 9200 | /_cluster/health | 2 min |
| Loki | 3100 | /loki/api/v1/status/buildinfo | 2 min |
| Logstash Beats | 5044 | TCP | 2 min |
| Logstash TCP | 5000 | TCP | 2 min |
Orchestration & OTel
| Service | Port | Type | Interval |
|---|---|---|---|
| Prefect UI | 4200 | /api/health | 2 min |
| OTel gRPC | 4317 | TCP | 2 min |
| OTel HTTP | 4318 | TCP | 2 min |
Database
| Service | Port | Type | Interval |
|---|---|---|---|
| pgAdmin | 5080 | / | 5 min |
| PostgreSQL | 5433 | TCP | 2 min |
| Redis | 6379 | TCP | 2 min |
| Elasticsearch Transport | 9300 | TCP | 5 min |
Load Testing
| Service | Port | Path | Interval |
|---|---|---|---|
| k6 Runner | 9000 | /health | 5 min |
APISIX Gateway (Kubernetes NodePort)
| Service | Port | Path | Interval |
|---|---|---|---|
| APISIX Gateway | 30080 | / | 1 min |
| APISIX Admin API | 30180 | /apisix/admin/routes | 2 min |
| APISIX Dashboard | 30900 | / | 2 min |
| APISIX Prometheus Metrics | 30091 | /apisix/prometheus/metrics | 2 min |
| Redis Exporter | 30121 | /metrics | 2 min |
Documentation
| Service | Port | Path | Interval |
|---|---|---|---|
| Mintlify Docs | 3333 | / | 5 min |
How It Works
check_http, check_tcp, check_load, check_disk, check_swap, check_procs) plus custom shell-based commands for host memory and CPU load (reading from the host /proc filesystem mounted inside the container).
Accessing the UI
Services Status Page
Go to Current Status → Services to see all service checks at a glance.- Green (OK) — service is up and responding normally
- Yellow (WARNING) — service is responding but threshold exceeded
- Red (CRITICAL) — service is down or threshold critically exceeded
- Grey (UNKNOWN) — check could not run
Hosts Page
Go to Current Status → Hosts — showscoderz-server host status.
Tactical Overview
The Tactical Overview on the Nagios home screen shows a count summary:- Hosts UP/DOWN
- Services OK/WARNING/CRITICAL
- Scheduled downtimes
Configuration Files
All configuration is mounted from/opt/coderz/configs/nagios/ into the container:
| File | Purpose |
|---|---|
coderz-hosts.cfg | Host definitions (coderz-server) |
coderz-services.cfg | All service check definitions |
coderz-commands.cfg | Custom check commands |
cgi.cfg | Web UI authorization (authorizes nagiosadmin user) |
Adding a New Service Check
Edit/opt/coderz/configs/nagios/coderz-services.cfg and add:
Custom Commands
Thecheck_http_port and check_tcp_port commands are defined in coderz-commands.cfg:
How Host Metrics Work
Nagios runs inside Docker but monitors real host resources by reading from the host filesystem:| Mount | Container Path | Used For |
|---|---|---|
Host /proc | /hostfs/proc | CPU load, memory (/proc/meminfo, /proc/loadavg) |
Host /sys | /hostfs/sys | System devices |
Host / | /hostfs/root | Disk usage (check_disk -p /hostfs/root) |
Docker Compose
Comparison with Prometheus / Grafana
| Nagios | Prometheus + Grafana | |
|---|---|---|
| Type | Active probing | Passive scraping |
| Best for | Up/down, reachability | Metrics, trends, performance |
| Alerting | Built-in, per-check | Rule-based, threshold over time |
| Dashboards | Basic status tables | Rich graphs and histograms |
| Latency data | No | Yes |