Nagios

URL: http://109.199.120.120:8089/nagios/ Credentials: nagiosadmin / coderz123 Container: coderz-nagios Config: /opt/coderz/configs/nagios/ Nagios provides active health monitoring for the server and all stack services. Unlike Prometheus (which collects metrics passively), Nagios actively probes every endpoint every 1–5 minutes and immediately flags anything that goes DOWN or CRITICAL.

What It Monitors

System Resources

CheckWarningCriticalInterval
CPU Load (1m avg)> 5.0> 10.01 min
Memory Usage> 85%> 95%1 min
Disk Space (/)< 20% free< 10% free5 min
Swap Usage< 20% free< 10% free5 min
Running Processes> 400> 6005 min

Web / Reverse Proxy

ServicePortPathInterval
Nginx HTTP80/1 min
Nginx HTTPS443/1 min

Monitoring Stack

ServicePortPathInterval
Grafana3000/login1 min
Prometheus9090/-/healthy1 min
Node Exporter9100/metrics2 min
cAdvisor8080/healthz2 min
Nagios (self)8089/nagios/2 min

APIs

ServicePortPathInterval
.NET API health5050/api/health1 min
.NET API items5050/api/items2 min
WebApp API8888/api/health1 min

Logging Stack

ServicePortPath / TypeInterval
Kibana5601/api/status2 min
Elasticsearch9200/_cluster/health2 min
Loki3100/loki/api/v1/status/buildinfo2 min
Logstash Beats5044TCP2 min
Logstash TCP5000TCP2 min

Orchestration & OTel

ServicePortTypeInterval
Prefect UI4200/api/health2 min
OTel gRPC4317TCP2 min
OTel HTTP4318TCP2 min

Database

ServicePortTypeInterval
pgAdmin5080/5 min
PostgreSQL5433TCP2 min
Redis6379TCP2 min
Elasticsearch Transport9300TCP5 min

Load Testing

ServicePortPathInterval
k6 Runner9000/health5 min

APISIX Gateway (Kubernetes NodePort)

ServicePortPathInterval
APISIX Gateway30080/1 min
APISIX Admin API30180/apisix/admin/routes2 min
APISIX Dashboard30900/2 min
APISIX Prometheus Metrics30091/apisix/prometheus/metrics2 min
Redis Exporter30121/metrics2 min

Documentation

ServicePortPathInterval
Mintlify Docs3333/5 min

How It Works

Nagios active probes every service


   Check result: OK / WARNING / CRITICAL / UNKNOWN

          ├── OK → green in UI, no action
          ├── WARNING → yellow, soft alert
          └── CRITICAL → red, hard alert + notification
Checks use the standard Nagios plugin set (check_http, check_tcp, check_load, check_disk, check_swap, check_procs) plus custom shell-based commands for host memory and CPU load (reading from the host /proc filesystem mounted inside the container).

Accessing the UI

Services Status Page

Go to Current Status → Services to see all service checks at a glance.
  • Green (OK) — service is up and responding normally
  • Yellow (WARNING) — service is responding but threshold exceeded
  • Red (CRITICAL) — service is down or threshold critically exceeded
  • Grey (UNKNOWN) — check could not run

Hosts Page

Go to Current Status → Hosts — shows coderz-server host status.

Tactical Overview

The Tactical Overview on the Nagios home screen shows a count summary:
  • Hosts UP/DOWN
  • Services OK/WARNING/CRITICAL
  • Scheduled downtimes

Configuration Files

All configuration is mounted from /opt/coderz/configs/nagios/ into the container:
FilePurpose
coderz-hosts.cfgHost definitions (coderz-server)
coderz-services.cfgAll service check definitions
coderz-commands.cfgCustom check commands
cgi.cfgWeb UI authorization (authorizes nagiosadmin user)

Adding a New Service Check

Edit /opt/coderz/configs/nagios/coderz-services.cfg and add:
define service {
    host_name               coderz-server
    service_description     HTTP - My New Service
    check_command           check_http_port!PORT!/PATH
    check_interval          1
    retry_interval          1
    max_check_attempts      3
    check_period            24x7
    notification_interval   10
    notification_period     24x7
    contact_groups          admins
}
Then restart Nagios:
docker compose -f /opt/coderz/docker-compose.yml restart nagios

Custom Commands

The check_http_port and check_tcp_port commands are defined in coderz-commands.cfg:
# HTTP check on a specific port and path
define command {
    command_name    check_http_port
    command_line    $USER1$/check_http -H 109.199.120.120 -p $ARG1$ -u $ARG2$ -t 10
}

# TCP port check
define command {
    command_name    check_tcp_port
    command_line    $USER1$/check_tcp -H 109.199.120.120 -p $ARG1$ -t 10
}

How Host Metrics Work

Nagios runs inside Docker but monitors real host resources by reading from the host filesystem:
MountContainer PathUsed For
Host /proc/hostfs/procCPU load, memory (/proc/meminfo, /proc/loadavg)
Host /sys/hostfs/sysSystem devices
Host //hostfs/rootDisk usage (check_disk -p /hostfs/root)
This avoids the need for NRPE agents on the host while still giving accurate system readings.

Docker Compose

nagios:
  image: jasonrivers/nagios:latest
  container_name: coderz-nagios
  ports:
    - "8089:80"
  environment:
    - NAGIOSADMIN_USER=nagiosadmin
    - NAGIOSADMIN_PASS=coderz123
  volumes:
    - ./configs/nagios:/etc/nagios4/conf.d/coderz:ro
    - ./configs/nagios/cgi.cfg:/opt/nagios/etc/cgi.cfg:ro
    - /proc:/hostfs/proc:ro
    - /sys:/hostfs/sys:ro
    - /:/hostfs/root:ro
  networks:
    - coderz-net

Comparison with Prometheus / Grafana

NagiosPrometheus + Grafana
TypeActive probingPassive scraping
Best forUp/down, reachabilityMetrics, trends, performance
AlertingBuilt-in, per-checkRule-based, threshold over time
DashboardsBasic status tablesRich graphs and histograms
Latency dataNoYes
Use Nagios to know if something is up or down. Use Grafana to understand why it’s slow or degraded.