FreeBSD.software
Home/Guides/FreeBSD Monitoring Stack: Complete Production Setup
guide·2026-04-09·9 min read

FreeBSD Monitoring Stack: Complete Production Setup

Build a complete monitoring stack on FreeBSD: Prometheus + Grafana + Alertmanager + Loki for logs, with node_exporter, ZFS dashboards, jail monitoring, and PagerDuty integration.

FreeBSD Monitoring Stack: Complete Production Setup

This guide builds a full production monitoring stack on FreeBSD: Prometheus for metrics collection, Grafana for dashboards, Alertmanager for notifications, Loki for log aggregation, and the exporters that tie it all together. Every component runs natively on FreeBSD. No Docker. No containers. Just packages, rc.conf, and configuration files.

For the basic Prometheus and Grafana setup, see Prometheus and Grafana on FreeBSD. For general monitoring concepts, see FreeBSD Server Monitoring Guide. For a comparison of monitoring tools, see Best Monitoring Tools for FreeBSD.

Architecture Overview

The monitoring stack has four layers:

  1. Exporters: Run on every monitored host. Expose metrics as HTTP endpoints.
  2. Prometheus: Scrapes metrics from exporters on a schedule. Stores time-series data. Evaluates alerting rules.
  3. Alertmanager: Receives alerts from Prometheus. Deduplicates, groups, and routes them to notification channels (email, PagerDuty, Slack).
  4. Grafana: Visualizes metrics from Prometheus and logs from Loki.
  5. Loki + Promtail: Collects and queries log data (the logging layer).

All components can run on a single FreeBSD server for small deployments (under 50 monitored hosts) or be distributed across multiple hosts for scale.

Installing the Stack

sh
# Core components pkg install prometheus pkg install grafana10 pkg install alertmanager pkg install loki pkg install promtail # Exporters pkg install node_exporter pkg install blackbox_exporter

Enable services:

sh
sysrc prometheus_enable="YES" sysrc grafana_enable="YES" sysrc alertmanager_enable="YES" sysrc loki_enable="YES" sysrc promtail_enable="YES" sysrc node_exporter_enable="YES"

Prometheus Configuration

The main configuration file is /usr/local/etc/prometheus.yml:

sh
cat > /usr/local/etc/prometheus.yml << 'CONF' global: scrape_interval: 15s evaluation_interval: 15s scrape_timeout: 10s alerting: alertmanagers: - static_configs: - targets: - localhost:9093 rule_files: - "/usr/local/etc/prometheus/rules/*.yml" scrape_configs: # Prometheus itself - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] # Node exporter on all hosts - job_name: "node" static_configs: - targets: - "localhost:9100" - "server2:9100" - "server3:9100" labels: env: "production" # Blackbox exporter (HTTP probes) - job_name: "blackbox-http" metrics_path: /probe params: module: [http_2xx] static_configs: - targets: - "https://example.com" - "https://api.example.com" relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: localhost:9115 CONF

Storage Configuration

Prometheus stores data locally by default. For production, tune the retention:

sh
# In /etc/rc.conf, add flags sysrc prometheus_args="--storage.tsdb.retention.time=90d --storage.tsdb.retention.size=50GB --web.enable-lifecycle"

Storage sizing: Prometheus uses about 1-2 bytes per sample. With 500 time series scraped every 15 seconds, that is about 2.5 GB per month.

Service Discovery

For environments with many hosts, use file-based service discovery instead of static configs:

sh
# In prometheus.yml scrape_configs: - job_name: "node" file_sd_configs: - files: - "/usr/local/etc/prometheus/targets/*.json" refresh_interval: 5m

Create target files:

sh
mkdir -p /usr/local/etc/prometheus/targets cat > /usr/local/etc/prometheus/targets/production.json << 'JSON' [ { "targets": ["server1:9100", "server2:9100", "server3:9100"], "labels": { "env": "production", "datacenter": "us-east" } } ] JSON

Start Prometheus:

sh
service prometheus start # Verify it is running fetch -qo - http://localhost:9090/-/healthy # Should return "Prometheus Server is Healthy."

Node Exporter

node_exporter exposes system metrics (CPU, memory, disk, network) in Prometheus format.

Configuration

sh
# FreeBSD-specific node_exporter flags sysrc node_exporter_args="--collector.cpu --collector.meminfo --collector.diskstats --collector.netdev --collector.loadavg --collector.zfs --collector.uname"

The ZFS collector (--collector.zfs) is critical for FreeBSD -- it exposes pool health, ARC statistics, and dataset metrics.

sh
service node_exporter start # Verify fetch -qo - http://localhost:9100/metrics | head -20

Key Metrics for FreeBSD

Important metrics exposed by node_exporter on FreeBSD:

shell
# CPU node_cpu_seconds_total # Memory node_memory_free_bytes node_memory_active_bytes node_memory_inactive_bytes node_memory_wired_bytes # Disk node_disk_read_bytes_total node_disk_written_bytes_total node_disk_io_time_seconds_total # Network node_network_receive_bytes_total node_network_transmit_bytes_total # ZFS node_zfs_arc_size node_zfs_arc_hits_total node_zfs_arc_misses_total

Alerting Rules

Create alert rules:

sh
mkdir -p /usr/local/etc/prometheus/rules cat > /usr/local/etc/prometheus/rules/system.yml << 'CONF' groups: - name: system rules: - alert: HighCPUUsage expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85 for: 10m labels: severity: warning annotations: summary: "High CPU usage on {{ $labels.instance }}" description: "CPU usage is {{ $value | printf \"%.1f\" }}% for 10+ minutes." - alert: HighMemoryUsage expr: (1 - node_memory_free_bytes / node_memory_size_bytes) * 100 > 90 for: 5m labels: severity: warning annotations: summary: "High memory usage on {{ $labels.instance }}" - alert: DiskSpaceLow expr: (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 > 85 for: 5m labels: severity: warning annotations: summary: "Disk space low on {{ $labels.instance }}" - alert: HostDown expr: up == 0 for: 2m labels: severity: critical annotations: summary: "Host {{ $labels.instance }} is down" - name: zfs rules: - alert: ZFSPoolDegraded expr: node_zfs_zpool_state{state!="online"} > 0 for: 1m labels: severity: critical annotations: summary: "ZFS pool is degraded on {{ $labels.instance }}" - alert: ZFSARCHitRateLow expr: rate(node_zfs_arc_hits_total[5m]) / (rate(node_zfs_arc_hits_total[5m]) + rate(node_zfs_arc_misses_total[5m])) < 0.85 for: 15m labels: severity: warning annotations: summary: "ZFS ARC hit rate is low on {{ $labels.instance }}" CONF

Validate the rules:

sh
promtool check rules /usr/local/etc/prometheus/rules/system.yml

Alertmanager Configuration

sh
cat > /usr/local/etc/alertmanager/alertmanager.yml << 'CONF' global: resolve_timeout: 5m route: group_by: ['alertname', 'instance'] group_wait: 30s group_interval: 5m repeat_interval: 4h receiver: 'default' routes: - match: severity: critical receiver: 'pagerduty' repeat_interval: 1h - match: severity: warning receiver: 'email' repeat_interval: 4h receivers: - name: 'default' email_configs: - to: 'ops@example.com' from: 'alertmanager@example.com' smarthost: 'smtp.example.com:587' auth_username: 'alertmanager@example.com' auth_password: 'smtp-password' - name: 'email' email_configs: - to: 'ops@example.com' from: 'alertmanager@example.com' smarthost: 'smtp.example.com:587' auth_username: 'alertmanager@example.com' auth_password: 'smtp-password' - name: 'pagerduty' pagerduty_configs: - service_key: 'your-pagerduty-integration-key' severity: '{{ .CommonLabels.severity }}' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'instance'] CONF

Validate and start:

sh
amtool check-config /usr/local/etc/alertmanager/alertmanager.yml service alertmanager start

Slack Integration

Replace or add a Slack receiver:

sh
receivers: - name: 'slack' slack_configs: - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL' channel: '#alerts' title: '{{ .CommonAnnotations.summary }}' text: '{{ .CommonAnnotations.description }}'

Grafana Configuration

sh
service grafana start # Grafana runs on port 3000 by default # Default credentials: admin / admin # Change the password on first login

Add Prometheus as a Data Source

Navigate to Grafana (http://localhost:3000) > Configuration > Data Sources > Add data source:

  • Type: Prometheus
  • URL: http://localhost:9090
  • Access: Server (default)
  • Save & Test

FreeBSD System Dashboard

Import a dashboard or create one. Key panels for a FreeBSD system dashboard:

CPU Usage Panel (PromQL):

shell
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory Usage Panel:

shell
(1 - node_memory_free_bytes / node_memory_size_bytes) * 100

ZFS ARC Size:

shell
node_zfs_arc_size

ZFS ARC Hit Rate:

shell
rate(node_zfs_arc_hits_total[5m]) / (rate(node_zfs_arc_hits_total[5m]) + rate(node_zfs_arc_misses_total[5m]))

Disk I/O:

shell
rate(node_disk_read_bytes_total[5m]) rate(node_disk_written_bytes_total[5m])

Network Traffic:

shell
rate(node_network_receive_bytes_total{device!="lo0"}[5m]) * 8 rate(node_network_transmit_bytes_total{device!="lo0"}[5m]) * 8

Provisioning Dashboards as Code

Store dashboard JSON in /usr/local/etc/grafana/provisioning/dashboards/:

sh
mkdir -p /usr/local/etc/grafana/provisioning/dashboards cat > /usr/local/etc/grafana/provisioning/dashboards/default.yml << 'CONF' apiVersion: 1 providers: - name: 'default' orgId: 1 folder: '' type: file disableDeletion: false editable: true options: path: /usr/local/etc/grafana/provisioning/dashboards foldersFromFilesStructure: false CONF

Loki: Log Aggregation

Loki is Prometheus-like but for logs. It indexes log metadata (labels) but not the log content, keeping storage costs low.

Loki Configuration

sh
cat > /usr/local/etc/loki-local-config.yaml << 'CONF' auth_enabled: false server: http_listen_port: 3100 common: path_prefix: /var/db/loki storage: filesystem: chunks_directory: /var/db/loki/chunks rules_directory: /var/db/loki/rules replication_factor: 1 ring: kvstore: store: inmemory schema_config: configs: - from: 2024-01-01 store: tsdb object_store: filesystem schema: v13 index: prefix: index_ period: 24h limits_config: retention_period: 720h storage_config: filesystem: directory: /var/db/loki/chunks compactor: working_directory: /var/db/loki/compactor compaction_interval: 10m retention_enabled: true retention_delete_delay: 2h retention_delete_worker_count: 150 CONF mkdir -p /var/db/loki/chunks /var/db/loki/rules /var/db/loki/compactor chown -R loki:loki /var/db/loki
sh
service loki start

Promtail Configuration

Promtail is the log shipping agent. It reads log files and sends them to Loki.

sh
cat > /usr/local/etc/promtail-local-config.yaml << 'CONF' server: http_listen_port: 9080 grpc_listen_port: 0 positions: filename: /var/db/promtail/positions.yaml clients: - url: http://localhost:3100/loki/api/v1/push scrape_configs: - job_name: syslog static_configs: - targets: - localhost labels: job: syslog host: myhost __path__: /var/log/messages - job_name: auth static_configs: - targets: - localhost labels: job: auth host: myhost __path__: /var/log/auth.log - job_name: nginx static_configs: - targets: - localhost labels: job: nginx host: myhost __path__: /var/log/nginx/access.log - job_name: cron static_configs: - targets: - localhost labels: job: cron host: myhost __path__: /var/log/cron CONF mkdir -p /var/db/promtail
sh
service promtail start

Viewing Logs in Grafana

Add Loki as a data source in Grafana:

  • Type: Loki
  • URL: http://localhost:3100

Query logs in the Explore view:

shell
{job="syslog"} |= "error" {job="auth"} |= "Failed password" {job="nginx"} | json | status >= 500

Jail Monitoring

Monitor FreeBSD jails by running node_exporter inside each jail:

sh
# Inside the jail pkg -j myjail install node_exporter jexec myjail sysrc node_exporter_enable="YES" jexec myjail service node_exporter start

Add jail targets to Prometheus:

sh
# In prometheus.yml - job_name: "jails" static_configs: - targets: - "10.0.0.2:9100" # jail1 - "10.0.0.3:9100" # jail2 labels: type: "jail"

Blackbox Exporter

Monitor external endpoints (HTTP, HTTPS, DNS, TCP, ICMP):

sh
cat > /usr/local/etc/blackbox.yml << 'CONF' modules: http_2xx: prober: http timeout: 5s http: valid_http_versions: ["HTTP/1.1", "HTTP/2.0"] valid_status_codes: [200] method: GET follow_redirects: true preferred_ip_protocol: "ip4" http_post_2xx: prober: http http: method: POST tcp_connect: prober: tcp timeout: 5s icmp: prober: icmp timeout: 5s dns_lookup: prober: dns dns: query_name: "example.com" query_type: "A" CONF sysrc blackbox_exporter_enable="YES" service blackbox_exporter start

Security

Grafana

sh
# In /usr/local/etc/grafana.ini or grafana/grafana.ini # Change default admin password # Disable anonymous access # Enable HTTPS

Prometheus

Prometheus has no built-in authentication. Put it behind a reverse proxy:

sh
# nginx reverse proxy with basic auth pkg install nginx # Generate htpasswd pkg install apache24-utils # for htpasswd htpasswd -c /usr/local/etc/nginx/.htpasswd admin # nginx config snippet cat > /usr/local/etc/nginx/conf.d/prometheus.conf << 'NGINX' server { listen 9091 ssl; ssl_certificate /usr/local/etc/ssl/monitoring.crt; ssl_certificate_key /usr/local/etc/ssl/monitoring.key; location / { auth_basic "Prometheus"; auth_basic_user_file /usr/local/etc/nginx/.htpasswd; proxy_pass http://127.0.0.1:9090; } } NGINX

Firewall Rules

sh
# pf rules for monitoring stack (restrict to management network) pass in on $int_if proto tcp from $mgmt_net to self port { 9090 9093 3000 3100 } block in on $ext_if proto tcp to self port { 9090 9093 9100 3000 3100 }

FAQ

How much resources does the monitoring stack need?

For monitoring up to 50 hosts: 2 CPU cores, 4 GB RAM, 100 GB disk. Prometheus is the heaviest component -- memory scales with the number of active time series. Grafana and Alertmanager are lightweight.

Can I run this stack in a jail?

Yes. All components work inside FreeBSD jails. The main consideration is that node_exporter inside a jail will only see the jail's resources, not the host. Run node_exporter on the host for host-level metrics.

How long should I retain Prometheus data?

90 days is a good default for operational monitoring. For capacity planning, consider 1 year. Use Prometheus's --storage.tsdb.retention.time and --storage.tsdb.retention.size flags to control retention.

What is the difference between Loki and a traditional log system like ELK?

Loki indexes only log metadata (labels), not the full text. This makes it far cheaper to run but slower for ad-hoc full-text search. For most operational use cases (filtering by host, service, and known patterns), Loki is sufficient and far simpler than Elasticsearch.

How do I monitor ZFS pool health?

node_exporter's ZFS collector exposes pool state metrics. Create an alert rule that fires when node_zfs_zpool_state is not "online". Combine with Grafana panels showing ARC hit rate, pool capacity, and scrub status.

Can I use this stack to monitor non-FreeBSD systems?

Yes. Prometheus and Grafana are platform-agnostic. node_exporter runs on Linux, FreeBSD, and other systems. The stack on FreeBSD can scrape metrics from any host running compatible exporters.

How do I add PagerDuty integration?

Add a PagerDuty receiver in alertmanager.yml with your integration key. Route critical alerts to the PagerDuty receiver and warning alerts to email or Slack. See the Alertmanager configuration section above for the exact syntax.

Get more FreeBSD guides

Weekly tutorials, security advisories, and package updates. No spam.