FreeBSD Monitoring Stack: Complete Production Setup
This guide builds a full production monitoring stack on FreeBSD: Prometheus for metrics collection, Grafana for dashboards, Alertmanager for notifications, Loki for log aggregation, and the exporters that tie it all together. Every component runs natively on FreeBSD. No Docker. No containers. Just packages, rc.conf, and configuration files.
For the basic Prometheus and Grafana setup, see Prometheus and Grafana on FreeBSD. For general monitoring concepts, see FreeBSD Server Monitoring Guide. For a comparison of monitoring tools, see Best Monitoring Tools for FreeBSD.
Architecture Overview
The monitoring stack has four layers:
- Exporters: Run on every monitored host. Expose metrics as HTTP endpoints.
- Prometheus: Scrapes metrics from exporters on a schedule. Stores time-series data. Evaluates alerting rules.
- Alertmanager: Receives alerts from Prometheus. Deduplicates, groups, and routes them to notification channels (email, PagerDuty, Slack).
- Grafana: Visualizes metrics from Prometheus and logs from Loki.
- Loki + Promtail: Collects and queries log data (the logging layer).
All components can run on a single FreeBSD server for small deployments (under 50 monitored hosts) or be distributed across multiple hosts for scale.
Installing the Stack
sh# Core components pkg install prometheus pkg install grafana10 pkg install alertmanager pkg install loki pkg install promtail # Exporters pkg install node_exporter pkg install blackbox_exporter
Enable services:
shsysrc prometheus_enable="YES" sysrc grafana_enable="YES" sysrc alertmanager_enable="YES" sysrc loki_enable="YES" sysrc promtail_enable="YES" sysrc node_exporter_enable="YES"
Prometheus Configuration
The main configuration file is /usr/local/etc/prometheus.yml:
shcat > /usr/local/etc/prometheus.yml << 'CONF' global: scrape_interval: 15s evaluation_interval: 15s scrape_timeout: 10s alerting: alertmanagers: - static_configs: - targets: - localhost:9093 rule_files: - "/usr/local/etc/prometheus/rules/*.yml" scrape_configs: # Prometheus itself - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] # Node exporter on all hosts - job_name: "node" static_configs: - targets: - "localhost:9100" - "server2:9100" - "server3:9100" labels: env: "production" # Blackbox exporter (HTTP probes) - job_name: "blackbox-http" metrics_path: /probe params: module: [http_2xx] static_configs: - targets: - "https://example.com" - "https://api.example.com" relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: localhost:9115 CONF
Storage Configuration
Prometheus stores data locally by default. For production, tune the retention:
sh# In /etc/rc.conf, add flags sysrc prometheus_args="--storage.tsdb.retention.time=90d --storage.tsdb.retention.size=50GB --web.enable-lifecycle"
Storage sizing: Prometheus uses about 1-2 bytes per sample. With 500 time series scraped every 15 seconds, that is about 2.5 GB per month.
Service Discovery
For environments with many hosts, use file-based service discovery instead of static configs:
sh# In prometheus.yml scrape_configs: - job_name: "node" file_sd_configs: - files: - "/usr/local/etc/prometheus/targets/*.json" refresh_interval: 5m
Create target files:
shmkdir -p /usr/local/etc/prometheus/targets cat > /usr/local/etc/prometheus/targets/production.json << 'JSON' [ { "targets": ["server1:9100", "server2:9100", "server3:9100"], "labels": { "env": "production", "datacenter": "us-east" } } ] JSON
Start Prometheus:
shservice prometheus start # Verify it is running fetch -qo - http://localhost:9090/-/healthy # Should return "Prometheus Server is Healthy."
Node Exporter
node_exporter exposes system metrics (CPU, memory, disk, network) in Prometheus format.
Configuration
sh# FreeBSD-specific node_exporter flags sysrc node_exporter_args="--collector.cpu --collector.meminfo --collector.diskstats --collector.netdev --collector.loadavg --collector.zfs --collector.uname"
The ZFS collector (--collector.zfs) is critical for FreeBSD -- it exposes pool health, ARC statistics, and dataset metrics.
shservice node_exporter start # Verify fetch -qo - http://localhost:9100/metrics | head -20
Key Metrics for FreeBSD
Important metrics exposed by node_exporter on FreeBSD:
shell# CPU node_cpu_seconds_total # Memory node_memory_free_bytes node_memory_active_bytes node_memory_inactive_bytes node_memory_wired_bytes # Disk node_disk_read_bytes_total node_disk_written_bytes_total node_disk_io_time_seconds_total # Network node_network_receive_bytes_total node_network_transmit_bytes_total # ZFS node_zfs_arc_size node_zfs_arc_hits_total node_zfs_arc_misses_total
Alerting Rules
Create alert rules:
shmkdir -p /usr/local/etc/prometheus/rules cat > /usr/local/etc/prometheus/rules/system.yml << 'CONF' groups: - name: system rules: - alert: HighCPUUsage expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85 for: 10m labels: severity: warning annotations: summary: "High CPU usage on {{ $labels.instance }}" description: "CPU usage is {{ $value | printf \"%.1f\" }}% for 10+ minutes." - alert: HighMemoryUsage expr: (1 - node_memory_free_bytes / node_memory_size_bytes) * 100 > 90 for: 5m labels: severity: warning annotations: summary: "High memory usage on {{ $labels.instance }}" - alert: DiskSpaceLow expr: (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 > 85 for: 5m labels: severity: warning annotations: summary: "Disk space low on {{ $labels.instance }}" - alert: HostDown expr: up == 0 for: 2m labels: severity: critical annotations: summary: "Host {{ $labels.instance }} is down" - name: zfs rules: - alert: ZFSPoolDegraded expr: node_zfs_zpool_state{state!="online"} > 0 for: 1m labels: severity: critical annotations: summary: "ZFS pool is degraded on {{ $labels.instance }}" - alert: ZFSARCHitRateLow expr: rate(node_zfs_arc_hits_total[5m]) / (rate(node_zfs_arc_hits_total[5m]) + rate(node_zfs_arc_misses_total[5m])) < 0.85 for: 15m labels: severity: warning annotations: summary: "ZFS ARC hit rate is low on {{ $labels.instance }}" CONF
Validate the rules:
shpromtool check rules /usr/local/etc/prometheus/rules/system.yml
Alertmanager Configuration
shcat > /usr/local/etc/alertmanager/alertmanager.yml << 'CONF' global: resolve_timeout: 5m route: group_by: ['alertname', 'instance'] group_wait: 30s group_interval: 5m repeat_interval: 4h receiver: 'default' routes: - match: severity: critical receiver: 'pagerduty' repeat_interval: 1h - match: severity: warning receiver: 'email' repeat_interval: 4h receivers: - name: 'default' email_configs: - to: 'ops@example.com' from: 'alertmanager@example.com' smarthost: 'smtp.example.com:587' auth_username: 'alertmanager@example.com' auth_password: 'smtp-password' - name: 'email' email_configs: - to: 'ops@example.com' from: 'alertmanager@example.com' smarthost: 'smtp.example.com:587' auth_username: 'alertmanager@example.com' auth_password: 'smtp-password' - name: 'pagerduty' pagerduty_configs: - service_key: 'your-pagerduty-integration-key' severity: '{{ .CommonLabels.severity }}' inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'instance'] CONF
Validate and start:
shamtool check-config /usr/local/etc/alertmanager/alertmanager.yml service alertmanager start
Slack Integration
Replace or add a Slack receiver:
shreceivers: - name: 'slack' slack_configs: - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL' channel: '#alerts' title: '{{ .CommonAnnotations.summary }}' text: '{{ .CommonAnnotations.description }}'
Grafana Configuration
shservice grafana start # Grafana runs on port 3000 by default # Default credentials: admin / admin # Change the password on first login
Add Prometheus as a Data Source
Navigate to Grafana (http://localhost:3000) > Configuration > Data Sources > Add data source:
- Type: Prometheus
- URL: http://localhost:9090
- Access: Server (default)
- Save & Test
FreeBSD System Dashboard
Import a dashboard or create one. Key panels for a FreeBSD system dashboard:
CPU Usage Panel (PromQL):
shell100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Memory Usage Panel:
shell(1 - node_memory_free_bytes / node_memory_size_bytes) * 100
ZFS ARC Size:
shellnode_zfs_arc_size
ZFS ARC Hit Rate:
shellrate(node_zfs_arc_hits_total[5m]) / (rate(node_zfs_arc_hits_total[5m]) + rate(node_zfs_arc_misses_total[5m]))
Disk I/O:
shellrate(node_disk_read_bytes_total[5m]) rate(node_disk_written_bytes_total[5m])
Network Traffic:
shellrate(node_network_receive_bytes_total{device!="lo0"}[5m]) * 8 rate(node_network_transmit_bytes_total{device!="lo0"}[5m]) * 8
Provisioning Dashboards as Code
Store dashboard JSON in /usr/local/etc/grafana/provisioning/dashboards/:
shmkdir -p /usr/local/etc/grafana/provisioning/dashboards cat > /usr/local/etc/grafana/provisioning/dashboards/default.yml << 'CONF' apiVersion: 1 providers: - name: 'default' orgId: 1 folder: '' type: file disableDeletion: false editable: true options: path: /usr/local/etc/grafana/provisioning/dashboards foldersFromFilesStructure: false CONF
Loki: Log Aggregation
Loki is Prometheus-like but for logs. It indexes log metadata (labels) but not the log content, keeping storage costs low.
Loki Configuration
shcat > /usr/local/etc/loki-local-config.yaml << 'CONF' auth_enabled: false server: http_listen_port: 3100 common: path_prefix: /var/db/loki storage: filesystem: chunks_directory: /var/db/loki/chunks rules_directory: /var/db/loki/rules replication_factor: 1 ring: kvstore: store: inmemory schema_config: configs: - from: 2024-01-01 store: tsdb object_store: filesystem schema: v13 index: prefix: index_ period: 24h limits_config: retention_period: 720h storage_config: filesystem: directory: /var/db/loki/chunks compactor: working_directory: /var/db/loki/compactor compaction_interval: 10m retention_enabled: true retention_delete_delay: 2h retention_delete_worker_count: 150 CONF mkdir -p /var/db/loki/chunks /var/db/loki/rules /var/db/loki/compactor chown -R loki:loki /var/db/loki
shservice loki start
Promtail Configuration
Promtail is the log shipping agent. It reads log files and sends them to Loki.
shcat > /usr/local/etc/promtail-local-config.yaml << 'CONF' server: http_listen_port: 9080 grpc_listen_port: 0 positions: filename: /var/db/promtail/positions.yaml clients: - url: http://localhost:3100/loki/api/v1/push scrape_configs: - job_name: syslog static_configs: - targets: - localhost labels: job: syslog host: myhost __path__: /var/log/messages - job_name: auth static_configs: - targets: - localhost labels: job: auth host: myhost __path__: /var/log/auth.log - job_name: nginx static_configs: - targets: - localhost labels: job: nginx host: myhost __path__: /var/log/nginx/access.log - job_name: cron static_configs: - targets: - localhost labels: job: cron host: myhost __path__: /var/log/cron CONF mkdir -p /var/db/promtail
shservice promtail start
Viewing Logs in Grafana
Add Loki as a data source in Grafana:
- Type: Loki
- URL: http://localhost:3100
Query logs in the Explore view:
shell{job="syslog"} |= "error" {job="auth"} |= "Failed password" {job="nginx"} | json | status >= 500
Jail Monitoring
Monitor FreeBSD jails by running node_exporter inside each jail:
sh# Inside the jail pkg -j myjail install node_exporter jexec myjail sysrc node_exporter_enable="YES" jexec myjail service node_exporter start
Add jail targets to Prometheus:
sh# In prometheus.yml - job_name: "jails" static_configs: - targets: - "10.0.0.2:9100" # jail1 - "10.0.0.3:9100" # jail2 labels: type: "jail"
Blackbox Exporter
Monitor external endpoints (HTTP, HTTPS, DNS, TCP, ICMP):
shcat > /usr/local/etc/blackbox.yml << 'CONF' modules: http_2xx: prober: http timeout: 5s http: valid_http_versions: ["HTTP/1.1", "HTTP/2.0"] valid_status_codes: [200] method: GET follow_redirects: true preferred_ip_protocol: "ip4" http_post_2xx: prober: http http: method: POST tcp_connect: prober: tcp timeout: 5s icmp: prober: icmp timeout: 5s dns_lookup: prober: dns dns: query_name: "example.com" query_type: "A" CONF sysrc blackbox_exporter_enable="YES" service blackbox_exporter start
Security
Grafana
sh# In /usr/local/etc/grafana.ini or grafana/grafana.ini # Change default admin password # Disable anonymous access # Enable HTTPS
Prometheus
Prometheus has no built-in authentication. Put it behind a reverse proxy:
sh# nginx reverse proxy with basic auth pkg install nginx # Generate htpasswd pkg install apache24-utils # for htpasswd htpasswd -c /usr/local/etc/nginx/.htpasswd admin # nginx config snippet cat > /usr/local/etc/nginx/conf.d/prometheus.conf << 'NGINX' server { listen 9091 ssl; ssl_certificate /usr/local/etc/ssl/monitoring.crt; ssl_certificate_key /usr/local/etc/ssl/monitoring.key; location / { auth_basic "Prometheus"; auth_basic_user_file /usr/local/etc/nginx/.htpasswd; proxy_pass http://127.0.0.1:9090; } } NGINX
Firewall Rules
sh# pf rules for monitoring stack (restrict to management network) pass in on $int_if proto tcp from $mgmt_net to self port { 9090 9093 3000 3100 } block in on $ext_if proto tcp to self port { 9090 9093 9100 3000 3100 }
FAQ
How much resources does the monitoring stack need?
For monitoring up to 50 hosts: 2 CPU cores, 4 GB RAM, 100 GB disk. Prometheus is the heaviest component -- memory scales with the number of active time series. Grafana and Alertmanager are lightweight.
Can I run this stack in a jail?
Yes. All components work inside FreeBSD jails. The main consideration is that node_exporter inside a jail will only see the jail's resources, not the host. Run node_exporter on the host for host-level metrics.
How long should I retain Prometheus data?
90 days is a good default for operational monitoring. For capacity planning, consider 1 year. Use Prometheus's --storage.tsdb.retention.time and --storage.tsdb.retention.size flags to control retention.
What is the difference between Loki and a traditional log system like ELK?
Loki indexes only log metadata (labels), not the full text. This makes it far cheaper to run but slower for ad-hoc full-text search. For most operational use cases (filtering by host, service, and known patterns), Loki is sufficient and far simpler than Elasticsearch.
How do I monitor ZFS pool health?
node_exporter's ZFS collector exposes pool state metrics. Create an alert rule that fires when node_zfs_zpool_state is not "online". Combine with Grafana panels showing ARC hit rate, pool capacity, and scrub status.
Can I use this stack to monitor non-FreeBSD systems?
Yes. Prometheus and Grafana are platform-agnostic. node_exporter runs on Linux, FreeBSD, and other systems. The stack on FreeBSD can scrape metrics from any host running compatible exporters.
How do I add PagerDuty integration?
Add a PagerDuty receiver in alertmanager.yml with your integration key. Route critical alerts to the PagerDuty receiver and warning alerts to email or Slack. See the Alertmanager configuration section above for the exact syntax.