FreeBSD Monitoring Stack: Complete Production Setup

This guide builds a full production monitoring stack on FreeBSD: Prometheus for metrics collection, Grafana for dashboards, Alertmanager for notifications, Loki for log aggregation, and the exporters that tie it all together. Every component runs natively on FreeBSD. No Docker. No containers. Just packages, rc.conf, and configuration files.

For the basic Prometheus and Grafana setup, see Prometheus and Grafana on FreeBSD. For general monitoring concepts, see FreeBSD Server Monitoring Guide. For a comparison of monitoring tools, see Best Monitoring Tools for FreeBSD.

Architecture Overview

The monitoring stack has four layers:

Exporters: Run on every monitored host. Expose metrics as HTTP endpoints.
Prometheus: Scrapes metrics from exporters on a schedule. Stores time-series data. Evaluates alerting rules.
Alertmanager: Receives alerts from Prometheus. Deduplicates, groups, and routes them to notification channels (email, PagerDuty, Slack).
Grafana: Visualizes metrics from Prometheus and logs from Loki.
Loki + Promtail: Collects and queries log data (the logging layer).

All components can run on a single FreeBSD server for small deployments (under 50 monitored hosts) or be distributed across multiple hosts for scale.

Installing the Stack

sh
# Core components
pkg install prometheus
pkg install grafana10
pkg install alertmanager
pkg install loki
pkg install promtail

# Exporters
pkg install node_exporter
pkg install blackbox_exporter

Enable services:

sh
sysrc prometheus_enable="YES"
sysrc grafana_enable="YES"
sysrc alertmanager_enable="YES"
sysrc loki_enable="YES"
sysrc promtail_enable="YES"
sysrc node_exporter_enable="YES"

Prometheus Configuration

The main configuration file is /usr/local/etc/prometheus.yml:

sh
cat > /usr/local/etc/prometheus.yml << 'CONF'
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  scrape_timeout: 10s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093

rule_files:
  - "/usr/local/etc/prometheus/rules/*.yml"

scrape_configs:
  # Prometheus itself
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  # Node exporter on all hosts
  - job_name: "node"
    static_configs:
      - targets:
          - "localhost:9100"
          - "server2:9100"
          - "server3:9100"
        labels:
          env: "production"

  # Blackbox exporter (HTTP probes)
  - job_name: "blackbox-http"
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - "https://example.com"
          - "https://api.example.com"
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115
CONF

Storage Configuration

Prometheus stores data locally by default. For production, tune the retention:

sh
# In /etc/rc.conf, add flags
sysrc prometheus_args="--storage.tsdb.retention.time=90d --storage.tsdb.retention.size=50GB --web.enable-lifecycle"

Storage sizing: Prometheus uses about 1-2 bytes per sample. With 500 time series scraped every 15 seconds, that is about 2.5 GB per month.

Service Discovery

For environments with many hosts, use file-based service discovery instead of static configs:

sh
# In prometheus.yml
scrape_configs:
  - job_name: "node"
    file_sd_configs:
      - files:
          - "/usr/local/etc/prometheus/targets/*.json"
        refresh_interval: 5m

Create target files:

sh
mkdir -p /usr/local/etc/prometheus/targets

cat > /usr/local/etc/prometheus/targets/production.json << 'JSON'
[
  {
    "targets": ["server1:9100", "server2:9100", "server3:9100"],
    "labels": {
      "env": "production",
      "datacenter": "us-east"
    }
  }
]
JSON

Start Prometheus:

sh
service prometheus start

# Verify it is running
fetch -qo - http://localhost:9090/-/healthy
# Should return "Prometheus Server is Healthy."

Node Exporter

node_exporter exposes system metrics (CPU, memory, disk, network) in Prometheus format.

Configuration

sh
# FreeBSD-specific node_exporter flags
sysrc node_exporter_args="--collector.cpu --collector.meminfo --collector.diskstats --collector.netdev --collector.loadavg --collector.zfs --collector.uname"

The ZFS collector (--collector.zfs) is critical for FreeBSD -- it exposes pool health, ARC statistics, and dataset metrics.

sh
service node_exporter start

# Verify
fetch -qo - http://localhost:9100/metrics | head -20

Key Metrics for FreeBSD

Important metrics exposed by node_exporter on FreeBSD:

shell
# CPU
node_cpu_seconds_total

# Memory
node_memory_free_bytes
node_memory_active_bytes
node_memory_inactive_bytes
node_memory_wired_bytes

# Disk
node_disk_read_bytes_total
node_disk_written_bytes_total
node_disk_io_time_seconds_total

# Network
node_network_receive_bytes_total
node_network_transmit_bytes_total

# ZFS
node_zfs_arc_size
node_zfs_arc_hits_total
node_zfs_arc_misses_total

Alerting Rules

Create alert rules:

sh
mkdir -p /usr/local/etc/prometheus/rules

cat > /usr/local/etc/prometheus/rules/system.yml << 'CONF'
groups:
  - name: system
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is {{ $value | printf \"%.1f\" }}% for 10+ minutes."

      - alert: HighMemoryUsage
        expr: (1 - node_memory_free_bytes / node_memory_size_bytes) * 100 > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"

      - alert: DiskSpaceLow
        expr: (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Disk space low on {{ $labels.instance }}"

      - alert: HostDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Host {{ $labels.instance }} is down"

  - name: zfs
    rules:
      - alert: ZFSPoolDegraded
        expr: node_zfs_zpool_state{state!="online"} > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "ZFS pool is degraded on {{ $labels.instance }}"

      - alert: ZFSARCHitRateLow
        expr: rate(node_zfs_arc_hits_total[5m]) / (rate(node_zfs_arc_hits_total[5m]) + rate(node_zfs_arc_misses_total[5m])) < 0.85
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "ZFS ARC hit rate is low on {{ $labels.instance }}"
CONF

Validate the rules:

sh
promtool check rules /usr/local/etc/prometheus/rules/system.yml

Alertmanager Configuration

sh
cat > /usr/local/etc/alertmanager/alertmanager.yml << 'CONF'
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'instance']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'default'

  routes:
    - match:
        severity: critical
      receiver: 'pagerduty'
      repeat_interval: 1h

    - match:
        severity: warning
      receiver: 'email'
      repeat_interval: 4h

receivers:
  - name: 'default'
    email_configs:
      - to: 'ops@example.com'
        from: 'alertmanager@example.com'
        smarthost: 'smtp.example.com:587'
        auth_username: 'alertmanager@example.com'
        auth_password: 'smtp-password'

  - name: 'email'
    email_configs:
      - to: 'ops@example.com'
        from: 'alertmanager@example.com'
        smarthost: 'smtp.example.com:587'
        auth_username: 'alertmanager@example.com'
        auth_password: 'smtp-password'

  - name: 'pagerduty'
    pagerduty_configs:
      - service_key: 'your-pagerduty-integration-key'
        severity: '{{ .CommonLabels.severity }}'

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']
CONF

Validate and start:

sh
amtool check-config /usr/local/etc/alertmanager/alertmanager.yml
service alertmanager start

Slack Integration

Replace or add a Slack receiver:

sh
receivers:
  - name: 'slack'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
        channel: '#alerts'
        title: '{{ .CommonAnnotations.summary }}'
        text: '{{ .CommonAnnotations.description }}'

Grafana Configuration

sh
service grafana start

# Grafana runs on port 3000 by default
# Default credentials: admin / admin
# Change the password on first login

Add Prometheus as a Data Source

Navigate to Grafana (http://localhost:3000) > Configuration > Data Sources > Add data source:

Type: Prometheus
URL: http://localhost:9090
Access: Server (default)
Save & Test

FreeBSD System Dashboard

Import a dashboard or create one. Key panels for a FreeBSD system dashboard:

CPU Usage Panel (PromQL):

shell
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory Usage Panel:

shell
(1 - node_memory_free_bytes / node_memory_size_bytes) * 100

ZFS ARC Size:

shell
node_zfs_arc_size

ZFS ARC Hit Rate:

shell
rate(node_zfs_arc_hits_total[5m]) / (rate(node_zfs_arc_hits_total[5m]) + rate(node_zfs_arc_misses_total[5m]))

Disk I/O:

shell
rate(node_disk_read_bytes_total[5m])
rate(node_disk_written_bytes_total[5m])

Network Traffic:

shell
rate(node_network_receive_bytes_total{device!="lo0"}[5m]) * 8
rate(node_network_transmit_bytes_total{device!="lo0"}[5m]) * 8

Provisioning Dashboards as Code

Store dashboard JSON in /usr/local/etc/grafana/provisioning/dashboards/:

sh
mkdir -p /usr/local/etc/grafana/provisioning/dashboards

cat > /usr/local/etc/grafana/provisioning/dashboards/default.yml << 'CONF'
apiVersion: 1
providers:
  - name: 'default'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    editable: true
    options:
      path: /usr/local/etc/grafana/provisioning/dashboards
      foldersFromFilesStructure: false
CONF

Loki: Log Aggregation

Loki is Prometheus-like but for logs. It indexes log metadata (labels) but not the log content, keeping storage costs low.

Loki Configuration

sh
cat > /usr/local/etc/loki-local-config.yaml << 'CONF'
auth_enabled: false

server:
  http_listen_port: 3100

common:
  path_prefix: /var/db/loki
  storage:
    filesystem:
      chunks_directory: /var/db/loki/chunks
      rules_directory: /var/db/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

limits_config:
  retention_period: 720h

storage_config:
  filesystem:
    directory: /var/db/loki/chunks

compactor:
  working_directory: /var/db/loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150
CONF

mkdir -p /var/db/loki/chunks /var/db/loki/rules /var/db/loki/compactor
chown -R loki:loki /var/db/loki

sh
service loki start

Promtail Configuration

Promtail is the log shipping agent. It reads log files and sends them to Loki.

sh
cat > /usr/local/etc/promtail-local-config.yaml << 'CONF'
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/db/promtail/positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
  - job_name: syslog
    static_configs:
      - targets:
          - localhost
        labels:
          job: syslog
          host: myhost
          __path__: /var/log/messages

  - job_name: auth
    static_configs:
      - targets:
          - localhost
        labels:
          job: auth
          host: myhost
          __path__: /var/log/auth.log

  - job_name: nginx
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          host: myhost
          __path__: /var/log/nginx/access.log

  - job_name: cron
    static_configs:
      - targets:
          - localhost
        labels:
          job: cron
          host: myhost
          __path__: /var/log/cron
CONF

mkdir -p /var/db/promtail

sh
service promtail start

Viewing Logs in Grafana

Add Loki as a data source in Grafana:

Type: Loki
URL: http://localhost:3100

Query logs in the Explore view:

shell
{job="syslog"} |= "error"
{job="auth"} |= "Failed password"
{job="nginx"} | json | status >= 500

Jail Monitoring

Monitor FreeBSD jails by running node_exporter inside each jail:

sh
# Inside the jail
pkg -j myjail install node_exporter
jexec myjail sysrc node_exporter_enable="YES"
jexec myjail service node_exporter start

Add jail targets to Prometheus:

sh
# In prometheus.yml
- job_name: "jails"
  static_configs:
    - targets:
        - "10.0.0.2:9100"  # jail1
        - "10.0.0.3:9100"  # jail2
      labels:
        type: "jail"

Blackbox Exporter

Monitor external endpoints (HTTP, HTTPS, DNS, TCP, ICMP):

sh
cat > /usr/local/etc/blackbox.yml << 'CONF'
modules:
  http_2xx:
    prober: http
    timeout: 5s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: [200]
      method: GET
      follow_redirects: true
      preferred_ip_protocol: "ip4"

  http_post_2xx:
    prober: http
    http:
      method: POST

  tcp_connect:
    prober: tcp
    timeout: 5s

  icmp:
    prober: icmp
    timeout: 5s

  dns_lookup:
    prober: dns
    dns:
      query_name: "example.com"
      query_type: "A"
CONF

sysrc blackbox_exporter_enable="YES"
service blackbox_exporter start

Security

Grafana

sh
# In /usr/local/etc/grafana.ini or grafana/grafana.ini
# Change default admin password
# Disable anonymous access
# Enable HTTPS

Prometheus

Prometheus has no built-in authentication. Put it behind a reverse proxy:

sh
# nginx reverse proxy with basic auth
pkg install nginx

# Generate htpasswd
pkg install apache24-utils  # for htpasswd
htpasswd -c /usr/local/etc/nginx/.htpasswd admin

# nginx config snippet
cat > /usr/local/etc/nginx/conf.d/prometheus.conf << 'NGINX'
server {
    listen 9091 ssl;
    ssl_certificate /usr/local/etc/ssl/monitoring.crt;
    ssl_certificate_key /usr/local/etc/ssl/monitoring.key;

    location / {
        auth_basic "Prometheus";
        auth_basic_user_file /usr/local/etc/nginx/.htpasswd;
        proxy_pass http://127.0.0.1:9090;
    }
}
NGINX

Firewall Rules

sh
# pf rules for monitoring stack (restrict to management network)
pass in on $int_if proto tcp from $mgmt_net to self port { 9090 9093 3000 3100 }
block in on $ext_if proto tcp to self port { 9090 9093 9100 3000 3100 }

FAQ

How much resources does the monitoring stack need?

For monitoring up to 50 hosts: 2 CPU cores, 4 GB RAM, 100 GB disk. Prometheus is the heaviest component -- memory scales with the number of active time series. Grafana and Alertmanager are lightweight.

Can I run this stack in a jail?

Yes. All components work inside FreeBSD jails. The main consideration is that node_exporter inside a jail will only see the jail's resources, not the host. Run node_exporter on the host for host-level metrics.

How long should I retain Prometheus data?

90 days is a good default for operational monitoring. For capacity planning, consider 1 year. Use Prometheus's --storage.tsdb.retention.time and --storage.tsdb.retention.size flags to control retention.

What is the difference between Loki and a traditional log system like ELK?

Loki indexes only log metadata (labels), not the full text. This makes it far cheaper to run but slower for ad-hoc full-text search. For most operational use cases (filtering by host, service, and known patterns), Loki is sufficient and far simpler than Elasticsearch.

How do I monitor ZFS pool health?

node_exporter's ZFS collector exposes pool state metrics. Create an alert rule that fires when node_zfs_zpool_state is not "online". Combine with Grafana panels showing ARC hit rate, pool capacity, and scrub status.

Can I use this stack to monitor non-FreeBSD systems?

Yes. Prometheus and Grafana are platform-agnostic. node_exporter runs on Linux, FreeBSD, and other systems. The stack on FreeBSD can scrape metrics from any host running compatible exporters.

How do I add PagerDuty integration?

Add a PagerDuty receiver in alertmanager.yml with your integration key. Route critical alerts to the PagerDuty receiver and warning alerts to email or Slack. See the Alertmanager configuration section above for the exact syntax.