FreeBSD High Availability: CARP, HAST, and Failover Guide
High availability on FreeBSD is built from kernel-level primitives that have been production-tested for over two decades. CARP provides virtual IP failover. Pfsync synchronizes firewall state between nodes. HAST replicates block devices in real time. ZFS send/recv handles asynchronous dataset replication. These tools combine to build redundant infrastructure without third-party clustering software.
This guide covers each HA component, how they work together, and complete configurations for common failover scenarios.
Architecture Overview
FreeBSD's HA stack:
| Component | Function | Layer |
|---|---|---|
| CARP | Virtual IP failover | Network |
| pfsync | PF firewall state sync | Firewall |
| HAST | Block-level replication | Storage |
| ZFS send/recv | Dataset replication | Filesystem |
| ifstated / devd | Failover automation | Orchestration |
| relayd | Load balancing | Application |
A typical two-node HA pair uses CARP for the virtual IP, pfsync to keep firewall connections alive during failover, and either HAST or ZFS replication for data synchronization.
CARP: Virtual IP Failover
CARP (Common Address Redundancy Protocol) allows multiple FreeBSD machines to share a virtual IP address. One machine is the master (handles traffic), and the others are backups. When the master fails, a backup takes over within seconds.
How CARP Works
CARP nodes send multicast advertisements on the network. The master sends advertisements at a rate determined by its advbase and advskew parameters. If backup nodes stop receiving advertisements, the one with the lowest advskew becomes the new master.
The failover is layer 2 -- the new master sends gratuitous ARP packets to update switches' MAC tables. Clients see no change except a brief interruption (typically under 3 seconds).
Basic CARP Setup
Two-node configuration with a shared virtual IP of 10.0.0.100.
Node 1 (primary):
sh# Physical interface sysrc ifconfig_em0="inet 10.0.0.1 netmask 255.255.255.0" # CARP virtual IP sysrc ifconfig_em0_alias0="vhid 1 advskew 0 pass secretpass alias 10.0.0.100/32" # Allow CARP protocol sysrc net.inet.carp.preempt=1 echo 'net.inet.carp.preempt=1' >> /etc/sysctl.conf
Node 2 (backup):
sh# Physical interface sysrc ifconfig_em0="inet 10.0.0.2 netmask 255.255.255.0" # CARP virtual IP (higher advskew = lower priority) sysrc ifconfig_em0_alias0="vhid 1 advskew 100 pass secretpass alias 10.0.0.100/32" # Allow CARP protocol sysrc net.inet.carp.preempt=1 echo 'net.inet.carp.preempt=1' >> /etc/sysctl.conf
Apply configuration:
shservice netif restart
Verifying CARP Status
shifconfig em0
On the master, you will see:
shellcarp: MASTER vhid 1 advbase 1 advskew 0 inet 10.0.0.100 netmask 0xffffffff broadcast 10.0.0.100
On the backup:
shellcarp: BACKUP vhid 1 advbase 1 advskew 100 inet 10.0.0.100 netmask 0xffffffff broadcast 10.0.0.100
Testing Failover
sh# On the master, force demotion ifconfig em0 vhid 1 state backup # Or simulate failure by shutting down the interface ifconfig em0 down # On the backup, watch it promote ifconfig em0 # Should show MASTER
CARP with Multiple Virtual IPs
For load distribution, use multiple CARP groups with different masters:
Node 1:
shsysrc ifconfig_em0_alias0="vhid 1 advskew 0 pass pass1 alias 10.0.0.100/32" sysrc ifconfig_em0_alias1="vhid 2 advskew 100 pass pass2 alias 10.0.0.101/32"
Node 2:
shsysrc ifconfig_em0_alias0="vhid 1 advskew 100 pass pass1 alias 10.0.0.100/32" sysrc ifconfig_em0_alias1="vhid 2 advskew 0 pass pass2 alias 10.0.0.101/32"
Node 1 is master for 10.0.0.100, Node 2 is master for 10.0.0.101. Both handle traffic in normal operation. If either fails, the other handles all virtual IPs.
CARP Preemption
With net.inet.carp.preempt=1, when the original master recovers, it reclaims its CARP groups automatically. Without preemption, the backup remains master even after the original comes back online.
Additionally, preemption causes all CARP groups to fail over together when any interface on the host goes down. This prevents split-brain scenarios where some VIPs are on one node and others are on another.
Pfsync: Firewall State Synchronization
Pfsync synchronizes PF firewall state tables between CARP nodes. Without pfsync, active TCP connections break during failover because the new master has no knowledge of existing sessions. With pfsync, the backup already knows about every connection and can continue processing them seamlessly.
Pfsync Setup
Use a dedicated network link (crossover cable or dedicated VLAN) between nodes for pfsync traffic:
Node 1:
sh# Dedicated sync interface sysrc ifconfig_em1="inet 192.168.255.1 netmask 255.255.255.252" # Enable pfsync sysrc pfsync_enable="YES" sysrc pfsync_syncdev="em1" sysrc pfsync_syncpeer="192.168.255.2"
Node 2:
shsysrc ifconfig_em1="inet 192.168.255.2 netmask 255.255.255.252" sysrc pfsync_enable="YES" sysrc pfsync_syncdev="em1" sysrc pfsync_syncpeer="192.168.255.1"
Configure both nodes:
shifconfig pfsync0 syncdev em1 syncpeer 192.168.255.2 up
PF Configuration for Pfsync
Allow pfsync traffic on the dedicated interface:
shcat >> /etc/pf.conf << 'EOF' # Pfsync traffic pass quick on em1 proto pfsync keep state pass quick on em1 proto carp keep state EOF pfctl -f /etc/pf.conf
Verifying Pfsync
sh# Check pfsync status ifconfig pfsync0 # Count synchronized states pfctl -s info | grep states # Both nodes should show similar state counts
HAST: Block-Level Replication
HAST (Highly Available STorage) provides synchronous block-level replication between two FreeBSD machines. It is similar to Linux's DRBD. One node is primary (read-write), the other is secondary (receives writes in real time). On failover, the secondary promotes to primary.
HAST Architecture
HAST operates below the filesystem layer. It replicates disk blocks, not files. This means you can use any filesystem (ZFS, UFS) on top of a HAST device. The replication is transparent to the filesystem.
HAST Setup
On both nodes:
shsysrc hastd_enable="YES"
Create the HAST configuration:
shcat > /etc/hast.conf << 'EOF' resource shared_storage { on node1 { local /dev/gpt/hast0 remote 192.168.255.2 } on node2 { local /dev/gpt/hast0 remote 192.168.255.1 } } EOF
This requires a partition labeled hast0 on both nodes:
shgpart add -t freebsd-zfs -s 100g -l hast0 ada1
Copy the same /etc/hast.conf to both nodes.
Starting HAST
On both nodes:
shservice hastd start
On the primary node:
shhastctl role primary shared_storage
On the secondary node:
shhastctl role secondary shared_storage
The HAST device appears as /dev/hast/shared_storage on the primary. Create a filesystem on it:
sh# UFS newfs -U /dev/hast/shared_storage mount /dev/hast/shared_storage /shared # Or ZFS zpool create shared /dev/hast/shared_storage
HAST Failover
When the primary fails, promote the secondary:
sh# On the secondary node hastctl role primary shared_storage # If using UFS fsck -y /dev/hast/shared_storage mount /dev/hast/shared_storage /shared # If using ZFS zpool import shared
HAST Replication Modes
| Mode | Description | Data Safety | Performance |
|---|---|---|---|
| memsync | Ack after remote receives, before disk write | Good | Better |
| fullsync | Ack after remote writes to disk | Best | Slower |
| async | Ack immediately, replicate in background | Lower | Fastest |
Set in hast.conf:
shellresource shared_storage { replication fullsync ... }
For data safety, use fullsync. For performance-sensitive workloads where some data loss is acceptable, use memsync.
HAST with CARP Automation
Use devd to automate HAST role changes when CARP state changes:
shcat > /etc/devd/carp-hast.conf << 'EOF' notify 100 { match "system" "IFNET"; match "subsystem" "carp0"; match "type" "LINK_UP"; action "/usr/local/sbin/carp-hast-switch master"; }; notify 100 { match "system" "IFNET"; match "subsystem" "carp0"; match "type" "LINK_DOWN"; action "/usr/local/sbin/carp-hast-switch slave"; }; EOF
The switch script:
shcat > /usr/local/sbin/carp-hast-switch << 'SCRIPT' #!/bin/sh case "$1" in master) logger "CARP: Promoting to master" hastctl role primary shared_storage sleep 2 mount /dev/hast/shared_storage /shared # Start services that depend on shared storage service postgresql start ;; slave) logger "CARP: Demoting to slave" service postgresql stop umount -f /shared hastctl role secondary shared_storage ;; esac SCRIPT chmod +x /usr/local/sbin/carp-hast-switch service devd restart
ZFS Replication
ZFS send/recv provides asynchronous dataset replication. It is simpler than HAST, does not require dedicated hardware, and works across any network. The trade-off is that replication is periodic (not real-time), so some data loss is possible.
Basic ZFS Replication
sh# Create initial snapshot and send to remote zfs snapshot zroot/data@initial zfs send zroot/data@initial | ssh backup-host zfs recv tank/replica/data # Incremental send (only changes since last snapshot) zfs snapshot zroot/data@2026-04-09 zfs send -i zroot/data@initial zroot/data@2026-04-09 | \ ssh backup-host zfs recv tank/replica/data
Automated Replication Script
shcat > /usr/local/sbin/zfs-replicate.sh << 'SCRIPT' #!/bin/sh # ZFS incremental replication DATASET="zroot/data" REMOTE="backup-host" REMOTE_DATASET="tank/replica/data" SNAP_PREFIX="autorep" # Create new snapshot NEW_SNAP="${SNAP_PREFIX}-$(date +%Y%m%d-%H%M%S)" zfs snapshot ${DATASET}@${NEW_SNAP} # Find previous replication snapshot PREV_SNAP=$(zfs list -t snapshot -o name -s creation ${DATASET} | \ grep "@${SNAP_PREFIX}" | tail -2 | head -1 | cut -d@ -f2) if [ -n "${PREV_SNAP}" ] && [ "${PREV_SNAP}" != "${NEW_SNAP}" ]; then # Incremental send zfs send -i ${DATASET}@${PREV_SNAP} ${DATASET}@${NEW_SNAP} | \ ssh ${REMOTE} zfs recv -F ${REMOTE_DATASET} if [ $? -eq 0 ]; then # Clean up old snapshots (keep last 5) zfs list -t snapshot -o name -s creation ${DATASET} | \ grep "@${SNAP_PREFIX}" | head -n -5 | xargs -I {} zfs destroy {} fi else # Full send (first time) zfs send ${DATASET}@${NEW_SNAP} | \ ssh ${REMOTE} zfs recv -F ${REMOTE_DATASET} fi SCRIPT chmod +x /usr/local/sbin/zfs-replicate.sh
Schedule it:
sh# Replicate every 15 minutes echo '*/15 * * * * root /usr/local/sbin/zfs-replicate.sh' >> /etc/crontab
ZFS Replication with Failover
When the primary fails, the backup promotes the replica:
sh# On the backup host zfs set readonly=off tank/replica/data zfs mount tank/replica/data # Update CARP to attract traffic ifconfig em0 vhid 1 advskew 0
When the primary recovers:
sh# Reverse replication direction temporarily # Then re-sync and restore normal roles
Load Balancer Failover with relayd
relayd is FreeBSD's built-in load balancer and application layer gateway. Combined with CARP, it provides highly available load balancing.
relayd Configuration
shpkg install relayd # Or use base system relayd if available sysrc relayd_enable="YES"
shcat > /usr/local/etc/relayd.conf << 'EOF' # Macros ext_addr = "10.0.0.100" # CARP virtual IP web1 = "10.0.0.10" web2 = "10.0.0.11" web3 = "10.0.0.12" # Health checks http protocol "http_check" { match request header append "X-Forwarded-For" value "$REMOTE_ADDR" match request header append "X-Forwarded-Port" value "$REMOTE_PORT" match response header "Content-Type" value "text/html*" } # Backend table table <webhosts> { $web1 $web2 $web3 } # Relay relay "www" { listen on $ext_addr port 443 tls protocol "http_check" forward to <webhosts> port 8080 mode loadbalance check http "/" code 200 } # Redirect (layer 3) redirect "web_redirect" { listen on $ext_addr port 80 forward to <webhosts> port 8080 check http "/" code 200 } EOF
Start relayd:
shservice relayd start relayctl show summary
HA Load Balancer Pair
Deploy relayd on two nodes with CARP:
Node 1:
shsysrc ifconfig_em0_alias0="vhid 1 advskew 0 pass lbpass alias 10.0.0.100/32" sysrc relayd_enable="YES" sysrc pfsync_enable="YES" sysrc pfsync_syncdev="em1"
Node 2:
shsysrc ifconfig_em0_alias0="vhid 1 advskew 100 pass lbpass alias 10.0.0.100/32" sysrc relayd_enable="YES" sysrc pfsync_enable="YES" sysrc pfsync_syncdev="em1"
Both nodes run relayd with the same configuration. Only the CARP master handles traffic. On failover, pfsync ensures existing connections continue on the new master.
Complete HA Example: Database Server
A two-node PostgreSQL HA setup using CARP + HAST:
Node Configuration
sh# Both nodes: /etc/hast.conf resource pgdata { on db1 { local /dev/gpt/pgdata remote 192.168.255.2 } on db2 { local /dev/gpt/pgdata remote 192.168.255.1 } replication fullsync }
Failover Script
shcat > /usr/local/sbin/db-failover.sh << 'SCRIPT' #!/bin/sh case "$1" in master) logger "DB failover: becoming master" hastctl role primary pgdata sleep 3 fsck -y /dev/hast/pgdata mount /dev/hast/pgdata /var/db/postgres/data service postgresql start ;; slave) logger "DB failover: becoming slave" service postgresql stop sleep 2 umount -f /var/db/postgres/data hastctl role secondary pgdata ;; esac SCRIPT chmod +x /usr/local/sbin/db-failover.sh
Monitoring
sh# Check CARP status ifconfig em0 | grep carp # Check HAST status hastctl status # Check pfsync pfctl -s info | grep pfsync # Check relayd backends relayctl show summary
FAQ
How fast is CARP failover?
Typical CARP failover completes in 1-3 seconds. The exact time depends on the advbase and advskew settings. The default advbase is 1 second, so the backup detects master failure within 3 missed advertisements and then takes over. For TCP connections, the interruption is usually short enough that most clients simply retry.
Can I have more than two CARP nodes?
Yes. CARP supports multiple backups. Each backup has a different advskew value -- lower values mean higher priority. On master failure, the backup with the lowest advskew takes over. Three or more nodes provide additional redundancy.
Does HAST work with ZFS?
Yes, but with caveats. You can create a ZFS pool on a HAST device. However, ZFS expects to manage its own disks, and placing a ZFS pool on a HAST block device adds a layer of complexity. For ZFS-native replication, use zfs send/recv instead. Use HAST with UFS when you need synchronous block-level replication.
What is the data loss window with ZFS replication?
ZFS send/recv is asynchronous. The data loss window equals the time since the last successful replication snapshot. If you replicate every 15 minutes, you could lose up to 15 minutes of data. For zero data loss, use HAST with fullsync replication.
Can pfsync work over a WAN?
Technically yes, but it is not recommended. Pfsync sends high volumes of state updates and is designed for low-latency links. Over a WAN, the latency and potential packet loss degrade performance and reliability. Use a dedicated, low-latency link (crossover cable, dedicated VLAN, or dedicated switch port) between pfsync peers.
How do I prevent split-brain with CARP?
Split-brain occurs when both nodes think they are master. Prevent it by: using a dedicated, reliable network link between nodes for CARP and pfsync; enabling net.inet.carp.preempt so all interfaces fail over together; and using ifstated or devd to demote a node that loses its sync link.
Can relayd replace HAProxy?
For basic load balancing and health checking, yes. Relayd handles HTTP and TCP load balancing, TLS termination, and basic health checks. HAProxy offers more advanced features: sophisticated ACLs, stick tables, detailed statistics, and more load balancing algorithms. For simple setups, relayd keeps you on base-system tools. For complex routing, HAProxy is more capable.