FreeBSD High Availability: CARP, HAST, and Failover Guide

High availability on FreeBSD is built from kernel-level primitives that have been production-tested for over two decades. CARP provides virtual IP failover. Pfsync synchronizes firewall state between nodes. HAST replicates block devices in real time. ZFS send/recv handles asynchronous dataset replication. These tools combine to build redundant infrastructure without third-party clustering software.

This guide covers each HA component, how they work together, and complete configurations for common failover scenarios.

Architecture Overview

FreeBSD's HA stack:

| Component | Function | Layer |

|---|---|---|

| CARP | Virtual IP failover | Network |

| pfsync | PF firewall state sync | Firewall |

| HAST | Block-level replication | Storage |

| ZFS send/recv | Dataset replication | Filesystem |

| ifstated / devd | Failover automation | Orchestration |

| relayd | Load balancing | Application |

A typical two-node HA pair uses CARP for the virtual IP, pfsync to keep firewall connections alive during failover, and either HAST or ZFS replication for data synchronization.

CARP: Virtual IP Failover

CARP (Common Address Redundancy Protocol) allows multiple FreeBSD machines to share a virtual IP address. One machine is the master (handles traffic), and the others are backups. When the master fails, a backup takes over within seconds.

How CARP Works

CARP nodes send multicast advertisements on the network. The master sends advertisements at a rate determined by its advbase and advskew parameters. If backup nodes stop receiving advertisements, the one with the lowest advskew becomes the new master.

The failover is layer 2 -- the new master sends gratuitous ARP packets to update switches' MAC tables. Clients see no change except a brief interruption (typically under 3 seconds).

Basic CARP Setup

Two-node configuration with a shared virtual IP of 10.0.0.100.

Node 1 (primary):

sh
# Physical interface
sysrc ifconfig_em0="inet 10.0.0.1 netmask 255.255.255.0"

# CARP virtual IP
sysrc ifconfig_em0_alias0="vhid 1 advskew 0 pass secretpass alias 10.0.0.100/32"

# Allow CARP protocol
sysrc net.inet.carp.preempt=1
echo 'net.inet.carp.preempt=1' >> /etc/sysctl.conf

Node 2 (backup):

sh
# Physical interface
sysrc ifconfig_em0="inet 10.0.0.2 netmask 255.255.255.0"

# CARP virtual IP (higher advskew = lower priority)
sysrc ifconfig_em0_alias0="vhid 1 advskew 100 pass secretpass alias 10.0.0.100/32"

# Allow CARP protocol
sysrc net.inet.carp.preempt=1
echo 'net.inet.carp.preempt=1' >> /etc/sysctl.conf

Apply configuration:

sh
service netif restart

Verifying CARP Status

sh
ifconfig em0

On the master, you will see:

shell
carp: MASTER vhid 1 advbase 1 advskew 0
inet 10.0.0.100 netmask 0xffffffff broadcast 10.0.0.100

On the backup:

shell
carp: BACKUP vhid 1 advbase 1 advskew 100
inet 10.0.0.100 netmask 0xffffffff broadcast 10.0.0.100

Testing Failover

sh
# On the master, force demotion
ifconfig em0 vhid 1 state backup

# Or simulate failure by shutting down the interface
ifconfig em0 down

# On the backup, watch it promote
ifconfig em0  # Should show MASTER

CARP with Multiple Virtual IPs

For load distribution, use multiple CARP groups with different masters:

Node 1:

sh
sysrc ifconfig_em0_alias0="vhid 1 advskew 0 pass pass1 alias 10.0.0.100/32"
sysrc ifconfig_em0_alias1="vhid 2 advskew 100 pass pass2 alias 10.0.0.101/32"

Node 2:

sh
sysrc ifconfig_em0_alias0="vhid 1 advskew 100 pass pass1 alias 10.0.0.100/32"
sysrc ifconfig_em0_alias1="vhid 2 advskew 0 pass pass2 alias 10.0.0.101/32"

Node 1 is master for 10.0.0.100, Node 2 is master for 10.0.0.101. Both handle traffic in normal operation. If either fails, the other handles all virtual IPs.

CARP Preemption

With net.inet.carp.preempt=1, when the original master recovers, it reclaims its CARP groups automatically. Without preemption, the backup remains master even after the original comes back online.

Additionally, preemption causes all CARP groups to fail over together when any interface on the host goes down. This prevents split-brain scenarios where some VIPs are on one node and others are on another.

Pfsync: Firewall State Synchronization

Pfsync synchronizes PF firewall state tables between CARP nodes. Without pfsync, active TCP connections break during failover because the new master has no knowledge of existing sessions. With pfsync, the backup already knows about every connection and can continue processing them seamlessly.

Pfsync Setup

Use a dedicated network link (crossover cable or dedicated VLAN) between nodes for pfsync traffic:

Node 1:

sh
# Dedicated sync interface
sysrc ifconfig_em1="inet 192.168.255.1 netmask 255.255.255.252"

# Enable pfsync
sysrc pfsync_enable="YES"
sysrc pfsync_syncdev="em1"
sysrc pfsync_syncpeer="192.168.255.2"

Node 2:

sh
sysrc ifconfig_em1="inet 192.168.255.2 netmask 255.255.255.252"
sysrc pfsync_enable="YES"
sysrc pfsync_syncdev="em1"
sysrc pfsync_syncpeer="192.168.255.1"

Configure both nodes:

sh
ifconfig pfsync0 syncdev em1 syncpeer 192.168.255.2 up

PF Configuration for Pfsync

Allow pfsync traffic on the dedicated interface:

sh
cat >> /etc/pf.conf << 'EOF'
# Pfsync traffic
pass quick on em1 proto pfsync keep state
pass quick on em1 proto carp keep state
EOF

pfctl -f /etc/pf.conf

Verifying Pfsync

sh
# Check pfsync status
ifconfig pfsync0

# Count synchronized states
pfctl -s info | grep states

# Both nodes should show similar state counts

HAST: Block-Level Replication

HAST (Highly Available STorage) provides synchronous block-level replication between two FreeBSD machines. It is similar to Linux's DRBD. One node is primary (read-write), the other is secondary (receives writes in real time). On failover, the secondary promotes to primary.

HAST Architecture

HAST operates below the filesystem layer. It replicates disk blocks, not files. This means you can use any filesystem (ZFS, UFS) on top of a HAST device. The replication is transparent to the filesystem.

HAST Setup

On both nodes:

sh
sysrc hastd_enable="YES"

Create the HAST configuration:

sh
cat > /etc/hast.conf << 'EOF'
resource shared_storage {
    on node1 {
        local /dev/gpt/hast0
        remote 192.168.255.2
    }
    on node2 {
        local /dev/gpt/hast0
        remote 192.168.255.1
    }
}
EOF

This requires a partition labeled hast0 on both nodes:

sh
gpart add -t freebsd-zfs -s 100g -l hast0 ada1

Copy the same /etc/hast.conf to both nodes.

Starting HAST

On both nodes:

sh
service hastd start

On the primary node:

sh
hastctl role primary shared_storage

On the secondary node:

sh
hastctl role secondary shared_storage

The HAST device appears as /dev/hast/shared_storage on the primary. Create a filesystem on it:

sh
# UFS
newfs -U /dev/hast/shared_storage
mount /dev/hast/shared_storage /shared

# Or ZFS
zpool create shared /dev/hast/shared_storage

HAST Failover

When the primary fails, promote the secondary:

sh
# On the secondary node
hastctl role primary shared_storage

# If using UFS
fsck -y /dev/hast/shared_storage
mount /dev/hast/shared_storage /shared

# If using ZFS
zpool import shared

HAST Replication Modes

|---|---|---|---|

Set in hast.conf:

shell
resource shared_storage {
    replication fullsync
    ...
}

For data safety, use fullsync. For performance-sensitive workloads where some data loss is acceptable, use memsync.

HAST with CARP Automation

Use devd to automate HAST role changes when CARP state changes:

sh
cat > /etc/devd/carp-hast.conf << 'EOF'
notify 100 {
    match "system" "IFNET";
    match "subsystem" "carp0";
    match "type" "LINK_UP";
    action "/usr/local/sbin/carp-hast-switch master";
};

notify 100 {
    match "system" "IFNET";
    match "subsystem" "carp0";
    match "type" "LINK_DOWN";
    action "/usr/local/sbin/carp-hast-switch slave";
};
EOF

The switch script:

sh
cat > /usr/local/sbin/carp-hast-switch << 'SCRIPT'
#!/bin/sh
case "$1" in
    master)
        logger "CARP: Promoting to master"
        hastctl role primary shared_storage
        sleep 2
        mount /dev/hast/shared_storage /shared
        # Start services that depend on shared storage
        service postgresql start
        ;;
    slave)
        logger "CARP: Demoting to slave"
        service postgresql stop
        umount -f /shared
        hastctl role secondary shared_storage
        ;;
esac
SCRIPT
chmod +x /usr/local/sbin/carp-hast-switch

service devd restart

ZFS Replication

ZFS send/recv provides asynchronous dataset replication. It is simpler than HAST, does not require dedicated hardware, and works across any network. The trade-off is that replication is periodic (not real-time), so some data loss is possible.

Basic ZFS Replication

sh
# Create initial snapshot and send to remote
zfs snapshot zroot/data@initial
zfs send zroot/data@initial | ssh backup-host zfs recv tank/replica/data

# Incremental send (only changes since last snapshot)
zfs snapshot zroot/data@2026-04-09
zfs send -i zroot/data@initial zroot/data@2026-04-09 | \
    ssh backup-host zfs recv tank/replica/data

Automated Replication Script

sh
cat > /usr/local/sbin/zfs-replicate.sh << 'SCRIPT'
#!/bin/sh
# ZFS incremental replication
DATASET="zroot/data"
REMOTE="backup-host"
REMOTE_DATASET="tank/replica/data"
SNAP_PREFIX="autorep"

# Create new snapshot
NEW_SNAP="${SNAP_PREFIX}-$(date +%Y%m%d-%H%M%S)"
zfs snapshot ${DATASET}@${NEW_SNAP}

# Find previous replication snapshot
PREV_SNAP=$(zfs list -t snapshot -o name -s creation ${DATASET} | \
    grep "@${SNAP_PREFIX}" | tail -2 | head -1 | cut -d@ -f2)

if [ -n "${PREV_SNAP}" ] && [ "${PREV_SNAP}" != "${NEW_SNAP}" ]; then
    # Incremental send
    zfs send -i ${DATASET}@${PREV_SNAP} ${DATASET}@${NEW_SNAP} | \
        ssh ${REMOTE} zfs recv -F ${REMOTE_DATASET}

    if [ $? -eq 0 ]; then
        # Clean up old snapshots (keep last 5)
        zfs list -t snapshot -o name -s creation ${DATASET} | \
            grep "@${SNAP_PREFIX}" | head -n -5 | xargs -I {} zfs destroy {}
    fi
else
    # Full send (first time)
    zfs send ${DATASET}@${NEW_SNAP} | \
        ssh ${REMOTE} zfs recv -F ${REMOTE_DATASET}
fi
SCRIPT
chmod +x /usr/local/sbin/zfs-replicate.sh

Schedule it:

sh
# Replicate every 15 minutes
echo '*/15 * * * * root /usr/local/sbin/zfs-replicate.sh' >> /etc/crontab

ZFS Replication with Failover

When the primary fails, the backup promotes the replica:

sh
# On the backup host
zfs set readonly=off tank/replica/data
zfs mount tank/replica/data
# Update CARP to attract traffic
ifconfig em0 vhid 1 advskew 0

When the primary recovers:

sh
# Reverse replication direction temporarily
# Then re-sync and restore normal roles

Load Balancer Failover with relayd

relayd is FreeBSD's built-in load balancer and application layer gateway. Combined with CARP, it provides highly available load balancing.

relayd Configuration

sh
pkg install relayd  # Or use base system relayd if available
sysrc relayd_enable="YES"

sh
cat > /usr/local/etc/relayd.conf << 'EOF'
# Macros
ext_addr = "10.0.0.100"  # CARP virtual IP
web1 = "10.0.0.10"
web2 = "10.0.0.11"
web3 = "10.0.0.12"

# Health checks
http protocol "http_check" {
    match request header append "X-Forwarded-For" value "$REMOTE_ADDR"
    match request header append "X-Forwarded-Port" value "$REMOTE_PORT"
    match response header "Content-Type" value "text/html*"
}

# Backend table
table <webhosts> {
    $web1
    $web2
    $web3
}

# Relay
relay "www" {
    listen on $ext_addr port 443 tls
    protocol "http_check"
    forward to <webhosts> port 8080 mode loadbalance check http "/" code 200
}

# Redirect (layer 3)
redirect "web_redirect" {
    listen on $ext_addr port 80
    forward to <webhosts> port 8080 check http "/" code 200
}
EOF

Start relayd:

sh
service relayd start
relayctl show summary

HA Load Balancer Pair

Deploy relayd on two nodes with CARP:

Node 1:

sh
sysrc ifconfig_em0_alias0="vhid 1 advskew 0 pass lbpass alias 10.0.0.100/32"
sysrc relayd_enable="YES"
sysrc pfsync_enable="YES"
sysrc pfsync_syncdev="em1"

Node 2:

sh
sysrc ifconfig_em0_alias0="vhid 1 advskew 100 pass lbpass alias 10.0.0.100/32"
sysrc relayd_enable="YES"
sysrc pfsync_enable="YES"
sysrc pfsync_syncdev="em1"

Both nodes run relayd with the same configuration. Only the CARP master handles traffic. On failover, pfsync ensures existing connections continue on the new master.

Complete HA Example: Database Server

A two-node PostgreSQL HA setup using CARP + HAST:

Node Configuration

sh
# Both nodes: /etc/hast.conf
resource pgdata {
    on db1 {
        local /dev/gpt/pgdata
        remote 192.168.255.2
    }
    on db2 {
        local /dev/gpt/pgdata
        remote 192.168.255.1
    }
    replication fullsync
}

Failover Script

sh
cat > /usr/local/sbin/db-failover.sh << 'SCRIPT'
#!/bin/sh
case "$1" in
    master)
        logger "DB failover: becoming master"
        hastctl role primary pgdata
        sleep 3
        fsck -y /dev/hast/pgdata
        mount /dev/hast/pgdata /var/db/postgres/data
        service postgresql start
        ;;
    slave)
        logger "DB failover: becoming slave"
        service postgresql stop
        sleep 2
        umount -f /var/db/postgres/data
        hastctl role secondary pgdata
        ;;
esac
SCRIPT
chmod +x /usr/local/sbin/db-failover.sh

Monitoring

sh
# Check CARP status
ifconfig em0 | grep carp

# Check HAST status
hastctl status

# Check pfsync
pfctl -s info | grep pfsync

# Check relayd backends
relayctl show summary

FAQ

How fast is CARP failover?

Typical CARP failover completes in 1-3 seconds. The exact time depends on the advbase and advskew settings. The default advbase is 1 second, so the backup detects master failure within 3 missed advertisements and then takes over. For TCP connections, the interruption is usually short enough that most clients simply retry.

Can I have more than two CARP nodes?

Yes. CARP supports multiple backups. Each backup has a different advskew value -- lower values mean higher priority. On master failure, the backup with the lowest advskew takes over. Three or more nodes provide additional redundancy.

Does HAST work with ZFS?

Yes, but with caveats. You can create a ZFS pool on a HAST device. However, ZFS expects to manage its own disks, and placing a ZFS pool on a HAST block device adds a layer of complexity. For ZFS-native replication, use zfs send/recv instead. Use HAST with UFS when you need synchronous block-level replication.

What is the data loss window with ZFS replication?

ZFS send/recv is asynchronous. The data loss window equals the time since the last successful replication snapshot. If you replicate every 15 minutes, you could lose up to 15 minutes of data. For zero data loss, use HAST with fullsync replication.

Can pfsync work over a WAN?

Technically yes, but it is not recommended. Pfsync sends high volumes of state updates and is designed for low-latency links. Over a WAN, the latency and potential packet loss degrade performance and reliability. Use a dedicated, low-latency link (crossover cable, dedicated VLAN, or dedicated switch port) between pfsync peers.

How do I prevent split-brain with CARP?

Split-brain occurs when both nodes think they are master. Prevent it by: using a dedicated, reliable network link between nodes for CARP and pfsync; enabling net.inet.carp.preempt so all interfaces fail over together; and using ifstated or devd to demote a node that loses its sync link.

Can relayd replace HAProxy?

For basic load balancing and health checking, yes. Relayd handles HTTP and TCP load balancing, TLS termination, and basic health checks. HAProxy offers more advanced features: sophisticated ACLs, stick tables, detailed statistics, and more load balancing algorithms. For simple setups, relayd keeps you on base-system tools. For complex routing, HAProxy is more capable.