Mastering ZFS on FreeBSD: Advanced Guide

This guide goes beyond basic ZFS setup. If you can already create a pool and take a snapshot, this is where you learn the architecture decisions that separate a fragile hobbyist setup from production-grade storage. Every command is real. Every recommendation comes from operational experience.

For the basics, see our ZFS on FreeBSD setup guide. For ZFS vs UFS comparison, see ZFS vs UFS on FreeBSD. For NAS-specific builds, see FreeBSD NAS Build Guide.

Pool Design Strategies

The most important ZFS decision happens before you type a single command: how you structure your vdevs. This decision is permanent -- you cannot change the vdev layout of a pool after creation (with the exception of RAIDZ expansion in OpenZFS 2.3+).

Mirrors vs RAIDZ: The Real Tradeoff

Mirrors (2-way or 3-way):

Best random read IOPS (reads can come from any disk in the mirror)
Fastest resilver times (only need to copy data from the surviving mirror partner)
50% usable capacity (2-way) or 33% (3-way)
Can lose N-1 disks in each mirror vdev

RAIDZ1 (single parity):

Better capacity efficiency (lose only 1 disk per vdev to parity)
Slower random reads (must read across the full stripe)
Resilver time depends on the amount of data, not disk size
Risk window during resilver -- a second disk failure kills the vdev

RAIDZ2 (double parity):

Can survive 2 disk failures per vdev
Slightly lower capacity than RAIDZ1
Standard recommendation for pools with large disks (8TB+)

RAIDZ3 (triple parity):

Can survive 3 failures per vdev
Rarely needed outside extreme reliability requirements

Production Recommendations

sh
# High-performance database server: striped mirrors
zpool create dbpool \
    mirror /dev/da0 /dev/da1 \
    mirror /dev/da2 /dev/da3 \
    mirror /dev/da4 /dev/da5

# General storage server (8+ TB disks): RAIDZ2
zpool create storage \
    raidz2 /dev/da0 /dev/da1 /dev/da2 /dev/da3 /dev/da4 /dev/da5

# Archive / backup (maximize capacity): RAIDZ3
zpool create archive \
    raidz3 /dev/da0 /dev/da1 /dev/da2 /dev/da3 /dev/da4 /dev/da5 /dev/da6 /dev/da7

# High-availability NAS: RAIDZ2 with hot spare
zpool create nas \
    raidz2 /dev/da0 /dev/da1 /dev/da2 /dev/da3 /dev/da4 /dev/da5 \
    spare /dev/da6

Vdev Width Guidelines

The number of disks per vdev affects both performance and capacity efficiency:

Mirrors: Always 2 or 3 disks
RAIDZ1: 3-5 disks per vdev (do not exceed 5 -- resilver risk is too high with more)
RAIDZ2: 4-8 disks per vdev
RAIDZ3: 6-12 disks per vdev

For pools with many disks, use multiple vdevs:

sh
# 12 disks as 2 RAIDZ2 vdevs (better performance than 1 wide vdev)
zpool create bigpool \
    raidz2 /dev/da0 /dev/da1 /dev/da2 /dev/da3 /dev/da4 /dev/da5 \
    raidz2 /dev/da6 /dev/da7 /dev/da8 /dev/da9 /dev/da10 /dev/da11

L2ARC: Read Cache

L2ARC (Level 2 Adaptive Replacement Cache) extends ZFS's in-memory read cache (ARC) to a fast storage device, typically an SSD or NVMe drive.

When L2ARC Helps

Your working set exceeds available RAM
You serve many random reads from a dataset larger than RAM
You have a fast SSD/NVMe available

When L2ARC Does Not Help

Your working set fits in RAM (ARC handles it)
Your workload is primarily sequential (streaming reads/writes)
Your L2ARC device is not significantly faster than your pool disks

Configuration

sh
# Add an L2ARC device to an existing pool
zpool add tank cache /dev/nvd0

# Verify
zpool status tank

Tune L2ARC behavior:

sh
# Maximum L2ARC write speed (bytes/sec, default ~8MB/s)
sysctl vfs.zfs.l2arc_write_max=67108864  # 64 MB/s

# Maximum L2ARC write speed during boost (after reboot)
sysctl vfs.zfs.l2arc_write_boost=134217728  # 128 MB/s

# Feed size per interval
sysctl vfs.zfs.l2arc_feed_secs=1

# Make persistent
echo 'vfs.zfs.l2arc_write_max=67108864' >> /etc/sysctl.conf
echo 'vfs.zfs.l2arc_write_boost=134217728' >> /etc/sysctl.conf

L2ARC Persistence

OpenZFS 2.0+ supports persistent L2ARC -- the cache survives reboots:

sh
# Check if persistent L2ARC is enabled
sysctl vfs.zfs.l2arc_rebuild_enabled
# 1 = enabled (default on FreeBSD 14.x with OpenZFS 2.2)

# After a reboot, the L2ARC contents are rebuilt from the device
# instead of starting with a cold cache

Monitoring L2ARC

sh
# L2ARC hit rate
sysctl kstat.zfs.misc.arcstats.l2_hits
sysctl kstat.zfs.misc.arcstats.l2_misses

# Detailed L2ARC stats
zpool iostat -v tank 5

SLOG: Write Log (ZIL)

The SLOG (Separate Log) device accelerates synchronous writes by providing a fast location for the ZFS Intent Log (ZIL).

When SLOG Helps

Workloads with heavy synchronous writes (NFS, databases with fsync)
iSCSI targets
Applications that use O_SYNC or fsync() frequently

When SLOG Does Not Help

Asynchronous writes (most file copy operations)
Workloads that do not call fsync
Systems where the pool disks are already fast enough

Requirements

The SLOG device must be:

Fast: Low latency is more important than throughput. Optane or enterprise NVMe.
Durable: Consumer SSDs without power-loss protection can lose data. Use enterprise-grade SSDs with capacitors.
Sized appropriately: 5-10 seconds worth of write throughput is enough. For most workloads, 16-32 GB is plenty.

sh
# Add a SLOG device (mirrored for safety)
zpool add tank log mirror /dev/nvd1 /dev/nvd2

# Verify
zpool status tank

Always mirror the SLOG. A failed SLOG device can lose in-flight synchronous writes.

Monitoring SLOG

sh
# ZIL commit statistics
sysctl kstat.zfs.misc.zil

# SLOG I/O
zpool iostat -v tank 5

Special VDEV (Metadata)

The special vdev stores metadata, dedup tables, and small file blocks on a fast device:

sh
# Add a mirrored special vdev
zpool add tank special mirror /dev/nvd3 /dev/nvd4

# Direct small blocks to the special vdev
zfs set special_small_blocks=32768 tank
# Blocks <= 32KB go to the special vdev

This dramatically improves metadata-heavy operations (directory listings, finds, tree walks) and file access for small files.

Always mirror the special vdev. Losing an unmirrored special vdev means losing the pool.

ZFS Native Encryption

OpenZFS supports native dataset-level encryption. This is independent of GELI or any disk-level encryption.

Creating an Encrypted Dataset

sh
# Create an encrypted dataset with a passphrase
zfs create -o encryption=aes-256-gcm -o keyformat=passphrase tank/secure

# Create with a key file
dd if=/dev/random of=/root/zfs-key bs=32 count=1
zfs create -o encryption=aes-256-gcm -o keyformat=raw -o keylocation=file:///root/zfs-key tank/encrypted

Key Management

sh
# Load a key (unlock a dataset after boot)
zfs load-key tank/secure

# Load all keys
zfs load-key -a

# Unload a key (lock the dataset)
zfs unload-key tank/secure

# Change the passphrase
zfs change-key tank/secure

# Change to a key file
zfs change-key -o keyformat=raw -o keylocation=file:///root/newkey tank/secure

Automatic Unlock at Boot

For datasets that should be available at boot, store the key file and configure automatic loading:

sh
# Store the key securely
chmod 600 /root/zfs-key

# In /etc/rc.conf
zfs_enable="YES"

# ZFS will attempt to load keys at boot for datasets with keylocation=file://

Encryption Inheritance

Child datasets inherit encryption from their parent:

sh
# All datasets under tank/secure are encrypted with the same key
zfs create tank/secure/documents
zfs create tank/secure/photos

Performance Impact

Encryption has minimal performance impact on modern CPUs with AES-NI:

sh
# Check if AES-NI is available
dmesg | grep -i aes
# Look for "AESNI" in the features list

# Benchmark encrypted vs unencrypted
# Typical overhead: 2-5% on AES-NI hardware

ZFS Delegation

ZFS delegation allows non-root users to manage specific ZFS operations on specific datasets:

sh
# Allow user 'backup' to take and send snapshots of tank/data
zfs allow backup snapshot,send,hold tank/data

# Allow user 'devops' to create and destroy datasets under tank/vms
zfs allow devops create,destroy,mount,snapshot tank/vms

# View delegated permissions
zfs allow tank/data

# Remove permissions
zfs unallow backup snapshot,send,hold tank/data

Useful delegation sets for common roles:

sh
# Backup operator
zfs allow backupuser snapshot,send,hold,release tank

# Developer (manage datasets under a project)
zfs allow devuser create,destroy,mount,snapshot,rollback,clone,promote tank/projects

# VM administrator
zfs allow vmadmin create,destroy,mount,snapshot,send,receive,volsize tank/vms

ZFS Channel Programs

Channel programs allow you to run Lua scripts atomically inside the ZFS kernel module. This is an advanced feature for complex operations that must be atomic.

sh
# Example: atomically snapshot multiple datasets
cat > /tmp/multi-snap.lua << 'LUA'
args = ...
snap_name = args["snap_name"]
datasets = {"tank/db", "tank/app", "tank/config"}
for _, ds in ipairs(datasets) do
    zfs.check.snapshot(ds .. "@" .. snap_name)
end
for _, ds in ipairs(datasets) do
    zfs.sync.snapshot(ds .. "@" .. snap_name)
end
LUA

# Run the channel program
zfs program tank /tmp/multi-snap.lua -- -snap_name "consistent-$(date +%Y%m%d%H%M%S)"

Channel programs are useful for:

Consistent multi-dataset snapshots
Conditional operations (snapshot only if a property matches)
Complex cleanup logic that must be atomic

Production Deployment Patterns

Boot Environment Integration

sh
# Create a boot environment before system updates
bectl create pre-update-$(date +%Y%m%d)

# List boot environments
bectl list

# Roll back if the update fails
bectl activate pre-update-20260409
reboot

Automated Snapshots

sh
# Install zfs-periodic for automatic snapshots
pkg install zfs-periodic

# Or use a simple cron-based approach
# Add to root's crontab:
crontab -e

sh
# Hourly snapshots, keep 24
0 * * * * /sbin/zfs snapshot -r tank/data@auto-$(date +\%Y\%m\%d-\%H\%M)

# Daily cleanup: remove snapshots older than 7 days
15 0 * * * /sbin/zfs list -H -t snapshot -o name -S creation tank/data | grep '@auto-' | tail -n +169 | xargs -n 1 /sbin/zfs destroy

Replication

sh
# Initial full send to backup server
zfs snapshot -r tank/data@replicate-base
zfs send -Rv tank/data@replicate-base | ssh backup-server zfs receive -Fduv backup/data

# Incremental replication (subsequent runs)
zfs snapshot -r tank/data@replicate-$(date +%Y%m%d%H%M)
zfs send -Rvi tank/data@replicate-base tank/data@replicate-$(date +%Y%m%d%H%M) | ssh backup-server zfs receive -Fduv backup/data

Monitoring Pool Health

sh
# Check pool status
zpool status -v

# Check for errors
zpool status -x
# "all pools are healthy" means no issues

# Scrub regularly (monthly recommended)
zpool scrub tank

# Monitor scrub progress
zpool status tank | grep scan

# Add to periodic.conf for automated scrubs
echo 'daily_scrub_zfs_enable="YES"' >> /etc/periodic.conf

ARC Tuning

sh
# View current ARC size
sysctl kstat.zfs.misc.arcstats.size
sysctl vfs.zfs.arc_max
sysctl vfs.zfs.arc_min

# Set maximum ARC size (e.g., 8 GB on a 16 GB system)
sysctl vfs.zfs.arc_max=8589934592

# Make permanent in /boot/loader.conf
echo 'vfs.zfs.arc_max="8589934592"' >> /boot/loader.conf

# For database servers, reduce ARC to leave RAM for the database cache
# PostgreSQL example: 16 GB total RAM, 10 GB for PostgreSQL, 4 GB for ARC
echo 'vfs.zfs.arc_max="4294967296"' >> /boot/loader.conf

Dataset Properties for Production

sh
# Database dataset
zfs create -o recordsize=8K -o compression=lz4 -o atime=off \
    -o primarycache=metadata -o logbias=throughput tank/db/postgres

# Virtual machine images
zfs create -o volblocksize=64K -o compression=lz4 tank/vms
zfs create -V 50G tank/vms/vm1-disk

# Log files
zfs create -o recordsize=128K -o compression=zstd -o atime=off tank/logs

# General file storage
zfs create -o compression=lz4 -o atime=off -o copies=2 tank/important-files

Troubleshooting

Slow Resilver

sh
# Increase resilver priority
sysctl vfs.zfs.resilver_delay=0
sysctl vfs.zfs.resilver_min_time_ms=5000

# Check resilver progress
zpool status tank

ARC Thrashing

If ARC hit rate is low:

sh
# Check ARC stats
sysctl kstat.zfs.misc.arcstats | grep -E "hits|misses|size"

# Calculate hit rate
# hits / (hits + misses) should be > 90%

Pool Import Issues

sh
# Force import a pool that was not cleanly exported
zpool import -f tank

# Import a pool from specific device paths
zpool import -d /dev tank

# Import with a new name
zpool import oldname newname

FAQ

How many disks should I put in a RAIDZ vdev?

RAIDZ1: 3-5 disks. RAIDZ2: 4-8 disks. RAIDZ3: 6-12 disks. Wider vdevs have better capacity efficiency but worse random I/O performance and longer resilver times. For most production deployments, 2 narrow RAIDZ2 vdevs outperform 1 wide vdev.

Should I use L2ARC?

Only if your working set exceeds RAM and your workload involves random reads. Adding L2ARC to a system where data fits in RAM wastes SSD writes and can actually reduce performance because L2ARC entries consume RAM for indexing.

Is ZFS native encryption as secure as GELI?

Yes. ZFS native encryption uses AES-256-GCM, which is well-audited. The advantage over GELI is that encryption is per-dataset, you can send encrypted snapshots without exposing the key, and it does not require an additional disk layer.

Can I convert a RAIDZ pool to mirrors?

Not in place. You need to create a new pool with the desired layout and migrate data. OpenZFS 2.3+ supports RAIDZ expansion (adding disks to a RAIDZ vdev) but not changing the vdev type.

How much SLOG capacity do I need?

A few seconds of write throughput. If your synchronous write rate is 500 MB/s sustained, a 16 GB SLOG handles 32 seconds of writes. In practice, 16-32 GB is more than enough for almost any workload. Latency matters far more than capacity for SLOG devices.

What is the performance overhead of ZFS compression?

LZ4 compression typically adds negligible CPU overhead and often improves performance by reducing I/O. Data that compresses well (text, logs, databases) reads faster with compression because less data moves from disk. Use LZ4 for everything unless you have a specific reason not to. Use ZSTD for archival data where higher compression ratios justify the extra CPU cost.

How do I replace a failed disk in a RAIDZ pool?

sh
# Identify the failed disk
zpool status tank

# Replace the failed disk
zpool replace tank /dev/da2 /dev/da7

# Monitor resilver progress
zpool status tank