Mastering ZFS on FreeBSD: Advanced Guide
This guide goes beyond basic ZFS setup. If you can already create a pool and take a snapshot, this is where you learn the architecture decisions that separate a fragile hobbyist setup from production-grade storage. Every command is real. Every recommendation comes from operational experience.
For the basics, see our ZFS on FreeBSD setup guide. For ZFS vs UFS comparison, see ZFS vs UFS on FreeBSD. For NAS-specific builds, see FreeBSD NAS Build Guide.
Pool Design Strategies
The most important ZFS decision happens before you type a single command: how you structure your vdevs. This decision is permanent -- you cannot change the vdev layout of a pool after creation (with the exception of RAIDZ expansion in OpenZFS 2.3+).
Mirrors vs RAIDZ: The Real Tradeoff
Mirrors (2-way or 3-way):
- Best random read IOPS (reads can come from any disk in the mirror)
- Fastest resilver times (only need to copy data from the surviving mirror partner)
- 50% usable capacity (2-way) or 33% (3-way)
- Can lose N-1 disks in each mirror vdev
RAIDZ1 (single parity):
- Better capacity efficiency (lose only 1 disk per vdev to parity)
- Slower random reads (must read across the full stripe)
- Resilver time depends on the amount of data, not disk size
- Risk window during resilver -- a second disk failure kills the vdev
RAIDZ2 (double parity):
- Can survive 2 disk failures per vdev
- Slightly lower capacity than RAIDZ1
- Standard recommendation for pools with large disks (8TB+)
RAIDZ3 (triple parity):
- Can survive 3 failures per vdev
- Rarely needed outside extreme reliability requirements
Production Recommendations
sh# High-performance database server: striped mirrors zpool create dbpool \ mirror /dev/da0 /dev/da1 \ mirror /dev/da2 /dev/da3 \ mirror /dev/da4 /dev/da5 # General storage server (8+ TB disks): RAIDZ2 zpool create storage \ raidz2 /dev/da0 /dev/da1 /dev/da2 /dev/da3 /dev/da4 /dev/da5 # Archive / backup (maximize capacity): RAIDZ3 zpool create archive \ raidz3 /dev/da0 /dev/da1 /dev/da2 /dev/da3 /dev/da4 /dev/da5 /dev/da6 /dev/da7 # High-availability NAS: RAIDZ2 with hot spare zpool create nas \ raidz2 /dev/da0 /dev/da1 /dev/da2 /dev/da3 /dev/da4 /dev/da5 \ spare /dev/da6
Vdev Width Guidelines
The number of disks per vdev affects both performance and capacity efficiency:
- Mirrors: Always 2 or 3 disks
- RAIDZ1: 3-5 disks per vdev (do not exceed 5 -- resilver risk is too high with more)
- RAIDZ2: 4-8 disks per vdev
- RAIDZ3: 6-12 disks per vdev
For pools with many disks, use multiple vdevs:
sh# 12 disks as 2 RAIDZ2 vdevs (better performance than 1 wide vdev) zpool create bigpool \ raidz2 /dev/da0 /dev/da1 /dev/da2 /dev/da3 /dev/da4 /dev/da5 \ raidz2 /dev/da6 /dev/da7 /dev/da8 /dev/da9 /dev/da10 /dev/da11
L2ARC: Read Cache
L2ARC (Level 2 Adaptive Replacement Cache) extends ZFS's in-memory read cache (ARC) to a fast storage device, typically an SSD or NVMe drive.
When L2ARC Helps
- Your working set exceeds available RAM
- You serve many random reads from a dataset larger than RAM
- You have a fast SSD/NVMe available
When L2ARC Does Not Help
- Your working set fits in RAM (ARC handles it)
- Your workload is primarily sequential (streaming reads/writes)
- Your L2ARC device is not significantly faster than your pool disks
Configuration
sh# Add an L2ARC device to an existing pool zpool add tank cache /dev/nvd0 # Verify zpool status tank
Tune L2ARC behavior:
sh# Maximum L2ARC write speed (bytes/sec, default ~8MB/s) sysctl vfs.zfs.l2arc_write_max=67108864 # 64 MB/s # Maximum L2ARC write speed during boost (after reboot) sysctl vfs.zfs.l2arc_write_boost=134217728 # 128 MB/s # Feed size per interval sysctl vfs.zfs.l2arc_feed_secs=1 # Make persistent echo 'vfs.zfs.l2arc_write_max=67108864' >> /etc/sysctl.conf echo 'vfs.zfs.l2arc_write_boost=134217728' >> /etc/sysctl.conf
L2ARC Persistence
OpenZFS 2.0+ supports persistent L2ARC -- the cache survives reboots:
sh# Check if persistent L2ARC is enabled sysctl vfs.zfs.l2arc_rebuild_enabled # 1 = enabled (default on FreeBSD 14.x with OpenZFS 2.2) # After a reboot, the L2ARC contents are rebuilt from the device # instead of starting with a cold cache
Monitoring L2ARC
sh# L2ARC hit rate sysctl kstat.zfs.misc.arcstats.l2_hits sysctl kstat.zfs.misc.arcstats.l2_misses # Detailed L2ARC stats zpool iostat -v tank 5
SLOG: Write Log (ZIL)
The SLOG (Separate Log) device accelerates synchronous writes by providing a fast location for the ZFS Intent Log (ZIL).
When SLOG Helps
- Workloads with heavy synchronous writes (NFS, databases with fsync)
- iSCSI targets
- Applications that use
O_SYNCorfsync()frequently
When SLOG Does Not Help
- Asynchronous writes (most file copy operations)
- Workloads that do not call fsync
- Systems where the pool disks are already fast enough
Requirements
The SLOG device must be:
- Fast: Low latency is more important than throughput. Optane or enterprise NVMe.
- Durable: Consumer SSDs without power-loss protection can lose data. Use enterprise-grade SSDs with capacitors.
- Sized appropriately: 5-10 seconds worth of write throughput is enough. For most workloads, 16-32 GB is plenty.
sh# Add a SLOG device (mirrored for safety) zpool add tank log mirror /dev/nvd1 /dev/nvd2 # Verify zpool status tank
Always mirror the SLOG. A failed SLOG device can lose in-flight synchronous writes.
Monitoring SLOG
sh# ZIL commit statistics sysctl kstat.zfs.misc.zil # SLOG I/O zpool iostat -v tank 5
Special VDEV (Metadata)
The special vdev stores metadata, dedup tables, and small file blocks on a fast device:
sh# Add a mirrored special vdev zpool add tank special mirror /dev/nvd3 /dev/nvd4 # Direct small blocks to the special vdev zfs set special_small_blocks=32768 tank # Blocks <= 32KB go to the special vdev
This dramatically improves metadata-heavy operations (directory listings, finds, tree walks) and file access for small files.
Always mirror the special vdev. Losing an unmirrored special vdev means losing the pool.
ZFS Native Encryption
OpenZFS supports native dataset-level encryption. This is independent of GELI or any disk-level encryption.
Creating an Encrypted Dataset
sh# Create an encrypted dataset with a passphrase zfs create -o encryption=aes-256-gcm -o keyformat=passphrase tank/secure # Create with a key file dd if=/dev/random of=/root/zfs-key bs=32 count=1 zfs create -o encryption=aes-256-gcm -o keyformat=raw -o keylocation=file:///root/zfs-key tank/encrypted
Key Management
sh# Load a key (unlock a dataset after boot) zfs load-key tank/secure # Load all keys zfs load-key -a # Unload a key (lock the dataset) zfs unload-key tank/secure # Change the passphrase zfs change-key tank/secure # Change to a key file zfs change-key -o keyformat=raw -o keylocation=file:///root/newkey tank/secure
Automatic Unlock at Boot
For datasets that should be available at boot, store the key file and configure automatic loading:
sh# Store the key securely chmod 600 /root/zfs-key # In /etc/rc.conf zfs_enable="YES" # ZFS will attempt to load keys at boot for datasets with keylocation=file://
Encryption Inheritance
Child datasets inherit encryption from their parent:
sh# All datasets under tank/secure are encrypted with the same key zfs create tank/secure/documents zfs create tank/secure/photos
Performance Impact
Encryption has minimal performance impact on modern CPUs with AES-NI:
sh# Check if AES-NI is available dmesg | grep -i aes # Look for "AESNI" in the features list # Benchmark encrypted vs unencrypted # Typical overhead: 2-5% on AES-NI hardware
ZFS Delegation
ZFS delegation allows non-root users to manage specific ZFS operations on specific datasets:
sh# Allow user 'backup' to take and send snapshots of tank/data zfs allow backup snapshot,send,hold tank/data # Allow user 'devops' to create and destroy datasets under tank/vms zfs allow devops create,destroy,mount,snapshot tank/vms # View delegated permissions zfs allow tank/data # Remove permissions zfs unallow backup snapshot,send,hold tank/data
Useful delegation sets for common roles:
sh# Backup operator zfs allow backupuser snapshot,send,hold,release tank # Developer (manage datasets under a project) zfs allow devuser create,destroy,mount,snapshot,rollback,clone,promote tank/projects # VM administrator zfs allow vmadmin create,destroy,mount,snapshot,send,receive,volsize tank/vms
ZFS Channel Programs
Channel programs allow you to run Lua scripts atomically inside the ZFS kernel module. This is an advanced feature for complex operations that must be atomic.
sh# Example: atomically snapshot multiple datasets cat > /tmp/multi-snap.lua << 'LUA' args = ... snap_name = args["snap_name"] datasets = {"tank/db", "tank/app", "tank/config"} for _, ds in ipairs(datasets) do zfs.check.snapshot(ds .. "@" .. snap_name) end for _, ds in ipairs(datasets) do zfs.sync.snapshot(ds .. "@" .. snap_name) end LUA # Run the channel program zfs program tank /tmp/multi-snap.lua -- -snap_name "consistent-$(date +%Y%m%d%H%M%S)"
Channel programs are useful for:
- Consistent multi-dataset snapshots
- Conditional operations (snapshot only if a property matches)
- Complex cleanup logic that must be atomic
Production Deployment Patterns
Boot Environment Integration
sh# Create a boot environment before system updates bectl create pre-update-$(date +%Y%m%d) # List boot environments bectl list # Roll back if the update fails bectl activate pre-update-20260409 reboot
Automated Snapshots
sh# Install zfs-periodic for automatic snapshots pkg install zfs-periodic # Or use a simple cron-based approach # Add to root's crontab: crontab -e
sh# Hourly snapshots, keep 24 0 * * * * /sbin/zfs snapshot -r tank/data@auto-$(date +\%Y\%m\%d-\%H\%M) # Daily cleanup: remove snapshots older than 7 days 15 0 * * * /sbin/zfs list -H -t snapshot -o name -S creation tank/data | grep '@auto-' | tail -n +169 | xargs -n 1 /sbin/zfs destroy
Replication
sh# Initial full send to backup server zfs snapshot -r tank/data@replicate-base zfs send -Rv tank/data@replicate-base | ssh backup-server zfs receive -Fduv backup/data # Incremental replication (subsequent runs) zfs snapshot -r tank/data@replicate-$(date +%Y%m%d%H%M) zfs send -Rvi tank/data@replicate-base tank/data@replicate-$(date +%Y%m%d%H%M) | ssh backup-server zfs receive -Fduv backup/data
Monitoring Pool Health
sh# Check pool status zpool status -v # Check for errors zpool status -x # "all pools are healthy" means no issues # Scrub regularly (monthly recommended) zpool scrub tank # Monitor scrub progress zpool status tank | grep scan # Add to periodic.conf for automated scrubs echo 'daily_scrub_zfs_enable="YES"' >> /etc/periodic.conf
ARC Tuning
sh# View current ARC size sysctl kstat.zfs.misc.arcstats.size sysctl vfs.zfs.arc_max sysctl vfs.zfs.arc_min # Set maximum ARC size (e.g., 8 GB on a 16 GB system) sysctl vfs.zfs.arc_max=8589934592 # Make permanent in /boot/loader.conf echo 'vfs.zfs.arc_max="8589934592"' >> /boot/loader.conf # For database servers, reduce ARC to leave RAM for the database cache # PostgreSQL example: 16 GB total RAM, 10 GB for PostgreSQL, 4 GB for ARC echo 'vfs.zfs.arc_max="4294967296"' >> /boot/loader.conf
Dataset Properties for Production
sh# Database dataset zfs create -o recordsize=8K -o compression=lz4 -o atime=off \ -o primarycache=metadata -o logbias=throughput tank/db/postgres # Virtual machine images zfs create -o volblocksize=64K -o compression=lz4 tank/vms zfs create -V 50G tank/vms/vm1-disk # Log files zfs create -o recordsize=128K -o compression=zstd -o atime=off tank/logs # General file storage zfs create -o compression=lz4 -o atime=off -o copies=2 tank/important-files
Troubleshooting
Slow Resilver
sh# Increase resilver priority sysctl vfs.zfs.resilver_delay=0 sysctl vfs.zfs.resilver_min_time_ms=5000 # Check resilver progress zpool status tank
ARC Thrashing
If ARC hit rate is low:
sh# Check ARC stats sysctl kstat.zfs.misc.arcstats | grep -E "hits|misses|size" # Calculate hit rate # hits / (hits + misses) should be > 90%
Pool Import Issues
sh# Force import a pool that was not cleanly exported zpool import -f tank # Import a pool from specific device paths zpool import -d /dev tank # Import with a new name zpool import oldname newname
FAQ
How many disks should I put in a RAIDZ vdev?
RAIDZ1: 3-5 disks. RAIDZ2: 4-8 disks. RAIDZ3: 6-12 disks. Wider vdevs have better capacity efficiency but worse random I/O performance and longer resilver times. For most production deployments, 2 narrow RAIDZ2 vdevs outperform 1 wide vdev.
Should I use L2ARC?
Only if your working set exceeds RAM and your workload involves random reads. Adding L2ARC to a system where data fits in RAM wastes SSD writes and can actually reduce performance because L2ARC entries consume RAM for indexing.
Is ZFS native encryption as secure as GELI?
Yes. ZFS native encryption uses AES-256-GCM, which is well-audited. The advantage over GELI is that encryption is per-dataset, you can send encrypted snapshots without exposing the key, and it does not require an additional disk layer.
Can I convert a RAIDZ pool to mirrors?
Not in place. You need to create a new pool with the desired layout and migrate data. OpenZFS 2.3+ supports RAIDZ expansion (adding disks to a RAIDZ vdev) but not changing the vdev type.
How much SLOG capacity do I need?
A few seconds of write throughput. If your synchronous write rate is 500 MB/s sustained, a 16 GB SLOG handles 32 seconds of writes. In practice, 16-32 GB is more than enough for almost any workload. Latency matters far more than capacity for SLOG devices.
What is the performance overhead of ZFS compression?
LZ4 compression typically adds negligible CPU overhead and often improves performance by reducing I/O. Data that compresses well (text, logs, databases) reads faster with compression because less data moves from disk. Use LZ4 for everything unless you have a specific reason not to. Use ZSTD for archival data where higher compression ratios justify the extra CPU cost.
How do I replace a failed disk in a RAIDZ pool?
sh# Identify the failed disk zpool status tank # Replace the failed disk zpool replace tank /dev/da2 /dev/da7 # Monitor resilver progress zpool status tank