Best Text Processing Tools for FreeBSD in 2026
FreeBSD ships with a solid base system, but when it comes to parsing JSON API responses, converting documentation between formats, slicing through gigabytes of log files, or wrangling CSV exports, you need specialized text processing tools. This guide covers the ten best options available through the FreeBSD ports and packages collection, with install commands, practical examples, and head-to-head comparisons so you can pick the right tool for each job.
Whether you are running a production server or a local development workstation, these utilities form the backbone of any serious text processing workflow on FreeBSD. If you are still setting up your environment, start with our FreeBSD VPS setup guide before installing these packages.
1. jq -- JSON Processing on the Command Line
What it does: jq is a lightweight command-line JSON processor. It parses, filters, transforms, and formats JSON data using a concise query language. Think of it as sed for JSON.
Install:
shpkg install jq
Quick example:
sh# Extract all container names from a Docker-style JSON config cat containers.json | jq -r '.containers[].name' # Pretty-print an API response curl -s https://api.example.com/status | jq '.' # Filter objects where status is "running" cat services.json | jq '.services[] | select(.status == "running") | .name'
Best for: Parsing REST API responses, processing JSON logs (e.g., from applications using structured logging), extracting fields from configuration files, and building shell pipelines that interact with modern web services.
jq handles nested structures, array operations, string interpolation, and conditional logic. It is indispensable on any FreeBSD server that interacts with APIs or processes JSON-formatted application logs.
2. xmlstarlet -- XML Querying and Transformation
What it does: xmlstarlet is a command-line toolkit for querying, editing, validating, and transforming XML documents. It supports XPath, XSLT, and XML Schema validation -- all from a single binary.
Install:
shpkg install xmlstarlet
Quick example:
sh# Extract all <title> elements from an XML feed xmlstarlet sel -t -v "//title" -n feed.xml # Change the value of an attribute xmlstarlet ed -u "//server/@port" -v "8443" config.xml # Validate against an XSD schema xmlstarlet val -e -s schema.xsd document.xml
Best for: Parsing XML configuration files (think Apache, nginx includes, or Java application configs), processing RSS/Atom feeds, SOAP response handling, and automated XML document transformations in CI/CD pipelines.
While JSON has taken over much of the web, XML remains deeply embedded in enterprise systems, build tools, and configuration management. xmlstarlet keeps you from needing a full programming language just to edit an XML attribute.
3. pandoc -- Universal Document Converter
What it does: pandoc converts documents between dozens of markup formats: Markdown, HTML, LaTeX, DOCX, reStructuredText, EPUB, PDF (via LaTeX), man pages, and many more. It is often called the "Swiss Army knife" of document conversion.
Install:
shpkg install pandoc
Quick example:
sh# Markdown to HTML pandoc README.md -o README.html # Markdown to PDF (requires texlive) pandoc report.md -o report.pdf # Convert HTML to plain text pandoc -f html -t plain page.html # Generate a man page from Markdown pandoc tool-docs.md -s -t man -o tool.1
Best for: Documentation pipelines, static site generation, converting legacy documentation formats, producing PDFs from Markdown, and generating man pages from structured text. If your team writes docs in Markdown but needs to deliver HTML, PDF, and EPUB, pandoc is the single tool that handles all three.
4. ripgrep (rg) -- Blazing-Fast Recursive Search
What it does: ripgrep recursively searches directories for a regex pattern, similar to grep but dramatically faster. It respects .gitignore rules by default, uses smart case sensitivity, and supports a wide range of file encodings.
Install:
shpkg install ripgrep
Quick example:
sh# Search for a function name across a project rg "def process_request" /usr/local/www/myapp/ # Search only Python files rg -t py "import requests" # Count matches per file rg -c "ERROR" /var/log/app/ # Show lines before and after matches rg -B 2 -A 2 "panic" /var/log/messages
Best for: Searching large codebases, scanning log directories, finding configuration values across scattered files, and any scenario where standard grep feels slow. On a FreeBSD server with tens of thousands of files, ripgrep consistently finishes searches in a fraction of the time grep takes, thanks to its parallel execution and SIMD-optimized regex engine.
For servers under heavy load, see our FreeBSD performance tuning guide to make sure your I/O subsystem keeps up with ripgrep's throughput.
5. sed and awk -- Classic Stream Processing
What they do: sed (stream editor) performs text transformations on an input stream -- substitutions, deletions, insertions -- using compact one-liners. awk is a pattern-scanning and processing language that excels at column-based data extraction and reporting. Both ship with the FreeBSD base system.
Install:
shell# Already included in FreeBSD base -- no install needed. # GNU versions available if you prefer: pkg install gsed gawk
Quick examples (sed):
sh# Replace all occurrences of "http" with "https" in a config file sed -i '' 's/http:/https:/g' config.conf # Delete blank lines sed '/^$/d' input.txt # Extract lines between two markers sed -n '/BEGIN/,/END/p' logfile.txt
Quick examples (awk):
sh# Print the 5th column of a space-delimited file awk '{print $5}' access.log # Sum values in column 3 awk '{sum += $3} END {print sum}' data.txt # Print lines where response code is 500 awk '$9 == 500' /var/log/nginx/access.log
Best for: Quick in-place edits, extracting columns from structured log files, performing arithmetic on fields, and building text transformation pipelines. sed and awk are the workhorses of Unix text processing, and every FreeBSD administrator should know them well.
6. csvkit -- CSV Processing Toolkit
What it does: csvkit is a suite of command-line tools for converting to and working with CSV files. It includes utilities for converting Excel/JSON to CSV, querying CSV with SQL, computing statistics, and more.
Install:
shpkg install py311-csvkit
Quick example:
sh# View CSV in a readable table format csvlook data.csv # Query CSV with SQL csvsql --query "SELECT name, revenue FROM data WHERE revenue > 10000" data.csv # Get column statistics csvstat data.csv # Convert JSON to CSV in2csv data.json > data.csv # Cut specific columns csvcut -c 1,3,5 data.csv
Best for: Analyzing CSV exports from databases or applications, quick ad-hoc SQL queries against flat files without loading them into a database, converting between tabular data formats, and generating summary statistics from data dumps. csvkit bridges the gap between raw CSV files and the kind of analysis you would normally need a spreadsheet or database for.
7. yq -- YAML Processing
What it does: yq is a command-line YAML processor that uses a jq-like syntax. It reads, writes, and transforms YAML documents, which makes it essential for working with Kubernetes manifests, Ansible playbooks, CI/CD configs, and any YAML-heavy workflow.
Install:
shpkg install yq
Quick example:
sh# Extract a nested value yq '.spec.containers[0].image' deployment.yaml # Update a field in place yq -i '.metadata.namespace = "production"' deployment.yaml # Convert YAML to JSON yq -o=json '.' config.yaml # Merge two YAML files yq eval-all 'select(fileIndex == 0) * select(fileIndex == 1)' base.yaml override.yaml
Best for: Editing Kubernetes manifests in CI/CD pipelines, updating Ansible variable files programmatically, extracting values from YAML configs in shell scripts, and converting between YAML and JSON. If you manage infrastructure as code on FreeBSD, yq is non-negotiable.
8. groff -- Text Typesetting and Formatting
What it does: groff (GNU troff) is a typesetting system that reads plain text mixed with formatting commands and produces formatted output for terminal display, PostScript, PDF, or HTML. It is the engine behind man page rendering on FreeBSD.
Install:
shell# groff is included in the FreeBSD base system. # For additional macro packages: pkg install groff
Quick example:
sh# Render a man page to terminal groff -man -Tascii mycommand.1 # Generate a PDF from a troff document groff -ms -Tpdf document.ms > document.pdf # Format a man page as HTML groff -man -Thtml mycommand.1 > mycommand.html
Best for: Producing man pages for custom tools, generating formatted reports in PostScript or PDF from structured text, and maintaining documentation in the traditional Unix troff/nroff ecosystem. groff remains the standard for Unix documentation and is how every man page on your FreeBSD system gets rendered.
9. aspell -- Spell Checking from the Command Line
What it does: aspell is an interactive spell checker that can process plain text, HTML, TeX/LaTeX, and other formats. It suggests corrections and can be used in batch mode for automated spell checking in scripts and CI pipelines.
Install:
shpkg install aspell en-aspell
Quick example:
sh# Interactive spell check aspell check document.txt # List misspelled words (batch mode) cat document.txt | aspell list # Spell check only comments in a source file aspell --mode=ccpp check program.c # Use a custom dictionary aspell --personal=./project-words.txt check README.md
Best for: Spell checking documentation before publishing, validating text content in CI/CD pipelines, batch-processing large numbers of text files, and maintaining consistent terminology across projects with custom dictionaries. aspell supports over 70 languages and can be integrated into Makefiles or pre-commit hooks.
10. diffutils -- Comparing Files and Directories
What it does: diffutils provides commands for comparing files line by line (diff), producing side-by-side comparisons (sdiff), combining diffs (diff3), and applying patches (patch). These are foundational tools for tracking changes and managing patches.
Install:
shell# diff and patch are in the FreeBSD base system. # GNU diffutils available for additional features: pkg install diffutils
Quick example:
sh# Standard unified diff diff -u original.conf modified.conf # Side-by-side comparison sdiff original.conf modified.conf # Generate a patch file diff -u old/ new/ > changes.patch # Apply a patch patch < changes.patch # Three-way merge diff3 mine.txt base.txt theirs.txt
Best for: Reviewing configuration changes before deploying, generating and applying patch files, three-way merging during conflict resolution, and auditing what changed between two versions of a file or directory tree. Every sysadmin who manages FreeBSD servers by hand (rather than through automation) uses diff daily.
Comparison Table
| Tool | Format | Install Size | Speed | Learning Curve | Base System |
|------------|--------------|--------------|---------|----------------|-------------|
| jq | JSON | ~1 MB | Fast | Moderate | No |
| xmlstarlet | XML | ~2 MB | Fast | Moderate | No |
| pandoc | Multi-format | ~60 MB | Moderate | Low-Moderate | No |
| ripgrep | Plain text | ~5 MB | Very fast | Low | No |
| sed | Plain text | Minimal | Fast | Moderate | Yes |
| awk | Plain text | Minimal | Fast | Moderate | Yes |
| csvkit | CSV | ~15 MB | Moderate | Low | No |
| yq | YAML | ~5 MB | Fast | Low | No |
| groff | Troff markup | ~15 MB | Fast | High | Yes (base) |
| aspell | Plain text | ~5 MB | Fast | Low | No |
| diffutils | Plain text | Minimal | Fast | Low | Yes |
Practical Section: Log Processing Pipelines
The real power of these tools shows up when you chain them together. Here are five real-world pipelines for FreeBSD server administration.
Pipeline 1: Extract Error Counts from JSON Logs
Many modern applications emit structured JSON logs. This pipeline extracts error-level entries and counts them by module:
sh# Parse JSON logs, filter errors, count by module cat /var/log/app/app.json.log \ | jq -r 'select(.level == "error") | .module' \ | sort | uniq -c | sort -rn | head -20
Pipeline 2: Find and Summarize Slow HTTP Requests
Parse an nginx access log to find all requests that took over 2 seconds and produce a summary:
sh# Extract slow requests from nginx access log awk '$NF > 2.0 {print $7, $NF}' /var/log/nginx/access.log \ | sort -k2 -rn \ | head -50 \ | awk '{printf "%-60s %6.2fs\n", $1, $2}'
Pipeline 3: Convert XML Config Dump to CSV Report
Pull data from an XML configuration export and convert it to a CSV for a spreadsheet:
sh# Extract server names and ports from XML, output as CSV xmlstarlet sel -t -m "//server" \ -v "concat(@name, ',', @port, ',', @status)" -n \ servers.xml \ | sed '1i\name,port,status' \ | csvlook
Pipeline 4: Multi-Format Documentation Build
Convert a Markdown document to HTML, PDF, and a man page in one shot:
sh# Build docs in three formats from a single Markdown source SRC="docs/tool-guide.md" pandoc "$SRC" -o docs/tool-guide.html --standalone pandoc "$SRC" -o docs/tool-guide.pdf pandoc "$SRC" -s -t man -o docs/tool-guide.1
Pipeline 5: YAML Config Audit Across Multiple Files
Check all Kubernetes deployment files for containers running as root:
sh# Find deployments with runAsUser: 0 (root) for f in $(rg -l "runAsUser" /usr/local/etc/k8s/); do USER=$(yq '.spec.template.spec.securityContext.runAsUser' "$f") if [ "$USER" = "0" ]; then echo "WARNING: $f runs as root" fi done
Pipeline 6: Diff Config Changes and Spell-Check Release Notes
A release pipeline that audits config changes and validates docs:
sh# Generate a diff report of config changes diff -u /etc/base-config/ /etc/current-config/ > /tmp/config-changes.diff # Spell check the release notes, output misspelled words cat RELEASE_NOTES.md | aspell list --lang=en | sort -u > /tmp/typos.txt # Report echo "Config changes: $(grep -c '^[+-]' /tmp/config-changes.diff) lines" echo "Spelling issues: $(wc -l < /tmp/typos.txt) words"
How to Choose the Right Tool
The decision tree is straightforward:
- JSON data? Use jq. No contest.
- XML data? Use xmlstarlet. Avoid writing Python scripts for simple XPath queries.
- YAML data? Use yq. Same logic as jq, different format.
- CSV data? Use csvkit. The SQL query feature alone justifies the install.
- Searching files? Use ripgrep for speed. Fall back to grep for simple one-offs where it is already in your muscle memory.
- Stream editing? Use sed for substitutions and line operations. Use awk when you need column extraction or arithmetic.
- Document conversion? Use pandoc. If you are writing anything that needs to exist in more than one format, pandoc saves hours.
- Spell checking? Use aspell. Integrate it into your CI pipeline for documentation quality.
- Comparing files? Use diff/sdiff. They are already on your system.
- Typesetting? Use groff if you are writing man pages or need traditional Unix document formatting.
Frequently Asked Questions
What text processing tools come pre-installed on FreeBSD?
FreeBSD's base system includes sed, awk, grep, diff, patch, sort, uniq, cut, tr, wc, head, tail, and groff. These cover basic text processing needs without installing any packages. For format-specific processing (JSON, XML, YAML, CSV), you need to install additional tools from the ports collection.
Is ripgrep really faster than grep on FreeBSD?
Yes, significantly. In benchmarks on typical FreeBSD servers, ripgrep searches large directory trees 5-10x faster than GNU grep. The speed advantage comes from parallel directory traversal, SIMD-accelerated regex matching, and automatic filtering of binary and ignored files. For searching a handful of small files, the difference is negligible. For scanning /var/log or a large codebase, ripgrep is the clear winner.
Can I use jq to edit JSON files in place?
jq does not support in-place editing natively. The standard pattern is to write to a temporary file and then move it:
shjq '.settings.debug = false' config.json > config.json.tmp && mv config.json.tmp config.json
Alternatively, use sponge from the moreutils package:
shpkg install moreutils jq '.settings.debug = false' config.json | sponge config.json
How do I process both JSON and YAML in the same pipeline?
Use yq to convert YAML to JSON, then pipe to jq for processing:
shyq -o=json '.' config.yaml | jq '.database.host'
Or convert JSON to YAML:
shjq '.' data.json | yq -P
This approach lets you standardize on jq's powerful query language regardless of the input format.
What is the best way to handle large log files on FreeBSD?
For large log files (multiple gigabytes), combine these strategies:
- Use ripgrep to narrow down to relevant lines first -- it handles large files efficiently.
- Pipe through awk or sed for field extraction and transformation.
- Use sort and uniq for aggregation.
- Avoid loading the entire file into memory -- all of these tools operate as stream processors.
A typical pipeline for a 10 GB access log:
shrg "500|502|503" /var/log/nginx/access.log \ | awk '{print $1}' \ | sort | uniq -c | sort -rn | head -20
This pipeline finds all server error responses, extracts client IPs, and ranks them by frequency -- all without loading the full file into memory.
Can pandoc replace a full word processor for technical documentation?
For technical documentation, largely yes. pandoc handles Markdown-to-PDF with LaTeX quality, supports bibliographies, cross-references, code highlighting, and custom templates. It will not replace a word processor for complex layouts with precise visual positioning, but for technical reports, API docs, and manuals, it is more efficient and version-control friendly than any GUI tool.
Conclusion
FreeBSD's packaging system gives you access to a complete text processing toolkit. Start with the base system tools -- sed, awk, grep, and diff -- for everyday tasks. Add jq, yq, and xmlstarlet when you work with structured data formats. Install ripgrep the moment you find yourself searching through anything larger than a single directory. Bring in pandoc when documentation needs to ship in multiple formats, and csvkit when someone hands you a CSV file and expects answers.
The real leverage comes from combining these tools in pipelines. A single line chaining ripgrep, awk, sort, and uniq can replace a 50-line Python script, run faster, and need zero dependencies beyond what is already on your FreeBSD system. Master the pipeline patterns in this guide and you will handle any text processing task your servers throw at you.