Best Text Processing Tools for FreeBSD in 2026

FreeBSD ships with a solid base system, but when it comes to parsing JSON API responses, converting documentation between formats, slicing through gigabytes of log files, or wrangling CSV exports, you need specialized text processing tools. This guide covers the ten best options available through the FreeBSD ports and packages collection, with install commands, practical examples, and head-to-head comparisons so you can pick the right tool for each job.

Whether you are running a production server or a local development workstation, these utilities form the backbone of any serious text processing workflow on FreeBSD. If you are still setting up your environment, start with our FreeBSD VPS setup guide before installing these packages.

1. jq -- JSON Processing on the Command Line

What it does: jq is a lightweight command-line JSON processor. It parses, filters, transforms, and formats JSON data using a concise query language. Think of it as sed for JSON.

Install:

sh
pkg install jq

Quick example:

sh
# Extract all container names from a Docker-style JSON config
cat containers.json | jq -r '.containers[].name'

# Pretty-print an API response
curl -s https://api.example.com/status | jq '.'

# Filter objects where status is "running"
cat services.json | jq '.services[] | select(.status == "running") | .name'

Best for: Parsing REST API responses, processing JSON logs (e.g., from applications using structured logging), extracting fields from configuration files, and building shell pipelines that interact with modern web services.

jq handles nested structures, array operations, string interpolation, and conditional logic. It is indispensable on any FreeBSD server that interacts with APIs or processes JSON-formatted application logs.

2. xmlstarlet -- XML Querying and Transformation

What it does: xmlstarlet is a command-line toolkit for querying, editing, validating, and transforming XML documents. It supports XPath, XSLT, and XML Schema validation -- all from a single binary.

Install:

sh
pkg install xmlstarlet

Quick example:

sh
# Extract all <title> elements from an XML feed
xmlstarlet sel -t -v "//title" -n feed.xml

# Change the value of an attribute
xmlstarlet ed -u "//server/@port" -v "8443" config.xml

# Validate against an XSD schema
xmlstarlet val -e -s schema.xsd document.xml

Best for: Parsing XML configuration files (think Apache, nginx includes, or Java application configs), processing RSS/Atom feeds, SOAP response handling, and automated XML document transformations in CI/CD pipelines.

While JSON has taken over much of the web, XML remains deeply embedded in enterprise systems, build tools, and configuration management. xmlstarlet keeps you from needing a full programming language just to edit an XML attribute.

3. pandoc -- Universal Document Converter

What it does: pandoc converts documents between dozens of markup formats: Markdown, HTML, LaTeX, DOCX, reStructuredText, EPUB, PDF (via LaTeX), man pages, and many more. It is often called the "Swiss Army knife" of document conversion.

Install:

sh
pkg install pandoc

Quick example:

sh
# Markdown to HTML
pandoc README.md -o README.html

# Markdown to PDF (requires texlive)
pandoc report.md -o report.pdf

# Convert HTML to plain text
pandoc -f html -t plain page.html

# Generate a man page from Markdown
pandoc tool-docs.md -s -t man -o tool.1

Best for: Documentation pipelines, static site generation, converting legacy documentation formats, producing PDFs from Markdown, and generating man pages from structured text. If your team writes docs in Markdown but needs to deliver HTML, PDF, and EPUB, pandoc is the single tool that handles all three.

4. ripgrep (rg) -- Blazing-Fast Recursive Search

What it does: ripgrep recursively searches directories for a regex pattern, similar to grep but dramatically faster. It respects .gitignore rules by default, uses smart case sensitivity, and supports a wide range of file encodings.

Install:

sh
pkg install ripgrep

Quick example:

sh
# Search for a function name across a project
rg "def process_request" /usr/local/www/myapp/

# Search only Python files
rg -t py "import requests"

# Count matches per file
rg -c "ERROR" /var/log/app/

# Show lines before and after matches
rg -B 2 -A 2 "panic" /var/log/messages

Best for: Searching large codebases, scanning log directories, finding configuration values across scattered files, and any scenario where standard grep feels slow. On a FreeBSD server with tens of thousands of files, ripgrep consistently finishes searches in a fraction of the time grep takes, thanks to its parallel execution and SIMD-optimized regex engine.

For servers under heavy load, see our FreeBSD performance tuning guide to make sure your I/O subsystem keeps up with ripgrep's throughput.

5. sed and awk -- Classic Stream Processing

What they do: sed (stream editor) performs text transformations on an input stream -- substitutions, deletions, insertions -- using compact one-liners. awk is a pattern-scanning and processing language that excels at column-based data extraction and reporting. Both ship with the FreeBSD base system.

Install:

shell
# Already included in FreeBSD base -- no install needed.
# GNU versions available if you prefer:
pkg install gsed gawk

Quick examples (sed):

sh
# Replace all occurrences of "http" with "https" in a config file
sed -i '' 's/http:/https:/g' config.conf

# Delete blank lines
sed '/^$/d' input.txt

# Extract lines between two markers
sed -n '/BEGIN/,/END/p' logfile.txt

Quick examples (awk):

sh
# Print the 5th column of a space-delimited file
awk '{print $5}' access.log

# Sum values in column 3
awk '{sum += $3} END {print sum}' data.txt

# Print lines where response code is 500
awk '$9 == 500' /var/log/nginx/access.log

Best for: Quick in-place edits, extracting columns from structured log files, performing arithmetic on fields, and building text transformation pipelines. sed and awk are the workhorses of Unix text processing, and every FreeBSD administrator should know them well.

6. csvkit -- CSV Processing Toolkit

What it does: csvkit is a suite of command-line tools for converting to and working with CSV files. It includes utilities for converting Excel/JSON to CSV, querying CSV with SQL, computing statistics, and more.

Install:

sh
pkg install py311-csvkit

Quick example:

sh
# View CSV in a readable table format
csvlook data.csv

# Query CSV with SQL
csvsql --query "SELECT name, revenue FROM data WHERE revenue > 10000" data.csv

# Get column statistics
csvstat data.csv

# Convert JSON to CSV
in2csv data.json > data.csv

# Cut specific columns
csvcut -c 1,3,5 data.csv

Best for: Analyzing CSV exports from databases or applications, quick ad-hoc SQL queries against flat files without loading them into a database, converting between tabular data formats, and generating summary statistics from data dumps. csvkit bridges the gap between raw CSV files and the kind of analysis you would normally need a spreadsheet or database for.

7. yq -- YAML Processing

What it does: yq is a command-line YAML processor that uses a jq-like syntax. It reads, writes, and transforms YAML documents, which makes it essential for working with Kubernetes manifests, Ansible playbooks, CI/CD configs, and any YAML-heavy workflow.

Install:

sh
pkg install yq

Quick example:

sh
# Extract a nested value
yq '.spec.containers[0].image' deployment.yaml

# Update a field in place
yq -i '.metadata.namespace = "production"' deployment.yaml

# Convert YAML to JSON
yq -o=json '.' config.yaml

# Merge two YAML files
yq eval-all 'select(fileIndex == 0) * select(fileIndex == 1)' base.yaml override.yaml

Best for: Editing Kubernetes manifests in CI/CD pipelines, updating Ansible variable files programmatically, extracting values from YAML configs in shell scripts, and converting between YAML and JSON. If you manage infrastructure as code on FreeBSD, yq is non-negotiable.

8. groff -- Text Typesetting and Formatting

What it does: groff (GNU troff) is a typesetting system that reads plain text mixed with formatting commands and produces formatted output for terminal display, PostScript, PDF, or HTML. It is the engine behind man page rendering on FreeBSD.

Install:

shell
# groff is included in the FreeBSD base system.
# For additional macro packages:
pkg install groff

Quick example:

sh
# Render a man page to terminal
groff -man -Tascii mycommand.1

# Generate a PDF from a troff document
groff -ms -Tpdf document.ms > document.pdf

# Format a man page as HTML
groff -man -Thtml mycommand.1 > mycommand.html

Best for: Producing man pages for custom tools, generating formatted reports in PostScript or PDF from structured text, and maintaining documentation in the traditional Unix troff/nroff ecosystem. groff remains the standard for Unix documentation and is how every man page on your FreeBSD system gets rendered.

9. aspell -- Spell Checking from the Command Line

What it does: aspell is an interactive spell checker that can process plain text, HTML, TeX/LaTeX, and other formats. It suggests corrections and can be used in batch mode for automated spell checking in scripts and CI pipelines.

Install:

sh
pkg install aspell en-aspell

Quick example:

sh
# Interactive spell check
aspell check document.txt

# List misspelled words (batch mode)
cat document.txt | aspell list

# Spell check only comments in a source file
aspell --mode=ccpp check program.c

# Use a custom dictionary
aspell --personal=./project-words.txt check README.md

Best for: Spell checking documentation before publishing, validating text content in CI/CD pipelines, batch-processing large numbers of text files, and maintaining consistent terminology across projects with custom dictionaries. aspell supports over 70 languages and can be integrated into Makefiles or pre-commit hooks.

10. diffutils -- Comparing Files and Directories

What it does: diffutils provides commands for comparing files line by line (diff), producing side-by-side comparisons (sdiff), combining diffs (diff3), and applying patches (patch). These are foundational tools for tracking changes and managing patches.

Install:

shell
# diff and patch are in the FreeBSD base system.
# GNU diffutils available for additional features:
pkg install diffutils

Quick example:

sh
# Standard unified diff
diff -u original.conf modified.conf

# Side-by-side comparison
sdiff original.conf modified.conf

# Generate a patch file
diff -u old/ new/ > changes.patch

# Apply a patch
patch < changes.patch

# Three-way merge
diff3 mine.txt base.txt theirs.txt

Best for: Reviewing configuration changes before deploying, generating and applying patch files, three-way merging during conflict resolution, and auditing what changed between two versions of a file or directory tree. Every sysadmin who manages FreeBSD servers by hand (rather than through automation) uses diff daily.

Comparison Table

|------------|--------------|--------------|---------|----------------|-------------|

| jq | JSON | ~1 MB | Fast | Moderate | No |

| xmlstarlet | XML | ~2 MB | Fast | Moderate | No |

| ripgrep | Plain text | ~5 MB | Very fast | Low | No |

| csvkit | CSV | ~15 MB | Moderate | Low | No |

| yq | YAML | ~5 MB | Fast | Low | No |

| aspell | Plain text | ~5 MB | Fast | Low | No |

Practical Section: Log Processing Pipelines

The real power of these tools shows up when you chain them together. Here are five real-world pipelines for FreeBSD server administration.

Pipeline 1: Extract Error Counts from JSON Logs

Many modern applications emit structured JSON logs. This pipeline extracts error-level entries and counts them by module:

sh
# Parse JSON logs, filter errors, count by module
cat /var/log/app/app.json.log \
  | jq -r 'select(.level == "error") | .module' \
  | sort | uniq -c | sort -rn | head -20

Pipeline 2: Find and Summarize Slow HTTP Requests

Parse an nginx access log to find all requests that took over 2 seconds and produce a summary:

sh
# Extract slow requests from nginx access log
awk '$NF > 2.0 {print $7, $NF}' /var/log/nginx/access.log \
  | sort -k2 -rn \
  | head -50 \
  | awk '{printf "%-60s %6.2fs\n", $1, $2}'

Pipeline 3: Convert XML Config Dump to CSV Report

Pull data from an XML configuration export and convert it to a CSV for a spreadsheet:

sh
# Extract server names and ports from XML, output as CSV
xmlstarlet sel -t -m "//server" \
  -v "concat(@name, ',', @port, ',', @status)" -n \
  servers.xml \
  | sed '1i\name,port,status' \
  | csvlook

Pipeline 4: Multi-Format Documentation Build

Convert a Markdown document to HTML, PDF, and a man page in one shot:

sh
# Build docs in three formats from a single Markdown source
SRC="docs/tool-guide.md"
pandoc "$SRC" -o docs/tool-guide.html --standalone
pandoc "$SRC" -o docs/tool-guide.pdf
pandoc "$SRC" -s -t man -o docs/tool-guide.1

Pipeline 5: YAML Config Audit Across Multiple Files

Check all Kubernetes deployment files for containers running as root:

sh
# Find deployments with runAsUser: 0 (root)
for f in $(rg -l "runAsUser" /usr/local/etc/k8s/); do
  USER=$(yq '.spec.template.spec.securityContext.runAsUser' "$f")
  if [ "$USER" = "0" ]; then
    echo "WARNING: $f runs as root"
  fi
done

Pipeline 6: Diff Config Changes and Spell-Check Release Notes

A release pipeline that audits config changes and validates docs:

sh
# Generate a diff report of config changes
diff -u /etc/base-config/ /etc/current-config/ > /tmp/config-changes.diff

# Spell check the release notes, output misspelled words
cat RELEASE_NOTES.md | aspell list --lang=en | sort -u > /tmp/typos.txt

# Report
echo "Config changes: $(grep -c '^[+-]' /tmp/config-changes.diff) lines"
echo "Spelling issues: $(wc -l < /tmp/typos.txt) words"

How to Choose the Right Tool

The decision tree is straightforward:

JSON data? Use jq. No contest.
XML data? Use xmlstarlet. Avoid writing Python scripts for simple XPath queries.
YAML data? Use yq. Same logic as jq, different format.
CSV data? Use csvkit. The SQL query feature alone justifies the install.
Searching files? Use ripgrep for speed. Fall back to grep for simple one-offs where it is already in your muscle memory.
Stream editing? Use sed for substitutions and line operations. Use awk when you need column extraction or arithmetic.
Document conversion? Use pandoc. If you are writing anything that needs to exist in more than one format, pandoc saves hours.
Spell checking? Use aspell. Integrate it into your CI pipeline for documentation quality.
Comparing files? Use diff/sdiff. They are already on your system.
Typesetting? Use groff if you are writing man pages or need traditional Unix document formatting.

Frequently Asked Questions

What text processing tools come pre-installed on FreeBSD?

FreeBSD's base system includes sed, awk, grep, diff, patch, sort, uniq, cut, tr, wc, head, tail, and groff. These cover basic text processing needs without installing any packages. For format-specific processing (JSON, XML, YAML, CSV), you need to install additional tools from the ports collection.

Is ripgrep really faster than grep on FreeBSD?

Yes, significantly. In benchmarks on typical FreeBSD servers, ripgrep searches large directory trees 5-10x faster than GNU grep. The speed advantage comes from parallel directory traversal, SIMD-accelerated regex matching, and automatic filtering of binary and ignored files. For searching a handful of small files, the difference is negligible. For scanning /var/log or a large codebase, ripgrep is the clear winner.

Can I use jq to edit JSON files in place?

jq does not support in-place editing natively. The standard pattern is to write to a temporary file and then move it:

sh
jq '.settings.debug = false' config.json > config.json.tmp && mv config.json.tmp config.json

Alternatively, use sponge from the moreutils package:

sh
pkg install moreutils
jq '.settings.debug = false' config.json | sponge config.json

How do I process both JSON and YAML in the same pipeline?

Use yq to convert YAML to JSON, then pipe to jq for processing:

sh
yq -o=json '.' config.yaml | jq '.database.host'

Or convert JSON to YAML:

sh
jq '.' data.json | yq -P

This approach lets you standardize on jq's powerful query language regardless of the input format.

What is the best way to handle large log files on FreeBSD?

For large log files (multiple gigabytes), combine these strategies:

Use ripgrep to narrow down to relevant lines first -- it handles large files efficiently.
Pipe through awk or sed for field extraction and transformation.
Use sort and uniq for aggregation.
Avoid loading the entire file into memory -- all of these tools operate as stream processors.

A typical pipeline for a 10 GB access log:

sh
rg "500|502|503" /var/log/nginx/access.log \
  | awk '{print $1}' \
  | sort | uniq -c | sort -rn | head -20

This pipeline finds all server error responses, extracts client IPs, and ranks them by frequency -- all without loading the full file into memory.

Can pandoc replace a full word processor for technical documentation?

For technical documentation, largely yes. pandoc handles Markdown-to-PDF with LaTeX quality, supports bibliographies, cross-references, code highlighting, and custom templates. It will not replace a word processor for complex layouts with precise visual positioning, but for technical reports, API docs, and manuals, it is more efficient and version-control friendly than any GUI tool.

Conclusion

FreeBSD's packaging system gives you access to a complete text processing toolkit. Start with the base system tools -- sed, awk, grep, and diff -- for everyday tasks. Add jq, yq, and xmlstarlet when you work with structured data formats. Install ripgrep the moment you find yourself searching through anything larger than a single directory. Bring in pandoc when documentation needs to ship in multiple formats, and csvkit when someone hands you a CSV file and expects answers.

The real leverage comes from combining these tools in pipelines. A single line chaining ripgrep, awk, sort, and uniq can replace a 50-line Python script, run faster, and need zero dependencies beyond what is already on your FreeBSD system. Master the pipeline patterns in this guide and you will handle any text processing task your servers throw at you.

Best Text Processing Tools for FreeBSD in 2026