Jul 20, 2023

Cross-platform and ultrafast toolkit for FASTA/Q file manipulation

FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly.

SeqKit is a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OS X, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations.

Checkout these related ports:
  • Wise - Intelligent algorithms for DNA searches
  • Wfa2-lib - Exact gap-affine algorithm using homology to accelerate alignment
  • Vt - Discovers short variants from Next Generation Sequencing data
  • Vsearch - Versatile open-source tool for metagenomics
  • Viennarna - Alignment tools for the structural analysis of RNA
  • Velvet - Sequence assembler for very short reads
  • Vcftools - Tools for working with VCF genomics files
  • Vcflib - C++ library and CLI tools for parsing and manipulating VCF files
  • Vcf2hap - Generate .hap file from VCF for haplohseq
  • Vcf-split - Split a multi-sample VCF into single-sample VCFs
  • Unikmer - Toolkit for nucleic acid k-mer analysis, set operations on k-mers
  • Unanimity - Pacific Biosciences consensus library and applications
  • Ugene - Integrated bioinformatics toolkit
  • Ucsc-userapps - Command line tools from the UCSC Genome Browser project
  • Trimmomatic - Flexible read trimming tool for Illumina NGS data