Jul 20, 2023

NCBI’s toolkit for handling data in INSDC Sequence Read Archives

SRA tools is a toolkit for using data in the INSDC Sequence Read Archives.

SRAs operated by International Nucleotide Sequence Database Collaboration houses sequence reads and alignments generated by “next-gen” sequencers. SRA tools allows conversion of .sra files, which INSDC SRAs maintain, from/to other formats that the ‘next-gen’ sequenecers generate including

  • csfasta/csqual ABI SOLiD
  • fastq and fasta for writing
  • hdf5 PacBio, reading only
  • qseq older Illumina
  • sam writing only / bam reading only
  • sff

The toolkit uses NCBI-VDB back-end enabling seamless access to remote SRA data and local SRA files.

Checkout these related ports:
  • Wise - Intelligent algorithms for DNA searches
  • Wfa2-lib - Exact gap-affine algorithm using homology to accelerate alignment
  • Vt - Discovers short variants from Next Generation Sequencing data
  • Vsearch - Versatile open-source tool for metagenomics
  • Viennarna - Alignment tools for the structural analysis of RNA
  • Velvet - Sequence assembler for very short reads
  • Vcftools - Tools for working with VCF genomics files
  • Vcflib - C++ library and CLI tools for parsing and manipulating VCF files
  • Vcf2hap - Generate .hap file from VCF for haplohseq
  • Vcf-split - Split a multi-sample VCF into single-sample VCFs
  • Unikmer - Toolkit for nucleic acid k-mer analysis, set operations on k-mers
  • Unanimity - Pacific Biosciences consensus library and applications
  • Ugene - Integrated bioinformatics toolkit
  • Ucsc-userapps - Command line tools from the UCSC Genome Browser project
  • Trimmomatic - Flexible read trimming tool for Illumina NGS data