Supported Applications

Software count:
Filtering is with keywords.
AppCiter will help you create a bibliography of the programs you wish to cite. See How.
AppCiter Programs:

No programs selected


Name Description Links
A5-miseq is a pipeline for assembling DNA sequence data generated on the Illumina sequencing platform. A5-miseq can produce high-quality microbial genome assemblies on a laptop computer without any parameter tuning by automating the process of adapter trimming, quality filtering, error correction, contig and scaffold generation and detection of misassemblies.
a simple transcriptome assembler based on kallisto and Cortex graphs.
an extended version of Partial Order Alignment (POA) that performs adaptive banded dynamic programming (DP) with an SIMD implementation.
mass screening of contigs for antibiotic resistance genes.
an abundance-based tool for binning metagenomic sequences.
(Another Gff Analysis Toolkit) a suite of tools to handle gene annotations in any GTF/GFF format.
(Ancestry and Kinship Toolkit) a statistical genetics tool for analysing large cohorts of whole-genome sequenced samples. It provides a handful of useful statistical genetics routines using the htslib API for input/output. This means it can seamlessly read BCF/VCF files and play nicely with bcftools.
an efficient and versatile command-line application that computes multi-sample quality control metrics in a read-group aware manner.
AlignStats produces various alignment, whole genome coverage, and capture coverage metrics for sequence alignment files in SAM, BAM, and CRAM format.
AMPtk: Amplicon tool kit for processing high throughput amplicon sequencing data.
an open-source, community-driven analysis and visualization platform for ‘omics data. Its interactive interface facilitates the management of metagenomic contigs and associated data for automatic or human-guided identification of genome bins and their curation.
ARAGORN identifies tRNA and tmRNA genes. The program employs heuristic algorithms to predict tRNA secondary structure, based on homology with recognized tRNA consensus sequences and ability to form a base‐paired cloverleaf.
(Antibiotic Resistance Identification By Assembly) a tool that identifies antibiotic resistance genes by running local assemblies. It can also be used for MLST calling.
Get assembly statistics from FASTA and FASTQ files.
trim adapters from high-throughput sequencing reads.
(Amazon Web Services Command Line Interface) a command line interface tool to manage multiple Amazon Web Services and automate them through scripts.
rapid and standardized annotation of bacterial genomes & plasmids.
bam-readcount generates metrics at single nucleotide positions.
BAMscale is a one-step tool for either 1) quantifying and normalizing the coverage of peaks or 2) generated scaled BigWig files for easy visualization of commonly used DNA-seq capture based methods.
a fast, flexible C++ API & toolkit for reading, writing, and manipulating BAM files.
a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam.
is a tool to extract paired reads in FASTQ format from coordinate sorted BAM files. Bazam is a smarter way to realign reads from one genome to another. If you've tried to use Picard SAMtoFASTQ or samtools bam2fq before and ended up unsatisfied with complicated, long running inefficient pipelines, bazam might be what you wanted. Bazam will output FASTQ in a form that can …
a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving.
is a bioinformatics tool for constructing the compacted de Bruijn graph from sequencing data.
provides best-practice pipelines for automated analysis of high throughput sequencing data with the goal of being quantifiable, analyzable, scalable and reproducible. The development process is fully open and sustained by contributors from multiple institutions. Bioinformaticians, biologists and the general public should be able to run these tools on inputs ranging from research materials to clinical samples to personal genomes.
a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.
BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale. Tasks can be easily split by chromosome for distributing whole-genome analyses across a computational cluster.
a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic. Bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), sophisticated analyses …
(Binding and Expression Target Analysis) a software package that integrates ChIP-seq of transcription factors or chromatin regulators with differential gene expression data to infer direct target genes.
is a compact file format for efficiently storing and querying whole-genome genotypes of tens to hundreds of thousands of samples. It can be considered as an alternative to genotype-only BCFv2. BGT is more compact in size, more efficient to process, and more flexible on query.
A quality assessment package for next-genomics sequencing data. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well.
an extension to Brian Kernighan's awk, with added support for several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q, and TAB-delimited formats with column names along with new built-in functions and a command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk should behave exactly like the original BWK awk.
tools for early stage NGS alignment file processing including fast sorting and duplicate marking.
subtype microbial whole-genome sequencing (WGS) data using SNV targeting k-mer subtyping schemes.
The bioinfokit toolkit aims to provide various easy-to-use functionalities to analyze, visualize, and interpret the biological data generated from genome-scale omics experiments.
BLASR (Basic Local Alignment with Successive Refinement) maps Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error.
(Basic Local Alignment Search Tool) finds regions of similarity between biological sequences.
a suite of BLAST (Basic Local Alignment Search Tool) tools that utilizes the NCBI C++ Toolkit with a number of performance and feature improvements over the legacy BLAST applications.
is a k-mer spectrum-based read error corrector, designed to correct large datasets with a very low memory footprint. It uses the disk streaming k-mer counting algorithm contained in the GATB library, and inserts solid k-mers in a bloom-filter. The correction procedure is similar to the Musket multistage approach. Bloocoo yields similar results while requiring far less memory: as an example, it can correct whole …
aka Best Match Tagger is for removing human reads from metagenomics datasets
bmtool is part of BMTagger aka Best Match Tagger, for removing human reads from metagenomics datasets.
an ultrafast, memory-efficient short read aligner for short DNA sequences (reads) from next-gen sequencers.
an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
implements a versatile high-performance version of the BPP software
(Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
a computational pipeline for finding mutations relative to a reference sequence in short-read DNA re-sequencing data for microbial sized genomes. It reports single-nucleotide mutations, point insertions and deletions, large deletions, and new junctions supported by mosaic reads.
bustools is a program for manipulating BUS files for single cell RNA-Seq datasets. It can be used to error correct barcodes, collapse UMIs, produce gene count or transcript compatbility count matrices, and is useful for many other tasks. See the kallisto | bustools website for examples and instructions on how to use bustools as part of a single-cell RNA-seq workflow.
(Burrows-Wheeler Aligner) a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.
(Concatemeric Consensus Caller with Partial Order alignments) is a computational pipeline for calling consensi on R2C2 nanopore data.
a reference-free whole-genome multiple alignment program based upon notion of Cactus graphs.
clusters paired-end reads using their barcodes and sequences.
Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing. Canu specializes in assembling PacBio or Oxford Nanopore sequences. Canu operates in three phases: correction, trimming and assembly. The correction phase will improve the accuracy of bases in reads.
clusters and compares protein or nucleotide sequences.
a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
The set of analysis pipelines in this suite perform sample demultiplexing, barcode processing, identification of open chromatin regions, and simultaneous counting of transcripts and peak accessibility in single cells.
a set of analysis pipelines that perform identification of open chromatin regions, motif annotation, and differential accessibility analysis for Single Cell ATAC data.
is a very rapid and memory-efficient system for the classification of DNA sequences from microbial samples, with better sensitivity than and comparable accuracy to other leading systems. The system uses a novel indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (e.g., 4.3 GB for ~4,100 bacterial …
Circlator is a tool to circularize genome assemblies. The input is a genome assembly in FASTA format and corrected PacBio or nanopore reads in FASTA or FASTQ format. Circlator will attempt to identify each circular sequence and output a linearised version of it. It does this by assembling all reads that map to contig ends and comparing the resulting contigs with the input assembly.
fast, accurate and versatile k-mer based classification system.
A tool to detect CLIP-seq peaks.
is the latest version of Clustal: a multiple sequence alignment program for DNA or proteins.
is a support library for a sparse, compressed, binary persistent storage format, also called cooler, used to store genomic interaction data, such as Hi-C contact matrices.
(COmpressive Read-mapping Accelerator) a compressive-acceleration tool for NGS read mapping methods.
Crass is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set.
controllable lossy compression of BAM/CRAM files.
a reference-guided assembler that assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
a fast, parallel, and very lightweight memory tool to construct the compacted de Bruijn graph from genome reference(s).
a cython wrapper around htslib built for fast parsing of Variant Call Format (VCF) files.
finds all significant local alignments between reads.
a toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing, version 2
dDocent is simple bash wrapper to QC, assemble, map, and call SNPs from almost any kind of RAD sequencing. If you have a reference already, dDocent can be used to call SNPs from almost any type of NGS data set.
a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq. deepTools contains useful modules to process the mapped reads data for multiple quality checks, creating normalized coverage files in standard bedGraph and bigWig file formats, that allow comparison between different files (for example, treatment and control). Finally, using such normalized and standardized files, …
Delly is an integrated structural variant (SV) prediction method that can discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends, split-reads and read-depth to sensitively and accurately delineate genomic rearrangements throughout the genome. Structural variants can be visualized using Delly-maze and Delly-suave.
Genetic multiplexing of barcoded single cell RNA-seq.
a Bioconductor software package installed in R 3.2.2 that estimates variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.
bax file decoder and data compressor.
a high-throughput program for aligning a file of short DNA sequencing reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity.
a suite of tools for use in genome assembly and consensus.
a python program for rapidly comparing large numbers of genomes, dRep can also "de-replicate" a genome set by identifying groups of highly similar genomes and choosing the best representative genome for each genome set.
(Detection of RNA Outlier Pipeline) pipeline to find aberrant gene expression events in RNA sequencing data.
a whole genome simulator for next-generation sequencing based off of wgsim found in SAMtools, which was written by Heng Li, and forked from DNAA. It was modified to handle ABI SOLiD and Ion Torrent data, as well as various assumptions about aligners and positions of indels. Many new features have been subsequently added.
a Bioconductor software package installed in R 3.2.2 for gene and isoform differential expression analysis of RNA-seq data.
a Bioconductor software package installed in R 3.2.2 for examining differential expression of replicated count data.
Fast genome-wide functional annotation through orthology assignment.
(EHdn) is a suite of tools for detecting novel expansions of short tandem repeats (STRs).
a high-performance tool for analyzing .sam/.bam files (up to and including variant calling) in sequencing pipelines.
EMu is a relative abundance estimator for 16S genomic sequences
a FASTQ lossless compression algorithm especially designed for nanopore sequencing FASTQ files.
is a software package for Bayesian tree inference.
a streaming tool for quantifying the abundances of a set of target sequences from sampled subsequences.
a DNA and protein sequence alignment software package that searches for matching sequence patterns or words, called k-tuples.
FastME provides distance algorithms to infer phylogenies.
Fastool is a simple and quick tool to read huge FastQ and FastA files (both normal and gzipped) and manipulate them. It makes use of the KSeq library ( for fast access to FastQ/A files.
is a tool designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance.
a quality control tool for high throughput sequence data.
fastq-scan reads a FASTQ from STDIN and outputs summary statistics (read lengths, per-read qualities, per-base qualities) in JSON format.
allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
a fast, flexible, user-friendly, cluster-friendly QTL mapper.
infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million sequences in a reasonable amount of time and memory.
an ultra-fast tool for identification of SARS-CoV-2 and other microbes from sequencing data.
a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
a set of tools to analyze genomic data with a focus on Next Generation Sequencing.
a tool for filtering long reads by quality.
(Fast Length Adjustment of SHort reads) is a very fast and accurate software tool to merge paired-end reads from next-generation sequencing experiments. FLASH is designed to merge pairs of reads when the original DNA fragments are shorter than twice the length of reads. The resulting longer reads can significantly improve genome assemblies. They can also improve transcriptome assembly when FLASH is used to merge …
fast and accurate de novo assembler for single molecule sequencing reads.
is an approximate sequence pattern matcher for FASTQ/FASTA files.
an efficient FASTQ manipulation suite.
Bayesian haplotype-based polymorphism discovery and genotyping.
Tool for plotting gene fusion events detected by various tools using Circos.
(Genome Analysis Toolkit) a software package developed to analyze high-throughput sequencing data capable of taking on projects of any size with a primary focus on variant discovery, genotyping, and data quality assurance.
a population genetics package that computes exact tests for Hardy-Weinberg equilibrium, for population differentiation and for genotypic disequilibrium among pairs of loci; computes estimates of F-statistics, null allele frequencies, allele size-based statistics for microsatellites, etc.; and performs analyses of isolation by distance from pairwise comparisons of individuals or population samples, including confidence intervals for “neighborhood size”.
compares and evaluates the accuracy of RNA-Seq transcript assemblers (Cufflinks, Stringtie), collapses (merges) duplicate transcripts from multiple GTF/GFF3 files (e.g. resulted from assembly of different samples), and classifies transcripts from one or multiple GTF/GFF3 files as they relate to reference transcripts provided in a annotation file (also in GTF/GFF3 format).
validates, filters, converts and performs various other operations on GFF files (use gffread -h to see the various usage options). Because the program shares the same GFF parser code with Cufflinks, Stringtie, and gffcompare, it could be used to verify that a GFF file from a certain annotation source is correctly "understood" by these programs. Thus the gffread utility can be used to simply …
(Generalized FOLD) a generalized fold change for ranking differentially expressed genes from RNA-seq data. GFOLD is especially useful when no replicate is available. GFOLD generalizes the fold change by considering the posterior distribution of log fold change, such that each gene is assigned a reliable fold change. It overcomes the shortcoming of p-value that measures the significance of whether a gene is differentially expressed …
a suite of motif tools, including a motif prediction pipeline for ChIP-seq experiments.
Genomic mapping and alignment program for mRNA and EST sequences.
(GNU-based Virus IDentification) a Python3 program for Gene Novelty Unit-based Virus Identification for SARS-CoV-2. It ranks CDS nucleotide sequences in a genome fna file based on the number of observed exact CDS nucleotide matches in a public or private database. It was created to type SARS-CoV-2 genomes using a whole genome multilocus sequence typing (wgMLST) approach.
a set of command line tools to manipulate multiple alignments. Implemented in Go language, Goalign aims to handle multiple alignments in Phylip, Fasta, Nexus, and Clustal formats, through several basic commands. Each command may print result (an alignment, for example) in the standard output, and thus can be piped to the standard input of the next goalign command.
provides functions for working on alignments in fasta format.
goleft is a collection of bioinformatics tools written in go distributed together as a single binary under a liberal (MIT) license. Running the binary goleft will give a list of subcommands with a short description. Running any subcommand without arguments will give a full help for that command
GraphMap is a novel mapper targeted at aligning long, error-prone third-generation sequencing data. It is designed to handle Oxford Nanopore MinION 1d and 2d reads with very high sensitivity and accuracy, and also presents a significant improvement over the state-of-the-art for PacBio read mappers.
GraphMap2 update containins tuning of alignments specific for long RNA reads. GraphMap2 is a novel mapper targeted at aligning long, error-prone third-generation sequencing data. It is designed to handle Oxford Nanopore MinION 1d and 2d reads with very high sensitivity and accuracy, and also presents a significant improvement over the state-of-the-art for PacBio read mappers.
GROOT is a tool to type Antibiotic Resistance Genes (ARGs) in metagenomic samples (a.k.a. Resistome Profiling). It combines variation graph representation of gene sets with an LSH indexing scheme to allow for fast classification of metagenomic reads. Subsequent hierarchical local alignment of classified reads against graph traversals facilitates accurate reconstruction of full-length gene sequences using a simple scoring scheme.
an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).
is a set of programs to process, normalize, analyze and visualize Hi-C and cHi-C data.
is an open-source command-line toolkit that performs restriction fragment bias-aware preprocessing of HiChIP data.
An optimized and flexible pipeline for Hi-C data processing
A tool for mapping and performing quality control on Hi-C data
HiC alignment and classification pipeline.
(Hierarchical Indexing for Spliced Alignment of Transcripts) a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome). HISAT2 is a successor to both HISAT and TopHat2.
(Hypergeometric Optimization of Motif EnRichment) a suite of sequencing analysis and sequence motif discovery tools.
a Python package that provides infrastructure to process data from high-throughput sequencing assays.
a C library for reading/writing high-throughput sequencing data.
is a quality control and processing pipeline for High Throughput Sequencing data.
(Histosketching Using Little Kmers) a tool that creates small, fixed-size sketches from streaming microbiome sequencing data, enabling rapid metagenomic dissimilarity analysis.
is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads).
(IMmunogenetic SEQuence Analysis) is a fast, PCR and sequencing error aware tool to analyze high throughput data from recombined T-cell receptor or immunoglobolin gene sequencing experiments. It derives immune repertoires from sequencing data in FASTA / FASTQ format.
a pipeline for processing inDrops sequencing data.
efficient RNA-RNA interaction prediction incorporating seeding and accessibility of interacting sites.
efficient and versatile phylogenomic software by maximum likelihood.
an efficient de novo trascriptome assembler for RNA-Seq data. It can assemble transcripts from RNA-Seq reads (in fasta format). Unlike most of de novo assembly methods that build de Bruijn graph or splicing graph by connecting k-mers which are sets of overlapping substrings generated from reads, IsoTree constructs splicing graph by connecting reads directly. For each splicing graph, IsoTree applies an iterative scheme of …
is a computational package that contains functions broadly useful for viral amplicon-based sequencing.
a one-click pipeline for processing terabase scale Hi-C datasets. Using Juicer, you can: Go from raw fastq files to Hi-C maps binned at many resolutions Automatically annotate loops and contact domains with the Juicer tools Run the pipeline in the cloud, on LSF, Univa, or SLURM, or on a single CPU Juicer creates hic files from raw (unaligned) reads derived from a Hi-C experiment.
fast and sensitive taxonomic classification for metagenomics.
a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
implements a method designed to map raw reads directly against redundant databases, in an ultra-fast manner using seed and extend. KMA is particulary good at aligning high quality reads against highly redundant databases, where unique matches often does not exist. It works for long low quality reads as well, such as those from Nanopore. Non-unique matches are resolved using the "ConClave" sorting scheme, and …
KMC—K-mer Counter is a utility designed for counting k-mers (sequences of consecutive k symbols) in a set of reads from genome sequencing projects. K-mer counting is important for many bioinformatics applications, e.g., developing de Bruijn graph assemblers. Building de Bruijn graphs is a commonly used approach for genome assembly with data from second-generation sequencer. Unfortunately, sequencing errors (frequent in practice) results in huge memory …
is a tool designed to perform quality control on metagenomic sequencing data, especially data from microbiome experiments.
a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
finds & aligns related regions of sequences. LAST is designed for moderately large data (e.g. genomes, DNA reads, proteomes).
a kmer-based error correction method for whole genome sequencing data.
is the standard tool to identify barcode and primer sequences in PacBio single-molecule sequencing data.
(Linked Read Analysis) a computational tool for removing amplification artifacts from single-cell DNA sequencing data and estimating mutation rates in single cells.
a computational algorithm and software tool for fast and accurate detection of gene fusion by long-read transcriptome sequencing
a variant calling tool for diploid genomes using long error prone reads such as Pacific Biosciences (PacBio) SMRT and Oxford Nanopore Technologies (ONT).
is a tool for digital spoligotyping of MTB strains from Illumina read data.
(Model Based Analysis of ChIP-Seq data) a novel algorithm for identifying transcript factor binding sites.
An efficient and versatile approach for short-read alignment and variant detection in high-throughput sequenced genomes.
MAPS (Model-based Analysis of PLAC-Seq data) pipeline is a a set of multiple scripts used to analyze PLAC-Seq and HiChIP data.
a set of fast and accurate sequence read classification tools designed to assign taxonomy and OTU classifications to ribosomal RNA sequences. This is done by using a reference set of full-length ribosomal RNA sequences for which known taxonomies are known, and for which a set of high quality OTU clusters has been previously generated. For each read, the best guess and corresponding confidence in …
a command line tool that is able to parse alignments in SAM format and produce a range of useful stats.
(Mapping and Assembly with Qualities) builds mapping assemblies from short reads generated by the next-generation sequencing machines.
is a fast sequence distance estimator that uses the MinHash algorithm and is designed to work with genomes and metagenomes in the form of assemblies or reads.
a tool to create consensus sequences and variant calls from nanopore sequencing data.
(Manipulation Environment for Genetic Analyses) - data-handling program for facilitating genetic linkage and association analyses.
an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph.
Nanopore modified base and sequence variant detection.
MLST (multi-locus sequence typing) is a classic technique for genotyping bacteria, widely applied for pathogen outbreak surveillance.
a modular toolkit designed for large-scale gene discovery and annotation in eukaryotic metagenomic contigs.
ATLAS - Three commands to start analysing your metagenome data
The MetaGraph framework allows for indexing and analysis of very large biological sequence collections, producing compressed indexes that can represent several petabases of input data. The indexes can be efficiently queried with any query sequence of interest.
(Metagenomic Phylogenetic Analysis) is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
MethylDackel will process a coordinate-sorted and indexed BAM or CRAM file containing some form of BS-seq alignments and extract per-base methylation metrics from them. MethylDackel requires an indexed fasta file containing the reference genome as well.
(Metagenomic Inquiry Compressive Acceleration) a family of programs for performing compressively-accelerated metagenomic sequence searches based on BLASTX and DIAMOND.
Minialign is a little bit fast and moderately accurate nucleotide sequence alignment tool designed for PacBio and Nanopore long reads. It is built on three key algorithms, minimizer-based index of the minimap overlapper, array-based seed chaining, and SIMD-parallel Smith-Waterman-Gotoh extension.
Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to …
is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥100 bp in length, ≥1 kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap …
whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (the later at the moment only CCS and error-corrected CLR reads).
(Mixture-of-Isoforms) for isoform quantitation using RNA-Seq is a probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons across samples. MISO is installed as a standalone program and as a module within python.
scan contig files against PubMLST typing schemes.
MMseqs2: ultra fast and sensitive sequence search and clustering suite
MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies. The MOB-suite is designed to be a modular set of tools for the typing and reconstruction of plasmid sequences from WGS assemblies.
fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.
a project to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community. Includes accelerated versions of DOTUR and SONS and the functionality of a number of other popular tools.
marker gene-based OTU (mOTU) profiling.
aggregates results from bioinformatics analyses across many samples into a single report.
a versatile alignment tool for DNA and protein sequences.
antibiotic resistance prediction in minutes.
RNA modification detection using Nanopore raw reads with Deep One Class classification.
Filtering and trimming of long read sequencing data.
a set of tools developed for visualization and processing of long-read sequencing data from Oxford Nanopore Technologies and Pacific Biosciences.
a standard tool to demultiplex Nanopore long read sequencing data.
Plotting tool for long read sequencing data and alignments.
software package for signal-level analysis of Oxford Nanopore sequencing data.
Ultra-fast quality control and summary reports for nanopore reads
NanoSim is a fast and scalable read simulator for Nanopore sequencing data.
calculates various statistics from a long read sequencing dataset in fastq, bam or albacore sequencing summary format.
a genomic structural variant (SV) caller that utilizes low-depth long-read sequencing such as Oxford Nanopore Technologies (ONT).
Viral genome sequence alignment tool
SARS-CoV-2 genome clade assignment, mutation calling, and sequence quality checks
real-time tracking of pathogen evolution.
(NGS Processing with Less Work) enables creation of a pipeline of work for all the first phase of NGS analysis until the point (inclusive) of annotation.
(coNvex Gap-cost alignMents for Long Reads) a long-read mapper designed to sensitively align PacBilo or Oxford Nanopore to (large) reference genomes.
Quick mining and visualization of NGS data by integrating genomic databases
the first program capable of inferring variants in a real-time, as read alignments are fed in. Ococo inputs unsorted alignments from a stream and infers single-nucleotide variants, together with a genomic consensus, using statistics stored in compact several-bit counters.
Oncofuse is a framework designed to estimate the oncogenic potential of de-novo discovered gene fusions. It uses several hallmark features and employs a bayesian classifier to provide the probability of a given gene fusion being a driver mutation.
is a simple interface to HDF5 files of the Oxford Nanopore .fast5 file format.
(Open Reading Frame - Regression Algorithm for Translational Evaluation of Ribosome-protected footprints) comprises a series of scripts for coding sequence annotation based on ribosome profiling data.
a fast, accurate and comprehensive platform for comparative genomics, OrthoFinder is accurate inference of orthogroups, orthologues, gene trees and rooted species tree made easy!
A package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood.
(Phylogenetic Assignment of Named Global Outbreak LINeages) software package for assigning SARS-CoV-2 genome sequences to global lineages.
is an implementation of the PASTA (Practical Alignment using Saté and TrAnsitivity) algorithm.
pbalign aligns PacBio reads to reference sequences, filters aligned reads according to user-specific filtering criteria, and converts the output to either the SAM format or PacBio Compare HDF5 (e.g., .cmp.h5) format. The output Compare HDF5 file will be compatible with Quiver if --forQuiver option is specified.
The pbbam software package provides components to create, query, & edit PacBio BAM files and associated indices. These components include a core C++ library, bindings for additional languages, and command-line utilities.
pbmm2 is a SMRT C++ wrapper for minimap2's C API. Its purpose is to support native PacBio in- and output, provide sets of recommended parameters, generate sorted output on-the-fly, and postprocess alignments. Sorted output can be used directly for polishing using GenomicConsensus, if BAM has been used as input to pbmm2. Benchmarks show that pbmm2 outperforms BLASR in mapped concordance, number of mapped bases, …
PacBio structural variant (SV) calling and analysis tools
a tool that takes a set of CLIP-seq peak regions and for each region, individually extracts the most likely site context (transcript or genomic).
a collection of Bayesian approaches to infer hidden determinants and their effects from gene expression profiles using factor analysis methods.
Phantompeakqualtools computes informative enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data. It can also be used to obtain robust estimates of the predominant fragment length or characteristic tag shift values in these assays.
(phasing and Allele Specific Expression from RNA-seq) performs haplotype phasing using read alignments in BAM format from both DNA and RNA based assays, and provides measures of haplotypic expression for RNA based assays.
(phy-loo-chee) is a software package that is useful for analyzing both data collected from UCE loci and also data collection from other types of loci for phylogenomic studies at the species, population, and individual levels.
a set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats.
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.