Button for mobile navigation

Supported Applications

Software count:
Filtering is with keywords.
AppCiter will help you create a bibliography of the programs you wish to cite. See How.
AppCiter Programs:

No programs selected


Name Description Links
A5-miseq is a pipeline for assembling DNA sequence data generated on the Illumina sequencing platform. A5-miseq can produce high-quality microbial genome assemblies on a laptop computer without any parameter tuning by automating the process of adapter trimming, quality filtering, error correction, contig and scaffold generation and detection of misassemblies.
(Ancestry and Kinship Toolkit) a statistical genetics tool for analysing large cohorts of whole-genome sequenced samples. It provides a handful of useful statistical genetics routines using the htslib API for input/output. This means it can seamlessly read BCF/VCF files and play nicely with bcftools.
estimates the evolutionary distance between closely related genomes. These distances can be used to rapidly infer phylogenies for big sets of genomes. Because andi does not compute full alignments, it is so efficient that it scales even up to thousands of bacterial genomes.
a command-line genome browser running from terminal window and solely based on ASCII characters.
a gene prediction program for eukaryotes that can be used as an ab initio program, which means it bases its prediction purely on the sequence.
A universal protein model for prokaryotic gene prediction
(BAsic Rapid Ribosomal RNA Predictor) predicts the location of ribosomal RNA genes in genomes (bacteria, archaea, metazoan mitochondria and eukaryotes).
Prioritize small variants, structural variants and coverage based on biological inputs. The goal is to use pre-existing knowledge of relevant genes, domains and pathways involved with a disease to extract the most interesting signal from a set of high quality small or structural variant calls. Given information on coverage, it will be able to identify poorly covered regions in potential genes of interest.
bcbio-variation is a toolkit to analyze genome variation data, built on top of the Genome Analysis Toolkit (GATK) with Clojure. It supports scoring for the Archon Genomics X PRIZE competition and is also a general framework for variant file comparison. It enables validation of variants and exploration of algorithm differences between calling methods by automating the process involved with comparing two sets of variants. …
Parallel merging, squaring off and ensemble calling for genomic variants. Provide a general framework meant to combine multiple variant calls, either from single individuals, batched family calls, or multiple approaches on the same sample. Splits inputs based on shared genomic regions without variants, allowing independent processing of smaller regions with variant calls.
is a software package for phasing genotypes and imputing ungenotyped markers.
is a cross-platform program for Bayesian analysis of molecular sequences using MCMC.
BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale. Tasks can be easily split by chromosome for distributing whole-genome analyses across a computational cluster.
a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic. Bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), sophisticated analyses …
an extension to Brian Kernighan's awk, with added support for several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q, and TAB-delimited formats with column names along with new built-in functions and a command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk should behave exactly like the original BWK awk.
a set of tools for the time-efficient analysis of Bisulfite-Seq (BS-Seq) data. Bismark performs alignments of bisulfite-treated reads to a reference genome and cytosine methylation calls at the same time.
a Perl/Cpp package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads. It includes two complementary programs.
Bam and Variant Analysis Tools
a tool for calling copy number variants (CNVs) from human DNA sequencing data.
CEFCIG (Computational Epigenetic Framework for Cell Identity Gene Discovery)
a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis.
Efficient genotyping bi-allelic SNPs on single cells
an interactive explorer for single-cell transcriptomics data
is a very rapid and memory-efficient system for the classification of DNA sequences from microbial samples, with better sensitivity than and comparable accuracy to other leading systems. The system uses a novel indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (e.g., 4.3 GB for ~4,100 bacterial …
A complete suite for gene-by-gene schema creation and strain identification.
is an ultrafast method for aligning and preprocessing high throughput chromatin profiles.
a command-line toolkit and Python library for detecting copy number variants and alterations genome-wide from high-throughput sequencing.
uses exome sequencing data to find copy number variants (CNVs) and genotype the copy-number of duplicated genes.
Copy number and genotype annotation from whole genome and whole exome sequencing data.
enables the easy detection of CRISPRs and cas genes in user-submitted sequence data (allows sequences up to 50 Mo otherwise download standalone program). This is an update of the CRISPRFinder program with improved specificity and indication on the CRISPR orientation. MacSyFinder is used to identify cas genes, the CRISPR-Cas type and subtype.
a program for genome coordinates conversion between different genome assemblies.
a toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing.
deconvolutes mixed genomes with unknown proportions.
is a flexible and customizable pipeline for prokaryotic genome annotation as well as data submission to the INSDC.
a whole genome simulator for next-generation sequencing based off of wgsim found in SAMtools, which was written by Heng Li, and forked from DNAA. It was modified to handle ABI SOLiD and Ion Torrent data, as well as various assumptions about aligners and positions of indels. Many new features have been subsequently added.
estimates haplotype phase either within a genotyped cohort or using a phased reference panel. Eagle2 is now the default phasing method used by the Sanger and Michigan imputation servers and uses a new, very fast HMM-based algorithm that improves speed and accuracy over existing methods via two key ideas: a new data structure based on the positional Burrows-Wheeler transform and a rapid search algorithm …
The EIGENSOFT package combines functionality from our population genetics methods (Patterson et al. 2006) and our EIGENSTRAT stratification correction method (Price et al. 2006).
a Java program that finds potential disease-causing variants from whole-exome or whole-genome sequencing data. Starting from a VCF file and a set of phenotypes encoded using the Human Phenotype Ontology (HPO), it will annotate, filter and prioritize likely causative variants based on user-defined criteria such as a variant's predicted pathogenicity, frequency of occurrence in a population and also how closely the given phenotype matches …
developed for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI).
infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million sequences in a reasonable amount of time and memory.
performs fast principal component analysis (PCA) of single nucleotide polymorphism (SNP) data, similar to smartpca from EIGENSOFT (http://www.hsph.harvard.edu/alkes-price/software/) and shellfish (https://github.com/dandavison/shellfish). FlashPCA is based on the https://github.com/yixuan/spectra/ library.
finds somatic fusion-genes in RNA-seq data.
a tool for genome-wide profiling tandem repeats from short reads. A key advantage of GangSTR over existing tools (e.g. lobSTR or hipSTR) is that it can handle repeats that are longer than the read length. GangSTR takes aligned reads (BAM) and a set of repeats in the reference genome as input and outputs a VCF file containing genotypes for each locus.
Genesis Applications for Phylogenetic Placement Analysis
(Genome-wide Complex Trait Analysis) a tool for genome-wide complex trait analysis with five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait …
a free tool offered by Golden Helix that delivers stunning visualizations of your genomic data, enabling you to see what is occurring at each base pair in your samples.
a peak-caller for genomic enrichment assays (e.g. ChIP-seq, ATAC-seq).
a tool based on a genomic ordered relational architecture and allows analysis of large sets of genomic and phenotypic tabular data using a declarative query language, in a parallel execution engine. It is very efficient in a wide range of use-cases, including genome wide batch analysis, range-queries, genomic table joins of variants and segments, filtering, aggregation etc.
an ultra-fast sequence alignment algorithm for intra-species genome comparison.
is an ultra-fast and scalable microbial genome search program based on MinHash-like metric and graph-based approximate nearest neighbor search
a tool to sort genomic files according to a genomefile.
is a user-friendly workflow for phylogenomics intended to give more researchers the capability to create phylogenomic trees.
an open-source, general-purpose, Python-based data analysis library with additional data types and methods for working with genomic data.
Developed for the detection of subtle allelic imbalance events from next-generation sequencing data, hapLOHseq is a sequencing-based extension of hapLOH, which is a method for the detection of subtle allelic imbalance events from SNP array data. It is capable of identifying events of 10 mega-bases or greater occurring in as little as 16% of the sample using exome sequencing data (at 80x) and 4% …
(Holistic Allele-specific Tumor Copy-number Heterogeneity) is an algorithm that infers allele and clone-specific CNAs and WGDs jointly across multiple tumor samples from the same patient, and that leverages the relationships between clones in these samples.
a framework that integrates optical mapping, high-throughput chromosome conformation capture (Hi-C), and whole genome sequencing to systematically detect SVs in a variety of normal or cancer samples and cell lines.
Blazing fast toolkit to work with .hic and .cool files
(Haplotype inference and phasing for Short Tandem Repeats) a novel haplotype-based method for robustly genotyping and phasing STRs from Illumina sequencing data. HipSTR was specifically developed to deal with short tandem repeats (STRs) in genomic sequences in the hopes of obtaining more robust STR genotypes.
Hopla enables classic genomic single, duo, trio, etc., analysis, by studying a single (multisample) vcf-file, eventually generating interactive visualizations.
(Hypothesis Testing using Phylogenies) an open-source software package for comparative sequence analysis using stochastic evolutionary models.
The IDR (Irreproducible Discovery Rate) framework is a unified approach to measure the reproducibility of findings identified from replicate experiments and provide highly stable thresholds based on reproducibility.
(INFERence of RNA ALignment) searches DNA sequence databases for RNA structure and sequence similarities and uses a special case of profile stochastic context-free grammars called covariance models (CMs). In many cases It is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.
is a tool for intersection and visualization of multiple genomic region and gene sets (or lists of items).
a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers quickly by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.
a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores.
is a toolkit to compare genes lifted between genome assemblies.
is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It makes full use of base-call qualities and other sources of errors inherent in sequencing (e.g. mapping or base/indel alignment uncertainty), which are usually ignored by other methods or only used for filtering.
A long-read analysis toolbox for cancer genomics.
a probabilistic framework for structural variant discovery.
a program to model and detect macromolecular systems, genetic pathways in protein datasets. In prokaryotes, these systems have often evolutionarily conserved properties: they are made of conserved components and are encoded in compact loci (conserved genetic architecture). The user models these systems with MacSyFinder to reflect these conserved features and to allow their efficient detection.
(Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout) a computational tool to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens (or GeCKO) technology.
a comprehensive quality control, analysis and visualization workflow for CRISPR/Cas9 screens.
calls structural variants (SVs) and indels from mapped paired-end sequencing reads. Manta is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. It discovers, assembles, and scores large-scale SVs, medium-sized indels and large insertions within a single efficient workflow. The method is designed for rapid analysis on standard compute hardware: NA12878 at 50x genomic coverage …
a method for rapid genotype refinement for whole-genome sequencing data using multi-variate normal distribution. Whole-genome low-coverage sequencing has been combined with linkage-disequilibrium (LD) based genotype refinement to accurately and cost-effectively infer genotypes in large cohorts of individuals.
uses sparse trees to represent gene flow in pedigrees and is a fast pedigree analysis package.
MinCED is a program to find Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) in full genomes or environmental datasets such as assembled contigs from metagenomes.
a lower memory and more computationally efficient implementation of the genotype imputation algorithms in minimac/mininac2/minimac3.
a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.
Octopus is a mapping-based variant caller that implements several calling models within a unified haplotype-aware framework. Octopus takes inspiration from particle filtering by constructing a tree of haplotypes and dynamically pruning and extending the tree based on haplotype posterior probabilities in a sequential manner. This allows octopus to implicitly consider all possible haplotypes at a given loci in reasonable time.
an Open Source Web Application for Genetic Data (SNPs) using 23AndMe and Data Crawling Technologies.
an acronym that standands for Unveil Hi-C Anchors and Peaks, Peakachu takes genome-wide contact data as input and returns coordinates of likely interactions such as chromatin loops.
compares familial-relationships and sexes as reported in a PED/FAM file with those inferred from a VCF.
a Java-based variant caller designed for detecting somatic deletions from high-coverage (~30x) single-cell whole-genome sequencing (scWGS) data.
(phasing and Allele Specific Expression from RNA-seq) performs haplotype phasing using read alignments in BAM format from both DNA and RNA based assays, and provides measures of haplotypic expression for RNA based assays.
(Phylogenetic Analysis with Space/Time models) a software package for comparative and evolutionary genomics.
PHESANT - PHEnome Scan ANalysis Tool Run a phenome scan (pheWAS, Mendelian randomisation (MR)-pheWAS etc.) in UK Biobank. There are three components in this project: Running a phenome scan in UK Biobank Post-processing of results PHESANT-viz: Visualising the results
can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
Platypus is a tool designed for efficient and accurate variant-detection in high-throughput sequencing data. By using local realignment of reads and local assembly it achieves both high sensitivity and high specificity. Platypus can detect SNPs, MNPs, short indels, replacements and (using the assembly option) deletions up to several kb. It has been extensively tested on whole-genome, exon-capture, and targeted capture data, it has been …
a comprehensive update to Shaun Purcell's PLINK command-line program -- a whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses.
infers undirected graphical models to describe coevolution and covariation in families of biological sequences. With a multiple sequence alignment as an input, plmc can quantify inferred coupling strengths between all pairs of positions (couplingsfile output) or infer a generative model of the sequences for predicting the effects of mutations or designing new sequences (paramfile output).
(Population-wide Deletion Calling) fast structural deletion calling on population-scale short read paired-end germline WGS data.
(POPulation Partitioning Using Nucleotide Kmers) Calculate core and accessory distances, cluster genomes, assign new genomes to clusters, make visualisations
is a fast, reliable protein-coding gene prediction for prokaryotic genomes.
A fork of Prodigal meant to improve gene calling for giant viruses and viruses that use alternative genetic codes.
is a variant caller for single cell data from whole genome amplification with multiple displacement amplification (MDA). It relies on a pair of samples, where one is from an MDA single cell and the other from a bulk sample of the same cell population, sequenced with any next-generation sequencing technology.
QTLtools is a tool set for molecular QTL discovery and analysis.
(Regulatory Genomics Toolbox) is an open source python library for analysis of regulatory genomics. RGT is programmed in an oriented object fashion and its core classes provide functionality for handling regulatory genomics data.
(Rapid ORF Description & Evaluation Online) evaluates one or many genes, characterizing a gene neighborhood based on the presence of profile hidden Markov models (pHMMs).
Structural variant (SV) annotation.
genotyper for somatic SNV and indel discovery in PTA-amplified single cells.
runs as a two-step process. First cluster_identifier is used to generate soft-clipped read cluster consensus sequences. Second, SCRAMBle-MEIs.R analyzes the cluster file for likely Mobile Element Insertions.
a program to calculate EHH-based scans for positive selection in genomes.
implements a collapsed haplotype pattern (CHP) method to generate markers from sequence data for linkage analysis.
SMALT aligns DNA sequencing reads with a reference genome. Reads from a wide range of sequencing platforms can be processed, for example Illumina, Roche-454, Ion Torrent, PacBio or ABI-Sanger. Paired reads are supported. There is no support for SOLiD reads.
structural variant calling and genotyping with existing tools, but, smoothly.
a structural variation caller using third generation sequencing.
genomic variant annotation and functional effect prediction toolbox.
is an ensemble somatic SNV/indel caller that has the ability to use machine learning to filter out false positives from other callers.
(CRISPR Spacer Phage-Host pAiRs findER) a modular toolkit for sensitive phage-host interaction identification using CRISPR spacers.
provides analysis and publication quality printing of linear and circular RNA splicing, expression and regulation.
Mutual information based detection of pairs of genomic loci co-evolving under a shared selective pressure
(Sequence Read Archive Toolkit) a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. The germline caller employs an efficient tiered haplotype model to improve accuracy and provide read-backed phasing, adaptively selecting between assembly and a faster alignment-based haplotyping approach at each variant locus. The germline caller also analyzes input sequencing data using a mixture-model …
a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that …
Inference of population structure using multilocus genotype data
a tool set for simulating/evaluating SVs, merging and comparing SVs within and among samples, and includes various methods to reformat or summarize SVs.
a computational tool for detecting structural variations from cell free DNA (cfDNA) containing low dilutions of circulating tumor DNA (ctDNA).
a software that estimates telomere length from whole genome sequencing data (BAMs).
an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning, …
is a software package for visualising and analysing the MCMC trace files generated through Bayesian phylogenetic inference. Tracer provides kernel density estimation, multivariate visualisation, demographic trajectory reconstruction, conditional posterior distribution summary and more. Tracer v1.7.1 can read output files from MrBayes, BEAST, BEAST2, RevBayes, Migrate, LAMARC and and possibly other MCMC programs from other domains.
a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data.
a platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454, and similar instruments.

Restriction: available to non-profit users only. See technical notes for additional information on for-profit user licensing.

allows you to quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files. It uses a simple conf file to allow the user to specify the source annotation files and fields and how they will be added to the info of the query VCF.
allows to simultaneously filter variants based on any INFO field, CHROM, POS, REF, ALT, QUAL, and the annotation field ANN.
Variation graphs provide a succinct encoding of the sequences of many genomes. A variation graph (in particular as implemented in vg) is composed of: * nodes, which are labeled by sequences and ids * edges, which connect two nodes via either of their respective ends * paths, describe genomes, sequence alignments, and annotations (such as gene models and transcripts) as walks through nodes connected …
a versatile software tool for efficiently solving large scale sequence matching tasks. Vmatch subsumes the software tool REPuter, but is much more general, with a very flexible user interface, and improved space and time requirements.
A tool set for short variant discovery in genetic sequence data.
WASP is a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs.
a software for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly.
discretizes several ChIP-seq replicates simultaneously and resolves conflicts between them. After the job is done, Zerone checks the results and tells you whether it passes the quality control.