Supported Applications

Software count:
Filtering is with keywords.
AppCiter will help you create a bibliography of the programs you wish to cite. See How.
AppCiter Programs:

No programs selected

Results:

Name Description Links
an Any-to-PostScript filter that processes plain text files, but also pretty prints quite a few popular languages.
Keywords:
Utilities
A5-miseq is a pipeline for assembling DNA sequence data generated on the Illumina sequencing platform. A5-miseq can produce high-quality microbial genome assemblies on a laptop computer without any parameter tuning by automating the process of adapter trimming, quality filtering, error correction, contig and scaffold generation and detection of misassemblies.
a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
(Ancestry and Kinship Toolkit) a statistical genetics tool for analysing large cohorts of whole-genome sequenced samples. It provides a handful of useful statistical genetics routines using the htslib API for input/output. This means it can seamlessly read BCF/VCF files and play nicely with bcftools.
a suite of programs that allows users to carry out molecular dynamics simulations, particularly on biomolecules. The suite can be used to carry out complete (non-periodic) molecular dynamics simulations (using NAB) with either explicit water or generalized Born solvent models. The independently developed packages work well by themselves, and with Amber itself.
(Alignment of Multiple Protein Sequences) a suite of programs for protein multiple sequence alignment, pairwise alignment, statistical analysis and flexible pattern matching.
a Python distribution that includes more than 400 of the most popular Python packages for science, math, engineering, and data analysis.
is a command-line genome browser running from terminal window and solely based on ASCII characters.
a fast, flexible C++ API & toolkit for reading, writing, and manipulating BAM files.
a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam.
a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving.
provides best-practice pipelines for automated analysis of high throughput sequencing data with the goal of being quantifiable, analyzable, scalable and reproducible. The development process is fully open and sustained by contributors from multiple institutions. Bioinformaticians, biologists and the general public should be able to run these tools on inputs ranging from research materials to clinical samples to personal genomes.
a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.
a cross-platform program for Bayesian phylogenetic analysis of molecular sequences. It estimates rooted, time-measured phylogenies using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST 2 uses Markov chain Monte Carlo (MCMC) to average over tree space, so that each ...
a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic. Bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), sophisticated ...
(Binding and Expression Target Analysis) a software package that integrates ChIP-seq of transcription factors or chromatin regulators with differential gene expression data to infer direct target genes.
A quality assessment package for next-genomics sequencing data. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well.
an extension to Brian Kernighan's awk, with added support for several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q, and TAB-delimited formats with column names along with new built-in functions and a command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk should behave exactly like the original ...
tools for early stage NGS alignment file processing including fast sorting and duplicate marking.
tools to analyze and comprehend high-throughput genomic data.
Installation Client for the BioGrids software collection.
Keywords:
Utilities
a set of tools for biological computation written in Python by an international team of developers.
Keywords:
Utilities
a set of tools for the time-efficient analysis of Bisulfite-Seq (BS-Seq) data. Bismark performs alignments of bisulfite-treated reads to a reference genome and cytosine methylation calls at the same time.
(Basic Local Alignment Search Tool) finds regions of similarity between biological sequences.
a suite of BLAST (Basic Local Alignment Search Tool) tools that utilizes the NCBI C++ Toolkit with a number of performance and feature improvements over the legacy BLAST applications.
(BLAST-Like Alignment Tool) a very fast sequence alignment tool similar to BLAST.
aka Best Match Tagger is for removing human reads from metagenomics datasets
the Amazon Web Services (AWS) SDK for Python, which allows Python developers to write software that makes use of Amazon services like S3 and EC2. Boto provides an easy to use, object-oriented API as well as low-level direct service access.
an ultrafast, memory-efficient short read aligner for short DNA sequences (reads) from next-gen sequencers.
an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
a Perl/Cpp package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads. It includes two complementary programs.
a computational pipeline for finding mutations relative to a reference sequence in short-read DNA re-sequencing data for microbial sized genomes. It reports single-nucleotide mutations, point insertions and deletions, large deletions, and new junctions supported by mosaic reads.
Bam and Variant Analysis Tools
(Burrows-Wheeler Aligner) a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.
designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically.
Ultrafast and accurate clustering through imputation and dimensionality reduction for single-cell RNA-seq data.
a multiple sequence alignment program that uses seeded guide trees and HMM profile-profile techniques to generate alignments between three or more sequences.
(COmpressive Read-mapping Accelerator) a compressive-acceleration tool for NGS read mapping methods.
a Workflow Management System geared towards scientific workflows.
redistributable software libraries to support CUDA applications for Linux.
Keywords:
a reference-guided assembler that assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
Finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
an optimising static compiler for both the Python programming language and the extended Cython programming language (based on Pyrex). It makes writing C extensions for Python as easy as Python itself. cython is installed as a module within python.
Keywords:
Python Module
a software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data.
GNU datamash is a command-line program which performs basic numeric,textual and statistical operations on input textual data files.
a Bioconductor software package installed in R 3.2.2 that estimates variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.
a high-throughput program for aligning a file of short DNA sequencing reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity.
a comprehensive DNASTAR software package that includes the software suites for genomics, structural biology, and molecular biology research.
Limited license: supported at Boston Children's Hospital
a whole genome simulator for next-generation sequencing based off of wgsim found in SAMtools, which was written by Heng Li, and forked from DNAA. It was modified to handle ABI SOLiD and Ion Torrent data, as well as various assumptions about aligners and positions of indels. Many new features have been subsequently added.
a Bioconductor software package installed in R 3.2.2 for gene and isoform differential expression analysis of RNA-seq data.
a Bioconductor software package installed in R 3.2.2 for examining differential expression of replicated count data.
integrates a range of currently available packages and tools for sequence analysis into a seamless whole.
a tool to predict protein structure, function, and mutations using evolutionary sequence covariation.
a Java program that finds potential disease-causing variants from whole-exome or whole-genome sequencing data. Starting from a VCF file and a set of phenotypes encoded using the Human Phenotype Ontology (HPO), it will annotate, filter and prioritize likely causative variants based on user-defined criteria such as a variant's predicted pathogenicity, frequency of occurrence in a population and also how closely the given phenotype ...
a streaming tool for quantifying the abundances of a set of target sequences from sampled subsequences.
a DNA and protein sequence alignment software package that searches for matching sequence patterns or words, called k-tuples.
a quality control tool for high throughput sequence data.
allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. FastTree can handle alignments with up to a million sequences in a reasonable amount of time and memory.
a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
(Feature frequency profile) an alignment free comparison tool for phylogenetic analysis and text comparison. It can be applied to nucleotide sequences, complete genomes, proteomes and even used for text comparison.
(Fast Length Adjustment of SHort reads) is a very fast and accurate software tool to merge paired-end reads from next-generation sequencing experiments. FLASH is designed to merge pairs of reads when the original DNA fragments are shorter than twice the length of reads. The resulting longer reads can significantly improve genome assemblies. They can also improve transcriptome assembly when FLASH is used to merge ...
a tool for genome-wide profiling tandem repeats from short reads. A key advantage of GangSTR over existing tools (e.g. lobSTR or hipSTR) is that it can handle repeats that are longer than the read length. GangSTR takes aligned reads (BAM) and a set of repeats in the reference genome as input and outputs a VCF file containing genotypes for each locus.
a software package developed to analyze high-throughput sequencing data capable of taking on projects of any size with a primary focus on variant discovery, genotyping, and data quality assurance.
(Genome-wide Complex Trait Analysis) a tool for genome-wide complex trait analysis with five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait ...
a powerful and comprehensive suite of molecular biology and NGS analysis tools.
Limited license: supported at Boston Children's Hospital
a population genetics package that computes exact tests for Hardy-Weinberg equilibrium, for population differentiation and for genotypic disequilibrium among pairs of loci; computes estimates of F-statistics, null allele frequencies, allele size-based statistics for microsatellites, etc.; and performs analyses of isolation by distance from pairwise comparisons of individuals or population samples, including confidence intervals for “neighborhood size”.
a free tool offered by Golden Helix that delivers stunning visualizations of your genomic data, enabling you to see what is occurring at each base pair in your samples.
compares and evaluates the accuracy of RNA-Seq transcript assemblers (Cufflinks, Stringtie), collapses (merges) duplicate transcripts from multiple GTF/GFF3 files (e.g. resulted from assembly of different samples), and classifies transcripts from one or multiple GTF/GFF3 files as they relate to reference transcripts provided in a annotation file (also in GTF/GFF3 format).
validates, filters, converts and performs various other operations on GFF files (use gffread -h to see the various usage options). Because the program shares the same GFF parser code with Cufflinks, Stringtie, and gffcompare, it could be used to verify that a GFF file from a certain annotation source is correctly "understood" by these programs. Thus the gffread utility can be used to simply ...
an interpreter for the PostScript (TM) language. It can display and convert postscript files. Software can be involved with gs command.
Keywords:
Utilities
a set of command line tools to manipulate multiple alignments. Implemented in Go language, Goalign aims to handle multiple alignments in Phylip, Fasta, Nexus, and Clustal formats, through several basic commands. Each command may print result (an alignment, for example) in the standard output, and thus can be piped to the standard input of the next goalign command.
a query tool for working with sequence data based on a Genomic Ordered Relational (GOR) architecture.
a set of command line tools to manipulate phylogenetic trees. It is implemented in Go language. The goal is to handle phylogenetic trees in Newick, Nexus and PhyloXML formats, through several basic commands. Each command may print result (a tree for example) in the standard output, and thus can be piped to the standard input of the next gotree command.
Developed for the detection of subtle allelic imbalance events from next-generation sequencing data, hapLOHseq is a sequencing-based extension of hapLOH, which is a method for the detection of subtle allelic imbalance events from SNP array data. It is capable of identifying events of 10 mega-bases or greater occurring in as little as 16% of the sample using exome sequencing data (at 80x) and 4 ...
an open-source software package for sensitive protein sequence searching based on the pairwise alignment of hidden Markov models (HMMs).
(Haplotype inference and phasing for Short Tandem Repeats) a novel haplotype-based method for robustly genotyping and phasing STRs from Illumina sequencing data. HipSTR was specifically developed to deal with short tandem repeats (STRs) in genomic sequences in the hopes of obtaining more robust STR genotypes.
(Hierarchical Indexing for Spliced Alignment of Transcripts) a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome). HISAT2 is a successor to both HISAT and TopHat2.
a tool specially designed to analyze histone modification ChIP-seq data produced from cancer genomes. HMCan corrects for the GC-content and copy number bias and then applies Hidden Markov Models to detect the signal from the corrected data. On simulated data, HMCan outperformed several commonly used tools developed to analyze histone modification data produced from genomes without copy number alterations.
Installation client for Harvard Medical School
Limited license: supported at Harvard Medical School
(Hypergeometric Optimization of Motif EnRichment) a suite of sequencing analysis and sequence motif discovery tools.
a Python package that provides infrastructure to process data from high-throughput sequencing assays.
a C library for reading/writing high-throughput sequencing data.
a statistical software package offered by IBM that is used for statistical analysis. Includes IBM SPSS Modeler, IBM SPSS Statistics, IBM SPSS Analytic Server, & IBM SPSS Collaboration & Deployment Services.
Limited license: supported at Boston Children's Hospital
a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
a pipeline for processing inDrops sequencing data.
(INFERence of RNA ALignment) searches DNA sequence databases for RNA structure and sequence similarities and uses a special case of profile stochastic context-free grammars called covariance models (CMs). In many cases It is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.
an efficient de novo trascriptome assembler for RNA-Seq data. It can assemble transcripts from RNA-Seq reads (in fasta format). Unlike most of de novo assembly methods that build de Bruijn graph or splicing graph by connecting k-mers which are sets of overlapping substrings generated from reads, IsoTree constructs splicing graph by connecting reads directly. For each splicing graph, IsoTree applies an iterative scheme of ...
(Iterative Threading ASSEmbly Refinement) a hierarchical approach to protein structure and function prediction. Structural templates are first identified from the PDB by multiple threading approach LOMETS; full-length atomic models are then constructed by iterative template fragment assembly simulations. Finally, function inslights of the target are derived by threading the 3D models through protein function database BioLiP.
a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers quickly by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.
a language-agnostic HTML notebook application for Project Jupyter.
Keywords:
Pipelines
the next-generation web-based user interface for Project Jupyter. JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner.
Keywords:
Utilities
a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. Developed with a focus on enabling fast experimentation, Keras is a deep learning library that allows for easy and fast prototyping (through user friendliness, modularity, and extensibility); supports both convolutional networks and recurrent networks, as well as combinations of the two; and runs seamlessly on ...
a system designed for automated collection of images from a transmission electron microscope; it includes the python-side programs written in python and c, the MySQL database and server, and the mainly php-based image and data viewers on a web server.
Keywords:
Bioimaging
a probabilistic framework for structural variant discovery.
(Model Based Analysis of ChIP-Seq data) a novel algorithm for identifying transcript factor binding sites.
a comprehensive Macintosh application that provides sequence editing, primer design, internet database searching, protein analysis, sequence confirmation, multiple sequence alignment, phylogenetic reconstruction, coding region analysis, agarose gel simulation and a variety of other functions.
Limited license: supported at Boston Children's Hospital
Keywords:
Nucleic Acids
a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <200 sequences), FFT-NS-2 (fast; for alignment of <30,000 sequences).
a comprehensive quality control, analysis and visualization workflow for CRISPR/Cas9 screens.
calls structural variants (SVs) and indels from mapped paired-end sequencing reads. Manta is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. It discovers, assembles, and scores large-scale SVs, medium-sized indels and large insertions within a single efficient workflow. The method is designed for rapid analysis on standard compute hardware: NA12878 at 50x genomic ...
is a set of fast and accurate sequence read classification tools designed to assign taxonomy and OTU classifications to ribosomal RNA sequences. This is done by using a reference set of full-length ribosomal RNA sequences for which known taxonomies are known, and for which a set of high quality OTU clusters has been previously generated. For each read, the best guess and correspoding confidence ...
(Mapping and Assembly with Qualities) builds mapping assemblies from short reads generated by the next-generation sequencing machines.
a method for rapid genotype refinement for whole-genome sequencing data using multi-variate normal distribution. Whole-genome low-coverage sequencing has been combined with linkage-disequilibrium (LD) based genotype refinement to accurately and cost-effectively infer genotypes in large cohorts of individuals.
a matrix-based, high-performance language for scientific and engineering computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation.
Limited license: supported at Boston Children's Hospital

test

Keywords:
Other Tools
an object-oriented Python library to analyze trajectories from molecular dynamics (MD) simulations in many popular formats. It can write most of these formats, too, together with atom selections suitable for visualization or native analysis tools.
Keywords:
Python Module
(Multiple EM for Motif Elicitation) a collection of motif-based sequence analysis tools for discovering motifs in a group of related DNA or protein sequences.
(Metagenomic Inquiry Compressive Acceleration) a family of programs for performing compressively-accelerated metagenomic sequence searches based on BLASTX and DIAMOND.
whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (the later at the moment only CCS and error-corrected CLR reads).
(Mixture-of-Isoforms) for isoform quantitation using RNA-Seq is a probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons across samples. MISO is installed as a standalone program and as a module within python.
an open source implementation of Microsoft's .NET Framework based on the ECMA standards for C# and the Common Language Runtime.
Keywords:
provides a toolkit for analyzing single cell gene expression experiments. Monocle was originally developed to analyze dynamic biological processes such as cell differentiation, although it also supports other experimental settings.
a project to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community. Includes accelerated versions of DOTUR and SONS and the functionality of a number of other popular tools.
a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.
Keywords:
Utilities
a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.
a reactive workflow framework and programming DSL that ease writing computational pipelines with complex data. It is designed around the idea that the Linux platform is the lingua franca of data science. Linux provides many simple but powerful command-line and scripting tools that, when chained together, facilitate complex data manipulations. Nextflow extends this approach, adding the ability to define complex program interactions and a ...
a powerful tool designed for mapping of short reads onto a reference genome from Illumina, Ion Torrent, & 454 NGS platforms.
contains among other things: a powerful N-dimensional array object, sophisticated (broadcasting) functions, tools for integrating C/C++ and Fortran code, useful linear algebra, Fourier transform, and random number capabilities. NumPy is installed as a module within Python.
Keywords:
Python Module
(Open Java Development Kit) is a free and open source implementation of the Java Platform, Standard Edition (Java SE).
Keywords:
an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available.
a library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. pandas is installed as a module within python.
Pediatric Scholar installation client for Boston Children's Hospital
Limited license: supported at Boston Children's Hospital
Keywords:
Utilities
(Pathologically Eclectic Rubbish Lister or sometimes called the Practical Extraction and Reporting Language) a highly capable, feature-rich family of programming languages.
Keywords:
Utilities
a set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats.
can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
a comprehensive update to Shaun Purcell's PLINK command-line program -- a whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses.
infers undirected graphical models to describe coevolution and covariation in families of biological sequences. With a multiple sequence alignment as an input, plmc can quantify inferred coupling strengths between all pairs of positions (couplingsfile output) or infer a generative model of the sequences for predicting the effects of mutations or designing new sequences (paramfile output).
predicts the regulatory role of CRP transcription factor in Escherichia coli. PredCRP provides an accurate method for deriving an optimised model (named PredCRP-model) and a set of four interpretable rules (named PredCRP-ruleset) for predicting and analysing the regulatory roles of CRP from sequences of CRP-binding sites.
GraphPad Prism combines scientific graphing, comprehensive curve fitting (nonlinear regression), understandable statistics, and data organization.
uses a simple and accurate secondary structure prediction method incorporating two feed-forward neural networks which perform an analysis on output obtained from BLAST.
an interface to the REDCap Application Programming Interface (API), PyCap is designed to be a minimal interface exposing all required and optional API parameters.
Keywords:
Utilities
open source version of the widely used molecular visualization package developed by Warren DeLano.
a python module that makes it easy to read and manipulate genomic data sets. It is a lightweight wrapper of the htslib C-API; it provides facilities to read and write SAM/BAM/VCF/BCF/BED/GFF/GTF/FASTA/FASTQ files as well as access to the command line functionality of the SAMtools and BCFtools packages. Pysam is installed as a module within python.
a general-purpose, interpreted, object oriented, high-level dynamic programming language that emphasizes code readability. Its syntax allows programmers to express concepts in fewer lines of code than in C++ or Java, thus allowing programmers to work more quickly and integrate their systems more effectively.
Keywords:
Utilities
an open source deep learning platform that provides a seamless path from research prototyping to production deployment.
cross platform time zone library that brings the Olson tz database into Python and allows accurate and cross platform timezone calculations using Python 2.4 or higher. It also solves the issue of ambiguous times at the end of daylight saving time, which you can read more about in the Python Library Reference (datetime.tzinfo).
Keywords:
Bioimaging
a powerful, extensible, and decentralized microbiome analysis package with a focus on data and analysis transparency. QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.
a cross-platform application framework that is used for developing application software that can be run on various software and hardware platforms with little or no change in the underlying codebase, while still being a native application with native capabilities and speed.
Keywords:
(QUAlity score Reduction at Terabyte scale) an efficient de novo quality score compression tool based on traversing the k-mer landscape of NGS read datasets.
(QUality ASsessment Tool) evaluates genome assemblies by computing various metrics, including N50, length for which the collection of all contigs of that length or longer covers at least 50% of assembly length; NG50, where length of the reference genome is being covered; NA50 and NGA50, where aligned blocks instead of contigs are taken; misassemblies, misassembled and unaligned contigs or contigs bases; and genes and ...
a free software environment for statistical computing and graphics.
Keywords:
Utilities
rapid sensitive and accurate read mapping via quasi-mapping.
(RNA-Seq by Expectation-Maximization) a software package for estimating gene and isoform expression levels from RNA-Seq data.
an integrated development environment (IDE) for R that includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
Keywords:
enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.
a tool for quantifying the expression of transcripts using RNA-seq data. Salmon uses new algorithms to provide very quick, accurate expression estimates using little memory and performs inference using an expressive and realistic model of RNA-seq data that takes into account experimental attributes and biases commonly observed in real RNA-seq data.
a high performance, highly parallel, robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Because of its efficiency is an important work horse running in many sequencing centres around the world today.
a fast, flexible program for marking duplicates in read-id grouped1 paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file.
(Sequence Alignment/Map) a generic format for storing large nucleotide sequence alignments that provides various utilities for manipulating alignments, including sorting, merging, indexing and generating alignments in a per-position format.
runs as a two-step process. First cluster_identifier is used to generate soft-clipped read cluster consensus sequences. Second, SCRAMBle-MEIs.R analyzes the cluster file for likely Mobile Element Insertions.
a Bioconductor software package installed in R 3.2.2 that takes the position weight matrix of a DNA sequence motif and plots the corresponding sequence logo.
a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files, which can also be optionally compressed by gzip.
a DNA sequence assembly and analysis software package for Sanger Sequencing and Next Generation Sequencing.
Limited license: supported at Boston Children's Hospital
A method and tool to control single-cell RNA-seq data quality.
implements the bit-masked k-difference matching algorithm dedicated to the task of adapter trimming. It is specially designed for processing next-generation sequencing (NGS) paired-end sequences.
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.
genomic variant annotation and functional effect prediction toolbox.
SolexaQA calculates sequence quality statistics and creates visual representations of data quality for second-generation sequencing data.
(St. Petersburg genome assembler) a genome assembly algorithm designed for single-cell and multi-cell bacterial data sets.
a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
(Spliced Transcripts Alignment to a Reference) is an ultrafast universal RNA-seq aligner.
Data Analysis and Statistical Software
Limited license: supported at Boston Children's Hospital
a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. The germline caller employs an efficient tiered haplotype model to improve accuracy and provide read-backed phasing, adaptively selecting between assembly and a faster alignment-based haplotyping approach at each variant locus. The germline caller also analyzes input sequencing data using a ...
a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that ...
(Tool Command Language) is a very powerful but easy to learn dynamic programming language, suitable for a very wide range of uses, including web and desktop applications, networking, administration, testing and many more. Tk is a graphical user interface toolkit that takes developing desktop applications to a higher level than conventional approaches.
an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep ...
cleans up raw data files and converts them to pdf format with LaTex. TeX Live offers an easy way to get up and running with the TeX document production system.
Keywords:
Utilities
a fast splice junction mapper for RNA-Seq reads that aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data.
a flexible read trimming tool for Illumina NGS data.
a software package comprised of three independent software modules (Inchworm, Chrysalis, and Butterfly) for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.
The UCSC tool suite contains utilities designed to work with UCSC Genome Browser data and databases.
a platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454, and similar instruments.
a program package designed to provide easily accessible methods for working with complex genetic variation data in the form of VCF files, such as those generated by the 1000 Genomes Project.
a sequence assembler for very short reads.
a multi-threaded Perl script for automatically optimising the three primary parameter options (K, -exp_cov, -cov_cutoff) for the Velvet de novo sequence assembler.
an alternative to the USEARCH tool developed by Robert C. Edgar (2010) for which the source code is not publicly available, VSEARCH is an open source, multithreaded 64-bit tool for processing and preparing metagenomics, genomics, and population genomics nucleotide sequence data. It supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact ...
(Visualization Tool Kit)f a C++ class library and several interpreted interface layers including Tcl/Tk, Java, and Python. VTK supports a wide variety of visualization algorithms including scalar, vector, tensor, texture, and volumetric methods, as well as advanced modeling techniques such as implicit modeling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation.
command line tools for sequence logo generation.
a blending of the wxWidgets C++ class library with the Python programming language.
an extensible parallel framework, written in Python using OpenMPI libraries that allows researchers to quickly build high throughput big data pipelines without extensive knowledge of parallel programming.
a full rigid-body search of docking orientations between two proteins.
Scroll