an Any-to-PostScript filter that processes plain text files, but also pretty prints quite a few popular languages.
a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
consists of several independently developed packages that work well by themselves, and with Amber itself. The suite can also be used to carry out complete (non-periodic) molecular dynamics simulations (using NAB), with generalized Born solvent models
(Alignment of Multiple Protein Sequences) a suite of programs for protein multiple sequence alignment, pairwise alignment, statistical analysis and flexible pattern matching.
a Python distribution that includes more than 400 of the most popular Python packages for science, math, engineering, and data analysis.
a fast, flexible C++ API & toolkit for reading, writing, and manipulating BAM files.
a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.
a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic. Bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), sophisticated ...
(Binding and Expression Target Analysis) a software package that integrates ChIP-seq of transcription factors or chromatin regulators with differential gene expression data to infer direct target genes.
A quality assessment package for next-genomics sequencing data. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well.
an extension to Brian Kernighan's awk, with added support for several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q, and TAB-delimited formats with column names along with new built-in functions and a command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk should behave exactly like the original ...
tools for early stage NGS alignment file processing including fast sorting and duplicate marking.
Installation Client for the BioGrids software collection.
(Basic Local Alignment Search Tool) finds regions of similarity between biological sequences.
a suite of BLAST (Basic Local Alignment Search Tool) tools that utilizes the NCBI C++ Toolkit with a number of performance and feature improvements over the legacy BLAST applications.
(BLAST-Like Alignment Tool) a very fast sequence alignment tool similar to BLAST.
the Amazon Web Services (AWS) SDK for Python, which allows Python developers to write software that makes use of Amazon services like S3 and EC2. Boto provides an easy to use, object-oriented API as well as low-level direct service access.
an ultrafast, memory-efficient short read aligner for short DNA sequences (reads) from next-gen sequencers.
an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
a computational pipeline for finding mutations relative to a reference sequence in short-read DNA re-sequencing data for microbial sized genomes. It reports single-nucleotide mutations, point insertions and deletions, large deletions, and new junctions supported by mosaic reads.
(Burrows-Wheeler Aligner) a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.
designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically.
a highly extensible, interactive molecular graphics program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles. It is often a tool of choice for rendering EM volumes.
Ultrafast and accurate clustering through imputation and dimensionality reduction for single-cell RNA-seq data.
(COmpressive Read-mapping Accelerator) a compressive-acceleration tool for NGS read mapping methods.
redistributable software libraries to support CUDA applications for Linux.
a reference-guided assembler that assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
Finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
an optimising static compiler for both the Python programming language and the extended Cython programming language (based on Pyrex). It makes writing C extensions for Python as easy as Python itself. cython is installed as a module within python.
a software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data.
GNU datamash is a command-line program which performs basic numeric,textual and statistical operations on input textual data files.
a Bioconductor software package installed in R 3.2.2 that estimates variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.
a high-throughput program for aligning a file of short DNA sequencing reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity.
a comprehensive DNASTAR software package that includes the software suites for genomics, structural biology, and molecular biology research.
a Bioconductor software package installed in R 3.2.2 for gene and isoform differential expression analysis of RNA-seq data.
a Bioconductor software package installed in R 3.2.2 for examining differential expression of replicated count data.
integrates a range of currently available packages and tools for sequence analysis into a seamless whole.
a tool to predict protein structure, function, and mutations using evolutionary sequence covariation.
a streaming tool for quantifying the abundances of a set of target sequences from sampled subsequences.
a DNA and protein sequence alignment software package that searches for matching sequence patterns or words, called k-tuples.
a quality control tool for high throughput sequence data.
allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
(Feature frequency profile) an alignment free comparison tool for phylogenetic analysis and text comparison. It can be applied to nucleotide sequences, complete genomes, proteomes and even used for text comparison.
a software package developed to analyze high-throughput sequencing data capable of taking on projects of any size with a primary focus on variant discovery, genotyping, and data quality assurance.
(Genome-wide Complex Trait Analysis) a tool for genome-wide complex trait analysis with five main functions: data management, estimation of the genetic relationships from SNPs, mixed linear model analysis of variance explained by the SNPs, estimation of the linkage disequilibrium structure, and GWAS simulation. GCTA estimates the variance explained by all the SNPs on a chromosome or on the whole genome for a complex trait ...
a powerful and comprehensive suite of molecular biology tools.
a free tool offered by Golden Helix that delivers stunning visualizations of your genomic data, enabling you to see what is occurring at each base pair in your samples.
an interpreter for the PostScript (TM) language. It can display and convert postscript files. Software can be involved with gs command.
creates a Globus endpoint on your laptop or other personal computer and allows you to transfer and share files, regardless of whether you have administrative privileges on your machine. Globus Connect Personal is available for Mac OS X and Linux operating systems.
a query tool for working with sequence data based on a Genomic Ordered Relational (GOR) architecture.
(Hierarchical Indexing for Spliced Alignment of Transcripts) a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome). HISAT2 is a successor to both HISAT and TopHat2.
Installation client for Harvard Medical School
(Hypergeometric Optimization of Motif EnRichment) a suite of sequencing analysis and sequence motif discovery tools.
a Python package that provides infrastructure to process data from high-throughput sequencing assays.
a statistical software package offered by IBM that is used for statistical analysis. Includes IBM SPSS Modeler, IBM SPSS Statistics, IBM SPSS Analytic Server, & IBM SPSS Collaboration & Deployment Services.
a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
(INFERence of RNA ALignment) searches DNA sequence databases for RNA structure and sequence similarities and uses a special case of profile stochastic context-free grammars called covariance models (CMs). In many cases It is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.
(Iterative Threading ASSEmbly Refinement) a hierarchical approach to protein structure and function prediction. Structural templates are first identified from the PDB by multiple threading approach LOMETS; full-length atomic models are then constructed by iterative template fragment assembly simulations. Finally, function inslights of the target are derived by threading the 3D models through protein function database BioLiP.
a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
a system designed for automated collection of images from a transmission electron microscope; it includes the python-side programs written in python and c, the MySQL database and server, and the mainly php-based image and data viewers on a web server.
(Model Based Analysis of ChIP-Seq data) a novel algorithm for identifying transcript factor binding sites.
a comprehensive Macintosh application that provides sequence editing, primer design, internet database searching, protein analysis, sequence confirmation, multiple sequence alignment, phylogenetic reconstruction, coding region analysis, agarose gel simulation and a variety of other functions.
a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <200 sequences), FFT-NS-2 (fast; for alignment of <30,000 sequences).
a comprehensive quality control, analysis and visualization workflow for CRISPR/Cas9 screens.
(Mapping and Assembly with Qualities) builds mapping assemblies from short reads generated by the next-generation sequencing machines.
a matrix-based, high-performance language for scientific and engineering computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation.
an object-oriented Python library to analyze trajectories from molecular dynamics (MD) simulations in many popular formats. It can write most of these formats, too, together with atom selections suitable for visualization or native analysis tools.
(Multiple EM for Motif Elicitation) a collection of motif-based sequence analysis tools for discovering motifs in a group of related DNA or protein sequences.
(Metagenomic Inquiry Compressive Acceleration) a family of programs for performing compressively-accelerated metagenomic sequence searches based on BLASTX and DIAMOND.
(Mixture-of-Isoforms) for isoform quantitation using RNA-Seq is a probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons across samples. MISO is installed as a standalone program and as a module within python.
used for homology or comparative modeling of protein three-dimensional structures. From a sequence alignment with known related structures MODELLER automatically calculates a model containing all non-hydrogen atoms using comparative protein structure modeling by satisfaction of spatial restraints. It can also perform de novo modeling of loops in protein structures, optimize various models of protein structure with respect to a flexibly defined objective function, multiple ...
an open source implementation of Microsoft's .NET Framework based on the ECMA standards for C# and the Common Language Runtime.
provides a toolkit for analyzing single cell gene expression experiments. Monocle was originally developed to analyze dynamic biological processes such as cell differentiation, although it also supports other experimental settings.
a project to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community. Includes accelerated versions of DOTUR and SONS and the functionality of a number of other popular tools.
a high performance and widely portable implementation of the Message Passing Interface (MPI) standard.
a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.
a powerful tool designed for mapping of short reads onto a reference genome from Illumina, Ion Torrent, & 454 NGS platforms.
contains among other things: a powerful N-dimensional array object, sophisticated (broadcasting) functions, tools for integrating C/C++ and Fortran code, useful linear algebra, Fourier transform, and random number capabilities. NumPy is installed as a module within Python.
(Open Java Development Kit) is a free and open source implementation of the Java Platform, Standard Edition (Java SE).
The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available.
a library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. pandas is installed as a module within python.
Pediatric Scholar installation client for Boston Children's Hospital
(Pathologically Eclectic Rubbish Lister or sometimes called the Practical Extraction and Reporting Language) a highly capable, feature-rich family of programming languages.
a set of Java command line tools for manipulating high-throughput sequencing (HTS) data and formats.
a comprehensive update to Shaun Purcell's PLINK command-line program -- a whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses.
GraphPad Prism combines scientific graphing, comprehensive curve fitting (nonlinear regression), understandable statistics, and data organization.
uses a simple and accurate secondary structure prediction method incorporating two feed-forward neural networks which perform an analysis on output obtained from BLAST.
open source version of the widely used molecular visualization package developed by Warren DeLano.
a python module that makes it easy to read and manipulate genomic data sets. It is a lightweight wrapper of the htslib C-API and provides facilities to read and write SAM/BAM/VCF/BCF/BED/GFF/GTF/FASTA/FASTQ files as well as access to the command line functionality of the SAMtools and BCFtools packages. Pysam is installed as a module within python.
a general-purpose, interpreted, object oriented, high-level dynamic programming language that emphasizes code readability. Its syntax allows programmers to express concepts in fewer lines of code than in C++ or Java, thus allowing programmers to work more quickly and integrate their systems more effectively.
cross platform time zone library that brings the Olson tz database into Python and allows accurate and cross platform timezone calculations using Python 2.4 or higher. It also solves the issue of ambiguous times at the end of daylight saving time, which you can read more about in the Python Library Reference (datetime.tzinfo).
a cross-platform application framework that is used for developing application software that can be run on various software and hardware platforms with little or no change in the underlying codebase, while still being a native application with native capabilities and speed.
(QUAlity score Reduction at Terabyte scale) an efficient de novo quality score compression tool based on traversing the k-mer landscape of NGS read datasets.
a free software environment for statistical computing and graphics.
a software suite for modeling macromolecular structures and for predicting and designing protein structures, protein folding mechanisms, and protein-protein interactions.
(RNA-Seq by Expectation-Maximization) a software package for estimating gene and isoform expression levels from RNA-Seq data.
an integrated development environment (IDE) for R that includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.
a tool for quantifying the expression of transcripts using RNA-seq data. Salmon uses new algorithms to provide very quick, accurate expression estimates using little memory and performs inference using an expressive and realistic model of RNA-seq data that takes into account experimental attributes and biases commonly observed in real RNA-seq data.
(Sequence Alignment and Modeling system) a collection of tools for creating, refining, and using linear hidden Markov models for biological sequence analysis.
(Sequence Alignment/Map) a generic format for storing large nucleotide sequence alignments that provides various utilities for manipulating alignments, including sorting, merging, indexing and generating alignments in a per-position format.
a Bioconductor software package installed in R 3.2.2 that takes the position weight matrix of a DNA sequence motif and plots the corresponding sequence logo.
a DNA sequence assembly and analysis software package for Sanger Sequencing and Next Generation Sequencing.
A method and tool to control single-cell RNA-seq data quality.
SolexaQA calculates sequence quality statistics and creates visual representations of data quality for second-generation sequencing data.
(St. Petersburg genome assembler) a genome assembly algorithm designed for single-cell and multi-cell bacterial data sets.
a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
(Spliced Transcripts Alignment to a Reference) is an ultrafast universal RNA-seq aligner.
Data Analysis and Statistical Software
(Tool Command Language) is a very powerful but easy to learn dynamic programming language, suitable for a very wide range of uses, including web and desktop applications, networking, administration, testing and many more. Tk is a graphical user interface toolkit that takes developing desktop applications to a higher level than conventional approaches.
cleans up raw data files and converts them to pdf format with LaTex. TeX Live offers an easy way to get up and running with the TeX document production system.
a fast splice junction mapper for RNA-Seq reads that aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
a flexible read trimming tool for Illumina NGS data.
a software package comprised of three independent software modules (Inchworm, Chrysalis, and Butterfly) for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.
a program package designed to provide easily accessible methods for working with complex genetic variation data in the form of VCF files, such as those generated by the 1000 Genomes Project.
a sequence assembler for very short reads.
a multi-threaded Perl script for automatically optimising the three primary parameter options (K, -exp_cov, -cov_cutoff) for the Velvet de novo sequence assembler.
(Visualization Tool Kit)f a C++ class library and several interpreted interface layers including Tcl/Tk, Java, and Python. VTK supports a wide variety of visualization algorithms including scalar, vector, tensor, texture, and volumetric methods, as well as advanced modeling techniques such as implicit modeling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation.
command line tools for sequence logo generation.
a blending of the wxWidgets C++ class library with the Python programming language.
an extensible parallel framework, written in Python using OpenMPI libraries that allows researchers to quickly build high throughput big data pipelines without extensive knowledge of parallel programming.
a full rigid-body search of docking orientations between two proteins.