February 2020 Newsletter
Our February newsletter includes twenty-one new and updated software titles. This month's Software Highlight focuses on MultiQC, a software package that aggregates results from bioinformatics analyses across multiple tools and samples. Spring training schedules include nine workshops and classes available to HMS researchers.
We remind readers that BioGrids is not compatible with MacOS 10.15 at this time.
While we recommend not upgrading to 10.15 on any Mac with BioGrids already installed, we have implemented a workaround to install BioGrids and SBGrid on new machines.
If your use of BioGrids supplied software was an important element in your publication, please include the following statement in your work:
"Software used in the project was installed and configured by BioGrids
(cite: eLife 2013;2:e01456, Collaboration gets the most out of software.)"
See our Grant Support page for additional details.
Register here to try out our software installer, which allows users to choose from over 290 bioinfomatics and life sciences tools that can be installed as ready-to-run applications on Mac or Linux machines with the click of a button or a short command from the CLI. No need to worry about dependencies or compilation.
BioGrids is supported by a team of scientists and engineers at HMS. We provide direct support to BioGrids members. This includes all aspects of software installation and management. If you need assistance of any kind please send a note to: firstname.lastname@example.org.
The BioGrids Installer is an easy to use application that makes installing and managing life sciences software simple and quick.
A command line version is also available for Macs and Linux. Download using the link button above and register here for activation.
The BioGrids team provides support, infrastructure and testing for scientific software packages. We currently provide over 290 titles in five categories and an additional 1,500 R, python and perl packages and modules. The collection grows weekly. Learn more here: About BioGrids
If you are new to BioGrids and would like to quickly get started with the command line version, follow the instructions below:
1: Download the BioGrids Installer command line version
curl -kLO https://biogrids.org/wiki/downloads/biogrids-1.0.694-Linux.tgz
tar zxf biogrids-1.0.694-Linux.tgz
curl -kLO https://biogrids.org/wiki/downloads/biogrids-1.0.694-Darwin.tgz
tar zxf biogrids-1.0.694-Darwin.tgz
2: Activate biogrids
./biogrids activate biogrid-production jvinent1 70rYFTDnmCr93VUklfbf1s3M4jdyC9bFVYHew==
Replace the site name, user name and activation key with your own credentials.
3: Install software with BioGrids
./biogrids install fastqc trimmomatic samtools star subread igv
When finished, verify applications are installed:
Software Highlight: MultiQC
MultiQC is an aggregator of bioinformatics analyses across multiple tools and samples. It collects statistics and results from pipelines run on multiple samples and combines them in a single report. This allows you to see results from all samples in one place.
From the MultiQC website :
- MultiQC collects numerical stats from each module at the top the report, so that you can track how your data behaves as it proceeds through your analysis.
- Visualizing your samples together allows detailed comparison, not possible by scanning one report after another.
- MultiQC supports many common bioinformatics tools out of the box. If you're missing something, just create an issue on GitHub to request it - if you have an example log file it's usually pretty fast.
Many example reports are provided that can be viewed on the website or directly downloaded.
Anaconda is a Python distribution that includes more than 400 of the most popular Python packages for science, math, engineering, and data analysis.
ASCIIgenome is a command-line genome browser running from terminal window and solely based on ASCII characters.
bamtools is a fast, flexible C++ API & toolkit for reading, writing, and manipulating BAM files.
CUDA is redistributable software libraries to support CUDA applications for Linux.
feba includes scripts for estimating mutant fitness by sequencing randomly barcoded transposons (RB-TnSeq).
GATK (Genome Analysis Toolkit) is a software package developed to analyze high-throughput sequencing data capable of taking on projects of any size with a primary focus on variant discovery, genotyping, and data quality assurance.
Mash is a fast sequence distance estimator that uses the MinHash algorithm and is designed to work with genomes and metagenomes in the form of assemblies or reads.
Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database.
MultiQC aggregates results from bioinformatics analyses across many samples into a single report.
Percolator Semi-supervised learning for peptide identification from shotgun proteomics datasets.
phaser performs haplotype phasing using read alignments in BAM format from both DNA and RNA based assays, and provides measures of haplotypic expression for RNA based assays.
Sambamba is a high performance, highly parallel, robust and fast tool for working with SAM and BAM files.
SAMtools (Sequence Alignment/Map) is a generic format for storing large nucleotide sequence alignments that provides various utilities for manipulating alignments, including sorting, merging, indexing and generating alignments in a per-position format.
SeqKit is a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing.
STAR (Spliced Transcripts Alignment to a Reference) is an ultrafast universal RNA-seq aligner.
TaxonKit is a command-line toolkit for rapid manipulation of NCBI taxonomy data.
TeX Live cleans up raw data files and converts them to pdf format with LaTex. TeX Live offers an easy way to get up and running with the TeX document production system.
TPMCalculator quantifies mRNA abundance directly from the alignments by parsing BAM files.
vcfanno allows you to quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files.
VSEARCH is an open source, multithreaded 64-bit tool for processing and preparing metagenomics, genomics, and population genomics nucleotide sequence data.
WASPQTL is a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs.
Training sessions available to HMS trainees:
HMS Research Computing
Intro to Python 3/4/2020 3-5p Countway 506 Minot Room
Intro to MATLAB 3/18/2020 2-4p TMEC 328 Learning studio (Cannon)
The Harvard Chan Bioinformatics Core
Workshops for HSCI and on-quad HMS researchers:
Introduction to the command-line interface (shell/bash/Unix) Basic March 6th
Introduction to bulk RNA-seq Analysis Advanced March 16th & 17th
Introduction to R Basic April 1st & 2nd 1.5 days
Introduction to Differential Gene Expression Analysis Advanced May 19th & 20th
Workshops for all researchers at Harvard University and affiliated institutions:
Gene Annotations and Functional Analysis of Gene Lists March 18th 1 PM HSPH Kresge G1
Generating Data Analysis Reports with RMarkdown April 15th 1 PM HSPH Kresge G1
Countway Library of Medicine
Practical Presentation Skills
Meets Wednesdays, Jan 29 - Jun 17, from 5:30pm - 7:00pm, Countway L2: Room 025, Harvard Longwood Campus
Need help getting software installed on new machines? Have you been planning to try Amazon Web Services (AWS) cloud computing?
BioGrids can help you get started. We have expertise in bioinformatics, programming, workflow development and high performance computing.
We improve the collection with feedback from the community.
Want to see a new application in BioGrids?
Let us know: email@example.com
BioGrids is supported by Harvard Medical School and Boston Children's Hospital and relies on a framework that was developed by SGBGrid.