March 2021 Newsletter
Our March newsletter includes thirty five new software titles and an additional fifteen updates. Spring workshop and training registrations are open for both The Harvard Chan Bioinformatics Core and HMS Research Computing.
macOS 11 Big Sur support
We've been testing new application installations on macOS 11 Big Sur. The majority of applications appear to work normally on both x86_64 and Apple Silicon hardware, with a few exceptions. If you have a new Big Sur M1 mac, you should be able to install software with the BioGrids installation manager. Be sure to install XQuartz, and let us know if you encounter any problems
Installation Manager Beta Testing
Our new graphical Installation Manager for macOS and Linux is nearly ready to go and we are looking for some beta testers! Internal testing is going well and we would now like feedback from the community. Email help@biogrids.org and we'll get you set up.
As always, please let us know if you have any questions or problems upgrading - help@biogrids.org
Remote Working Help
The BioGrids Wiki provides step by step instructions for installing BioGrids software on a local laptop or desktop machine. If you prefer a live demonstration, or run into trouble, please contact help@biogrids.org. We can set up a Zoom meeting to assist you.
MacOS 10.15 Catalina
While we recommend not upgrading to 10.15 on any Mac with BioGrids already installed, we have implemented a workaround to install BioGrids and SBGrid on new machines. Two approaches are available.
Cite BioGrids
If your use of BioGrids supplied software was an important element in your publication, please include the following statement in your work:
"Software used in the project was installed and configured by BioGrids
(cite: eLife 2013;2:e01456, Collaboration gets the most out of software.)"
See our Grant Support page for additional details.
Register here to try out our software installer, which allows users to choose from over 290 bioinfomatics and life sciences tools that can be installed as ready-to-run applications on Mac or Linux machines with the click of a button or a short command from the CLI. No need to worry about dependencies or compilation.
BioGrids is supported by a team of scientists and engineers at HMS. We provide direct support to BioGrids members. This includes all aspects of software installation and management. If you need assistance of any kind please send a note to: help@biogrids.org.
BioGrids Installer
The BioGrids Installer is an easy to use application that makes installing and managing life sciences software simple and quick.
A command line version is also available for Macs and Linux. Download using the link button above and register here for activation.
The BioGrids team provides support, infrastructure and testing for scientific software packages. We currently provide 335 titles in five categories and over 1,500 R, python and perl packages and modules. The collection grows weekly. Learn more here: About BioGrids
BioGrids QuickStart
If you are new to BioGrids and would like to quickly get started with the command line version, follow the instructions below:
1: Download the BioGrids Installer command line version
Linux CLI
curl -kLO https://biogrids.org/wiki/downloads/biogrids-1.0.695-Linux.tgz
tar zxf biogrids-1.0.694-Linux.tgz
cd biogrids-1.0.694-Linux
OSX CLI
curl -kLO https://biogrids.org/wiki/downloads/biogrids-1.0.695-Darwin.tgz
tar zxf biogrids-1.0.694-Darwin.tgz
cd biogrids-1.0.694-Darwin
2: Activate biogrids
./biogrids activate biogrid-production jvinent1 70rYFTDnmCr93VUklfbf1s3M4jdyC9bFVYHew==
Replace the site name, user name and activation key with your own credentials.
3: Install software with BioGrids
./biogrids install fastqc trimmomatic samtools star subread igv
When finished, verify applications are installed:
./biogrids installed
Software Updates
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.
Updated versions:1.12 | Linux 64 | OS X INTEL
biohansel Subtype microbial whole-genome sequencing (WGS) data using SNV targeting k-mer subtyping schemes.
Updated versions:2.6.1 | OS X INTEL 2.6.1 | Linux 64
bioinfokit is a toolkit aimed to provide various easy-to-use functionalities to analyze, visualize, and interpret the biological data generated from genome-scale omics experiments.
Updated versions:2.0.1 | OS X INTEL 2.0.1 | Linux 64
BLAST+ is a suite of BLAST (Basic Local Alignment Search Tool) tools that utilizes the NCBI C++ Toolkit with a number of performance and feature improvements over the legacy BLAST applications.
Updated versions:2.11.0 | Linux 64 2.11.0 | OS X INTEL
DANPOS2 is a toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing, version 2.
Updated versions:2.2.2 | Linux 64
DIAMOND is a high-throughput program for aligning a file of short DNA sequencing reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity.
Updated versions:2.0.4 | OS X INTEL 2.0.4 | Linux 64
dRep is a python program for rapidly comparing large numbers of genomes. dRep can also "de-replicate" a genome set by identifying groups of highly similar genomes and choosing the best representative genome for each genome set.
Updated versions:3.1.1 | OS X INTEL 3.1.1 | Linux 64
emu is a relative abundance estimator for 16S genomic sequences
Updated versions:1.0.1 | OS X INTEL 1.0.1 | Linux 64
fastv is an ultra-fast tool for identification of SARS-CoV-2 and other microbes from sequencing data.
Updated versions:0.8.1 | OS X INTEL 0.8.1 | Linux 64
Genrich is a peak-caller for genomic enrichment assays (e.g. ChIP-seq, ATAC-seq).
Updated versions:0.6.1 | OS X INTEL 0.6.1 | Linux 64
GNUVID (GNU-based Virus IDentification) is a Python3 program. It ranks CDS nucleotide sequences in a genome fna file based on the number of observed exact CDS nucleotide matches in a public or private database. It was created to type SARS-CoV-2 genomes using a whole genome multilocus sequence typing (wgMLST) approach.
Updated versions:2.2 | OS X INTEL 2.2 | Linux 64
HiLine is a HiC alignment and classification pipeline.
Updated versions:0.2.2 | Linux 64 0.2.2 | OS X INTEL
IGV (Integrative Genomics Viewer) a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
Updated versions:2.9.4 | Linux 64 | OS X INTEL
LongGF is a computational algorithm and software tool for fast and accurate detection of gene fusion by long-read transcriptome sequencing
Updated versions:0.1.2 | OS X INTEL 0.1.2 | Linux 64
Mapula is a command line tool that is able to parse alignments in SAM format and produce a range of useful stats.
Updated versions:2.1.1 | OS X INTEL 2.1.1 | Linux 64
MetaEuk is a modular toolkit designed for large-scale gene discovery and annotation in eukaryotic metagenomic contigs.
Updated versions:4.a0f584d | Linux 64 4.a0f584d | OS X INTEL
metagraph framework allows for indexing and analysis of very large biological sequence collections, producing compressed indexes that can represent several petabases of input data. The indexes can be efficiently queried with any query sequence of interest.
Updated versions:0.1.0 | OS X INTEL 0.1.0 | Linux 64
mokapot implements fast and flexible semi-supervised learning for peptide detection
Updated versions:0.6.0 | Linux 64 0.6.0 | OS X INTEL
msstitch is a tool to integrate a number of Shotgun proteomics tools, generating ready to use result files.
Updated versions:3.6 | OS X INTEL 3.6 | Linux 64
Nanopolish is a software package for signal-level analysis of Oxford Nanopore sequencing data.
Updated versions:0.13.2 | Linux 64 0.13.2 | OS X INTEL
NanoStat calculates various statistics from a long read sequencing dataset in fastq, bam or albacore sequencing summary format.
Updated versions:1.5.0 | OS X INTEL 1.5.0 | Linux 64
ngmlr CoNvex Gap-cost alignMents for Long Reads (ngmlr) is a long-read mapper designed to sensitively align PacBilo or Oxford Nanopore to (large) reference genomes.
Updated versions:0.2.7 | Linux 64
ont_fast5_api is a simple interface to HDF5 files of the Oxford Nanopore .fast5 file format.
Updated versions:3.3.0 | OS X INTEL 3.3.0 | Linux 64
pangolin Phylogenetic Assignment of Named Global Outbreak LINeages
Updated versions:2.3.2 | OS X INTEL 2.3.2 | Linux 64
PLASS (Protein-Level ASSembler) is a software to assemble short read sequencing data on a protein level.
Updated versions:4.687d7 | OS X INTEL 4.687d7 | Linux 64
PLINK is a comprehensive update to Shaun Purcell's PLINK command-line program -- a whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses.
Updated versions:1.90 | OS X INTEL 1.90 | Linux 64 2.00a3 | OS X INTEL 2.00a3 | Linux 64
Python is a general-purpose, interpreted, object oriented, high-level dynamic programming language that emphasizes code readability.
Updated versions:3.7.0 | OS X INTEL 3.7.0 | Linux 64
Raven is a de novo genome assembler for long uncorrected reads.
Updated versions:1.5.0 | Linux 64 1.5.0 | OS X INTEL
refgenie manages storage, access, and transfer of reference genome resources.
Updated versions:0.9.3 | OS X INTEL 0.9.3 | Linux 64
RNAblueprint library solves the problem of stochastically sampling RNA/DNA sequences compatible to multiple structural constraints.
Updated versions:1.3.2 | OS X INTEL 1.3.2 | Linux 64
RODEO evaluates one or many genes, characterizing a gene neighborhood based on the presence of profile hidden Markov models (pHMMs).
Updated versions:2.3.3 | OS X INTEL 2.3.3 | Linux 64
Rust-Bio-Tools is a set of ultra fast and robust command line utilities for bioinformatics tasks based on Rust-Bio.
Updated versions:0.19.6 | OS X INTEL 0.19.6 | Linux 64
TOBIAS (Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal) is a collection of command-line bioinformatics tools for performing footprinting analysis on ATAC-seq data.
Updated versions:0.12.10 | Linux 64 0.12.10 | OS X INTEL
TreeSAPP is a functional and taxonomic annotation tool for microbial genomes and proteins
Updated versions:0.10.2 | OS X INTEL 0.10.2 | Linux 64
SAMtools (Sequence Alignment/Map) a generic format for storing large nucleotide sequence alignments that provides various utilities for manipulating alignments, including sorting, merging, indexing and generating alignments in a per-position format.
Updated versions: 1.12 | OS X INTEL 1.12 | Linux 64
SECIMTools is a suite of tools for processing of metabolomics data.
Updated versions:21.3.4 | OS X INTEL 21.3.4 | Linux 64
SeqFu is a general-purpose program to manipulate and parse information from FASTA/FASTQ files.
Updated versions:0.8.11 | OS X INTEL 0.8.11 | Linux 64 0.8.10 | OS X INTEL 0.8.10 | Linux 64
smallgenomeutilities is a collection of scripts that is useful for dealing and manipulating NGS data of small viral genomes.
Updated versions:0.3.2 | Linux 64 0.3.2 | OS X INTEL
SpacePHARER is a modular toolkit for sensitive phage-host interaction identification using CRISPR spacers.
Updated versions:4.228b9e5 | OS X INTEL 4.228b9e5 | Linux 64
SPAdes (St. Petersburg genome assembler) a genome assembly algorithm designed for single-cell and multi-cell bacterial data sets.
Updated versions:3.15.2 | Linux 64 3.15.2 | OS X INTEL
spaTyper is a computational method for finding spa types.
Updated versions:0.3.3 | OS X INTEL 0.3.3 | Linux 64
StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts.
Updated versions:2.1.5 | Linux 64 2.1.5 | OS X INTEL
TensorFlow is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.
Updated versions: 2.4.1 | Linux 64 1.14.0 | OS X INTEL 1.14.0 | Linux 64 2.0.0 | OS X INTEL
Ultraplex is an all-in-one software package for processing and demultiplexing fastq files.
Updated versions:1.1.4 | OS X INTEL 1.1.4 | Linux 64
UShER is a program that rapidly places new samples onto an existing phylogeny using maximum parsimony. It is particularly helpful in understanding the relationships of newly sequenced SARS-CoV-2 genomes with each other and with previously sequenced genomes in a global phylogeny.
Updated versions:0.2.0 | Linux 64 0.2.0 | OS X INTEL
varlociraptor is a flexible, arbitrary-scenario, uncertainty-aware variant calling with parameter free filtration via FDR control.
Updated versions:2.3.0 | OS X INTEL 2.6.5 | Linux 64
VIRULIGN is a tool for codon-correct pairwise alignments, with an augmented functionality to annotate the alignment according the positions of the proteins.
Updated versions:1.0.1 | OS X INTEL 1.0.1 | Linux 64
WiggleTools package allows genomewide data files to be manipulated as numerical functions, equipped with all the standard functional analysis operators (sum, product, product by a scalar, comparators), and derived statistics (mean, median, variance, stddev, t-test, Wilcoxon's rank sum test, etc).
Updated versions:1.2.8 | Linux 64 1.2.8 | OS X INTEL
Software Training
Training sessions available to HMS trainees:
HMS Research Computing
New courses and registrations for Spring 2021 are now open.
See the HMS Research Computing Training Portal for the most current updates.
Date
|
Topic
|
|
April 7th
|
Systems Modeling and Controls with Simulink & Simscape
|
Register
|
April 21st
|
What’s New in MATLAB for Research
|
Register |
May 5th
|
Distance Learning and Virtual Labs
|
Register |
The Harvard Chan Bioinformatics Core
Courses for Spring 2021 are now open. See the Workshop Updates page for updates.
Topic |
Category |
Date |
Duration |
Prerequisites |
Command-line interface and the O2 cluster (shell/Unix/Linux) |
Basic |
March 5th, 9th, 12th |
Three 2.5h sessions |
None |
Bulk RNA-seq (Part I - FASTQ to counts) |
Advanced |
March 23rd, 26th, 30th & April 2nd |
Four 2.5h sessions |
Command-line interface |
R |
Basic |
April 13th, 16th, 20th, 23rd |
Four 2h sessions |
None |
Bulk RNA-seq (Part II - Differential gene expression) |
Advanced |
May 4th, 7th, 11th, 14th |
Four 2h sessions |
R |
scRNA-seq |
Advanced |
May 25th, 28th & June 1st |
Three 2.5h sessions |
R |
Bioinformatics Support
Need help getting software installed on new machines? Have you been planning to try Amazon Web Services (AWS) cloud computing?
BioGrids can help you get started. We have expertise in bioinformatics, programming, workflow development and high performance computing.
We improve the collection with feedback from the community.
Want to see a new application in BioGrids?
Let us know: help@biogrids.org
BioGrids is supported by Harvard Medical School and Boston Children's Hospital and relies on a framework that was developed by SBGrid.
|