November 2020 Newsletter
Our November newsletter includes ten new and updated software titles. We include a Software Spotlight os Dask, a high level library for scaling Python code. We remind users that OSX Big Sur is not yet supported by BioGrids.
Upgrade Your BioGrids Install Client
BioGrids Installer version 1.0.694 and earlier will display a message: "Missing Internet connection". This is due to an expired certificate.
All users must upgrade the install client to continue using it.
The issue has been corrected in the latest version (695), available here:
https://biogrids.org/wiki/client_install
Activated users can download the new version and use it without additional activation or registration steps.
As always, please let us know if you have any questions or problems upgrading - help@biogrids.org
Remote Working Help
The BioGrids Wiki provides step by step instructions for installing BioGrids software on a local laptop or desktop machine. If you prefer a live demonstration, or run into trouble, please contact help@biogrids.org. We can set up a Zoom meeting to assist you.
MacOS 10.15 Catalina
While we recommend not upgrading to 10.15 on any Mac with BioGrids already installed, we have implemented a workaround to install BioGrids and SBGrid on new machines. Two approaches are available.
Cite BioGrids
If your use of BioGrids supplied software was an important element in your publication, please include the following statement in your work:
"Software used in the project was installed and configured by BioGrids
(cite: eLife 2013;2:e01456, Collaboration gets the most out of software.)"
See our Grant Support page for additional details.
Register here to try out our software installer, which allows users to choose from over 290 bioinfomatics and life sciences tools that can be installed as ready-to-run applications on Mac or Linux machines with the click of a button or a short command from the CLI. No need to worry about dependencies or compilation.
BioGrids is supported by a team of scientists and engineers at HMS. We provide direct support to BioGrids members. This includes all aspects of software installation and management. If you need assistance of any kind please send a note to: help@biogrids.org.
BioGrids Installer
The BioGrids Installer is an easy to use application that makes installing and managing life sciences software simple and quick.
A command line version is also available for Macs and Linux. Download using the link button above and register here for activation.
The BioGrids team provides support, infrastructure and testing for scientific software packages. We currently provide 335 titles in five categories and over 1,500 R, python and perl packages and modules. The collection grows weekly. Learn more here: About BioGrids
BioGrids QuickStart
If you are new to BioGrids and would like to quickly get started with the command line version, follow the instructions below:
1: Download the BioGrids Installer command line version
Linux CLI
curl -kLO https://biogrids.org/wiki/downloads/biogrids-1.0.695-Linux.tgz
tar zxf biogrids-1.0.694-Linux.tgz
cd biogrids-1.0.694-Linux
OSX CLI
curl -kLO https://biogrids.org/wiki/downloads/biogrids-1.0.695-Darwin.tgz
tar zxf biogrids-1.0.694-Darwin.tgz
cd biogrids-1.0.694-Darwin
2: Activate biogrids
./biogrids activate biogrid-production jvinent1 70rYFTDnmCr93VUklfbf1s3M4jdyC9bFVYHew==
Replace the site name, user name and activation key with your own credentials.
3: Install software with BioGrids
./biogrids install fastqc trimmomatic samtools star subread igv
When finished, verify applications are installed:
./biogrids installed
Software Spotlight - Dask
https://dask.org
Dask is a flexible library for parallel computing in Python. It natively scales Python, providing advanced parallelism to enable performance at scale.
Dask is composed of two parts:
Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
“Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of dynamic task schedulers.
Dask emphasizes the following virtues:
Familiar: Provides parallelized NumPy array and Pandas DataFrame objects
Flexible: Provides a task scheduling interface for more custom workloads and integration with other projects.
Native: Enables distributed computing in pure Python with access to the PyData stack.
Fast: Operates with low overhead, low latency, and minimal serialization necessary for fast numerical algorithms
Scales up: Runs resiliently on clusters with 1000s of cores
Scales down: Trivial to set up and run on a laptop in a single process
Responsive: Designed with interactive computing in mind, it provides rapid feedback and diagnostics to aid humans
Get started using Dask with the tutorial here: https://github.com/dask/dask-tutorial
Software Updates
CellRanger - Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis.
Updated versions:4.0.0 | Linux 64
CUDA - a redistributable software libraries to support CUDA applications for Linux.
Updated versions:11.1 | Linux 64 11.0.3 | Linux 64 11.0 | Linux 64
dask - Dask is a flexible library for parallel computing in Python.
Updated versions:2.30.0 | Linux 64
idr - The IDR (Irreproducible Discovery Rate) framework is a unified approach to measure the reproducibility of findings identified from replicate experiments and provide highly stable thresholds based on reproducibility.
Updated versions:2.0.3 | Linux 64 2.0.3 | OS X INTEL
Python - a general-purpose, interpreted, object oriented, high-level dynamic programming language that emphasizes code readability. Its syntax allows programmers to express concepts in fewer lines of code than in C++ or Java, thus allowing programmers to work more quickly and integrate their systems more effectively.
Updated versions:3.6.5 | Linux 64 3.7.0 | Linux 64
R - a free software environment for statistical computing and graphics.
Updated versions:4.0.2 | OS X INTEL 3.6.2 | OS X INTEL 4.0.2 | Linux 64 3.6.2 | Linux 64
rclone - Rclone is a command line program to manage files on cloud storage.
Updated versions:1.53.2 | OS X INTEL 1.53.2 | Linux 64
RStudio - an integrated development environment (IDE) for R that includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
Updated versions:1.3.1093 | Linux 64 1.3.1093 | OS X INTEL
scvi-tools - (single-cell variational inference tools) is a package for end-to-end analysis of single-cell omics data.
Updated versions:0.7.0-beta.0 | Linux 64 0.6.8 | Linux 64
SEQLinkage - implements a collapsed haplotype pattern (CHP) method to generate markers from sequence data for linkage analysis.
Updated versions:1.0.1 | Linux 64
Software Training
Training sessions available to HMS trainees:
HMS Research Computing
Fall 2020 registration is now open.
See the HMS Research Computing Training Portal for the most current updates.
Intro to Python |
Wednesday, December 2, 2020 |
3-5p |
|
|
|
link |
Intro to R/Bioconductor |
Wednesday, December 9, 2020 |
3-5p |
|
|
|
link |
The Harvard Chan Bioinformatics Core
See the Workshop Updates page for recent changes.
Reproducible research with Git/Github and RMarkdown |
|
December 8th, 11th, 15th |
Three 2.5h sessions |
|
Bioinformatics Support
Need help getting software installed on new machines? Have you been planning to try Amazon Web Services (AWS) cloud computing?
BioGrids can help you get started. We have expertise in bioinformatics, programming, workflow development and high performance computing.
We improve the collection with feedback from the community.
Want to see a new application in BioGrids?
Let us know: help@biogrids.org
BioGrids is supported by Harvard Medical School and Boston Children's Hospital and relies on a framework that was developed by SBGrid.
|