April 2018 Newsletter
Our April newsletter brings eleven new software titles and three updates. The Software Spotlight below provides an overview of the application StringTie, a transcriptome assembler. The companion programs GFFRead and GFFCompare are also included in this month's new releases.
StringTie
Overview (from the StringTie website ) StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads.In order to identify differentially expressed genes between experiments, StringTie's output can be processed by specialized software like Ballgown, Cuffdiff or other programs (DESeq2, edgeR, etc.).
Many transcriptome assemblers have a hard time assembling transcripts in regions where introns have been retained, even in a small percentage of transcripts. Here is an example where RNA-seq data from a human kidney cell line shows increased transcription activity in the region containing the miR-17-92 cluster, one of the most potent oncogenic miRNA polycistrons. The six miRNAs included in the miR-17-92 cluster reside inside the third intron of the MIR17HG non-coding protein. Read alignments across the entire span of this intron prevent other transcriptome assemblers from assembling it correctly, but StringTie gets it right, as shown here. The BAM alignment file (produced by TopHat2) of the RNA-seq data from this region can be downloaded from here (http://ccb.jhu.edu/software/stringtie/dl/mir-17-92.bam) . miR example Highly covered regions pose a great challenge for transcriptome assembly and most software cannot handle them. RNA-seq data from the cytosol of fetal lung fibroblasts (downloaded from the ENCODE data set , GEO accession GSM981244) shows a very high level of expression for the COL1A1 gene, although StringTie can still handle it. The TopHat alignment BAM file of the RNA-seq data from this region can be downloaded from here .
How long does it take to run StringTie?
StringTie is not only accurate but also very fast compared to most other transcriptome assemblers. Here we show some typical running times for StringTie and Cufflinks on four large real data sets including three human RNA-seq data sets downloaded from the ENCODE project (GEO accessions GSM981256, GSM981244, and GSM984609) and one RNA-seq data generated from nuclear RNA from a human kidney cell line (NCBI Study accession number SRP041943 ). Both programs were run on the same multi-core 2.1 GHz AMD Opteron servers using 8 threads. Time is shown in as hours:minutes. run times Do you have a favorite software package you'd like to use in BioGrids or see highlighted here? Drop a note to help@biogrids.org.
BioGrids Installer
BioGrids Installer Download
New to BioGrids? Give it a try today! The BioGrids Installer is easy to use. Download and activate using the link button above.
Need a jumpstart with your bioinformatics software, workflow development or large scale computations? BioGrids can help. We have expertise in bioinformatics, programming, workflow development and high performance computing. Send a note to: help@biogrids.org
Software Update
For a full listing of available applications see the BioGrids.org website.
New Software
Software Updates
Training sessions available to HMS trainees
May 4 1-3p Intro to R/Bioconductor TMEC 423 May 2 3-5p Intro to Perl TMEC 333 May 9 3-5p Intro to Git/GitHub Countway 403 May 16 3-5p Intro to MATLAB Countway 403 May 23 3-5p Intro to Parallel Computing Countway 403 May 30 3-5p Intro to O2 Countway 403
No classes scheduled.
The Harvard Chan Bioinformatics Core
May 30th - May 31st Introduction to shell and High-Performance Computing. Registration opens on May 9th.
Happy to help.
We'd like to hear from you! We improve the collection with feedback from the community. Want to see a new application in BioGrids? Let us know: help@biogrids.org
============================================================
BioGrids is supported by the HMS TnT fund and based upon SBGrid.org