Genome Informatics Section

News

MashMap: approximate long-read mapping using minimizers and MinHash

May 22, 2017

Chirag Jain recently presented a paper at RECOMB’17 titled “A fast approximate algorithm for mapping long reads to large reference databases” (preprint | proceedings). This paper describes the algorithms behind MashMap, which is our new tool designed for approximate read mapping. Chirag joined the lab last year as a summer fellow, and I asked him to write a new read mapper. (How else does one learn bioinformatics?) He clearly lived up to the challenge, and I think the paper contains some useful ideas for the looming “long-read” era. I wanted to summarize those ideas here for anyone who missed RECOMB.

Assembling the Cliveome

April 28, 2017

We recently participated in a collaborative effort to sequence, assemble, and analyze a human genome (GM12878) using the Oxford Nanopore MinION (Jain et al. 2017). As part of that project, Josh Quick and Nick Loman developed a nanopore sequencing protocol capable of generating “ultra-long” reads of length 100 kb and greater. In the paper we predict that reads of such length could enable the most continuous human assemblies to date, with NG50 contig sizes exceeding 30 Mbp. Thus far, we have only collected 5x coverage using the ultra-read protocol and cannot fully test this prediction. However, another human dataset, the “Cliveome”, lets us compare the effect of read length and coverage on nanopore assembly. Here we present a brief analysis of that assembly, which achieved a remarkable contig NG50 of 24.5 Mbp.

People
Software

Canu

A single molecule sequence assembler for genomes large and small

Mash

Fast genome and metagenome distance estimation using MinHash
Publications
Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes
Briefings in Bioinformatics, August 7, 2017
Olson ND, Treangen TJ, Hill CM, Cepeda-Espinoza V, Ghurye J, Koren S, Pop M
Draft Genome Sequences from a Novel Clade of Bacillus cereus Sensu Lato Strains, Isolated from the International Space Station
Genome Announcements, July 12, 2017
Venkateswaran K, Sielaff AC, Ratnayake S, Pope RK, Blank TE, Stepanov VG, Fox GE, van Tongeren SP, Torres C, Allen J, Jaing C, Pierson D, Perry J, Koren S, Phillippy AM, Klubnik J, Treangen TJ, Rosovitz MJ, Bergman NH