News

« prev · 1 · 2 · next »

Human genome assemblies with nanopore, an update

May 23, 2018

We recently participated in a collaborative effort to sequence, assemble, and analyze a human genome (GM12878) using the Oxford Nanopore MinION (Jain et al. 2018). Since then, we’ve also developed a trio-based strategy for assembling complete haplotypes from long-read data (Koren et al. 2018). Oxford Nanopore has continued to advance in the meantime, releasing several major base-calling updates. Other tools, such as Nanopolish, have also gotten faster and added new functionality, like methylation-aware polishing. So, we decided to re-analyze the dataset from the paper using the latest base calling and assembly tools. The new assembly increases the NG50 to over 10 Mbp and trio binning accurately reconstructs key MHC genes for both haplotypes.

We are hiring!

October 4, 2017

The Genome Informatics Section is hiring! Come and join our outstanding team at the National Human Genome Research Institute. The official advertisement is for a postdoctoral position, but we will also consider outstanding postbacs or graduate students (via the NIH GPP). Simply follow the same application procedure described below. Update 2018-02-22: this position has now been filled.

Mash Screen: what's in my sequencing run?

September 25, 2017

Last year we published Mash (Ondov et al. 2016) for the rapid comparison of genomes and metagenomes. Mash builds on Andrei Broder’s foundational paper “On the resemblance and containment of documents” (Broder 1997), and uses the MinHash technique to rapidly estimate resemblance, e.g. the similarity of two whole genomes. Mash is great for this purpose, but is not well suited for estimating containment, e.g. detecting genomes contained within a metagenome. Implementing containment was the next logical step for Mash — it is in the title of the Broder paper, after all! Here we introduce a new Mash screen operation that implements containment in the context of genomics.

MashMap: approximate long-read mapping using minimizers and MinHash

May 22, 2017

Chirag Jain recently presented a paper at RECOMB’17 titled “A fast approximate algorithm for mapping long reads to large reference databases” (preprint | proceedings). This paper describes the algorithms behind MashMap, which is our new tool designed for approximate read mapping. Chirag joined the lab last year as a summer fellow, and I asked him to write a new read mapper. (How else does one learn bioinformatics?) He clearly lived up to the challenge, and I think the paper contains some useful ideas for the looming “long-read” era. I wanted to summarize those ideas here for anyone who missed RECOMB.

Assembling the Cliveome

April 28, 2017

We recently participated in a collaborative effort to sequence, assemble, and analyze a human genome (GM12878) using the Oxford Nanopore MinION (Jain et al. 2017). As part of that project, Josh Quick and Nick Loman developed a nanopore sequencing protocol capable of generating “ultra-long” reads of length 100 kb and greater. In the paper we predict that reads of such length could enable the most continuous human assemblies to date, with NG50 contig sizes exceeding 30 Mbp. Thus far, we have only collected 5x coverage using the ultra-read protocol and cannot fully test this prediction. However, another human dataset, the “Cliveome”, lets us compare the effect of read length and coverage on nanopore assembly. Here we present a brief analysis of that assembly, which achieved a remarkable contig NG50 of 24.5 Mbp.

Fast and highly accurate HLA typing by linearly-seeded graph alignment

March 22, 2017

HLA*PRG:LA approximates the graph alignment process by starting with linear sequence alignments. It brings down the resource requirements per sample for the HLA typing process to 30GB RAM/30 CPU hours, and produces highly accurate calls.

« prev · 1 · 2 · next »