« prev · 1 · 2 · next »

We are hiring!

October 2, 2019

The Genome Informatics Section is hiring! Come join our outstanding team at the NIH’s National Human Genome Research Institute and contribute to the development of new reference genomes and computational methods for DNA sequencing and analysis. Both postdoc and PhD students positions are available. More information and application instructions follow below. Update 2020-02-14: these positions have now been filled.

De novo assembly of haplotype-resolved genomes with trio binning

October 22, 2018

Our latest paper with Tim Smith (USDA) is now out in Nature Biotechnology — “Complex allelic variation hampers the assembly of haplotype-resolved sequences from diploid genomes. We developed trio binning, an approach that simplifies haplotype assembly by resolving allelic variation before assembly … Trio binning uses short reads from two parental genomes to first partition long reads from an offspring into haplotype-specific sets. Each haplotype is then assembled independently, resulting in a complete diploid reconstruction.” Here are links to the full paper and a nice summary from NHGRI with quotes from me and Tim. Credit to Sergey Koren and Arang Rhie for developing this great new method. We have many more trios planned!

Human genome assemblies with nanopore, an update

May 23, 2018

We recently participated in a collaborative effort to sequence, assemble, and analyze a human genome (GM12878) using the Oxford Nanopore MinION (Jain et al. 2018). Since then, we’ve also developed a trio-based strategy for assembling complete haplotypes from long-read data (Koren et al. 2018). Oxford Nanopore has continued to advance in the meantime, releasing several major base-calling updates. Other tools, such as Nanopolish, have also gotten faster and added new functionality, like methylation-aware polishing. So, we decided to re-analyze the dataset from the paper using the latest base calling and assembly tools. The new assembly increases the NG50 to over 10 Mbp and trio binning accurately reconstructs key MHC genes for both haplotypes.

We are hiring!

October 4, 2017

The Genome Informatics Section is hiring! Come and join our outstanding team at the National Human Genome Research Institute. The official advertisement is for a postdoctoral position, but we will also consider outstanding postbacs or graduate students (via the NIH GPP). Simply follow the same application procedure described below. Update 2018-02-22: this position has now been filled.

Mash Screen: what's in my sequencing run?

September 25, 2017

Last year we published Mash (Ondov et al. 2016) for the rapid comparison of genomes and metagenomes. Mash builds on Andrei Broder’s foundational paper “On the resemblance and containment of documents” (Broder 1997), and uses the MinHash technique to rapidly estimate resemblance, e.g. the similarity of two whole genomes. Mash is great for this purpose, but is not well suited for estimating containment, e.g. detecting genomes contained within a metagenome. Implementing containment was the next logical step for Mash — it is in the title of the Broder paper, after all! Here we introduce a new Mash screen operation that implements containment in the context of genomics.

MashMap: approximate long-read mapping using minimizers and MinHash

May 22, 2017

Chirag Jain recently presented a paper at RECOMB’17 titled “A fast approximate algorithm for mapping long reads to large reference databases” (preprint | proceedings). This paper describes the algorithms behind MashMap, which is our new tool designed for approximate read mapping. Chirag joined the lab last year as a summer fellow, and I asked him to write a new read mapper. (How else does one learn bioinformatics?) He clearly lived up to the challenge, and I think the paper contains some useful ideas for the looming “long-read” era. I wanted to summarize those ideas here for anyone who missed RECOMB.

« prev · 1 · 2 · next »