News

« prev · 1 · 2 · 3 · next »

The (near) complete sequence of a human genome

September 22, 2020

The Telomere-to-Telomere (T2T) consortium is proud to announce our v1.0 assembly of a complete human genome. This post briefly summarizes our work over the past year, including a month-long virtual workshop in June, as we strove to complete as many human chromosomes as possible. Our progress over the summer exceeded our wildest expectations and resulted in the completion of all human chromosomes, with the only exception being the 5 rDNA arrays. Our v1.0 assembly includes more than 100 Mbp of novel sequence compared to GRCh38, achieves near-perfect sequence accuracy, and unlocks the most complex regions of the genome to functional study. We plan to release a series of preprints in the coming months that fully describe our methods and analyses, but due to its tremendous value, we are releasing the assembly immediately.

De novo assembly of haplotype-resolved genomes with trio binning

October 22, 2018

Our latest paper with Tim Smith (USDA) is now out in Nature Biotechnology — “Complex allelic variation hampers the assembly of haplotype-resolved sequences from diploid genomes. We developed trio binning, an approach that simplifies haplotype assembly by resolving allelic variation before assembly … Trio binning uses short reads from two parental genomes to first partition long reads from an offspring into haplotype-specific sets. Each haplotype is then assembled independently, resulting in a complete diploid reconstruction.” Here are links to the full paper and a nice summary from NHGRI with quotes from me and Tim. Credit to Sergey Koren and Arang Rhie for developing this great new method. We have many more trios planned!

Human genome assemblies with nanopore, an update

May 23, 2018

We recently participated in a collaborative effort to sequence, assemble, and analyze a human genome (GM12878) using the Oxford Nanopore MinION (Jain et al. 2018). Since then, we’ve also developed a trio-based strategy for assembling complete haplotypes from long-read data (Koren et al. 2018). Oxford Nanopore has continued to advance in the meantime, releasing several major base-calling updates. Other tools, such as Nanopolish, have also gotten faster and added new functionality, like methylation-aware polishing. So, we decided to re-analyze the dataset from the paper using the latest base calling and assembly tools. The new assembly increases the NG50 to over 10 Mbp and trio binning accurately reconstructs key MHC genes for both haplotypes.

Mash Screen: what's in my sequencing run?

September 25, 2017

Last year we published Mash (Ondov et al. 2016) for the rapid comparison of genomes and metagenomes. Mash builds on Andrei Broder’s foundational paper “On the resemblance and containment of documents” (Broder 1997), and uses the MinHash technique to rapidly estimate resemblance, e.g. the similarity of two whole genomes. Mash is great for this purpose, but is not well suited for estimating containment, e.g. detecting genomes contained within a metagenome. Implementing containment was the next logical step for Mash — it is in the title of the Broder paper, after all! Here we introduce a new Mash screen operation that implements containment in the context of genomics.

MashMap: approximate long-read mapping using minimizers and MinHash

May 22, 2017

Chirag Jain recently presented a paper at RECOMB’17 titled “A fast approximate algorithm for mapping long reads to large reference databases” (preprint | proceedings). This paper describes the algorithms behind MashMap, which is our new tool designed for approximate read mapping. Chirag joined the lab last year as a summer fellow, and I asked him to write a new read mapper. (How else does one learn bioinformatics?) He clearly lived up to the challenge, and I think the paper contains some useful ideas for the looming “long-read” era. I wanted to summarize those ideas here for anyone who missed RECOMB.

« prev · 1 · 2 · 3 · next »