Genome Informatics Section

Our section develops and applies computational methods for the analysis of massive genomics datasets, focusing on the challenges of genome sequencing and comparative genomics. We aim to improve such foundational processes and translate emerging genomic technologies into practice.


Going ape for T2T

November 27, 2023

Last year we released complete, gapless, “T2T” sex chromosomes for chimp, bonobo, gorilla, Sumatran orangutan, Bornean orangutan, and siamang gibbon. This December we are proud to announce our latest preprint “The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes”! Over the past year, we have also finished the autosomes for these genomes! The v2.0 assemblies for these species are now available from our T2T-primates project page, and all of the raw HiFi, ONT, Hi-C, and Illumina sequencing data can be found on GenomeArk. This has been a Herculean effort involving nearly everyone in the lab and a large swath of the T2T team. It turns out that finishing six genomes is a lot more work than finishing one! A huge thank you to everyone involved, especially Kateryna Makova for spearheading the project.

The Q100 project

October 31, 2023

Today, we are excited to release the v1.0 T2T assembly of the HG002 benchmark genome! This assembly is part of what we have dubbed the “Q100” project, or in other words, our quest to assemble a completely error-free human genome (in the Phred QV scale, Q100 equates to an error rate of 1 per 10 billion bases). The Genome in a Bottle consortium has released some tremendous resources over the years, including DNA reference materials such as HG002. However, these reference materials are currently defined as a list of variants called against the GRCh38 reference genome. A more natural representation is the complete sequence of the genome itself, i.e. a “genome benchmark” as opposed to a “variant benchmark”. This is our first step towards creating such a genome benchmark. We will have much more to say about this in the coming year, but for now you can find more information at the GitHub page linked above.

The reference genome of Macropodus opercularis (the paradise fish)
Scientific Data, May 25, 2024
Fodor E, Okendo J, Szabó N, Szabó K, Czimer D, Tarján-Rácz A, Szeverényi I, Low BW, Liew JH, Koren S, Rhie A, Orbán L, Miklósi Á, Varga M, Burgess SM
Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph
Nature Methods, May 10, 2024
Cheng H, Asri M, Lucas J, Koren S, Li H


A single molecule sequence assembler for genomes large and small


Fast genome and metagenome distance and containment estimation using MinHash


Interactively explore metagenomes and more from a web browser