
Through the Human Genome Project, which began in 1990 and concluded in 2003, the scientific community gained access to a wealth of knowledge in the form of a reference genome representing approximately 92% of our genetic blueprint. While the insights gained from this 13-year endeavor have been invaluable, the remaining 8% of the human genome – comparable in size to an entire chromosome – has left unanswered questions about yet undiscovered genetic variations and their potential impact on human health. In the two decades since the publication of the first human reference genome, sequencing technologies have become much faster, less expensive and more accurate, allowing researchers to fill in the gaps and deepen our understanding of human genomic variation. Now, the Telomere to Telomere (T2T) consortium, which included leadership from researchers at the National Human Genome Research Institute (NHGRI), University of California, Santa Cruz and University of Washington, Seattle, has generated the first-ever complete, gapless sequence of a human genome, providing a new comprehensive view of the roughly 3 billion bases in our DNA blueprint.
To complete the sequence, the T2T researchers developed a uniformly homozygous human cell line, T2T-CHM13, and leveraged advancements in sequencing technology that have been made in recent years, namely more accurate long-read sequencing capabilities. Long- and ultralong-read methods helped ensure that some of the most challenging and repetitive stretches of the genome could be fully mapped, such as areas near the telomeres and centromeres of the chromosome. For these purposes, the researchers utilized the Oxford Nanopore DNA sequencing method, which can read up to 1 million DNA bases in a single read with modest accuracy, and the PacBio HiFi DNA sequencing method, which can read about 20,000 bases with nearly perfect accuracy, according to the NHGRI.
The results of the T2T consortium’s work have been published in a package of six papers in the journal Science, including the article “The complete sequence of the human genome” by Nurk et al., which describes the methods used for sequencing. The researchers have used the now-complete sequence as a reference to discover more than 2 million additional variants in the human genome, as well as provide more accurate information about genomic variants within 622 medically relevant genes. Several other research groups are also using a pre-release version of the complete human genome to aid in their own research, according to the NHGRI. The completed genome opens up new possibilities for the study of human disease and can ultimately lead to improved clinical diagnostics and care for patients in the future.
“In the future, when someone has their genome sequenced, we will be able to identify all of the variants in their DNA and use that information to better guide their healthcare,” said consortium co-chair Adam Phillippy, whose research group at NHGRI led the finishing effort. “Truly finishing the human genome sequence was like putting on a new pair of glasses. Now that we can clearly see everything, we are one step closer to understanding what it all means.”
The sequencing effort also included researchers from Johns Hopkins University, the University of Connecticut, the University of California, Davis, Howard Hughes Medical Institute and the National Institute of Standards and Technology. The six papers reporting this accomplishment were published in Science on March 31, 2022, along with companion papers in several other journals.