20 years ago, the Human Genome Project made history by releasing the first sequences of the human genome. This development in genome sequencing came soon after the biotech firm Celera Genomics utilized data from the Human Genome Project to achieve large reductions in the cost of sequencing.
However, technological limitations meant the researchers couldn’t fill all the gaps in their final sequence. Reads of DNA from regions that held long sections of repeated base pairs posed major challenges. So, when the researchers finished the first draft of the sequence, they were missing approximately 15%.
Since then, researchers have dedicated momentous time and resources to fill these gaps. But even the most recent genome, which researchers completed in 2013 and patched in 2019, lacks 8% of the full sequence because of challenges associated with sequencing heterochromatin and other complex sections of DNA. That said, geneticists from the Telomere-to-Telomere (T2T) Consortium have utilized the latest technologies to complete more of the human genome sequence, leaving only the Y chromosome.
The life sciences journal BioTechniques publishes updates in next-generation sequencing, including developments in the human genome sequence. Here, BioTechniques explains why the T2T Consortium utilized long-read technologies to sequence the human genome, how they sequenced the human genome, and how the Consortium’s future work may fill the final gaps in the genome.
What Is the T2T Consortium?
The T2T Consortium is an international association of approximately 30 institutions. Karen Miga from theUniversity of California (CA, USA), Evan E Eichler from the University of Washington School of Medicine (DC, USA), and Adam Phillippy from the National Human Genome Research Institute (NY, USA) set up the Consortium to conduct research into “unmappable” centromere regions.
In May 2021, the Consortium published a preprint titled “The complete sequence of a human genome”, which explains how its researchers sequenced the remaining parts of the human genome using long-read sequencing technologies. The researchers added 115 protein-coding genes and nearly 200 million DNA base pairs to the human genome sequence, marking a 4.5% increase in the number of base pairs and a 0.4% increase in the number of protein-coding genes, which has now reached 19,969.
What Are the Benefits of Long-Read Techniques Over Short-Read Techniques?
To complete their latest draft of the human genome sequence (T2T-CHM13), the T2T Consortium employed some of the latest next-generation sequencing technologies from Pacific Biosciences (CA, USA) and Oxford Nanopore (Oxford, UK). These long-read sequencing technologies allowed the researchers to sequence stretches of DNA that contained as many as 20,000 base pairs at the same time.
Long-Read Sequencing Techniques
Although long-read sequencing techniques were, at their outset, prone to errors, Pacific Biosciences’ latest long-read sequencing technology enables scientists to identify minute variations in long stretches of repeated sequences, making long, repetitive chromosome segments tractable. Meanwhile, Oxford Nanopore’s platform captures several modifications to DNA that modulate gene expression. This enabled the T2T researchers to map genome-wide “epigenetic tags”.
Short-Read Sequencing Techniques
On the other hand, the short-read sequencing techniques previously employed only allow researchers to sequence a few hundred base pairs at the same time. After sequencing, researchers must then reassemble the base pairs, much like puzzle pieces. Although the reads produced by short-read technologies are accurate, they aren’t long enough to map highly repetitive genomic sequences unambiguously, such as the centromeres that coordinate the partitioning of newly replicated DNA during cell division and the telomeres that cap chromosome ends. Meanwhile, the longer stretches of DNA enabled by long-read sequencing techniques are easier to piece together because they are more likely to contain sequences that overlap.
How Did the T2T Consortium Sequence the Human Genome?
Instead of taking DNA from a living human, the T2T Consortium utilized a cell line from a complete hydatidiform mole. (A hydatidiform mole is a type of tissue that forms in humans when a sperm fertilizes an egg that doesn’t have a nucleus.) The cell line contained two identical sets of chromosomes. The benefit of this was that the researchers didn’t have to distinguish between chromosomes from two individuals as the sperm cell only carried an X chromosome. However, this approach also meant that the new sequence didn’t cover the Y chromosome, which activates male biological development.
On top of this, the T2T Consortium estimates that approximately 0.3% of the sequence could contain errors because of challenges relating to passaged cell lines and difficulties performing quality checks on some problematic areas of the genome.
What Does the Future Hold for the Human Genome Sequence?
The T2T Consortium is now attempting to sequence the Y chromosome using the same method applied in its latest draft. The researchers plan to sequence a genome that contains chromosomes from two parents and have teamed up with the Human Pangenome Reference Consortium to sequence over 300 genomes from around the globe over the next three years. They will use T2T-CHM13 as a reference to understand which parts of the genome usually differ among individuals.
As new next-generation sequencing tools, technologies, and resources continue to emerge, ongoing research into the human genome sequence should fill in the remaining gaps and help us identify links between newly sequenced areas and human diseases. As a result, human genome sequencing could become a mainstream practice over upcoming years.
Sharing methodologies in life sciences and medicine
First printed in 1983, BioTechniques was the first publication to review lab methodologies instead of treatments. Today, the open-access, peer-reviewed publication has developed a global reputation as a leading life sciences journal. Scientists from an array of disciplines, like physics, chemistry, climate science, plant and agricultural science, and computer science, use the journal to understand the reproducibility of new methods that will contribute to the future of science and medicine.
.