The complete human genome sequence of the two X chromosomes and autosomes from the female tissue derived cell line has been completed. This includes the 8% of the genome sequence that was missing in the original draft that was released in 2001.
The complete human genome sequence of the entire 3.055 billion base pairs has been revealed by the Telomere-to-Telomere (T2T) Consortium. This represents the largest improvement to the human reference genome released in 2001 by Celera Genomics and the International Human Genome Sequencing Consortium. That genome sequence covered most of the euchromatic regions while either leaving out the heterochromatin regions or erroneous representation. These regions comprise 8% of the human genome that has finally been revealed. The new T2T-CHM13 reference1 includes complete sequence for all 22 autosomes plus Chromosome X. This new reference sequence has also corrected numerous errors, and has added approximately 200 million bp of novel sequences containing 2,226 gene copies, out of which 115 are predicted to be protein coding.
The current GRCh38.p13 reference genome has been as a result of two major updates, one in 2013 and the other one on 2019 on the 2001 Celera genome sequence. However, it still had 151 million base pairs of unknown sequence distributed throughout the genome, including pericentromeric and sub telomeric regions, duplications, gene and ribosomal DNA (rDNA) arrays, all of which are necessary for fundamental cellular processes. The new reference has been named as T2T-CHM13 as it comes from sequencing the DNA from CHM13 (Complete Hydatiform Mole) cell line and is performed by T2T consortium. The cell line is derived from abnormal fertilized egg or an overgrowth of tissue from the placenta in which women appears to be pregnant (false pregnancy), hence the sequence represents only of the two X chromosomes and autosomes of the female. Multiple sequence technologies have been put to use such as PacBio, Oxford Nanopore, 100X and 70X Illumina sequencers to name a few. The technological advances in sequencing have led to the sequencing of the remaining 8% as mentioned above.
The only limitation of the T2T-CHM13 sequence is the lack of a Y chromosome. This sequencing is currently underway, using the DNA from the HG002 cell line, which has a 46 (23 pairs) with a XY karyotype. The sequence will then be assembled using the same methods developed for the homozygous CHM13 genome.
The availability of T2T-CHM13 as a new reference genome represents a major breakthrough that will help in understanding the role of heterochromatin regions and help understand its effects on the cellular processes in greater detail. Till the Y chromosome sequencing is completed, this will serve as the reference genome for future studies in understanding the cellular processes and functions.
- Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze A V, Mikheenko A et al. The complete sequence of a human genome bioRxiv 2021.05.26.445798; DOI: https://doi.org/10.1101/2021.05.26.445798