ÁńÁ«ĘÓƵ

New Human Reference Genome Opens Unexplored Regions

News
Group of three facing camera
Assistant professor Megan Dennis (center) with graduate student Colin Shew (left) and Daniela Soto outside the UC Davis Genome Center. Dennis’ lab, with Professor Charles Langley at the College of Biological Sciences, took part in an NIH-led consortium that has completed sequencing of the human genome. (Karin Higgins/UC Davis)

A complete sequence of the human genome has finally been published by an international consortium of scientists. The new reference genome fills in gaps left by earlier drafts, which will help researchers better understand genetic variation and how it can sometimes lead to disease.

The work is described in a series of papers published April 1 in Science by the Telomere-to-Telomere (T2T) Consortium. A number of ÁńÁ«ĘÓƵ, Davis, investigators contributed to the studies, including Megan Dennis, assistant professor of biochemistry and molecular medicine at the UC Davis Genome Center, School of Medicine and MIND Institute, with integrative genetics and genomics graduate students Daniela Soto and Colin Shew, as well as Charles Langley, distinguished professor of evolution and ecology at the UC Davis College of Biological Sciences along with his daughter Sasha Langley, a project scientist at UC Berkeley.

The original human genome sequence, published in 2001, left out about 8% of the DNA, Dennis said. The areas left out included nearly identical duplications containing functional genes as well as centromeres and telomeres in the middle and at the tip of chromosomes respectively. These areas contain long runs of repeated sequences.

“These are important regions but difficult to sequence,” Dennis said.

Sequencing a genome is rather like slicing up a book into snippets of text then trying to reconstruct the book by piecing them together again. Stretches of text that contain a lot of common or repeated words and phrases would be harder to put in their correct place than more unique pieces of text.

Earlier DNA sequencing technology could only read relatively short runs of sequence.

“A major leap in technology has been long-read sequencing,” Dennis said. Newer generation sequencers can decode much longer pieces, as much as a million base-pairs or “letters” of DNA. That means the chunks are much larger and easier to assemble back into the original sequence.

“It’s a game changer,” Dennis said.

UC Davis researchers contributed to the project by carrying out some of the long-read sequencing with machines at the Genome Center, and by analyzing variants and duplicated sequences.

The new reference genome comes from a single human sample, although not exactly a person. The DNA came from a cell line derived from a bundle of cells called a hydatidiform mole. These form when an egg in the uterus loses its own genome but gets fertilized by a sperm. The resulting cell ends up with two identical copies of each chromosome, unlike most human cells, which carry two slightly different copies. Despite its odd origin, there’s nothing to suggest anything out of the ordinary with the cell line’s genome, Dennis said.

The sperm came from a person of European descent. In contrast, the original human reference genome was stitched together from several people, creating some errors and artifacts. 

Exploring the centromere

About 90% of the new sequence actually comes from the centromeres of chromosomes, Langley said. Structurally distinct a