What is the human pangenome and why do we need it?
A sequence for the human genome was first published in 2001, but this original reference doesn’t reflect the full genetic diversity of humanity – something a new “pangenome” attempts to solve
By Michael Le Page
10 May 2023
The pangenome aims to better reflect the diversity of human populations
Darryl Leja, NHGRI
An effort to expand on the Human Genome Project by capturing the diversity of people around the world has produced the first draft of a new resource called the “pangenome reference”.
What is a pangenome?
It is a set of genomes from many individuals put together to show where the sequences are identical or different. The draft human pangenome consists of 47 genomes, and the plan is to expand this to 350 genomes by 2024.
Why do we need it?
The pangenome will help researchers discover what effects genetic variants have, and to develop treatments for conditions linked to those variants. At present, some variants are essentially invisible to researchers because of the reliance on a single reference genome.
Advertisement
Hold on, what is a reference genome?
It is a kind of map. When researchers sequence someone’s DNA, they get lots of pieces that they put together based on where they fit on the reference genome. It is a bit like assembling a skeleton by looking in an anatomy textbook to see where each bone fits. For the vast majority of bones, that works fine, but some people have extra bones such as cervical ribs that aren’t in the textbook. “Currently, when we map a sequence from a patient, there’s always a fraction of the sequence, sometimes a significant fraction, that can’t be mapped,” says Evan Eichler at the University of Washington in Seattle.
Whose DNA was the reference genome based on?
The reference genome was supposed to be made from a mix of DNA from 20 anonymous donors, but in the end, 73 per cent of it came from one individual. Later analyses have shown that that person was African American, and also that the next biggest donor, at around 6 per cent, was mainly of east Asian ancestry.
We have already sequenced millions of genomes. Why haven’t we got a pangenome already?
The many genomes we have sequenced are far from complete – in fact, the single reference genome was only 92 per cent complete when the Human Genome Project was declared “complete”. Only short pieces of DNA could be sequenced at the time and because much of the genome is highly repetitive, many of these small pieces couldn’t be reassembled. The pangenome project has used methods that produce much longer pieces, known as “reads”. As a result, the pangenome is based on extremely high-quality sequences that are 99 per cent complete.