Researchers have created the largest-ever family tree – a genealogical network of human genetic diversity – that is a major step towards mapping the genetic relationships between all humans.
Published in Science, the study reveals in unprecedented detail how individuals across the world are related to each other. It predicts common ancestors, approximately when and where they lived, and even key events in human evolutionary history such as the migration out of Africa.
“We have basically built a huge family tree, a genealogy for all of humanity that models as exactly as we can the history that generated all the genetic variation we find in humans today,” says author Dr Yan Wong, an evolutionary geneticist at the University of Oxford’s Big Data Institute, UK.
“This genealogy allows us to see how every person’s genetic sequence relates to every other, along all the points of the genome.”
The past two decades have seen incredible advancements in human genetic research. We have generated massive amounts of data by sequencing thousands of genomes from prehistoric and modern humans. But finding a way to combine these from many different databases and develop algorithms to handle the enormous amounts of data has been a major challenge.
Now, scientists have devised a new method to do this, using “tree sequences” to accommodate potentially millions of genome sequences.
As regions of the human genome are only inherited from one of our parents, the ancestry of each genetic region can be traced back in time – a bit like a family tree – to the ancestor where the particular genetic variation first appeared. This is a tree sequence.
“Essentially, we are reconstructing the genomes of our ancestors and using them to form a vast network of relationships,” explains lead author Dr Anthony Wilder Wohns, from the Broad Institute of Massachusetts Institute of Technology and Harvard University, U.S. “We can then estimate when and where these ancestors lived.”
The researchers integrated data on modern and ancient human genomes from eight different databases, including a total of 3609 individual genome sequences from 215 populations. Using computer algorithms, they were then able to predict where common ancestors must be present in the evolutionary trees of these individuals to explain the patterns of genetic variation seen.
The resulting network contained almost 27 million ancestors and, after adding location data on these sample genomes, it could even be used to estimate where the predicted common ancestors had lived in time.
The team plans to make this genealogical map even more comprehensive by incorporating more genetic data into it as it becomes available. This is only possible because tree sequences store data in a highly efficient way, which could result in millions of additional genome sequences being added.
“This study is laying the groundwork for the next generation of DNA sequencing,” says Wong. “As the quality of genome sequences from modern and ancient DNA samples improves, the trees will become even more accurate and we will eventually be able to generate a single, unified map that explains the descent of all the human genetic variation we see today.”
The researchers suggest that the underlying method used could also have applications in medical research, such as the identification of genetic predictors of disease risk.
“While humans are the focus of this study, the method is valid for most living things, from orangutans to bacteria,” says Wohn. “It could be particularly beneficial in medical genetics, in separating out true associations between genetic regions and diseases from spurious connections arising from our shared ancestral history.”