How can science get genetic ancestry right?

Who’s afraid of genetic ancestry?

The concept of ancestry is used across population genetics, archaeology, and genetic genealogy. With the push for more clinical genomics and personalised medicine, it’s also increasingly popular in health research.

In a recent policy forum in the prestigious journal Science, an interdisciplinary team of scholars from fields including medicine, genetics, sociology and bioethics has questioned the use of genetic ancestry categories in scientific research.

But what exactly is genetic ancestry, and why does it matter how we use it in science?

What is genetic ancestry?

‘Genetic ancestry’ is not synonymous with plain old ‘ancestry’. Your ancestry, or genealogical ancestry, refers to your ancestors – the people you are descended from. That’s your parents, grandparents, great-grandparents and so on, stretching back through history.

However, your genome may only contain a small amount of information from some of these ancestors. Parents pass on only 50% of their DNA to their children, and those bits of parental DNA are shuffled and combined in different ways in each sperm and egg cell.

So, to know an individual’s genetic ancestry, you need to look at the genome itself and trace how the DNA segments have been passed on through time. This can be described in a representation called an ancestral recombination graph – something that looks a bit like a family tree, but specifically tracks the inheritance of DNA.

“We define it as the ways in which each of us has inherited these different sections of our genome through our family tree.”

Anna Lewis, Harvard University

So far, so good. The main issue comes from how scientists and others choose to categorise and label this ancestry.

“We define it [genetic ancestry] as the ways in which each of us has inherited these different sections of our genome through our family tree,” says Anna Lewis, a research associate at the Edmond J. Safra Centre for Ethics at Harvard University and lead author on the Science forum. “You might notice that that definition doesn’t refer to any sort of groupings.”

However, scientific studies frequently do group people into categories, and one system is particularly popular. A 2017 analysis of articles in the journal Nature Genetics reported a marked growth since the 1990s in the use of continental ancestry labels, such as ‘African’ and ‘European’ ancestry.

These labels may not be based on direct knowledge of an individual’s genetics, but simply on their presumed or self-identified ancestry. For example, someone who is white may be placed in the category of ‘European ancestry’, without knowing if every ancestor who contributed meaningfully to that individual’s DNA actually came from Europe.

The long shadow of race science

To grasp what’s at stake here, it’s important to understand a little of the dark history of race and science – a huge topic in itself – and how it echoes in genetics and medicine today.

In 1735, the Swedish taxonomist Carl Linnaeus – father of the modern scientific naming system  – divided humanity into four coloured “races”: ‘white’ Europeans, ‘red’ Americans, ‘yellow’ Asians, and ‘black’ Africans. Western science has struggled to transcend similar categorisations ever since. This is despite genetic data clearly showing that human “races”, as commonly understood, do not correspond to clearly distinguishable genetic groups. Two people can belong to the same “race” and have very different DNA and ancestries.

Nevertheless, racial categories can still permeate scientific and medical practice. For example, a 2020 article in the New England Journal of Medicine highlights how many algorithms used by healthcare professionals for clinical decision-making explicitly factor in the patient’s race – often in a way that produces better medical care for white patients.

Genetics dna genes
Example of Race Correction in Clinical Medicine excerpt. Credit: Darshali A. Vyas, M.D., Leo G. Eisenstein, M.D., and David S. Jones, M.D., Ph.D. / Hidden in Plain Sight — Reconsidering the Use of Race Correction in Clinical Algorithms.

Race needs to be understood, experts say, as a social and political construct, not as biologically intrinsic. Both Lewis and Aylwyn Scally, a population geneticist at the University of Cambridge, point out that in the early 20th-century United States and Europe, people from southern European countries such as Italy and Greece were often excluded from the category of whiteness, conceptualised as a different race from northern Europeans.

“Who gets defined as what is very context-dependent, and very often reflects what categories are going to suit whoever’s in power,” Lewis summarises.

“Race is an appropriate proxy for racism, and that’s an important thing to track,” she continues. “So in various studies, it’s very appropriate to collect race as a variable … but race is not a good proxy for anything biological.”

Race needs to be understood, experts say, as a social and political construct, not as biologically intrinsic.

But we do need some way to talk about human genetic variation, even though race isn’t it. Loïc Yengo, a statistical geneticist at the University of Queensland, works on developing models to predict disease risk based on DNA. Currently, he says, scientists’ and clinicians’ ability to make these predictions accurately is hampered because the majority of available genetic data comes from people of European ancestry.

Because those data only represent a certain subset of the genetic diversity of our whole species, it means that predictions of disease may be less accurate for people who don’t share the DNA variants that are common in that database.

Genetic ancestry to the rescue?

Against this backdrop, genetic ancestry seems like a more scientific, less problematic, and all-round more attractive way to think about human variation.

But is genetic ancestry living up to this promise? In their recent paper, Lewis and her colleagues argue that it’s not.

For one thing, continental ancestry groupings don’t accurately capture the breadth of human genetic diversity. People, and their genomes, don’t fall neatly into continental ‘types’. Instead, we all exist on a spectrum of being more closely or distantly related, genetically speaking.

“This is consistent with what we know of human demographic history, in which mass migration and constant mixing across groups have been the norm,” Lewis and colleagues write. “We all have multiple ancestries depending on the time horizon considered.”

Scally agrees: “We know there’s been gene flow and migration and movements between parts of the world and between groups of people constantly throughout our history, as far as we can tell … And genetic ancestry, as a result, reflects that complexity.”

People, and their genomes, don’t fall neatly into continental ‘types’.

In addition, Lewis and her co-authors warn that uncritical use of continental ancestry categories risks continuing the same simplistic and harmful racial categories that the field is trying to avoid. It’s hard to see how continental ancestry categories help matters if we simply replace “white” with “European ancestry”, “black” with “African ancestry”, “Asian” with “East Asian ancestry” and so forth.

“The danger is that we just switch out one set of words for another, and we make no progress and we end up with exactly the same problems that we’ve been trying to solve,” says Lewis.

“Continued uncritical conflation of ‘race’ and genetic ancestry shapes how people interpret research results and then make decisions about funding allocation, research priorities, and patient care based on that faulty interpretation,” adds Aliya Saperstein, a sociologist at Stanford University.

“If the existence of racism and other ‘environmental factors’ are not properly accounted for in research – genetic and otherwise – then the research results are, at best, not likely to alleviate racial inequality, and at worst are likely to exacerbate it.”

How can science get genetic ancestry right?

The Science piece calls for researchers to implement “a more complex notion of ancestry” – one that more closely reflects the reality of our nuanced genetic history and relatedness, and that turns away from simplistic labels. But what would that practically look like?

It depends on what the research is, Lewis says, but she gives an example of geneticists trying to find links between genetic variants and diseases. These geneticists may worry that having people with different continental genetic ancestries in their study will obscure the associations between genes and disease, making their results less informative.

“Very often, they end up just focusing on one ancestral group, or dividing people into these large groups and then doing the analysis,” she says. “We think that a lot of those practices are just entirely avoidable.”

Genetic ancestry concept figure showing the difference between a pedigree (shows all direct ancestors) and an ancestral recombination graph which tracks only the inheritance of dna segments
Figure illustrating the concept of the “ancestral recombination graph” (ARG). (A) a pedigree for one individual (represented by a black circle) and their direct ancestors; (B) the pathways in the pedigree through which the DNA at a single position in that individual’s genome has been inherited; (C) ARG for one individual, which represents only pathways of DNA inheritance rather than every ancestor; (D) merged ARGs for multiple individuals. Credit: Mathieson and Scally (2020), What is ancestry? PLoS Genetics 16(3). © 2020 Mathieson, Scally, reproduced under a Creative Commons Attribution License.

Instead, she says, scientists should turn to analytical techniques and tools that can treat genetic ancestry in a continuous rather than a categorical way.

Both Scally and Yengo generally agree that maintaining more complexity is preferable – not just from an ethical perspective, but scientifically too.

As a statistical geneticist trained in mathematics, Yengo tells me that he’s accustomed to thinking in terms of continuous measures – but many analytical tools in genetics don’t work this way by default.

“If we can have software that doesn’t require us to put people under labels in the first place, I think that will be a good place to start,” he says.

“There definitely will be better ways of keeping categories out for as long as possible.”

Aylwyn Scally, Cambridge University

Since genetic ancestry is so complex and difficult to accurately classify, why should we bother trying to sort people into categories at all?

“I think a lot of the time, we should try to avoid it,” Scally says. “There definitely will be better ways of keeping categories out for as long as possible and working with these complex structures for longer. That will require larger, better tools, larger computing resources and things like that, but that’s totally feasible.”

He’s sceptical that categories can be abandoned entirely, though.

“It’s going to get to a point, still, where [scientists] want to try and present their results and talk about what they found,” he says. It can be hard to do that without applying some sort of label.

But abandoning all ancestry categories isn’t really what Lewis and her co-authors are calling for. They want scientists to avoid using categories where possible, and to think more deeply and more critically about what those categories describe – a genetic or environmental difference, or both.

Hope for the future?

Saperstein points out that the new Science forum isn’t the first time similar issues have been raised.

“Unfortunately, even if most people agree with the argument in theory, it doesn’t seem to get put into practice by the majority of genetics and biomedical researchers,” she says.

“I would like to believe the tide is finally turning and genetics researchers are finally questioning what have become the taken-for-granted methods in their field. But this is not the first article to make this argument, and I suspect it will not be the last.”

“I would like to believe the tide is finally turning.”

Aliya Saperstein, Stanford University

Lewis is more optimistic, citing her experience steering an interdisciplinary group that included geneticists, bioethicists, sociologists and anthropologists to consensus in the new paper.

“I think we had a surprisingly easy time, actually, on aligning on this stuff,” she reflects. “I think many more people are thinking about these issues than they were even, say, three years ago.

“The field does have some ways to move beyond them already, but people, I think, don’t quite get what the ethical imperative is to make those changes … we need to do more work, and we need to change a lot of hearts and minds.” In the meantime, the take-home message for all of us is to keep in mind the complexity that genetic ancestry categories simplify, and to remember to ask ourselves what they truly represent.

Please login to favourite this article.