Genetic knowledge gets a serious boost

May 28, 2020

Ian Connellan

It seems a long, long time since the Human Genome Project was declared complete: in fact, it’s just over 17 years, and the work that’s come after it hasn’t always attracted the same attention.

Sometimes it’s seemed like the more we know about human genetics the more we understand how little we know.

But the release of the first major studies of human genetic variation by the international genome aggregation data consortium – the gnomAD Consortium – is a really big deal. Really.

For many years, a number of issues have stymied researchers’ abilities to understand what the differences between each individual’s genetic code means for their development and health.

As Deanna Church notes in the journal Nature – in a commentary supporting the release of eight individual research papers in Nature, Nature Communications, and Nature Medicine – these issues include the fact that to unravel genetic variation you need to analyse huge numbers of sequences, because humans carry many rare variants but only a few cause genetic diseases.

Another issue is that most understanding of genetic variation has come from the study of single nucleotide variants (SNVs), but structural variants, more than 50 nucleotides long, can have a larger impact on physiological traits – and are major contributors to disease. We also lack an understanding of variation outside protein-coding sequences.

For the past eight years, the gnomAD Consortium and its predecessor, the Exome Aggregation Consortium, or ExAC, have been trying to address these gaps in knowledge.

Working with geneticists around the world, they have compiled and studied more than 125,000 exomes and 15,000 whole genomes. More than 100 scientists and groups internationally have provided data and analytical effort.

Their analyses reveal new details on rare types of genetic variation and provide better tools for genetic disease diagnosis and drug development.

Computational biologist Konrad Karczewski, lead author on the collection’s flagship paper in Nature, says each of the new studies “represents someone bringing a new angle to the dataset, saying, ‘I have an idea on how we can put all of this to work’ and creating a new resource for the genetics community. It was amazing to see it unfold.”

The project’s scientific lead is Daniel MacArthur, who works at the Garvan Institute of Medical Research in Sydney and Murdoch Children’s Research Institute in Melbourne, Australia.

More than 100 scientists and groups internationally have provided data and analytical effort.

He and colleagues built ExAC, then gnomAD, to expand on the work of other projects, particularly the 1000 Genomes Project, the first large-scale international effort to catalogue human genetic variation.

ExAC’s first collection of whole exome data was released in 2014. It then started gathering whole genome data, evolving into the gnomAD Consortium and releasing gnomAD v1.0 in February 2017.

Subsequent gnomAD releases focussed on increasing the numbers of exomes and genomes, the volume of variants highlighted in the data, and the diversity of the dataset.

The new papers are based on the gnomAD v2.1.1 dataset, which includes genomes and exomes from more than 25,000 people of East and South Asian descent, nearly 18,000 of Latino descent, and 12,000 of African or African American descent.

Two of the seven papers show how large genomic datasets can help researchers learn more about rare or understudied types of genetic variants.

The authors identified more than 443,000 loss-of-function (LoF) variants in the gnomAD dataset, dramatically exceeding all previous catalogues.

By comparing the number of these rare variants in each gene with the predictions of a new model of the human genome’s mutation rate, the authors were also able to classify all protein-coding genes according to how tolerant they are to disruptive mutations — that is, how likely genes are to cause significant disease when disrupted by genetic changes.

This new classification scheme pinpoints genes that are more likely to be involved in severe diseases such as intellectual disability.

200528 people spiral — Credit: Hiroshi Watanabe / Getty Images

“The gnomAD catalogue gives us our best look so far at the spectrum of genes’ sensitivity to variation and provides a resource to support gene discovery in common and rare disease,” Karczewski explains.

Other researchers used gnomAD to explore structural variants. This class of genomic variation includes duplications, deletions, inversions, and other changes involving larger DNA segments (generally greater than 50–100 bases long).

Their study presents gnomAD-SV, a catalogue of more than 433,000 structural variants identified within the gnomAD genomes. The variants in gnomAD-SV represent most of the major known classes of structural variation and collectively form the largest map of structural variation to date.

“Structural variants are notoriously challenging to identify within whole genome data, and have not previously been surveyed at this scale,” says study author Michael Talkowski, a US-based geneticist.

“But they alter more individual bases in the genome than any other form of variation and are well established drivers of human evolution and disease.”

Among surprising results, the study found that at least 25% of all rare LoF variants in the average individual genome are actually structural variants, and that many people carry what should be deleterious or harmful structural alterations, but without the phenotypes or clinical outcomes that would be expected.

Researchers also learned that many genes were just as sensitive to duplication as to deletion – that, from an evolutionary perspective, gaining one or more copies of a gene can be just as undesirable as losing one.

“We learned a great deal by building this catalogue in gnomAD, but we’ve clearly only scratched the surface of understanding the influence of genome structure on biology and disease,” says Talkowski.

Other papers reveal how gnomAD’s deep catalogues of different types of genetic variation and the cellular context in which variants arise can help clinical geneticists more accurately determine whether a given variant might be protective, neutral, or harmful in patients.

This new classification scheme pinpoints genes that are more likely to be involved in severe diseases such as intellectual disability.

Researchers found that tissue-based differences in how segments of a given gene are expressed can change the downstream effects of variants within those segments on biology and disease risk. The team combined data from gnomAD and the Genotype Tissue Expression (GTEx) project to develop a method that uses these differences to assess the clinical significance of variants.

Another study surveyed multinucleotide variants — ones consisting of two or more nearby base pair changes that are inherited together. This was the first attempt to systematically catalogue these variants, which can have complex effects, and to examine their distribution throughout the genome and predict their effects on gene structure and function.

Of potentially most public interest, two gnomAD studies describe how diverse, population-scale genetic data can help researchers assess and pick the best drug targets.

One study group used LoF variants to evaluate the consequences of reducing the expression of a gene called LRRK2, which is associated with risk of Parkinson’s disease. They used these data to predict that drugs that reduce LRRK2 protein levels, or partially block the gene’s activity, are unlikely to have severe side effects.

“We’ve catalogued large amounts of gene-disrupting variation in gnomAD,” says MacArthur. “And with these two studies we’ve shown how you can then leverage those variants to illuminate and assess potential drug targets.”

Public sharing of data is a core principle of the gnomAD project: the data behind these seven papers were publicly released in 2016 via the gnomAD browser, without usage or publication restrictions.

Papers

The mutational constraint spectrum quantified from variation in 141,456 humans

Corresponding author: Konrad Karczewski

Evaluating drug targets through human loss-of-function genetic variation

Corresponding author: Eric Minikel

A structural variation reference for medical and population genetics

Corresponding author: Michael Talkowski

Transcript expression-aware annotation improves rare variant interpretation

Corresponding author: Beryl Cummings

The effect of LRRK2 loss-of-function variants in humans

Corresponding author: Nicola Whiffin

Characterising the loss-of-function impact of 5’ untranslated region variants in 15,708 individuals