Identifying viruses needs a “slaughter of a thousand oxen”

Sifting through the estimated 1031 viral particles on Earth seems like an almost impossible task, but it’s one which offers potentially enormous breakthroughs.

Viruses can cause disease, pandemics, and affect the health of terrestrial and marine environments. The majority of them are still unknown or misunderstood.

Now, researchers in Australia and the US have built new software that will streamline metagenomic testing, the genetic analysis of genomes in environmental samples, so they can more accurately identify the viruses present.

The open-source bioinformatics platform, Hecatomb, described in a new paper in the journal GigaScience, is already being used to help identify viruses affecting people with cystic fibrosis in a clinical setting in South Australia.

“How do we identify [viruses] from DNA sequences or RNA sequences? How do we categorise them and classify them?” co-author Robert Edwards, director of bioinformatics and human microbiology for the Flinders Accelerator for Microbiome Exploration (FAME) at Flinders University in Australia, told Cosmos.

“We’ve been really interested in this problem for a long time.”

A black and white image of a bacteriophage, a virus that infects bacteria, it looks like a geometric head, with a tube body, followed by thin legs.
A bacteriophage. A virus that infects the bacterium Acinetobacter baumannii. A. baumannii is an opportunistic pathogen in humans, affecting people with compromised immune systems and often linked to a hospital-driven infection, could one day be treated with phage therapy. Credit: Flinders University

Metagenomic testing involves amplifying and sequencing the genetic material in an environmental sample – like stool, soil, or water. It can be incredibly difficult to isolate viral genomes from other genetic material contaminating it.

“If we take, for example, a faecal sample from people, most of the DNA you get out of that is either bacterial or maybe human DNA,” says Edwards.

Ruling out the human DNA from the data can get complicated because at least 8% of the human genome is made up of the remnants of genetic material from ancient retroviruses.

A photograph of a man wearing a black tshirt and blue jeans. He is sitting outside in front of some greenery
Robert Edwards. Credit: Flinders University

Viruses in general also have a lot of shared sequence similarity across the tree of life, which can lead to lots of false-positive identifications.

“So, we built this software, this program, to separate out the different types of samples, to kind of clean the DNA [and RNA] when we get it back from the sequencer,” says Edwards.

“We need to do a lot of quality control to get rid of the bad stuff.”

Edwards collaborated with US researcher Professor Scott Handley and his group at the University of Washington to develop Hecatomb. The software was named for the Greek word meaning “a slaughter of a thousand oxen” after it helped a Greek collaborator narrow down the list of potential viruses infecting a patient from hundreds to just 1 or 2.

“We had really narrowed down his initial analysis, where he had seen all these results that were very spurious and quite misleading … The sequence was bad quality, or he was using the wrong databases, or he was just sort of mislead with some of the other approaches that people use,” says Edwards.

“Especially when you’re thinking about it for a clinical setting, or advising a doctor, you don’t want to tell the doctor that this patient may have 100 viruses when, in fact, it’s very unlikely,” says Edwards.

Hecatomb rules out or ‘sacrifices’ thousands of genomic sequences to find the virus in the sequence haystack.

In the new paper, Edwards and collaborators show that Hecatomb can be used to assess the viruses present in stool samples of rhesus macaques infected with simian immunodeficiency virus (SIV) and in seawater and coral mucus from a coral reef system in Bermuda.

Black and white microscope image of a virus
The bacteriophage that infects the bacterium Stenotrophomonas maltophilia, which can be multidrug-resistant and difficult to treat. Credit: Flinders University

“Subsequently, we used it on cystic fibrosis patients,” says Edwards.

“I work with people at the Women’s and Children’s Hospital and the Royal Adelaide Hospital [in South Australia], and we’ve sequenced a lot of stool samples from people with cystic fibrosis.

“And using this pipeline has allowed us to identify … the viruses that may be affecting the patient. So we can really quickly focus in and report back to the clinician and say: ‘We think this person has this virus, you might want to do a follow up test’.

Hecatomb can also identify the viruses that infect bacteria, known as bacteriophage, which can be used as a marker to figure out the bacteria affecting a person or environment.

“The other problem is that those viruses, those phages that infect bacteria, very often carry very nasty genes. So lots of diseases like cholera or diphtheria or E. coli … the bad things in those bacteria are actually provided by phages,” says Edwards.

“It’s quite important that we understand what those viruses are doing.”

The open-source Hecatomb software is freely available online for researchers to download, use and improve upon.

Buy cosmos print magazine

Please login to favourite this article.