Police in the US opened a can of worms – as well as making headlines – when they used data from a genealogy website to help capture the notorious “Golden State Killer” earlier this year.
Now researchers from Stanford University have upped the ante by revealing they have found a way to link ancestry databases with those of law-enforcement agencies, which may make it a lot easier to do this a lot more in the future.
Technically it’s an impressive achievement, as these databases use completely different genetic markers.
However, that is unlikely to appease those who have expressed concerns about whether data uploaded in good faith to help with tracking family histories should be used, possibly without consent, in criminal investigations.
In April, a former policeman was arrested and charged with a series of rapes and murders in California in the 1970s and ‘80s after police uploaded DNA from a crime scene to a genealogy website and got a match with some of his relatives.
What makes that seemingly simple process complicated, according to Stanford biology professor Noah Rosenberg, is that genealogy websites are a relatively new thing.
“There’s a legacy problem in that so many DNA profiles have been collected with this older genetic marker system that’s been used by law enforcement since the 1990s,” he explains.
“The system is not designed for the more challenging queries that are currently of interest, such as identifying people represented in a DNA mixture or identifying relatives of the contributor of a DNA sample.
“In this study, we were trying to pose the question of whether a newer, more modern system of genetic markers could be tested against the old system and still get matches and find relatives.”
The answer, according to a paper published in the journal Cell, is yes.
Rosenberg and colleagues developed a computational method for linking individuals in two quite different databases. In a trial with 872 people, when one individual had been analysed with one type of marker and the other with another, between 30 and 32% of parent-offspring pairs and 35 to 36% of sibling pairs could be linked.
The database used by the FBI and other law-enforcement agencies, known as the Combined DNA Index System (CODIS), relies on short tandem repeat (STR) markers, a type of copy-number variation, in noncoding regions of the DNA. By contrast, ancestry databases look for differences in single-nucleotide polymorphisms (SNPs) across hundreds of thousands of sites in the genome.
Rosenberg says the study was intended to provide data for discussing the many issues surrounding forensic genetics and genomic privacy.
“We wanted to examine to what extent these different types of databases can communicate with each other,” he says.
“It’s important for the public to be aware that information between these two types of genetic data can be connected, often in unexpected ways.”
In the paper, the researchers note other policy-relevant issues, including the fact that some population groups are overrepresented in law-enforcement databases. Expanding the use of database searches could change the calculation about who is accessible to investigators from the profiles in those databases.
The findings may also have (less controversial) application elsewhere. For example, ecologists studying organisms in the field could use this approach to determine whether animals living in a particular geographic site descended from animals whose DNA had been collected on a previous sampling trip, even if only STR data is available from the older samples.