Open-source intelligence (OSINT): helping us or hurting us?

Open-source intelligence: as smart as it seems?

Every phone is an undercover agent. Every security system is a spy. Webcams are real-time reconnaissance feeds. And even fire-sensing satellites are collaborators.

Welcome to the world of open-source intelligence (OSINT).

Its secret headquarters is your study. Its resources and assets are freely available. And its undercover operatives are scattered through the World Wide Web. It’s the democratisation and commercialisation of military-grade intelligence.

It can’t end well, can it?

Students. IT troubleshooters. Academics. Over the past decade, they’ve been mocked as amateurs at best or propagandists at worst. Now they’re being accused of knowing too much. And the full implications of extracting so much information in real-time is also yet to be appreciated.

Over the past weeks, Ukraine’s security service has highlighted how citizen-supplied information has led to military strikes. Other groups are sifting through images, searching for evidence of potential war crimes.

Ukraine’s security service has highlighted how citizen-supplied information has led to military strikes.

But the process also often involves identifying from where the picture was taken. And that could lead to unintended consequences.

“The collection and analysis of public data have exploded in recent years,” says digital technology, security, and governance research fellow Dr Zac Rogers.

“This has already moved far beyond backyard operatives and into the realm of venture capitalism. This is a trillion-dollar marketplace. It’s a huge problem.”

When information is power

“It wasn’t even that people didn’t know how to do online open-source investigation,” says Eliot Higgins, founder of the open-source investigative agency Bellingcat. “It was they hadn’t even heard of it to know it was a thing they didn’t know about. The last decade has been a long process of educating everyone about it and getting them to take it seriously.”

OSINT, by definition, relies on information openly available to everyone. Google Maps and Street View; the ESO’s Sentinel radar and visual earth observation satellites; the NOA global fire monitoring system. Even web-based video streams and holiday picture snaps. All can be used to verify the integrity of social media videos of war, crime and disaster.

Higgins and his team contributed to the prosecution of several Russian agents behind the shooting down of Malaysia Airlines flight MH17 in 2014, which killed all 298 people on board.

“Back then, the community was just a handful of people from places like Storyful, Amnesty International, Human Rights Watch, and me and my blog. No-one outside of that group had a clue what open-source investigation or things like geolocation was.”

Now tutorials, resource packages and advice abound. Higgins’ views are widely sought, even by those at the serious and respected end of the media universe.

Open source intelligence
Credit: Tweet from Security Sources Ukraine Twitter / Link:

“What we’re seeing now is the network growing and developing further,” says Higgins. “So next time there’s a conflict, there’s more chance that open-source evidence of war crimes and atrocities that emerge from the conflict will count towards the public’s awareness of the conflict and accountability.”

But, according to Rogers, verifying the truth of combat footage and finding secret Chinese detention facilities is just one side of OSINT’s double-edged sword.

It’s used as the basis of a wide variety of assessments and predictions. These are then sold to global corporations and governments.

“We’re actually critiquing the various tools now being offered, anything from sentiment analysis to event aggregation data,” he says. “Essentially, it’s all open-source intelligence taken into a machine learning system.”

When knowledge is power

OSINT has become big business. It’s not about identifying war crimes, or assessing the impact of a natural disaster. It’s about predicting outcomes, and selling those predictions.

Everything from Google Street View to Facebook and Twitter is being “scraped” to infer the character of individuals. Analyst agencies then use their secret sauce – usually an artificial intelligence algorithm – to turn it into a marketable prophecy.

They have eager markets. They’re offering consumer insights to advertisers and manufacturers. They’re offering risk assessments to insurers and banks. They’re offering sentiment forecasting to politicians. They’re offering threat-warning analysis to law enforcement and national security agencies.

“We’re just going sort of whoa, whoa, whoa, whoa!” Rogers says. “It’s one thing to train an algorithm to identify ships in a satellite photograph. That’s a simple image classification problem. It’s a whole new thing to say you can accurately predict a person’s behaviour by harvesting social media posts.”

Errors already happen all the time. The quest to eradicate pornography from Facebook, for example, suppressed important breast cancer discussion forums.

“What if such an unintended mistake – a mistake narrow-minded AIs make all the time, by the way – is buried amid the data a spy agency uses to target suspects?” Rogers asks. “Who is accountable? The prediction’s user? The prediction’s creator? The data supplier used to train the AI that made the prediction?

“We already know who will suffer: you and I.”

Miss out on a loan because of an outdated Google Street View photo? You’ll never know, says Rogers. “The bank doesn’t care if it’s true – so long as it’s profitable for them to act as if it is. And if a few people fall by the wayside – that’s just business!”

Can wisdom win?

“It’s the old ‘rubbish-in, rubbish out’ problem,” says Rogers. “Can you really expect to get accurate, reliable insight into the state of society just by scraping social media? Are the psychological models assessing them based on truly representative information?”

Predictive algorithms usually rely on identifying correlations, Rogers says – not causations. In the academic world, that should already be a red flag.

Everything from Google Street View to Facebook and Twitter is being “scraped” to infer the character of individuals.

And then there’s their reliance on Bayesian inference. It’s a statistical tool designed to make predictions from incomplete data.

“What Bayesian inference allows you to do is to infer what the missing data may be,” Rogers says. “So what we get is a predictive algorithm that derives knowledge for effect versus knowledge for truth. It’s built to fill in knowledge gaps – connect the dots – to reach a conclusion. Of course, chances are it’s going to find what it’s trained to look for.”

Knowledge is obtained or learned. Wisdom is developed. Machines don’t do wisdom. They just gather more and more knowledge – and find patterns among it all.

So is it unwise to let machines decide the future?

“In terms of knowledge for truth and our ability to get at the truth, that’s being lost as we desperately try to grapple with a glut of raw information,” says Rogers.

But users – be they insurers or national security agencies – tend to treat algorithmic prophecies as truth. So should they be willing to bet the bank, or the nation, on it? “Open-source prediction is spurious,” Rogers concludes. “It’s largely snake oil. But that snake oil is selling really well among businesses and governments at the moment because everyone wants to believe it works – or is worried they will miss out if it does.”

Please login to favourite this article.