Searching the Deep Web for science clues

The internet contains vast amounts of information that does not show up on Google searches. The so-called “Deep Web” -contains data that is not indexed by search engines.

But all that information could soon become accessible to law enforcement agencies and scientists under a program being developed by US The Defense Advanced Research Projects Agency (DARPA).

Researchers at NASA’s Jet Propulsion Laboratory in Pasadena, California, have joined the program, hoping it will help catalogue the vast amounts of data NASA spacecraft deliver on a daily basis.

“We’re developing next-generation search technologies that understand people, places, things and the connections between them,” said Chris Mattmann, principal investigator for JPL’s work on Memex.

Memex checks not just standard text-based content online but also images, videos, pop-up ads, forms, scripts and other ways information is stored to look at how they are interrelated.

“We’re augmenting Web crawlers to behave like browsers – in other words, executing scripts and reading ads in ways that you would when you usually go online. This information is normally not catalogued by search engines,” Mattmann said.

Memex can even recognise what’s in videos and pair it with searches on the same subjects. The search tool could identify the same object across many frames of a video or even different videos.

All of the code written for Memex is open-source.

Please login to favourite this article.