Trove in trouble: why does it cost money to keep the resource online?

January 25, 2023

Ellen Phiddian

Cosmos science journalist

The online database Trove may go offline in the middle of the year without additional funding.

Trove, which is owned and operated by the National Library of Australia (NLA), is a free resource which provides access to billions of digital documents, images, media and records of physical documents. It also contains millions of digitised Australian newspaper pages and issues.

Trove receives around 22 million hits per year, and is widely used by both academic researchers and members of the public.

So what does it cost to run an archive like it?

“We work very hard to make Trove look simple to the end user,” says Dr Marie-Louise Ayres, Director-General of the NLA.

“But behind the scenes, it’s not simple. It’s creating, or collecting, and then preserving, and then providing meaningful access to a huge body of digital content.”

Trove has at least six billion records in its archive. These come from a combination of places: many are items or records owned by the NLA, but they also come from archives at 900 participating institutions around Australia.

These institutions include other libraries, museums, galleries, and archives – some volunteer-run, others funded through state or federal governments, or other sources.

Read more: Trove: how vital is it to Australian research?

“Some materials are held by the National Library on National Library servers, and other things are just metadata, and the original items or the full records are actually stored somewhere else,” says Professor Deb Verhoeven, Canada 150 Research Chair in Gender and Cultural Informatics at the University of Alberta, Canada, and a visiting professor at University of Technology Sydney.

Partner organisations also contribute to Trove financially – providing about 44% of the running costs – but the NLA expects this revenue to decline, as most of them are also facing financial pressure.

“Trove has been kind of pieced together from various legacy systems over more than a decade,” says Dr Mike Jones, an archivist and historian at the Australian National University.

“There are different elements that have gone into it, like the Pictures Australia database, the federated library catalogues from Australia, the Web Archive, the digitised newspapers, and in a lot of cases, these are distinct bits of technology that were kind of piled together.”

All of this together needs a tremendous amount of storage and energy to operate smoothly.

“That kind of processing power, to really provide meaningful access and reasonable performance across all of this digital content, is a major undertaking for any organisation – and certainly it is for the library,” says Ayres.

Beyond digital infrastructure, Trove needs staff to keep operating. Ayres says that roughly a third of the NLA’s 350 staff are employed for digital services of one form or another – not all are Trove-related, but many are.

“We need to have business analysts, some enterprise architects, developers, cybersecurity experts, as well as people in the libraries who know what Trove users want, how to get the most out of it, and to funnel the many, many business relationships that we have,” says Ayres.

Cybersecurity in particular is a rising cost, as high-profile hacks have become increasingly common.

According to the Sydney Morning Herald, the NLA requires $7-$10 million per year to keep Trove running in its current form. In its Trove Strategy, the NLA stresses that long-term secure funding, with a clear source, will best help it continue to make Trove’s technology work.

The Strategy also makes a case for significantly more funding than that, to improve Trove’s offerings. What might this entail?

“There’s a technology overhaul that needs to happen under the surface, to help make the platform more sustainable and navigable,” says Jones.

For instance, there’s application programming interfaces (APIs): software that allows users and programs to communicate with a database.

“For a single data store, for example, there might be a single API that you can interrogate to get information,” says Jones. Trove has many, many more than one.

“They can build a kind of master API over the top of that, that helps to interrogate those systems. But that means that it’s very fragmented when you get not particularly far underneath the surface, and that creates limitations on what you can do,” says Jones.

One example is the Australian Web Archive, which stores Australian web pages which have been updated, or gone offline.

“We don’t have an API to the Australian Web Archive at the moment – we can’t support that additional load when we’re still running on premises,” says Ayres.

“You actually need to go into the web archives part of Trove and search that separately, we don’t have the processing power to run search over all of our databases at the same time. We certainly do have a long term vision of lifting all of that infrastructure up into a cloud environment so that we can take advantage of surge capacity to do big searches, and improve performance. But that’s not possible without very significant additional funding.”

As well as digital infrastructure, the library wants to improve the way records and data get added to Trove by their partner institutions.

This means training for the “content stewards” who add records in (many of whom are volunteers) – “but it’s also about simplifying sets of software that enable that,” says Ayres.

“We’re really committed to increasing the amount of data that’s in Trove at the lowest possible cost.”

Trove is primarily a public resource, for members of the Australian public to get information. But a variety of academics also rely on it to do their research – both directly, and through other researcher-focussed databases.

Trove in trouble: why does it cost money to keep the resource online?

Ellen Phiddian

Read more: Trove: how vital is it to Australian research?

Read more: AM recipient Linda Barwick concerned about Trove’s future

Library in a glass chip: laser-writing trick can store vast amounts of data

Previewing data stored in DNA

The big data storage question

A tensor situation: new system speeds up big data processing

Ellen Phiddian

Read more: Trove: how vital is it to Australian research?

We now know the 200 most used passwords, and hacking them is pretty easy

Read more: AM recipient Linda Barwick concerned about Trove’s future