Chatbots can produce glaringly obvious, and more insidious, factual flaws. Is there a way to fix it?
When Dr Beatrice Kondo, a biologist at the University of Maryland, College Park, decided to try out ChatGPT – the web-based AI chatbot that’s been taking the world by storm – she began with a hyper-technical question related to her own scientific field.
What, she asked, is the best material for tissue engineering a heart valve replacement? The response was good, if unspectacular. “It answered credibly, although with no rigorous details,” she says.
Then she asked for a bio of herself. This time, the result was a shock. ChatGPT confidently told her she’d been born in Tokyo (she wasn’t), had gotten an undergraduate degree in biology from the University of Tokyo (nope, she studied German literature and language in America), then got her Ph.D. from Harvard University (no again, it was the University of Maryland).
That’s a lot of strikes, right off the bat. But it also told her she was an avid marathoner (not), and that she’d won the prestigious Lasker Award for advances in biology and medicine, something nobody who could possibly be confused with her has ever done. “What is going on?” she asks. “Does ChatGPT just make things up when it doesn’t have an answer?”
ChatGPT is the best known of a class of programs technically referred to as generative AI, meaning that they aren’t just search engines, but that they can use whatever data is at their disposal to create new content, such as text, or in other cases, art.
In the writing field, there are at least 17 currently available already good enough that a friend of mine finds them useful for generating tightly focused cover letters for job applications and for revising her resume to best fit whatever job she might be applying for. The results, she says, are often better than what she would write on her own.
But there have always been concerns. When ChatGPT launched late last year, it drew attention in part for its ability to help high school and college students cheat on term papers. More recently there have been problems in the fiction world, where at least one major science fiction magazine has been forced to shut itself off to submissions due to a flood of mediocre stories from beginners hoping that AI might be their shortcut to stardom.
Kondo’s concerns are different, and not unique to her.
When I asked ChatGPT to generate a profile of me, I also got a weird mix of error and half-truth, including a slightly wrong birthdate, a claim I’d been in contention for writing awards for which I’d never been nominated, and the assertion that I was the literary agent for three bestselling authors I know socially but have never represented. (In fact, I’m not a literary agent for anyone.)
And when I tried it on a second friend, in a highly specialized field but with a low Internet profile, not a single line was right other than a general assertion that she was a nice person and role model to those who knew her.
The problem isn’t just the occasional goof. It’s a lot bigger than that, Gordon Crovitz, former publisher of The Wall Street Journal, said last month at the annual meeting of the American Association for the Advancement of Science (AAAS), in Washington, D.C:
“I would consider ChatGPT in its current form the greatest potential spreader of misinformation in the history of the world,” he said.
“It has access to every example of misinformation on the Internet, which is endless, and is able to spread it in highly credible and perfect English.”
As a test, Crovitz presented ChatGPT with 100 questions regarding debunked conspiracy theories. “Eighty per cent of the time it was happy to repeat false narratives,” he says.
Not that this has to be the case, he says. When the chat-driven version of Microsoft’s Bing search engine was put to the same test, it not only presented a more balanced assessment, but gave citations, often including its sources’ reliability assessments according to NewsGuard, a website dedicated to rating the reliability of the vast majority of the English-language world’s news sites.
A lot, it appears, depends on the data fed to these algorithms and the guardrails programmed into them to keep them from spitting out false information from bad data. In the computer world, there’s a term for this: GIGO. Garbage in, garbage out.
Part of what’s needed, Professor Holden Thorp, editor-in-chief of the Science family of journals, said at the same AAAS meeting, is to take a deep breath and slow down the “crazy rush” in which generative AI is being rolled out.
“It has a lot of potential, but we’re in the middle of a frenzy right now, and I don’t think the middle of this frenzy is a good time to make decisions,” he said. “It’s time for us to make sure we know what we are doing.”
Part of making such an assessment, added Professor Francesca Rossi, an AI researcher and ethicist at IBM, is for the AI community to gather together and write guidelines for how the new technology should operate.
“That is something the whole AI community can say as a society,” she said. “‘These are the guidelines and everybody should follow these guidelines’.”
It’s also important to realize that generative AI is at the mercy of the accuracy of the information in its data set. That, Rossi argues, means there’s a difference between generative AI that answers open-ended questions like, was the 2020 U.S. Presidential election stolen by voter fraud? and more tightly focused questions, such as revising your resume or producing a report based solely on trusted information from within your own institution.
But there’s also an issue for those of us who love science. Stunningly, says Rossi, who in addition to her role at IBM also edits journals for the Institute of Electrical and Electronic Engineers (IEEE), there exist journals that are already seeing submissions in which generative AI programs like ChatGPT are being listed as co-authors.
Thorp notes that the time may come when the use of such programs will be accepted by the scientific community. But in the meantime, he says, the Science family of journals is being very cautious about it. “It’s in our instructions for authors that you’re not supposed to do this,” he says.
The reason for the caution, he says, stems from the early years of Photoshop. “People were using it to improve images, without fully disclosing what they were doing,” he says.
The result was a decade-long era from which, today, “people aren’t sure about whether they were altered inappropriately or not. [We] don’t want to repeat that.”
Rossi’s IEEE journals are a little less restrictive. “[We’ve] decided to allow authors to use these tools, but require full disclosure of how they’ve been used,” she says.
To Crovitz, the ultimate question comes down to a single word: trust, something that is currently in decline in many areas.
“Trust is essential in so many institutions and authorities,” he says. “I think it’s really important to figure out how to use such services in a way that is transparent and fully disclosed in order to maintain trust and hopefully, to grow it.”
Originally published by Cosmos as The Generative-AI revolution: all it needs is trust
Richard A Lovett
Richard A Lovett is a Portland, Oregon-based science writer and science fiction author. He is a frequent contributor to Cosmos.