Something extraordinary appears to be happening, unfolding on a scale and at a speed that compares only to the sudden and ubiquitous arrival of the World Wide Web in the last weeks of 1993.
Before that moment, few had even heard of hypertext. After that moment, everyone used it, and most of the world’s data found a place within it. For people who have never lived in a pre-Web world, it’s hard to articulate the difference between that world – a world of scarce information resources that you visited in order to use – and this.
We are so far on the other side of that transition we simply take for granted that our smartphones have access to something approaching the entire corpus of human knowledge, connections to all the people, and so forth. As spectacular as that is, we almost never think about it. That magic is just part of the fabric of our lives.
Read more: Art for AI’s sake – how DALL-E will change the way we see the world
There’s more magic coming our way.
Earlier this year I wrote about DALL-E, the first of a new generation of ‘generative AI’ tools. DALL-E turns a bit of text – a ‘prompt’, in the lingo – into an image. Although I believed DALL-E entirely amazing, it was something ‘over there’ – in the cloud – that had the potential to become a new tool for artists and visual creatives.
How wrong I was.
On the 22th of August, 2022, a startup named Stability AI publicly released its own generative AI tool – ‘stable diffusion’. It’s similar in function to DALL-E (and DALL-E’s commercial competitor, Midjourney), but that’s pretty much where the similarities end. Rather than being something that’s running ‘over there’ on a cloud service – for which you pay hefty monthly subscription or pay-by-use fees – Stable Diffusion is designed to run entirely self-contained on a PC. Not a particularly low-end PC, mind you – but one that you might find in a teenager’s bedroom, tricked out with support for high-resolution video gaming. Still, machines like that are fairly common: overnight the number of computers capable of generative AI image generation went from a handful to a few tens of millions.
That would have been hugely significant on its own, but Stability AI accelerated the age of generative AI by releasing all of their work as open source – anyone can take their code and modify it for their own needs. A technique that had been locked up behind the walls of Open AI – hoarded like a dragon guarding jewels – immediately became the foundation for hundreds, then thousands of ‘forks’, projects that used the code and data provided by Stability AI to power their own generative AI applications.
In retrospect we’ll likely see that as the ‘big bang’ moment in the field of artificial intelligence, when the whole field took a giant leap forward in both usage and ubiquity. In rapid-fire succession, Meta announced it had created a tool to generate video from text prompts, Canva and Microsoft both previewed integrations of generative AI tools within their design tool suites, and Google researchers showed off ‘dreamfusion’ a tool that used the same techniques as DALL-E and Stable Diffusion to create three-dimensional objects from prompts.
The capstone event occurred at the start of November, when a developer of apps for iPhones and iPads introduced their own app that implemented Stable Diffusion on Apple’s smartphones and tablets. In less than nine weeks, generative AI had gone from the bleeding edge of technology to an app on my iPhone.
As the proverb runs, “Quantity is its own quality.” We are now awash in images created by generative AI, and the code that enables AI experts to craft the tools creatives use is freely available. Where we are today is only the very first taste of something that is poised to mushroom into a completely new and singular environment of imagery – both still and moving images – generated on-demand and to-taste by a clever arrangement of prompts.
This moment resembles that moment 29 years ago, when the Web stood at a similar threshold. It was already available, already open source, and a few folks had been having their own ‘penny drop’ moments about the transformations about to come. Knowing what we know today about what the Web got right – and what it got very, very wrong – we would be well served to have a think about how to best guide our actions (and our expectations) for generative AI, looking for a path that produces the maximum benefit for the least pain.
To this point, two significant pain points have been identified in generative AI, each echoing back to a similar issue that confronted the early Web: safety and copyright.
Read more: AI Art: Proof that artificial intelligence is creative or not?
The safety issue boils down to a basic fact of human nature – we’re not all nice people, and even those of us who are nice are not always as nice as we could be. Given a potent technology for the translation of hate speech, sexual violence or other forms of degrading behavior into visual form, it seems sensible to police the use of generative AI for the creation of such imagery, and to prevent the widespread dissemination of such imagery.
However necessary, this is far easier said than done. Short of licensing all generative AI – and ‘watermarking’ their outputs, so that any imagery can be traced to a specific generative AI and its user, it’s not immediately clear how this can be meaningfully policed.
Social media services are already drowning in human-generated materials that are exploitative, abusive, hateful, and violent. Adding automation via generative AI will simply create a tsunami of material that could effectively overwhelm any attempt at human moderation. We will need to fundamentally rethink the processes of moderation, and we likely will need to have some sort of solutions in place quickly – within six to twelve months – before this wave of humanity’s ugliness, amplified by generative AI, collides with our social networks.
Issues of copyright have been contentious since the invention of the printing press. The Web took those issues to an entirely new level, as it created a platform for the ‘liberation’ (more often, expropriation) of materials under copyright. Stable Diffusion, trained on a massive hoovering-up of more than 100 terabytes of images gleaned from the public internet, and encoded into a ‘checkpoint’ file of just a few gigabytes, reduces the last ten thousand years of human imagery to a set of ‘weights’. Inside this incredibly compressed rendering of human visual history are the catalogues of almost every artist who has had their works photographed and published online. That’s not just Michelangelo or Hokusai or Monet – the checkpoint includes a lot of both fine and commercial artists working today, artists who have every expectation to be paid for their work. That the Stable Diffusion model can produce imagery ‘in the style of’ a working artist is a triumph of generative AI – and, at the same time, a deeply concerning development. Not because it’s wrong, but because the model does not recognise any copyright in its sources, and therefore cannot filter them from the images it generates.
The solutions here are both obvious and relatively easy to implement: Stability AI can produce an updated ‘checkpoint’ model that avoids the works of living artists – or artists whose works remain under copyright – except where their explicit permission is given. A ‘sanitised’ checkpoint file would then be the generative AI version of a public resource such as Wikipedia – open to all, free to all, and powered by the vast wealth of human images. Conversely, artists can proactively license their works for inclusion within generative AI tools. Anthony Breslin, a Melbourne-based artist, has done precisely this, pointing toward a generative AI future which isn’t simply extractive, but works in concert with artists, helping them to create tools that scale their creative output in ways never before possible.
It’s the scale that comes with automation that remains the most significant aspect of this generative AI revolution. Within a few years most of the imagery we see will have passed through a generative AI tool. In itself, that’s not much different from the fact that nearly all commercial images are Photoshopped in some way. But these generative AI images won’t be one-offs, sent to a website or magazine or billboard. They will be everywhere, created on the fly, fed by all the analytics systems that already keep us under continuous surveillance: tuning themselves to our needs, our moods, and our desires. That’s the world we’re heading into, and there’s no turning back from it.
Even so, we have enough of time to think things through. We don’t want to find ourselves echoing that famous line from Rosencrantz and Guildenstern are Dead, “There must have been a moment, at the beginning, where we could have said no. But somehow we missed it.”