We all harbour the desire to make our wishes come true. Children use that wish to explore the infinite capacities of their imaginations. As adults, we confront the limits of the possible, wishing it were otherwise.
Those limits, so fixed for so long, seem to be changing rapidly. We’ve used the latest breakthroughs in artificial intelligence to build a magic lamp. Now a genie has popped its head out to ask us how it can fulfill our heart’s desire.
One of my friends, whose taste in art and music centres on the “Prog-rock” era of the mid-1970s – when the aesthetic of bands like Yes and Pink Floyd established the look of the times – asked the genie to create some new artworks for his personal collection. Could the genie perhaps redesign the iconic cover of Pink Floyd’s Dark Side of the Moon as if designed not by Hypgnosis, but by Gashlycrumb Tinies illustrator Edward Gorey?
His wish was the genie’s command:
The genie abstracted to the Gothic core of Gorey’s technique, then applied it to the album’s iconic prism-and-rainbow. The whole thing works – satisfying both intellectual sensibility and aesthetic sense.
Alright, my friend said, let’s make this a bit more obscure. How about the same album cover, in stained glass…
Somehow the genie had also absorbed a range of styles of stained glass – medieval to Pre-Raphaelite to modern – and once again used that as an aesthetic template for the prism-and-rainbow. This genie knows a lot about art, aesthetics and culture.
This genie is going to change everything.
Perhaps once in a generation we cross a threshold in technological progress. Just as things appear to be stalled, we find ourselves taking a massive leap into a radically new and unfamiliar territory of capabilities. We saw it when the World Wide Web arrived; all those lonely, lonesome PCs finally had some purpose beyond word processing and spreadsheets. We saw it with the smartphone, which took computing from something fixed to a place to ubiquity. (Simultaneously turning us all into device addicts, but that’s a story for another time…)
Grant Wood has nothing to fear from DALL-E – at least not in the short term.
This time, a genie – formally known as DALL-E – embodies the essence of this moment: a breakthrough into a new style of computing, heralding the actualisation of one of the oldest ideas in computer science, “intelligence augmentation”.
The foundations for DALL-E have been in development for several years. OpenAI – a consortium of researchers founded by a range of Silicon Valley luminaries (including Elon Musk) and dedicated to the creation of leading-edge artificial intelligence tools available to all – has been hard at work building increasingly sophisticated “language models”. These models take in billions of machine learning “parameters” – best thought of as relationships between multiple bits of data – and from these billions of parameters can do things such as write a basic news report or press release, without any human involvement beyond an instruction such as “write a press release about OpenAI’s latest work in artificial intelligence”.
Last year, OpenAI applied it’s latest-and-greatest language model – known as GPT-3 – to the practice of computer programming. The result – “Github Copilot” – is both useful and uncanny: having digested tens of millions of computer programs, it can suggest solutions to problems from nothing more than a bit of textual commentary added to the program’s code.
In much the same way that Github Copilot can make recommendations about the right piece of code to use in the right place within a computer program – because it “understands” the structure of a well-written program – DALL-E can place visual elements within a scene based on its own learned sense of what belongs where. It doesn’t always work perfectly, as we see when my friend generated these images from a reimagined cover to Elton John’s Goodbye Yellow Brick Road:
For all that some of these images look horribly wrong – on close examination, the image the lower left-hand corner is positively horrifying – they still look “right enough”.
(It should be noted that none of these images were generated by DALL-E, which can only be accessed by a tightly limited number of testers, but by the less powerful but still uncannily good DALL-E Mini. You can have a play with DALL-E Mini (craiyon.com) by visiting it here)
Does DALL-E spell the end of the visual artist? Will photographers and painters and sculptors be able to compete with a text box that can churn out an infinite series of engaging imagery and forms? Should they even try?
For the answer to that, look at what happened when I gave DALL-E Mini a go:
Grant Wood has nothing to fear from DALL-E – at least not in the short term.
This new wave of generative AI systems won’t rival artists; instead, artists will adopt these tools as new canvases, new brushes and new palettes. The advent of the lithographic printing press in the middle of the 19th century made design and art accessible to even the poorest. This new breakthrough means that the creation of designs – rather than their consumption – has become utterly democratised. Most people will never achieve the aesthetic refinements of the trained and talented artist, but we will all have access to creative tools that turn our wishes into visions that we can share.
This new breakthrough means that the creation of designs – rather than their consumption – has become utterly democratised.
In this we find ourselves circling back to the visionary Douglas Engelbart, who invented the mouse and hypertext and videoconferencing more than half a century ago as tools to unleash human potential, working to create machines that could “augment” our intelligence, and help us shoulder the burden of an increasingly complex world.
DALL-E is the first of the next generation of tools to help us work with and share our dreams, our visions, and the boundless depths of human imagination. We already inhabit a world where we see thousands of images every day – advertisements, memes, scenes, and much else besides – all of it carefully produced by people for consumption by people.
Where we’re headed, machines will be deeply embedded in that conversation, creating and consuming images, watching us respond to them, using those responses to generate new images. The moment of computing’s creative play has finally arrived. Everything, after this, looks different.