Do you see what I see?

July 16, 2021

Mark Pesce

Mark Pesce is a professional futurist and public speaker. He invented the technology for 3D on the Web.

In the middle of May, Google dropped a surprising new demo on us: “Project Starline” shows people communicating in what seems to be face-to-face conversations. Google then reveals its technological trickery: these folks are separated by vast distances, yet appear to be together in the same space, breathing the same air. Project Starline looks like a fully realised “holodeck” from Star Trek: The Next Generation – and not a moment too soon.

Credit: Google

During the pandemic, we’ve all become familiar with the feeling of “Zoom fatigue”.

Unlike a purely physical exhaustion, this feeling develops between our ears, brought on by hours of closely watching versions of our colleagues (or friends and family), who, when reduced to tiny patches of video, lose most of the body language we rely on to interpret the meanings behind their words.

These folks are separated by vast distances, yet appear to be together in the same space, breathing the same air.

Our social brains start working overtime, sifting through a paucity of clues for someone’s mood, hidden intentions, and others’ openness to engage in discussion or negotiation.

In person, these wholly human qualities telegraph themselves at a level mostly beneath our conscious perception. We only become aware of someone’s agitation or passivity when their behavior falls far beyond our expectations; most of our face-to-face experiences rest in a narrower range – exactly where videoconferencing fails us. We get some – but not all – of the person, and our minds do their best to fill in the gaps.

Sixty years ago, media futurist Marshall McLuhan described this as the difference between a “hot” and a “cool” medium.

A hot medium – face to face with anyone – provides a rich set of sensory perceptions, easily integrated with all of the other direct interactions we’ve had over our lifetimes, and tells us (subconsciously) to sit back and let it wash over us.

Project starline being used in one of our google offices. — Project Starline being used in a Google office. Credit: Google

A cool medium provides minimal sensory detail, creating space for our minds to get deeply involved in sense-making. A cool medium – McLuhan pointed to cartoons, comics, and the low-resolution images offered up by analog television – invites participation. It’s that participation – hour upon endless hour of video meetings – that’s left us all fatigued. Zoom may be cool, but it’s not chill.

The hot technology Google uses in Project Starline to capture, transmit and re-create the presence of a person as if they’re really there has its roots in cutting-edge computer-graphics research from the mid-1990s.

Using sophisticated image-processing software, running on computers a thousand times slower than the ones we have in our smartphones today, researchers photographed an object from hundreds of different angles, carefully noting the position and orientation of the camera in each photograph. These photographs would then be analysed – in a process that could take anywhere from hours to days – generating three-dimensional objects from all the different points of view. This “photogrammetry” allows us to convert our world into data, and archaeology (to name but one discipline) has been revolutionised by highly accurate photogrammetric capture of sites and artifacts. Ditto product design and gaming.

Photogrammetry has even found its way into our smartphones. Apps like Autodesk’s 123D (sadly discontinued) allow anyone to point their smartphone at an object, snap a lot of photos, and create their own three-dimensional models from real-world items. And as smartphones have grown more capable – the latest iPhones include LiDAR, a laser-based depth-sensing technology – they can now create photogrammetry models of real objects in just a few seconds.

Photogrammetry captures a moment in time, giving you the 3D equivalent of a two-dimensional photograph of a person, but falling short of conversation – something that requires the fourth dimension of time. To add a timeline to photogrammetry – turning it into ‘videogrammetry’ – requires not just many photographs of the subject taken from many angles, but many photographs taken from many angles many times a second.

Google uses in Project Starline to capture, transmit and re-create the presence of a person as if they’re really there.

In mid 2016, at a startup in Los Angeles, I toured my first videogrammetry studio. On a large soundstage, within a framework that looked like a large yurt, I counted 40 high-definition video cameras, all pointed inwards toward the centre. Those cameras piped data into digital recorders and a bank of disk drives. During a capture, performers had their movements captured from 40 different angles, generating a terabyte of data per minute of capture.

Once captured, the data would be uploaded to Amazon’s vast cloud computing infrastructure, where it would be processed. Each frame would be examined from those 40 angles, the performance pulled into three dimensions from all those two-dimensional images, then would move on to the next frame.

At 50 or 60 frames per second, even a minute of capture adds up to several thousand frames, so it took at least 10 hours to convert every minute into three-dimensional videogrammetry – even when powered by a massive cloud computing resource, rented for hundreds of dollars an hour.

Because it’s fully three-dimensional, the end product possesses all of the body language and subtle cues that tell us there’s a human right there beside us. It feels uncanny. As promising as it appeared to be, videogrammetry also seemed so difficult and expensive I reckoned it would remain a niche technology for another decade or more.

Yet just a month later, Microsoft released its “Holoportation” demo video, showing they’d not only had they mastered videogrammetry, they could do it in real time. A person in one room – under the gaze of many cameras capturing from many angles – could “beam” themselves to a person in another room, anywhere in the world. True, it took a roomful of powerful computers to make all of that happen, but the sudden advance from 10-hours-per-minute to real-time felt so utterly extraordinary that it appeared as though videogrammetry might arrive very soon.

Key breakthroughs include 3d imaging, real-time compression and a 3d display. — Key breakthroughs include 3D imaging, real-time compression and a 3D display. Credit: Google

The final breakthrough that made Project Starline came from recent improvements in display technology. Microsoft’s Holoportation required their bulky and uncomfortable Hololens augmented-reality headset – something you wouldn’t expect folks to endure (or, at $4,000, pay for) just to have a meeting. A tiny startup in Brooklyn, New York, called Looking Glass Factory, solved this problem when they created a “holographic” display.

Because we can see one another eye-to-eye in videogrammetry, we can connect – with all of our senses.

The technology blends the latest in displays with a bit of the slight-of-hand used in old-timey “lenticular” 3D displays, using optical gratings to vary the view seen by each eye as you move your head around. The effect feels amazingly realistic; like an aquarium, you feel as though something lives “inside” the display, in three dimensions – you can even peek around the display to get sense of the depth of the objects within it.

Put this holographic display (Looking Glass Factory) together with real-time videogrammetry (Microsoft Holoportation) and some original Google wizardry to handle the data compression (videogrammetry consumes a huge amount of data) and Project Starline goes from a nice bit of science fiction to a realised – and likely expensive – bit of kit.

It’s possible, it’s beautiful – but will it scale to billions of daily users? Scott O’Brien, founder of Australian videogrammetry startup Humense, sees the potential.

“A ‘hands and eyes free’ holographic display is practical for these coming years,” he says. “Global connectivity and other tools in hand now make an exceptional inflection point.

“We are just at the precipice of new forms of actual and perceived identity affecting trust, loyalty, persuasion and bonding and with this.”

Because we can see one another eye-to-eye in videogrammetry, we can connect – with all of our senses. Videogrammetry has the potential to reinvent the telephone call, the television broadcast, even the office meeting. We’ll all enjoy the feeling of one another’s presence, even at a distance.

Google’s AI search engine under fire for misinformation

Largest fragment of the human brain mapped

Auto-Incorrect

Google's AI chatbot Bard gives European privacy regulators pause