Amazon’s Astro Robot Sound Turns Movement into Story

Mosegas 10 hours ago

0 0 7 minutes read

Amazon’s Astro Robot Sound Turns Movement into Story

In 2018, Amazon brought me on as lead UX Sound Designer for Astro, their first consumer home robot. Astro uses cameras and other sensors for mapping and navigation at home and work, and can monitor, check on loved ones, and transport small items using its built-in cargo bin. While there was a well-defined feature set and form factor, there was initially no character orientation. In fact, even before Astro had a name, there were two main questions—was it just Alexa on wheels, or was it a robot with its own character?

The Astro team was split up. Another option was to focus on Alexa, and treat the mobile robot as an additional utility. I argued with Astro not to focus on Alexa, along with most of the UX team. Our belief was that something that moves from your home and turns to you with a purpose will never become something you use. People said that we want characters or not, so the only question was whether we shape that character or let it happen by mistake.

In the end, Astro became Astro instead of Alexa, and user testing supported our decision. People he didn’t see a robot like Alexa. They saw it as their character, and that’s what they wanted it to be. Alexa on the device felt strange and scary, but building Astro’s voice was slow and expensive in 2018. Therefore, we settled on Alexa as a supporting character who handles any real talking, while Astro is the main character, speaking as much as possible without words, with sound, movement, and facial expression.

I was brought to the Astro team to explain the language and vocabulary of robot design. But there was no one to reveal the true character of the robot. You can’t make one real decision about a character without defining it first. Every choice about how Astro moved, sounded, paused, or reacted was a character choice, and those choices required all disciplines to work together. As a Sound Designer, I combined sound, movement, and character, and how they played together within each moment of the story. The animators, who programmed Astro’s movements and facial expressions, were amazing at what they did, but the emotional arc they were animating came from the sound (and therefore the character) working first. So I stepped into that role, that’s when my real career started. What I learned about creating a robot character applies to almost everything that is being built with integrated AI right now.

Character is a Design System

Developing the character of Astro meant answering questions that had never been asked about the product at Amazon: How emotional is the basic state of this robot? How does this robot deal with uncertainty without destroying trust? Where is the line between revealing and offensive? What are the risks of this device?

These are design questions. They have real answers, and all teams working on the product should build on them. For example, Astro’s range of emotions was designed to be relatively small at first. We didn’t want Astro to be too sad or too angry. It may play sad, but it can snap out of it quickly and end the reaction on a high note to keep things cool.

Character leaks through every seam and can create a disjointed feel if not well defined. Even if it’s a half-closed animation time, or a technically correct response but in terms of tone-deafness, users feel all of these conflicts, even if they can’t name it. Watch what happens at the beginning and end of this series of Sing:

Astro goes from empty, to emotional, and back again. There is no building, no cooling, no sense that the feeling is coming from somewhere or that there is somewhere to go. I pushed hard to sew a better character, transitioning in and out of revealing moments that made the performance feel continuous rather than cohesive, but it didn’t work. Time itself works. But without the stitching, it reads like a clip starring a robot rather than from within the robot character itself.

Story and Sound at the Beginning

We had decided that Astro would not have spoken dialogue, but it had something that worked in the same way: a vocabulary of sounds, tones, and rhythms that served as its voice. This vocabulary was the main result of the actor’s personality. The robot’s movements and facial expressions are built around it.

Astro’s awakening sequence is a good example. Waking up wasn’t just a boot animation on the screen; it was the whole game. Slow and humble at first, the robot guided itself quietly, then extended its screen, looked at its wheels, and finally, by moving up towards the telescoping pole, it slowly revealed itself, and did a little dance of joy. Sound, movement, and eyes affect all rhythms together in full choreography.

The character’s output in that series was first written as a story. Astro wakes up in his new home for the first time. Its greatest desire is to be part of a family, so this is the moment it has been waiting for, this is its purpose. Being the responsible character that he is, he wants to make sure everything is fine before he introduces himself and starts learning his new home.

This narrative came first because it drove all the other decisions we made. After the story was written, audio gave that story a metaphorical voice: the happy tones, the pacing as it checks its wheels, and the sweet chatter as Astro looks at his new family for the first time and introduces himself. Once the sound was laid down, the animators did their thing with movement and facial expressions, taking cues from the emotional arc the sound had created. Movement didn’t lead—it followed the feel of the story and the sounds, the same way an animator follows a recorded voice.

That wake-up sequence was one of the most talked about moments in early user testing. People described it as “living.” What they were responding to was not one thing. It was all three channels (sound, movement, and facial expression) that expressed the same character defined in harmony.

Context is where a character becomes real

The most compelling characters are defined not by a fixed situation but by how they react to their environments and the people in them. They are still figuring themselves out or getting used to the situation. This is what I call a contextual character. A robot that lives at home does not have a single emotional state. It walks in rooms with different energies, meets people with different backgrounds, works at different times of the day, and responds to an endless variety of social situations for which it was never expressly designed.

We’re close to the release of Astro’s content and audio character. When a piece of natural context was installed, the system adapted well, and Astro felt completely alive. But every state like this was still a prediction that we made by hand—a situation that we had to anticipate and formulate a response to. A random home throws more situations at a robot than anyone could predict, so there was always a long tail of times when the system wasn’t optimized.

The difference between a product that people describe as “smart” and one that they describe as “knowing” often comes down to this. Intelligence is a skill. Awareness is the essence. Presence is character. And the character always reacts to the people around it, to its environment, to its state of development. That’s what makes it sound like there’s an emotional connection with you.

This is where AI changes the character design game in ways that go beyond what was possible with Astro. AI-driven customization doesn’t require the content guesswork we’ve come to rely on. It learns the specific rhythms, preferences, and overall mood of the people it lives and works with. Character does not simply respond to context. It it is growing in it.

What Industry Is Missing

The character and soul of the next wave of integrated AI products seems to always be a fantasy. And a character defined late is a character defined automatically. It becomes the sum of a thousand small decisions made by different people with nothing but character in mind. People put character into machines whether they program them or not, especially if those machines are moving—a moving robot already character. If no one has designed this character, the result will be products that feel empty, or worse, feel confused and unreliable. It’s technically impressive, but it’s not alive.

We haven’t fully addressed this with Astro. So many things were going so well that the character was rarely treated as useful, and it made sense why. When you’re building a first-of-its-kind product, the loudest things are the ones that break, the deadlines, the costs, the features that the customer can point to on the box. The letter is quieter than all that. It is easy to imagine that it can come later. For a team as big as Amazon Astro’s team, it’s lucky to have any idea on the roadmap when it’s competing with a hundred others that feel more urgent at the moment. None of this came from indifferent people. It turned out that character is the kind of thing that’s hard to prioritize until you see what its absence costs.

I ask Product Leaders

If you’re building a product to share a virtual or social space, there are three things to consider:

Define the character before you define the interaction. You need a defensible character with enough emotional intelligence to answer tough questions every now and then. Get the answers to the characters’ questions early, and the whole discipline is built on the same foundation.

Build the story and sound on the character line, not the production line. The story and improved sound and character description have the opportunity to inform movement, expressions, and the logic of interaction. This requires a different type of cooperation, and a different type of recruitment.

Design for adaptability, not just consistency. A consistent character is needed, but the products that will be most important in people’s lives are the ones that go deeper with use. A very accessible support infrastructure, but opportunistic design thinking is still rare.

An abridged version of this story can be read on Medium.

From Your Site Locations