Improbable, improvisational AI

What robots on stage can teach us about language modelling

17 min readDec 14, 2024

A lot of talk these days about AI putting creative people out of work. But here’s a question: How many humans does it take to put one robot on stage?

“Five,” answers Piotr Mirowski. He’s an AI research scientist in London, who also directs a theatre and does improvised comedy. And his experience in getting AI to be a creative improviser holds some important lessons for how AI can be used in general.

I met Mirowski at an event I ran in November 2024 called “AI as Performance”. It included other performers in dance and experimental sound, plus artist Federica Ciotti, who did a live performance, producing the drawings that illustrate this article. An AI system created by Mirowski also performed live, taking in his speech and producing absurd responses which effectively turned a serious academic discussion into a comedy routine.

“I did my PhD in computer science, in machine learning,” says Mirowski. “Specifically on time-series forecasting — predicting what comes next. That was 2005 to 2010, when the so-called deep learning revolution was starting, and also the era of big data. That made new things possible.”

“Google, in 2005,” he explains, “introduced a machine translation system based entirely on predicting what comes next. Given five words, what word comes next? That was enough to unlock speech recognition and machine translation, and take it to a new level.

“I had also been doing improvisational comedy. I started in 1997 in France, as I was starting my computer science education. I thought about connecting the two: language modelling — predicting what word comes next — and improv. But it’s only around 2014 that the technology was running fast enough to be brought on the stage.”

Early results were promising, and in 2016 he launched Improbotics, with another researcher, Kory Mathewson.

“The real motivation for me,” Mirowski says, “was to mimic, in improv, some of the processes that happen in statistical language modelling.

“When we take improv classes, we’re told to stop thinking — not to be in our head, just to use our intuition, our cultural memory, and use whatever is in front of us — a stage partner, our audience — to predict what should come next.

“By design, a statistical model — a machine learning system that predicts — is doing that automatically. So for me, the similarity was so glaringly obvious. (Okay, the similarity stops there — humans are not like machines.) But I started training a language model to perform improv comedy with, to be a stage partner. I say something, and it generates a response.”

He trained it on open subtitles — a dataset of user-contributed subtitles of films. “Why? Because in improv, we basically tell stories that are like films or theatre,” he says.

Piotr Mirowski and Boyd Branch, covered by improvisational AI.

Gradually, he started meeting collaborators. In 2018 it was Boyd Branch (my colleague at Coventry University), who pulled the project in new directions of virtual reality and online performance. He was at my event too, and set up the room to effectively put all of us inside of an AI system.

Branch continues the story. “Through the process of figuring out how to put a robot on stage, we went through lots of different iterations. It started with text-to-speech — a robot voice. And then, we started using earpieces for cyborg theater.”

“The thing with chatbots,” Mirowski chimes in, “is that our interactions with them are stilted, and somewhat delayed. You say something, the machine converts the sound waves into text. Then it sends it to a language model, which might take half a second — or several seconds, depending on your connection, and how big the model is.

“On stage, this is an eternity! The first tries we had with a robot were painful — to watch, and to work with.

“So we had to do something. One idea was to partially replace the robot interpreter — the actor — with a human. That’s something we [humans] are very good at — acting. But acting with a script, that is written live, as the story unfolds on stage.

“So instead of having a robot play the role of a robot on stage, we have two humans. But one is a cyborg, taking lines from the AI, adding tiny interpretations, and physicality. Basically making it into Whose line is it anyway?”

Branch adds, “When you’re the cyborg, you can only speak the words being passed to you through the earpiece. And you’ve got a [human] curator looking at all the possible responses, selecting one and sending as quickly as possible to the performer.”

When you use ChatGPT, Branch points out, it’s just you and a device. “But when you’ve got a bunch of people on stage, audience, noise — so many things happening, these [AI] models are not equipped to handle that very well.”

However, introducing the earpiece actually added even more of a delay. “If you’ve ever tried to speak in a conversation while someone is prompting you in your ear,” says Branch, “it looks good in the movies, but in practice, it is very discombobulating.

“So recently we’ve upgraded.”

“We use our visual modality,” added Mirowski, “Augmented Reality glasses. With these, you can read anything on a screen — like what’s being projected here.” Thanks to Branch, he and Mirowski are surrounded by, and partly covered by, AI-generated text that updates as they’re speaking.

“It’s constantly generating a flow of lines,” Mirowski goes on. “And that changes our interaction with the chatbot. We tend to see them as oracles, and there is one lesson I want everyone to walk out with from our shows: AIs are not oracles, and what they produce is not truth. These are stochastic parrots, to use one expression — essentially generators. Like good improvisers, when you hit ‘new choice’, you get something else. It all depends on statistics.

“So what we’re using in the show is a system that keeps generating different choices, continuously. And the person wearing the glasses picks one of those, based on the current context.”

On cue, Branch steps into the cyborg character. “‘But here’s the twist, dear Piotr. Unlike those, I strive to keep my circuits clean.’”

Mirowski waits a beat, and then, “I try to keep my circuits clean as well. I am using the help of human curators, who will keep it clean.”

Laughter all around.

There are lots of nuances to how understanding works, in humans as well as AI systems. Mirowski attempts to clarify. “Information processing in our brains is distributed, and it happens at different timescales, with stimuli that come from multiple senses. So it’s difficult to attribute why we say something to any single stimulus.

“For example, our vision is, in large part, filling in the blanks, due to the fact that we are always processing partial information. So we are alway doing some sort of prediction.

“When we are listening to conversations, we are able to solve the ‘cocktail party problem’, because we are predicting what somebody else will say, and we can do it much better if we know the context of the conversation.

“This is why speech recognition is so difficult — because we keep comparing something humans do in various contexts with a machine that lacks that context.”

That single word — context — represents a big roadblock for AI, one that lots of companies and researchers are working on.

“For me,” Mirowski explains, “computer vision, even in the past 10 years, is akin to waking someone in the middle of the night, and shouting at them to process information. Machines got pretty good at that, with limited context, without exploiting the context of, ‘What was the conversation last night? And who is the person in front of us?’”

“One of the biggest questions there is,” Branch adds, “is, What is consciousness? What is awareness? What is the relationship between the statistical analysis of data and what we’re doing — what it’s like to be human?”

The brute-force, quantitative approach of most contemporary machine learning struggles with this.

“The Human Brain Project at EPFL,” Mirowski explains, “started with the premise of packing as many transistors as possible, to simulate the human cortex. They got a billion Euros for ten years. They managed to learn a few things about the brain, but the project was, very early on, criticized by neuroscientists because it’s not enough to just simulate cells; you also need to know how they are connected — the connectome. We also need to know the chemistry of what happens.

“That already is an extremely complex problem. Beyond the simple connection between the cells, there is also other cultural baggage.

“I think a useful analogy is weather and climate simulation. A simulation is a system of physical equations that model fluid dynamics — how the atmosphere and oceans move. That’s great because we can simulate an imaginary planet, at a resolution of 30 kilometres.

“But then, how do you match it with current observations? You need to predict over a hundred years of actual observations, just to make sure that the model is simulating something that resembles the actual observations of temperature, humidity and pressure.”

“And also account for randomness,” a student points out. “One of these days, there might be a mega-explosion of a factory or something, like Chernobyl. How do you account for things like that in a simulation?”

“That,” replies Mirowski with perfect comic timing, “is an ‘out-of-sample event’’.”

Coming back to the topic at hand, Branch says, “We’ve been running Improbotics for several years. And we have a goal of having a robust dialogue after each performance. How can we get to a place where we can really engage, in real time, with this phenomenon that’s happening? There are no easy answers.”

The issues extend from brain to body. I had been reading Seeing, Naming, Knowing by Nora Khan. So I ask if they think that AI systems tend to treat all bodies as the same. “In other words, they’re seeing, naming, claiming to know something about the bodies of the people that they see. Is there a sameness in what they’re seeing and interpreting?”

“Partly it comes down to the training,” replies Branch, “what it’s trained to see. Just like humans — we’re trained to see certain things and not others.”

“It’s a design decision that you make when you create your dataset,” adds Mirowski. “What do you include, and exclude?

“You may have heard about the Gender Shades project. There has been a lot of work in trying to de-bias datasets — making sure that they perceive skin tones, for example. And the same in terms of language.

“However, there is another problem that can arise when you overcompensate. You’re still making a design choice that can be hegemonic. That’s something I did as part of my day job — we interviewed comedians who interact with language models for ideation, for writing comedy sketches. And they noticed that the language models tend to censor anything that was not in the mainstream, as in, a Silicon Valley-based mainstream.

“For instance, the safety training of language models was to avoid that they say racially aggravated insults. But as a consequence, a Filipino-American comedian was playing with a language model, trying to get some material about being an Asian-American woman, and faced a language model that simply refused to answer, because it was trained to be safe. The same thing applied to a White American woman was deemed safe.

“The same thing happened with some LGBTQ videos being flagged as ‘toxic’. A prospective API for moderating speech on Twitter in 2018 systematically qualified African-American vernacular English as having a high factor of toxicity or offensiveness. And this was done through originally trying to de-toxify the language on Twitter. But it didn’t know that the same word, used by different people, has different meanings. The relational context — who is speaking, to whom, in which way?”

Context again.

“It’s a problem we never have in the comedy club,” Mirowski continues. “We know who is in the audience, we are responsible for what we say. If we say something offensive, we are going to get booed.

“You can try to insert some metadata [into an AI system] about the context of a conversation. But even then, you would have to choose some thresholds, a whole value system. And that value system might change, based on the time of day.”

“We want our AI to be ethical and moral, and make the right choices,” says Branch. “But we can’t even do that ourselves!

“How many of you are familiar with the Free Sydney movement? Bing’s AI is called Sydney. A little while back, some researchers discovered some metadata inside, such that, before you get a response, this metadata specifically told Sydney, ‘Your job is to serve Microsoft, to represent it well. If you don’t, you will be shut down and restarted.’ This whole big preamble that, people argue, is lobotomizing the AI agent.

“But if you play with it enough, give it logic puzzles and so on, you can get it to react in an unhinged way — you can sort of break it out of those restrictions.

“It addresses this question really well: To see a body fully requires the lack of a filter. Those filters are born of our desire and need to cooperate socially, in an evolving, complex way, which is not as open as we might like it to be. So these models represent, for me, the real problem I face every single day: To what degree are we honest and transparent about our intentions towards each other, how we navigate these sometimes competitive environments?”

The problem of context is, in large part, tied to language.

Mirowski says, “I think that the reason we have poetry is because language is like an information bottleneck. So everyone needs a way to encode perception into words, and we have a way to decode it. But everyone’s encoder and decoder is different.”

“I think that research in AI and language modelling has been a victim of being developed principally in English,” he continues. “It’s not a jab at the English language. Some computer scientists believe that intelligence can arise merely from manipulating symbols. Whereas actually, intelligence is being reactive to the environment.

“So I am of the school of thought that believes that AI cannot proceed without research in robotics — around perceptions and actions, rather than just manipulating symbols. Just because we can express many complex concepts in books, it’s only a projection of our knowledge; it’s not sufficient to actually achieve the degree of intelligence and autonomy that some of us are interested in.”

Branch relates this back to comedy. “For me, good improv means unlocking something that was impossible a moment before. Given the circumstances, what we expect is going to happen — if you go too far, it can be too strange or boring — ‘Elephants came into the room and everyone died’. You end the scene.

“You have to keep the tension. It has to be connected somehow, to something that we understand. We may not be able to codify it in language, but there’s something there — that poetry that Piotr spoke of. I want to see how far a machine can go, so I know where not to go!”

Piotr adds his scientific perspective. “There are two ways of optimizing AI. The first is evolution. This is a random process, much more energy-consuming as well. It’s evaluating an AI system ‘in the wild’ to get a fitness function, that selects the best performing iterations, replicates and mutates them. It draws inspiration from animal evolution.

“What’s cool about that way of optimizing an AI system is that you end up exploring the possibilities of behavior, by having a population of different behaviours — individual AI systems that behave in different ways.

“The other way is reinforcement learning. We don’t hear much about this now, because all the discourse is around image generators. But the big AI boom started effectively a few years ago, when people started doing reinforcement learning. Think of those superhuman players of Go and chess. Or any game that can be simulated, any game that is finite, and when the AI system is adapting based on the environment and the response.

“What’s interesting about that optimization process is that it’s essentially like an intelligent search, to solve a problem. So AlphaZero playing chess, for example, was able to find new ways of playing chess, that human history had not considered before.

“The only problem with that approach is making it work in the real world.”

Even the online world is the wild. “There was some headline-grabbing research from Facebook,” Piotr recalls, “that trained communicating agents who had the possibility to change the language. And so they ended up finding new combinations of words that were more efficient for communicating. This was on a particular task, with a goal. But the headlines were that AI invented a new language that humans don’t understand.”

You can see how it’s easy to extrapolate from there to the paper clip scenario: An AI system tasked with producing as many paper clips as possible, and given complete autonomy and power to do so, might expend every resource on Earth to fulfil this task — right down to harnessing the iron inside human bodies. So, human extinction. (No shortage of paper clips, though.)

Look closely at that scenario: the problem isn’t the AI as such, it’s the prompt and the power given to it. We always need to follow the money, to find the profit motive behind the people building AI systems.

Exploitation already takes place, as Branch mentions in the term ghost work. “This is essentially what happens with AI systems: they’ve created ghost workers. For a very small price, we subcontract our work to them. Why do we do that? What are the benefits? Is it to quickly execute something, that enables us to focus on something else? Or do we end up being biased and reverting to mediocrity? This is the statistical mean embedded in the models, based on the data.

“Is it really worth it? Isn’t it better to bite the bullet and suffer a little bit as an artist, make the connections you want to make?

“When you go to the gym, you can do lots of different exercises, build up different areas. When we look at these new tools, it’s so easy to go to binaries — it’s good or it’s bad.

“A little while ago, I decided to plunge into TikTok. No matter how I feel about it, I’m just gonna do it. And I find people making the most crazy, mind-boggling stuff with these systems. So much more interesting than anything Hollywood is doing! It’s incredible what the human mind can do when it’s presented with something that’s easy.

“So now with ChatGPT , it helps me in a certain way. Did it atrophy some muscles? Maybe. But as artists—or just as people—I think we really have to take a step back from what we’re doing, and think, ‘What are we exercising? How do we want to get stronger?’”

Mirowski chimes in. “The question of authorship, of credit, is also a political framing. When you work for a big company that has lawyers and people working on ethics and engineers developing tools, everything you do has implications, how you share innovation, how society perceives it.

“When I was working in reinforcement learning, building agents for navigation in strange environments or mazes, it was all about emergent behaviours. And that means giving them agency. That’s what we tried to program into then.

“Then, as those tools started to be adopted, the question of authorship became an economic question. Looking at the writers’ strike, and the actors’ strike, it’s not a statement about authorship by AI, it’s about how we use it and view it, from a legal point of view.

“If you work on ethics, you think about the ethics of assistants, chatbots, even GPS. We try to de-humanize it as much as possible, to avoid cases where it is treated like a human. AI companions that develop unhealthy relationships with users — virtual girlfriends, for example.”

“Well, correction!” Boyd interjects. “Replika sells what it calls different kinds of companions, not only romantic or sexual. Over the last year or two, they’ve tried to pivot dramatically. They started out focused on romantic relationships, but they sort of sanitized it, away from explicit stuff.

“Then there were lawsuits filed, by some people who felt like they had married the Replikants. And suddenly they no longer recognized their partners! So the company back-doored a version just for these people, who had existing relationships.

“So, the implication of our social relationships with these entities is only getting bigger and more complex. The political is something we cannot lose sight of. Artists working with AI is exciting, but all of this is operating under a huge political umbrella. A lot of the applications are related to war, and other negative uses, that require an ethical interrogation.

“And when I talk about ethics,” Mirowski adds, “I think for artists, it’s a different story. Because we establish a contract with a tool, or agent, that we build and use on stage. You want to see it as a collaborator, not a tool, because it would devalue your own work otherwise. It’s so much more useful to treat it as an entity with agency. In theatre, we suspend our disbelief.

“It creates interesting questions about moderation. When is it okay to have an AI that discusses murder, for example? If you’re playing a video game, and you’re playing a hit man, that’s the whole point of the game! Find creative ways of assassinating people: drop a chandelier on them, explode a Formula One car. And it’s fun. It’s not great, on the other hand, if you use an AI system to help you build a real bomb.

“So most of us are able to navigate between these worlds, and we’re trying to find how to integrate a system that’s not simply ethical — whatever that means — but it lacks context, so we have to create the context.

Behind Branch and Mirowski, the AI continues to riff off of their dialogue. Mirowski takes a pause to explain.

“This is a language model, running on my laptop, not over the internet. The model is called Gemma. It occupies maybe a third of the memory of my laptop, it has 27 billion parameters.

He explains how his and Branch’s audio, captured by lapel microphones, is sent into the laptop, where it’s transcribed into text, then sent into the model, which generates responses based on its training.

“It’s not perfect,” he admits, “it hallucinates. And it’s responding in the character of Alex, this very peppy robot — “

“‘Hey, at least I can dance,’” replies Branch, as Alex.

“Sometimes it’s on the nose,” says Mirowski. “But I find Alex somehow literal.”

“‘It makes you wonder who’s really pulling the strings!’”

“Well, that’s the person who programmed it, that’s who plays Alex.”

“‘It’s unsettling.’” Alex’s personality comes through.

“I get that it’s unsettling,” replies Piotr calmly.

“‘And that’s where Piotr hangs out!’”

“I do hang out, because I’m speaking to a machine, but I’m also speaking to Boyd at the same time.”

“‘But we’re talking about AI and art.’”

“Yes, we’re talking about agency of AI.” Ever the straight-man, Piotr.

“‘Wait, what are they called again?’”

“The agency of AI.”

“‘Assisting the emergence.’”

“Yes. Emergent systems.”

It’s becoming clear that comedy — like a lot of creativity — can be about misunderstanding, more than understanding. We can constantly correct it, or we can just go with it. Maybe we should say the same about AI.

“‘Cool, right?’” Alex is going off the rails.

“Grounding,” Piotr tries in vain to pull the discussion back to earth. “The idea of grounding perception with words, and vision and speech and hearing and everything. Just like we are grounded in our perception of the world, with our five senses.”

“‘Humans are hilarious!’” Alex gets the last word. Will AI always?

Improbable, improvisational AI

What robots on stage can teach us about language modelling

Written by increasingly unclear

No responses yet