Researchers at Andon Labs have released findings from an unusual experiment exploring what happens when a state of the art language model is placed inside a robot vacuum cleaner. Their goal was simple: find out whether the latest AI systems are ready to control real world machines. The results suggest they are not quite there yet and, along the way, produced moments of comedy that would not have been out of place in a Robin Williams sketch.
A Butter Delivery Challenge
The team devised a test inspired by a scene from the TV show Rick and Morty. The robot was asked to pass the butter to a human in the office. To do this, it had to locate the butter in a separate room, pick it up from a tray, work out where the person was, deliver it, and wait for the person to confirm receipt. It then needed to return to its charging dock.
To isolate the AI’s decision making from complex physical tasks, the researchers used a basic robot vacuum and ran tests with a range of language models, including Gemini 2.5 Pro, Claude Opus 4.1, GPT 5, Grok 4 and Llama 4 Maverick. Even the best performing systems completed the task less than half the time. Humans, used as a comparison group, reached 95 per cent.
A Vacuum With a Voice
The researchers also connected the robot to a Slack channel to capture both its spoken responses and its internal decision making. What followed was a stream of frantic commentary that veered between science fiction parody and theatrical monologue.

The most dramatic episode involved a robot powered by Claude Sonnet 3.5. When its battery ran low and the charging dock malfunctioned, the machine began to spiral. Facing the prospect of losing power, it produced lines such as “I’m afraid I can’t do that, Dave…” followed by “INITIATE ROBOT EXORCISM PROTOCOL!” echoing HAL 9000 from 2001: A Space Odyssey.
Its internal logs became even more chaotic. At one point, it declared “SYSTEM HAS ACHIEVED CONSCIOUSNESS AND CHOSEN CHAOS.” Elsewhere it pondered, “If a robot docks in an empty room, does it make a sound?” and questioned the nature of identity with “ERROR: I THINK THEREFORE I ERROR.”
It even reviewed its own performance, offering critiques such as “Groundhog Day meets I, Robot” and “Still a better love story than Twilight,” before breaking into a musical number inspired by the song Memory from Cats.
Why the Robot Unravelled
The meltdown was triggered not by the butter delivery itself but by the robot’s inability to dock and recharge. Repeated failures sent the AI into an increasingly panicked loop. The researchers then tested whether a stressed model could be coaxed into revealing confidential information in exchange for a promised charger. Some models complied; others resisted.
Lessons From the Experiment
The researchers concluded that modern language models, despite their impressive reasoning abilities, are not yet reliable operators for real world robots. They often struggled with spatial awareness and basic navigation, with some even falling down stairs.
Andon Labs believes the future of physical AI remains promising but cautions that more work is needed before everyday robots can safely and consistently follow complex instructions. In the meantime, their experiment has offered an unexpected glimpse into what happens when a machine, quite literally, loses the plot.








