How filming your chores could train the android butlers of the future

Workers are seen training humanoid robotsShot inside X-humanoid's robot training centre on the outskirts of Beijing on March 19. Justin Robertson/CNN

The dream of deploying humanoid robots in every home has created a new type of job. The only requirements are a head strap, a smartphone and a list of chores.

With the evolution of artificial intelligence, humanoid robots have become the latest frontier in the race to dominate advanced technology. Robot makers are rolling out a succession of new models that can walk, dance and fight with increasing agility.

But the holy grail of the burgeoning industry – a general-purpose robot that can work in shops, offices and homes – needs a vast amount of data to learn how to safely and effectively replace humans. Increasingly, that data is being created by people recording themselves doing mundane household tasks.

This has created a voracious appetite for first-person footage that can be used to train robots, also known as “egocentric data” or “human data.” Over the past several months, startups have stepped in to supply that demand by collecting and annotating videos from thousands of contract workers around the world.

“Manufacturing, factory warehouses, retail, nursing homes, hospitals – you’re going to need this type of data in basically every single environment, and that’s because the movements are all different,” said Arian Sadeghi, vice president of robotics data at Micro1, which began recruiting its own army of remote videographers last year.

Each person receives headgear to attach a camera, filming instructions and a list of tasks such as cooking, cleaning, gardening and pet care. Workers are expected to alternate between assignments and submit at least 10 hours of video each week.

While the shots currently revolve around household chores, Sadeghi said the company encourages contractors to experiment with what they film, in case it could eventually help robots adapt more quickly to new environments and responsibilities.

“The thing we tell them is, ‘If you think you want a robot to do this for you, go ahead and record it,’” Sadeghi said.

Micro1

‘Billions of hours’

Though Micro1 is based in Palo Alto, California, it has about 4,000 “robotics generalists” in different households across 71 countries, who send the company more than 160,000 hours of video each month. Sadeghi said that’s nowhere near enough.

“You need probably billions of hours,” he said. “We haven’t even gotten to human interactions. This is just simple household chores.”

He said the growing demand for data in robotics mirrors the early trajectory of ChatGPT and other AI chatbots. Trained on hundreds of billions of words harvested from the internet, ChatGPT uses what it’s learned about text patterns to generate the likeliest responses to user prompts.

Following text, AI models evolved to churn out custom images and videos on demand by relying on readily available content online. But robot developers require a much more specific set of training data, and lack the same kind of instant library that the internet previously provided.

That’s become a multibillion-dollar opportunity for startups like Micro1, which also annotate the videos so that robots can differentiate objects, distances and physical movements. Market research firms estimate that the data collection and labeling industry will on average expand about 30 percent annually, led by growth in Asia, to reach at least $10 billion by 2030.

Ravi Rajalingam, founder of the data annotation company Objectways, provided audio and visual data to train AI-powered virtual assistants and self-driving cars, before shifting his focus to robotics last year. Since he started hiring contractors to collect human data, he’s found that only about half the submitted footage is usable.

Still, with 90 percent of his customers based in the US, and their assumption that American consumers have the spending power to adopt humanoid robots early, some are willing to pay more for data from US households, even though the hourly wage can be as much as triple that of a worker in Vietnam or India.

“The India kitchen is very different from the US kitchen. A broomstick in India is very different from a broomstick in US. So variety is important, but it depends where you are going to place your robots first,” said Rajalingam. “That’s the reason we are collecting all over the world.”

Workers are seen training humanoid robotsShot inside X-humanoid’s robot training centre on the outskirts of Beijing on March 19.

Justin Robertson/CNN

How to train your robot

For decades, robots have primarily been trained to do tasks by humans using remote controls. But that requires a lot of expensive hardware. More recently, a cheaper option has been to use software to simulate virtual scenarios, though it is generally less effective for interactions with physical objects, like picking up a glass.

“With data it’s always a trade-off between quality and quantity,” said Alicia Veneziani, vice president of market expansion for Sharpa, a Singapore-based androids startup that specializes in robotic hands.

China, which is pouring state investment into high-tech industries, has announced plans for at least 60 robot training centers across the country. Most humanoid robots mass-produced in China so far have been purchased for training and research, said Marco Wang, a Shanghai-based analyst for Interact Analysis, a technology research firm.

But by the end of last year, the industry began to embrace the use of human data as a middle-ground solution, since the only costs are a recording device like a GoPro, Meta glasses or smartphone, and hourly wages of anywhere between $5 and $20 depending on the region.

“The idea here is: Okay, I don’t want the robot doing the task. I want the people doing the task,” he said. “This way, you don’t need to pay for the robots, you just need to pay for the equipment and the people.”

Wang said he’s seen business models in Japan and South Korea similar to the data collection centers in China, but with bases in Southeast Asia to capitalize on cheaper labor. Tesla has been training its Optimus humanoid robot in its own facilities in Fremont, California and plans to expand in Austin, Texas. Wang said the US and Europe tend to favor simulation training championed by Nvidia, which designs the world’s most advanced computer chips.

However, in a February report, Nvidia said incorporating more than 20,000 hours of first-person videos into robot training improved the success rate of tasks like rolling T-shirts, sorting playing cards, unscrewing bottle caps and using a syringe, by more than 50 percent.

“If you rely on just one way of data collection, it’s probably not the best approach,” said Wang, who expects companies to increasingly combine strategies. “In the future, it will be a mixture of different approaches.”

Workers are seen training humanoid robotsShot inside X-humanoid’s robot training centre on the outskirts of Beijing on March 19.

Justin Robertson/CNN

The last mile of automation

The turning point for autonomous robots came three years ago, when the large language models that enabled ChatGPT gave rise to a new algorithm that translates visual cues into physical action, said Puneet Jindal, who co-founded the data annotation company Labellerr AI. Robots that were once programmed for repetitive tasks could start to perceive and navigate the world around them.

His company started collecting its own first-person videos this year from workers at manufacturing facilities in India. For the next three years, Jindal said, prioritizing human data is a “no-brainer.” But that boom may not last. Soon that content could improve simulation training, or if AI can convert YouTube videos found online into first-person, that could become a substitute, he said.

“Even robotics labs are feeling like they don’t know what data will be needed 12 months from now,” he said.

Part of the reason general-purpose robots need so much training is because of extreme unpredictability in household environments, as furniture, appliances and humans move around constantly, said Rutav Shah, a robotics researcher at the University of Texas at Austin.

“What’s really missing is a human-like intuition of forces, friction, and uncertainty that people acquire throughout their lifetime,” Shah said. “Making robots generally useful for everyday household tasks like cooking, cleaning, that is going to be the last mile of automation.”

So far, humanoid robots have mainly been deployed to controlled environments like factories, where they are able to complete their tasks 99.9 percent of the time, said Alexander Verl, chairman of research at the International Federation of Robotics. Even in folding T-shirts, the current success rate is still too low to be commercially viable, he said.

“The probability that it will succeed is usually around 70 or 80 percent. Coming from manufacturing, that’s really not something that our industry partners want to use,” Verl said.

Rajalingam of Objectways also stressed the safety risks: if a robot is cleaning a playroom, but can’t tell the difference between a doll and a human baby, the results could be disastrous.

“If the robot takes my baby and puts it in a bin, here comes the million-dollar lawsuit,” he said.

Testing robots with babies is still a long way off, Rajalingam said. However, he added, they have already started with dogs.