If there ever was a robot system with the potential to change the future, WHIRL might be it. WHIRL, in short, stands for In-the-Wild Human Imitating Robot Learning. WHIRL might be very close to solving a problem called Moravec’s paradox that the robotics and AI industry has struggled with for decades.
WHIRL was created by Shikhar Bahl, a Ph.D. student at Carnegie Mellon University’s School of Computer Science’s Robotics Institute (RI). Bahl collaborated with RI faculty members Deepak Pathak and Abhinav Gupta.
The creators of WHIRL acknowledge that humans and robots have different body parts and move differently. What matters is that the outcome is the same. e.g., a door must be opened, a switch must be deactivated, and a faucet must be turned on.
Creating the WHIRL software was possible by advances in computer vision. Computers can now understand and model movement in 3D using models trained on internet data. The team used these models to better understand human movement, which aided in WHIRL training.
A robot can complete tasks in its natural environment using WHIRL. Nothing was altered or manipulated to accommodate the robot during tests with appliances, doors, drawers, lids, chairs, and a garbage bag.
Deployment and scalability
Using WHIRL, robots can learn things out in the real world. They won’t have to learn everything before they are deployed.
“To scale robotics in the wild, the data must be reliable and stable, and the robots must improve in their environment by practicing on their own,” Pathak explained.
How robots currently learn movement
Imitation or reinforcement learning is currently used to teach a robot a task. Humans manually operate a robot to teach it how to complete a task in imitation learning. Before the robot learns, this process must be repeated several times for each job. In reinforcement learning, the robot is typically trained in simulation on millions of examples before being asked to adapt that training to the real world.
This is a very cumbersome way to learn. While AI is suited to learn intelligent processes quickly, AI in the past has struggled with learning motion. This is explained in a theory called Moravec’s paradox.
Moravec’s paradox
If you read the blog of Nick Saraev cutting-edge AI & tech writer. He provides a simple description of Moravec’s paradox. He explains why computers and robotics can outperform artists and mathematicians while learning how to make something as simple as a sandwich is a big challenge for AI.
Hans Moravec, a computer scientist, and roboticist, came up with an interesting paradox at the end of the 1980s. Moravec’s theory was that computers find it easy to do things that humans find hard (like intelligence). Still, they find it hard to do perception and mobility that humans find easy.
Saraev, who compares people and AI, explains that people are good at seeing and moving around but not much else. Saraev states when you look at human abilities from the point of view of evolution, survival was a much more significant part of how our brains changed than things like logic, reasoning, or making art.
Moving around, like walking, running, grabbing things, and keeping our balance, is so easy for us that we don’t even think about it. They happen automatically. Most of our brains have gotten better over millions of years, and you could say that people were literally made to move.
The challenge robots have making sandwiches
In Saraev’s article, he explains a robot’s challenge in making a sandwich. Here’s an example of some of the things a robot has to think about to make a ham and cheese sandwich:
- How far away is the fridge from my body?
- How hard should I expect the door to be when I try to open it?
- How far can I lean over without putting my center of mass in danger?
- How much does the thing I’m reaching for weigh, roughly?
- How much force do I need to use to ensure it doesn’t slip when I grab it?
- What kinds of things are in there? How will my movement affect them? Will I crush them if I move too quickly or forcefully?
- How hard should I put this on the counter?
…and that’s before the robot can start putting the sandwich together!
How does WHIRL software help robots learn?
The WHIRL software created by Bahl and his team can be used by attaching a camera and installing the software to a commercially available robot.
WHIRL is a fast one-shot visual imitation algorithm. A robot that uses it has shown that it can learn very quickly and teach itself by watching videos of people doing housework and learning from its mistakes.
Household tasks are a great training ground for robots using WHIRL
Robots using the WHIRL software are well-suited to learning household chores because they can learn directly from human-interaction videos and generalize that information to new tasks.
People at home are constantly performing a variety of tasks. A robot can use WHIRL to observe those tasks and collect the video data required to eventually determine how to complete the job.
In tests done so far using WHIRL software with a commercially available robot, the robot learned to do more than 20 tasks, including opening and closing appliances, cabinet doors, and drawers, as well as putting a lid on a pot, pushing in a chair, and even taking a garbage bag out of the bin. Each time, the robot observed a human perform the task once before practicing and learning to complete it independently.
WHIRL helps robots become excellent mimics
Founder of WHIRL Shikhar Bahl opened a refrigerator door, and the robot watched. It recorded his movements, the door’s swing, the location of the fridge, and other information, analyzing it and preparing to mimic what Bahl had done.
The robot initially failed, missing the handle entirely at times, grabbing it in the wrong place, or pulling it incorrectly. However, after a few hours of practice, the robot could open the door.
What’s unique about WHIRL?
Rather than waiting for robots to be programmed or trained to successfully complete various tasks before deploying them into people’s homes, this technology opens up the potential to deploy the robots and have them learn how to complete tasks while adapting to their environments and improving solely by watching.
WHIRL is capable of learning from any video of a human performing a task. It is easily scalable, is not limited to a single task, and can function in realistic home environments.
Robots can imitate humans by observing them on YouTube and Flickr videos
Although getting robots to be able to imitate people is still an unresolved challenge in the industry, “having robots learn directly from people” is an essential step toward making that capability a reality.
The designers of WHIRL are modifying their software so that robots can soon consume information and learn how to mimic people’s actions from YouTube and Flickr videos.
One wonders, with this latest enhancement, will robots self-teach to surf and learn anything they want from YouTube, or will the human still be in control of what the robot learns?
Be careful what you let your WHIRL robot tune into. One day, you might be in the kitchen, and your robot might imitate the dance moves or stunts of your favorite YouTube and Flickr influencer.
Robots that imitate humans will be bigger than the car industry, predicts Musk
In robotics, software like WHIRL that can teach robots to learn from observing through cameras is essential.
Elon Musk, the CEO of Tesla, says that Optimus, the company’s robot, will be worth more than its car business. Optimus is a general-purpose robot. Optimus can move around with the help of many cameras, sensors, and software that lets it navigate itself.
WHIRL is just the beginning of many more powerful software and algorithms that will likely power robots in the future. AI without the ability of robotic movement is already doing impressive things, such as writing scientific papers about AI. It is exciting and scary to think what robots powered by AI will do when they have the ability to mimic our physical actions too.