Robotic Machine Learning has been gaining a lot of attention lately. Various advancements, such as Nvidia's Project GR00T and Stanford's Aloha Project, have left the world wondering what the future holds. While the bipedal androids Tesla, Boston Dynamics, and others have been creating are shiny and awe-inspiring, the research behind how they are engineered and programmed is a lot less flashy. At the heart of Robotic Machine Learning is data-driven intelligence that can advance robotics from basic mimicry to intuitive actions on command. It's worth highlighting how researchers and engineers are tackling the challenge of developing and advancing robotic machine-learning models for the future.
Let's categorize robotics and machines into two groups: those that imitate human mobility and those that perform tasks beyond human capability. The latter includes machines such as cranes, dump trucks, and space satellites designed by humans for specific applications. These machines are not general-purpose and have a limited range of intended functions. On the other hand, machines that mimic human movements tend to be more versatile and complex to design for general-purpose applications, as humans are capable of mastering a wide range of skills: building houses, baking bread, swimming, running, sewing, driving, gesturing, exercising, and more.
In the early years of robotic machine learning research, hardware was a major expense. One of the first android-like robots created for research was the PR2 by Willow Garage. It cost an estimated $400,000.001, and only 11 labs in the world could afford it. Fast forward to today, reliable, cost-effective hardware has accelerated the pace researchers can develop and iterate. The same can be said for the availability of specific modal-optimized research kits like our Aloha Research Kits. The hardware can be designed for a specific subset of human modes of movement, such as arms and hands.
Our arms and hands are responsible for a significant portion of the daily tasks we undertake. They are our most versatile tools, arguably second only to speech and communication. Because we use our arms and hands more than almost any other appendage, it is one of the most sought-after modal subsets researchers focus on. Human arms require 6 Degrees of Freedom2 to position the wrist and orient the palm—the same number of Degrees of Freedom in the ViperX and WidowX Robotic Manipulator Arms used in the Aloha Research Kits.
The Aloha Research Kits implement a leader-follower arm design, where a researcher holds and manipulates the leader arms to teleoperate the follower arms with grippers to perform training tasks. As aforementioned, the arms are designed to mimic the joints of a human arm, and the researcher can move and position the arm and gripper naturally. The system then records the joint positions and speed within the joint space and continuously logs every step in the training episode for the user-defined time. Combined with time-encoded video feeds from multiple cameras, you have the primary inputs for a reliable machine-learning model capable of recreating tasks with a high degree of success.
Researchers need to train models to perform as many tasks as a human can, multiple times and in multiple ways to achieve general-purpose functionality. Think of the number of ways you could peel a banana, depending on where it is positioned relative to you, its size, shape, ripeness, etc. Tasks also require training to be distributed over a number of hardware instances to achieve the sheer volume of data needed. A single machine or individual operator cannot be relied upon for thousands of hours of training. Natural language models like ChatGPT reportedly took approximately 1 million hours of training to achieve general-purpose functionality from thousands of participants3.
Researchers can build an extensive training data pool using multiple low-cost research kits like the Aloha Stationary and Aloha Mobile. The larger the data pool, the more “experience” the model has to draw from, both success and failure. Like humans, this data pool of experience can be passed on from one model to another, allowing rapid development and iteration.
The development of general-purpose robotic machine-learning models is rapidly evolving. In the coming months and years, it will move out of the lab and into the consumer and commercial markets, where it will be used for applications in the home, the factory, and many other areas. We at Trossen Robotics are excited to be part of the R&D community by providing the hardware to researchers and engineers need to tackle these challenges.
1. “PR2”. Robots Guide. https://robotsguide.com/robots/pr2
2. H. Kim, L. M. Miller, N. Byl, G. M. Abrams and J. Rosen, "Redundancy Resolution of the Human Arm and an Upper Limb Exoskeleton," in IEEE Transactions on Biomedical Engineering, vol. 59, no. 6, pp. 1770-1779, June 2012, doi: 10.1109/TBME.2012.2194489.
3. Jonathan Vanian and Kif Leswing, March 13, 2023. “ChatGPT and generative AI are booming, but the costs can be extraordinary”. CNBC. https://www.cnbc.com/2023/03/13/chatgpt-and-generative-ai-are-booming-but-at-a-very-expensive-price.html
Yorumlar