Cloud ML Computing Part 1: Google Colab

Oct 11, 2024

As machine learning continues to expand into robotics, the need for efficient and accessible tools has never been greater. In our latest development, we’ve created a Google Colab Notebook designed to streamline the process of training robotic models using cloud-based GPUs, enabling users to harness the power of top-tier hardware without the cost or complexity of maintaining it themselves. Whether you're working with our Aloha Kits or any other dataset, this notebook makes it incredibly easy to train models in the cloud—eliminating the need for expensive local resources.

Why Colab and GPUs?

Training machine learning models can be computationally demanding, especially for tasks in robotics like imitation learning, where a large amount of data and high-performance hardware is required. GPUs can dramatically speed up the process, cutting training times down significantly. Google Colab provides an accessible way for users to utilize powerful GPUs like the A100 and T4, making it an excellent option for those without access to high-end local hardware. In our tests, using an A100 GPU, we trained the Action Chunking Transformer (ACT) model for a peg insertion task with 50 episodes and 80,000 steps in just 5 hours—utilizing 70 compute units. This notebook is designed to let you tap into that power with minimal setup, making model training faster and easier than ever.

Key Features of the Colab Notebook

Seamless Setup: The notebook guides you through selecting the appropriate GPU, downloading datasets from Hugging Face, and configuring training parameters. It even supports resuming training from a checkpoint, ensuring you don't lose progress.
Access to Datasets: By linking directly to the Trossen Robotics Community on Hugging Face, the notebook makes it simple to choose from various datasets. You can also use your data if preferred—paste the repository ID into the provided cell, and you’re ready to go.
Flexible Training Parameters: We’ve built-in customization options, allowing users to adjust batch sizes, learning rates, and training steps through a YAML file. For instance, the batch size used for our ACT model was set to 8, ensuring efficient processing of each training step while still maintaining high performance.
GPU Utilization: The notebook ensures smooth execution on powerful GPUs by monitoring the initial training epochs and confirming sufficient compute units. This avoids interruptions, ensuring long training sessions proceed without hitches. For our peg insertion task, utilizing 70 compute units over a 5-hour training period was optimal, and the model achieved solid results, demonstrating how leveraging cloud GPUs can significantly boost training efficiency.

Why This Matters for Robotics

Robotic tasks like fine manipulation require models to be trained efficiently and quickly, which can be difficult without the proper hardware. Our notebook bridges this gap by allowing users to harness the power of cloud-based GPUs, making training accessible to a much broader audience. Whether you're a researcher, developer, or enthusiast, you can now train advanced models like ACT on your tasks, improving your robot's performance without needing to worry about expensive hardware.

How to Use the LeRobot Colab Notebook

Start with the Basics: After launching the notebook, select your GPU type (A100, T4) and ensure that you have enough compute units for the session. If you’re resuming training, make sure your checkpoint file is available.
Log into Hugging Face: Simply enter your Hugging Face token to log in and access datasets from the Trossen Robotics Community or any dataset you choose to work with.
Set Training Parameters: Adjust the training parameters like batch size and learning rate by editing the provided YAML configuration file. For example, we used 50 episodes, batch size 8, and 80,000 training steps when training our ACT model on the peg insertion task.
Upload or Download Results: Once training is complete, you can store the model on Hugging Face for easy access and sharing or download the outputs locally to safeguard your data.

Training Results and Performance

In our trials, the ACT model was trained for a peg insertion task, which involved 50 episodes with a batch size of 8, running for 80,000 steps. Using an A100 GPU, the training completed in just 5 hours, utilizing 70 compute units. This demonstrates the efficiency and performance improvements that cloud-based GPUs can offer when training complex models, reducing training times by more than half compared to local machines.

By the end of the training, the ACT model showed remarkable performance in fine manipulation tasks, such as peg insertion, highlighting the capabilities of low-cost, cloud-based model training for robotics. Using the Colab notebook ensures you’re not limited by your local hardware, opening up the possibility for more complex and computationally intensive tasks.

Estimating Compute Units in Real-Time

Once you’ve connected to a GPU or CPU and started training, you can easily estimate the number of compute units required for your entire session. By opening the Resources tab (as shown in the image), you’ll see the approximate usage rate, which indicates how many compute units are consumed per hour.

You can also estimate the total time required for training by timing how long it takes to complete the first few steps. For example, if it takes 20 seconds to complete 100 steps, and your full training session includes 80,000 steps, you can calculate the total training time using the following formula:

Using the above example:

With this training time and the compute unit usage rate (e.g., 15.9 compute units per hour as shown in the image), you can now estimate the total compute units required:

For example, with a usage rate of 15.9 units per hour and an estimated training time of 4.44 hours:

Always remember to add a buffer of extra compute units to accommodate any variability in the training process. This method gives you a quick and practical way to estimate the resources you'll need for your entire session before proceeding with the full training.

Get Started Today!

Ready to simplify your robotic model training? Check out our LeRobot Colab Notebook today and experience how easy and efficient cloud-based training can be. Whether you're a beginner or an expert, this tool is designed to accelerate your workflows and get you closer to deploying powerful models in the real world.

For more details and a step-by-step walk-through, you can watch our video tutorial.

Happy experimenting!

Trossen AI

WidowX AI

Solo AI

Stationary AI

Mobile AI

Legacy

PincherX 100

WidowX 250

ViperX 300

Aloha Solo

Aloha Stationary

Aloha Mobile

WidowX Aloha Arm

ViperX Aloha Arm

Research UGVs

Ranger

Ranger Mini 3.0

Titan

Bunker Pro

Bunker Mini

Scout Mini

SLATE

Quads

Go2

B2