âACT: Learning Fine-Grained Bimanual Manipulation With Low-Cost Hardwareâ, 2023-04-23 (; backlinks)â :
Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback.
Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up.
Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface.
Imitation learning, however, presents its own challenges, particularly in high-precision domains: errors in the policy can compound over time, and human demonstrations can be non-stationary.
To address these challenges, we develop a simple yet novel algorithm, Action Chunking with Transformers (ACT), which learns a generative model over action sequences [but not the environment or rewards, so not a Decision Transformer]âŚthe model has around 80M parameters, and we train from scratch for each task. The training takes around 5 hours on a single 11GB Nvidia RTX 2080ti GPU, and the inference time is around 0.01 seconds on the same machine.
ACT allows the robot to learn 6 difficult tasks in the real world, such as opening a translucent condiment cup and slotting a battery with 80â90% success, with only 10 minutes worth of demonstrations.
Project website: ALOHA.