“Pathways: Asynchronous Distributed Dataflow for ML”, 2022-03-23 (; similar):
We present the design of a new large scale orchestration layer for accelerators.
Our system, Pathways, is explicitly designed to enable exploration of new systems and ML research ideas, while retaining state-of-the-art performance for current models. Pathways uses a sharded dataflow graph of asynchronous operators that consume and produce futures, and efficiently gang-schedules heterogeneous parallel computations on thousands of accelerators while coordinating data transfers over their dedicated interconnects. Pathways makes use of a novel asynchronous distributed dataflow design that lets the control plane execute in parallel despite dependencies in the data plane. This design, with careful engineering, allows Pathways to adopt a single-controller model that makes it easier to express complex new parallelism patterns.
We demonstrate that Pathways can achieve performance parity (~100% accelerator utilization) with state-of-the-art systems when running SPMD [single program multiple data] computations over 2×1024 = 2048 TPUv3s [97% utilization training a 128b T5 model], while also delivering throughput comparable to the SPMD case for Transformer models that are pipelined across 16 stages, or sharded across two islands of accelerators connected over a data center network.