âZigMa: Zigzag Mamba Diffusion Modelâ, 2024-03-20 ()â :
The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures. In this study, we aim to leverage the long sequence modeling capability of the SSM Mamba to extend its applicability to visual data generation.
Firstly, we identify a critical oversight in most current Mamba-based vision methods, namely the lack of consideration for spatial continuity in the scan scheme of Mamba. Secondly, building upon this insight, we introduce a simple, plug-and-play, zero-parameter method named Zigzag Mamba (ZigMa), which outperforms Mamba-based baselines and demonstrates improved speed and memory usage compared to transformer-based baselines.
Lastly, we integrate Zigzag Mamba with the Stochastic Interpolant framework to investigate the scalability of the model on large-resolution visual datasets, such as FacesHQ 1,024Ă1,024 and UCF101, MultiModal-CelebA-HQ, and MS COCO 256Ă256.
Code will be released at
https://taohu.me/zigma/.
View PDF: