“China’s GPT-3? BAAI Introduces Superscale Intelligence Model ‘Wu Dao 1.0’: The Beijing Academy of Artificial Intelligence (BAAI) Releases Wu Dao 1.0, China’s First Large-Scale Pretraining Model.”, 2021-03-23 (; similar):
[Fun note: the corpus uses The Pile.] In a bid to promote the research and development of China’s own large-scale pretraining models and further explore universal intelligence from a more fundamental perspective, the Beijing Academy of Artificial Intelligence (BAAI) recently unveiled Wu Dao 1.0, China’s first homegrown super-scale intelligent model system. The work was led by BAAI Research Academic Vice President and Tsinghua University Professor Tang Jie, with contributions from a team of more than 100 AI scientists from Peking University, Tsinghua University, Renmin University of China, Chinese Academy of Sciences and other institutes.
Wu Dao 1.0 has initiated large-scale research projects via 4 related models: Wu Dao—Wen Yuan, Wu Dao—Wen Lan, Wu Dao—Wen Hui, and Wu Dao—Wen Su.
Wu Dao—Wen Yuan: is China’s largest-ever pretraining language model, boasting the best processing power in mainstream languages, including Chinese and English. It has surpassed average human performance benchmarks on text categorization, sentiment analysis, natural language inference, reading comprehension and more. The Wu Dao—Wen Yuan project is designed to explore universal natural language understanding (NLU) techniques and study brain-inspired language models. It has 2.6 billion parameters and is capable of performing cognitive activities such as memorization, comprehension, retrieval, numerical calculation, multi-language, etc. Wu Dao—Wen Yuan has achieved GPT-3 comparable performance on 20 Chinese NLP tasks such as open-domain answering, grammar correction, sentiment analysis, etc.
…Wen Yuan introduces the open-source Chinese pretraining model (CPM). Based on CPM, the CPM-Distill model reduces language confusion by 38% and achieves better results on downstream tasks.
Wu Dao—Wen Lan: meanwhile, is the first publicly available Chinese universal graphic multimodal pretraining model. The ultra-large-scale multimodal pretraining model aims to break through the theoretical challenges of pretraining multimodal data based on a combination of graphics, text and video, and eventually generate industrial-grade Chinese graphics pretraining models and applications that exceed SOTA performance. Currently, the model has 1 billion parameters and is trained on 50 million graphic pairs collected from open sources. The Wu Dao—Wen Lan model has reached SOTA performance, scoring 5% higher than the champion team on the Image Caption task on the Chinese public multimodal test set AIC-ICC and 20% higher than the most popular UNITER model on the Visual Entailment task.
…Wen Lan is the first Chinese generic multimodal pretraining model that can understand “connotative information” based on weak correlations of images and text. Wen Lan uses an advanced cross-modal contrast learning algorithm: Given an image-text pair, it can enlarge the number of negative samples for each modal, especially for those which are difficult to distinguish, further improving the expression ability of neural networks. It can easily replace image and text encoders with the most advanced single-mode pretraining model, achieving 20× faster performance than the UNITER model.
Wu Dao—Wen Hui: is an ultra-large-scale cognitive-oriented pretraining model that focuses on a series of essential problems in general artificial intelligence from a cognitive perspective, aiming to develop and enhance the logic/consciousness/reasoning-based cognitive capabilities of pretraining models. Wu Dao—Wen Hui has reached 11.3 billion parameters, and through simple fine-tuning can generate poetry, make videos, draw pictures, retrieve text, perform complex reasoning, etc. BAAI says the model achieves near-human performance on poetry generation on the Turing test.
…Wen Hui proposes a new pretraining paradigm, Generative Language Model, breaking the bottlenecks of BERT and GPT. For the first time in history, a single model has achieved the best results in language understanding and generating tasks, and surpassed common pretraining models such as BERT, RoBERTa and T5 that trained on the same volume of data. Wen Hui’s continuous vector based fine-tuning method, P-tuning, is the first autoregressive model that surpasses the autoencoder model in NLU tasks and has achieved SOTA results on more than 10 tasks such as Knowledge Extraction and SuperGLUE Few-shot Learning, with over 20% performance improvement. Wen Hui’s inverse prompting algorithm achieves close to human performance on the task of Q&A and poetry generation, and is the first model that can generate classical Chinese poetry based on modern themes.
Wu Dao—Wen Su: is a large-scale training model for biomolecular structure prediction. It can handle super long biomolecular structures, where it has achieved SOTA performance, interpretability and robustness. Based on Google’s BERT language model, Wu Dao—Wen Su has completed protein training on the 100 GB UNIPARC database and gene training on 5–100,000 human peripheral blood immune cells (25–30 cell types) and 10,000 drug-resistant bacteria.
…Wen Su’s open-sourced FastMoE is the first high-performance MoE (Mixture-of-Experts Model) system that supports the PyTorch framework and a variety of hardware. Only one line of code is required to complete the MoE transformation, and model training speed is increased by 47× compared with the traditional PyTorch implementation.