“ChinAI #141: The PanGu Origin Story: Notes from an Informative Zhihu Thread on PanGu”, 2021-05-17 (; similar):
…Crucially, PanGu was a joint effort by researchers from both Huawei and Recurrent AI (循环智能), a provider of AI enterprise services. I was curious about PanGu. A simple search led me to a Zhihu thread titled: “What do you think of the PanGu model released by Huawei on April 25?” Zhihu, known as China’s Quora, is the country’s largest Q&A forum. The initial post linked to an article by Recurrent AI on PanGu. Plus, there were 40 responses to the thread, many of which were very insightful.
Key Takeaways from article linked in the initial Zhihu post: In the article, Recurrent AI claims that PanGu improves on GPT-3 in 3 aspects. The key word here is “claims” as I wasn’t able to trace many of these points to the results reported in the PanGu article itself:
First, it supposedly “surpasses GPT-3 in few-shot learning tasks, addressing issues the latter faces in dealing with complex commercial scenarios with few (training data) samples. For example, in scenarios involving customer voice analysis and analysis of employees’ ability to carry out tasks, when the PanGu NLP large model is used to produce semantic analysis, the sample size required to obtain the target result is only one-tenth of the GPT-3 model. That is, AI’s production efficiency can be increased 10×.”
Second, the PanGu team added prompt-based tasks in the pre-training phase, which greatly reduced the difficulty of fine-tuning. There have been difficulties with fine-tuning previous large models for different industry scenarios. One example from the article: “In a scenario about finding more target customers to increase the conversion rate, in which companies use communication content to determine customer purchase intentions, we found that the PanGu model can increase the order conversion rate by 27% compared to GPT-38.”
I’m not completely sure what Recurrent AI is arguing on the third innovation that PanGu makes on top of GPT-3. They write, “PanGu can recognize intent (of customers?) through few-shot learning, and transform them into queries of knowledge bases and databases, which addresses the issue that large models are difficult to integrate with industry knowledge and data in the past.” My best guess is that they are arguing PanGu can adapt better to industry-specific vocabularies and communications.
…Right at the beginning of his post, Jin clarifies that Huawei actually released 2 large NLP models at the HDC conference (both named after PanGu). The other one was an encoder-decoder Transformer. Here’s the key point: the training of both 100-billion parameter scale models was a collaboration between various Huawei divisions and Peng Cheng Lab (PCL), which provided computing power support…CloudBrain 1 is a large-scale cluster system with 100 Petaflops of computing power, including NVIDIA GPUs, Huawei GPUs, and Cambrian AI chips. A machine of 1,000 Petaflops will probably be built next year, which can be used by universities, research institutes, and SMEs for training models. The goal of 1,000 Petaflops (an exaflop) is generally considered a big milestone for compute over the next few years
He concludes with his expectation of future trends: “In order to gain more knowledge from pre-training, models such as GPT-3 and PanGu will become larger and larger. After all, we have not seen the limit of the pre-training benefits for large models. At that time, this type of model will have greater infrastructure requirements, and data parallelism and optimization strategies will be more complex…in the future, we need more researchers to devote themselves to the research of general intelligence and large-scale distributed computing.”