“ChinAI #141: The PanGu Origin Story: Notes from an Informative Zhihu Thread on PanGu”, Jeffrey Ding2021-05-17 (; similar)⁠:

…Crucially, PanGu was a joint effort by researchers from both Huawei and Recurrent AI (循环智能), a provider of AI enterprise services. I was curious about PanGu. A simple search led me to a Zhihu thread titled: “What do you think of the PanGu model released by Huawei on April 25?” Zhihu, known as China’s Quora, is the country’s largest Q&A forum. The initial post linked to an article by Recurrent AI on PanGu. Plus, there were 40 responses to the thread, many of which were very insightful.

Key Takeaways from article linked in the initial Zhihu post: In the article, Recurrent AI claims that PanGu improves on GPT-3 in 3 aspects. The key word here is “claims” as I wasn’t able to trace many of these points to the results reported in the PanGu article itself:

  1. First, it supposedly “surpasses GPT-3 in few-shot learning tasks, addressing issues the latter faces in dealing with complex commercial scenarios with few (training data) samples. For example, in scenarios involving customer voice analysis and analysis of employees’ ability to carry out tasks, when the PanGu NLP large model is used to produce semantic analysis, the sample size required to obtain the target result is only one-tenth of the GPT-3 model. That is, AI’s production efficiency can be increased 10×.”

  2. Second, the PanGu team added prompt-based tasks in the pre-training phase, which greatly reduced the difficulty of fine-tuning. There have been difficulties with fine-tuning previous large models for different industry scenarios. One example from the article: “In a scenario about finding more target customers to increase the conversion rate, in which companies use communication content to determine customer purchase intentions, we found that the PanGu model can increase the order conversion rate by 27% compared to GPT-38.”

  3. I’m not completely sure what Recurrent AI is arguing on the third innovation that PanGu makes on top of GPT-3. They write, “PanGu can recognize intent (of customers?) through few-shot learning, and transform them into queries of knowledge bases and databases, which addresses the issue that large models are difficult to integrate with industry knowledge and data in the past.” My best guess is that they are arguing PanGu can adapt better to industry-specific vocabularies and communications.