“Beyond the Pass Mark: the Accuracy of ChatGPT and Bing in the National Medical Licensure Examination in Japan”, 2023-03-10 ():
We evaluated the accuracy of ChatGPT, which was released in late 2022, and Bing, which was released in February 2023, in solving clinical questions.
As a measurement tool, we used the questions of the National Medical Licensure Examination in Japan [written in Japanese].
Bing has an accuracy level to pass the national medical licensing exam in Japan. However, Bing has an error rate of about 20% [78% right total; ChatGPT: 38%], and users could not determine the correctness of the answer based solely on the wording of the answers.
Bing cannot be used for clinical decision-making in the current form.
[Keywords: Bing, ChatGPT, clinical problem solving, evidence based medicine, large language models]
…The accuracy of ChatGPT was lower than prior studies using the United States Medical Licensing Examination3, 4. The limited amount of Japanese language data may have affected the ability of ChatGPT to correctly answer medical questions in Japanese8. This is an important point to consider when applying LLM in other languages.