“Do Large Language Models Understand Chemistry? A Conversation With ChatGPT”, Cayque Monteiro Castro Nascimento, André Silva Pimentel2023-03-16 ()⁠:

Large language models (LLMs) have promised a revolution in answering complex questions using the ChatGPT model. Its application in chemistry is still in its infancy. This viewpoint addresses the question of how well ChatGPT understands chemistry by posing 5 simple tasks in different subareas of chemistry.

…To illustrate the underlying issues, we focus our discussion on specific tasks that LLMs might apply in chemistry using the OpenAI ChatGPT with the InstructGPT model, text-davinci-003, which has knowledge of chemistry equations and common calculations.27 However, the outcomes might not be of relevance to other LLMs described anywhere. It follows the control parameters used in the predictions made in this viewpoint. Temperature is one of the most important settings to control the output of the GPT-3 engine. It controls the randomness of the generated text.28 A value of 0 makes the engine deterministic, which means that it will always generate the same output for a given input text, using 0.1 will be more deterministic. The maximum tokens are 256 (1 token is around 4 characters) that can be generated by the model.29 A standard “top p” parameter equal to 1 controls how many words or phrases the language model considers when it is trying to predict a sentence. A frequency penalty of 0 was used to lower the chances of a word being selected again. Also used was a presence penalty of 0, that encourages the model to make novel predictions.

  1. Convert a Compound Name into the SMILES Chemical Representation and Vice Versa

  2. Finding Information on Octanol-Water Partition Coefficients of Chemical Compounds

  3. Getting Structural Information on Coordination Compounds

  4. Water Solubility of Polymers
  5. Molecular Point Groups

…it is presented here in 5 tasks that the accuracy in answering the questions was between 25% and 100% without any tricks. The low or high accuracy depends on several important considerations: reasonable prompts should give correct answers, questions on popular subjects are easily answered, very specific topics that are not well included in a database or are not well trained in the model gives low accuracy, and the development of better prompts or strategies for training and fitting this knowledge in models might output better results.13

In this viewpoint, we attempted to mimic a regular student prompting the ChatGPT model to answer questions on chemistry subjects without using any tricks such as inserting copyright notices in source files or fine-tuning with human feedback.