Bibliography (7):

https://github.com/vlievin/medical-reasoning
GPT-3: Language Models are Few-Shot Learners
InstructGPT: Training language models to follow instructions with human feedback
PubMedQA: A Dataset for Biomedical Research Question Answering
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Language Models (Mostly) Know What They Know
Wikipedia Bibliography:
1. https://en.wikipedia.org/wiki/United_States_Medical_Licensing_Examination :
  
  https://en.wikipedia.org/wiki/United_States_Medical_Licensing_Examination