Bibliography (4):

  1. https://www.lesswrong.com/posts/oBpebs5j5ngs3EXr5/a-summary-of-anthropic-s-first-paper-3

  2. Evaluating Large Language Models Trained on Code

  3. https://gist.github.com/jareddk/2509330f8ef3d787fc5aaac67aab5f11

  4. HellaSwag: Can a Machine Really Finish Your Sentence?