“No Robots: Look Ma, an Instruction Dataset That Wasn’t Generated by GPTs!”, (; similar):
No Robots is a high-quality dataset of 10,000 instructions and demonstrations created by skilled human annotators. This data can be used for supervised fine-tuning (SFT) to make language models follow instructions better.
No Robots was modelled after the instruction dataset described in OpenAI’s InstructGPT paper, and is comprised mostly of single-turn instructions across the following categories:
Category Count Generation 4,560 Open QA 1,240 Brainstorm 1,120 Chat 850 Rewrite 660 Summarize 420 Coding 350 Classify 350 Closed QA 260 Extract 190