“No Robots: Look Ma, an Instruction Dataset That Wasn’t Generated by GPTs!”, HuggingFace (, ; similar)⁠:

No Robots is a high-quality dataset of 10,000 instructions and demonstrations created by skilled human annotators. This data can be used for supervised fine-tuning (SFT) to make language models follow instructions better.

No Robots was modelled after the instruction dataset described in OpenAI’s InstructGPT paper, and is comprised mostly of single-turn instructions across the following categories:

Category Count
Generation 4,560
Open QA 1,240
Brainstorm 1,120
Chat 850
Rewrite 660
Summarize 420
Coding 350
Classify 350
Closed QA 260
Extract 190