“SAYCam: A Large, Longitudinal Audiovisual Dataset Recorded from the Infant’s Perspective”, 2020-01-14 (; backlinks; similar):
We introduce a new resource: the SAYCam corpus.
Infants aged 6–32 months wore a head-mounted camera for ~2 hours per week, over the course of ~2.5 years.
The result is a large, naturalistic, longitudinal dataset of infant-perspective and child-perspective videos. Transcription efforts are underway, with over 200,000 words of naturalistic dialogue already transcribed. Similarly, the dataset is searchable using a number of criteria (eg. age of participant, location, setting, objects present).
The resulting dataset will be of broad use to psychologists, linguists, and computer scientists.