“Goodbooks-10k: a New Dataset for Book Recommendations”, 2017-11-29 (; backlinks):
There have been a few recommendations datasets for movies (Netflix, Movielens) and music (Million Songs), but not for books. That is, until now. The dataset contains six million ratings for ten thousand most popular books (with most ratings). There are also:
books marked to read by the users
book metadata (author, year, etc.)
tags/shelves/genres
As to the source, let’s say that these ratings come from a site similar to
goodreads.com, but with more permissive terms of use. There are a few types of data here:
explicit ratings
implicit feedback indicators (books marked to read)
tabular data (book info)
tags
…All files are available on GitHub. Some of them are quite large, so GitHub won’t show their contents online. See samples for smaller CSV snippets. You can download individual zipped files from releases.