“Goodreads Scraper”, 2020-01-11 (; backlinks):
These Python scripts can be used to collect book reviews and metadata from Goodreads.
We were motivated to develop this Goodreads Scraper because the Goodreads API is difficult to work with and does not provide access to the full text of reviews. The Goodreads Scraper instead uses the web scraping libraries Beautiful Soup and Selenium to collect data.
We used this Goodreads Scraper to collect data for our article, “The Goodreads ‘Classics’: A Computational Study of Readers, Amazon, and Crowdsourced Literary Criticism”. To allow others to reproduce (approximately) the data we used in the essay, we include a file with 144 Goodreads book IDs for the 144 classics that we analyzed (
goodreads_classics.txt). You can use these IDs to collect corresponding reviews and metadata with the Goodreads Scraper as described below.Note: Updates to the Goodreads website may break this code. We don’t guarantee that the scraper will continue to work in the future, but feel free to post an issue if you run into a problem. …
get_books.py: You can use the Python scriptget_books.pyto collect metadata about books on Goodreads, such as the total number of Goodreads reviews and ratings, average Goodreads rating, and most common Goodreads “shelves” for each book. This script takes as input a list of book IDs, stored in a plain text file with one book ID per line. Book IDs are unique to Goodreads and can be found at the end of a book’s URL. For example, the book ID for Little Women is1934.Little_Women.get_reviews.py: You can use the Python scriptget_reviews.pyto collect reviews and review metadata about books on Goodreads, including the text of the review, star rating, username of the reviewer, number of likes, and categories or “shelves” that the user has tagged for the book.