Which books on GoodReads are most difficult to finish? Estimating proportions in December 2019 gives an entirely different result than absolute counts.
What books are hardest for a reader who starts them to finish, and most likely to be abandoned? I scrape a crowdsourced tag, abandoned, from the GoodReads book social network on 2019-12-09 to estimate conditional probability of being abandoned.
The default GoodReads tag interface presents only raw counts of tags, not counts divided by total ratings (=reads). This conflates popularity with probability of being abandoned: a popular but rarely-abandoned book may have more abandoned tags than a less popular but often-abandoned book. There is also residual error from the winner’s curse where books with fewer ratings are more mis-estimated than popular books. I fix that to see what more correct rankings look like.
Correcting for both changes the top-5 ranking completely, from (raw counts):
I also consider a model adjusting for covariates (author/average-rating/year), to see what books are most surprisingly often-abandoned given their pedigrees & rating etc. Abandon rates increase the newer a book is, and the lower the average rating.
Books at the top of the adjusted list appear to reflect a mix of highly-popular authors changing genres, and ‘prestige’ books which are highly-rated but a slog to read.
These results are interesting for how they highlight how people read books for many reasons (such as marketing campaigns, literary prestige, or following a popular author), and this is reflected in their decision whether to continue reading or to abandon a book.