Hacker News submission analysis
Describing HN submissions; estimating manipulability
http://
nathanael.hevenet.com/ the-best-time-to-post-on-hacker-news-a-comprehensive-answer/ https://
minimaxir.com/ 201411ya/ 10/ hn-comments-about-comments/ http://
karpathy.ca/ myblog/ 201312ya/ 11/ 27/ quantifying-hacker-news-with-50-days-of-data/ https://
metamarkets.com/ 201114ya/ hacking-hacker-news-headlines/ https://
gkosev.blogspot.com/ 201213ya/ 08/ fixing-hacker-news-mathematical-approach.html https://
blog.rjmetrics.com/ 201213ya/ 10/ 17/ surprising-hacker-news-data-analysis/ https://
blog.rjmetrics.com/ 201213ya/ 10/ 24/ how-to-get-on-the-front-page-of-hacker-news/ https://
blog.datadive.net/ which-topics-get-the-upvote-on-hacker-news/ https://
porter.io/ blog/ hackernews-cheaters-catch-me-if-you-can/ https://
www.righto.com/ 201312ya/ 11/ how-hacker-news-ranking-really-works.html
Page view counts:
11.8k https://
web.archive.org/ web/ 20130719064119/ http:// aberrant.me/ front-page-of-hacker-news/ 15k http://
www.mikedellanoce.com/ 201213ya/ 09/ my-first-hacker-news-effect-experience.html 30k, 12k https://
shkspr.mobi/ blog/ 201213ya/ 11/ whats-the-front-page-of-hackernews-worth/ 8.5k https://
tumbling.alastair.is/ post/ 17661390124/ fun-with-analytics-pitting-hacker-news-and 15k http://
najafali.com/ zero-to-fifteen-thousand-in-twenty-four-hours.html 7k unique visitors https://
www.greig.cc/ journal/ 201312ya/ 1/ what-does-a-hacker-news-traffic-spike-look-like 10k unique visitors https://
whoapi.com/ blog/ 554/ how-hacker-news-hit-us-with-10-000-unique-visitors-in-10-hours/ 40k unique visitors https://
news.ycombinator.com/ “in one 24 hour period, 35,000 visits and 32,000 uniques”item?id=7427542 https://
medium.com/ routific/ what-61-points-on-hn-did-for-my-startup-81bd75a39425 https://
baremetrics.com/ blog/ hacker-news-1500-recurring-revenue 5k pageviews, 3.6k, +42 uniques http://
blog.remoteworknewsletter.com/ 201411ya/ 10/ 15/ how-to-valdiate-an-idea-on-hacker-news/ 41.3k https://
ptotrading.blogspot.com/ 201411ya/ 11/ a-message-to-all-non-daytraders-and.html 15,223 sessions https://
blog.wearewizards.io/ the-hacker-news-effect-examined +315, 17,000 sessions https://
hackernewslater.com/ posts/ post-launch-front-page-hn/
Submissions
Question: is submitting to HN worthwhile?
Simple experiment: submit each day one link to Gwern.net + 2 links to other domains. These serve as both a rough control for that day’s difficulty of front page, any benefits or penalties applied to my account, and repayment to HN for the potential spamming.
per nathanael’s post, I tried to consistently post around 10AM EST (I don’t get up early enough to do 7-8AM EST). Apparently he’s wrong? Oh well.
https://
Sep 201312ya, started with: “Equoid” (Charles Stross meets My Little Pony)’ https://
multi-level Poisson model? Group by Domain and cross group by Day. mixture model? seems appropriate for two different groups (those who make front page and those who don’t)
can’t filter Google Analytics by ycombinator.com
: HTTPS breaks referrers
HN API:
wget 'https://hn.algolia.com/api/v1/search_by_date?tags=story,author_gwern&hitsPerPage=1000'
How to get all my comments? This doesn’t work:
wget 'https://hn.algolia.com/api/v1/search?tags=author_gwern,(comment)&hitsPerPage=999&page=2'
https://hn.algolia.com/api/v1/search?tags=author_gwern,(comment)&hitsPerPage=1000
{"hits":[],"page":2,"nbHits":0,"nbPages":0,"hitsPerPage":999,"processingTimeMS":1,"message":"you can only fetch the 1000 hits for this query, contact us to increase the limit","query":" ","params":"advancedSyntax=true\u0026analytics=false\u0026hitsPerPage=999\u0026page=2\u0026tags=author_gwern%2C%28comment%29"}
algolia API is apparently heavily limited: https://
New API using Firebase https://
library(RCurl)
library(rjson)
user <- "gwern"
user <- fromJSON(getURL(paste0("https://hacker-news.firebaseio.com/v0/user/",user,".json")))
userAll <- sapply(user$submitted, function(id) { Sys.sleep(1); return(fromJSON(getURL(paste0("https://hacker-news.firebaseio.com/v0/item/",id,".json")))); } )
/newest
The social news service Hacker News has a two-layered organization, where newly submitted links are displayed on a ‘/
newest’ page seen by few users, and the best (as determined by users voting on each submission) are automatically selected for display on the high-traffic main front-page which most HN users read. I hypothesized that there is a lack of traffic on / newest and this implies that even one vote can substantially affect the chance a particular submission will reach the front-page, its ultimate score, and page-views of the submitted link. A randomized experiment in upvoting small batches of links confirms that the effect is real & large: TODO.
While using HN, as a sort of ‘public service’, I occasionally made sure to visit the newest submissions page rather than just the main front page most people read. After a while, I noticed that the links I upvoted there seemed to be turning up a lot on the front page, more than I would expect from my usual pattern of upvoting perhaps 5 links out of the 30 available. A horrible suspicion struck me: could the apparent arbitrariness of what links made the front page be caused by the /
Methodology
I decided to do a randomized parallel groups experiment to test: On /echo "$((RANDOM % 2 < 1))"
); and make no votes on any other /
A power calculation for sample size is hard to do: I don’t have a Poisson power function handy, and I don’t expect the data to fit a Poisson too well due to the stark contrast between the front-page and /
The experiment ran from 2014-03-16–2014-03-3111ya. On 22 Alot of the sample is just a waste as they are stuck at +1/
Analysis Plan
The analysis strategy is:
a non-parametric test of difference in mean scores
dichotomize the items by <10 as a proxy for having made it to the front-page for a meaningful amount of time, for a logistic regression to estimate increase in front-page odds
attempt a Poisson regression on scores, to extract an estimate of the difference in means
something fancier, suggested by the data (such as a mixture model of Poissons, perhaps, to split between front-page and non-front-page)
TODO: extract page views & time on page from my old Analytics, letting me calculate ‘how much time am I steering with, say, 10 upvotes on /
URLs
upvoted: https://