"Authorship Analysis on Dark Marketplace Forums", Spitters et al 2015

"Authorship Analysis on Dark Marketplace Forums", Spitters et al 2015:

Anonymity networks like Tor harbor many underground markets and discussion forums dedicated to the trade of illegal goods and services. As they are gaining in popularity, the analysis of their content and users is becoming increasingly urgent for many different parties, ranging from law enforcement and security agencies to financial institutions. A major issue in cyber forensics is that anonymization techniques like Tor's onion routing have made it very difficult to trace the identities of suspects. In this paper we propose classification set-ups for two tasks related to user identification, namely alias classification and authorship attribution. We apply our techniques to data from a Tor discussion forum mainly dedicated to drug trafficking, and show that for both tasks we achieve high accuracy using a combination of character-level n-grams, stylometric features and timestamp features of the user posts.


Comments


[3 Points] None:

Ah, stylometric analysis. I've had an idea for a long time for a service for vendors to beat stylometrics analysis, just haven't ever seen the interest. While we're on the subject, I may as well plug anonymouth, a tool to help anonymize what you write to prevent compromise by textual identification.


[1 Points] ShulginsCat:

Could character n-grams mean specific names (or street names) of drugs? Because that could artificially inflate their accuracy (which is already not that impressive)


[1 Points] None:

[deleted]


[1 Points] ShulginsCat:

I'm interested in writing this scammer account detector as an exercise in ML, and I know you wrote a couple of crawler scripts that go through the markets and download all the listings. A couple of questions on that:

  1. I'm not too keen on being connected to Tor 100% of the time through my home connection. Would it be a bad idea to rent a server somewhere and have it do the crawling?

  2. How would you handle the DoS protection captchas on most markets (I think this started after you left) ? Did you have agreements with the market admins? Or, alternatively, is it possible to do the captcha once on each market and stay logged on forever?


[-1 Points] ghiwidh5f:

WRiting anonalyiss only works if you are reusing the same accounts.. This paper says they proved with 99% accuracy without having the answers. I prove with 99% accuracy that gwern is madeline albright because of comma placement, do i know what i am talking about, hell no, neither do these authors.