Silk Road forums
Discussion => Security => Topic started by: nomodeset on August 24, 2012, 06:42 pm
-
The work I found presents a framework for creating adversarial passages including obfuscation, where a subject attempts to hide their identity, and imitation, where a subject attempts to frame another subject by imitating their writing style, and translation where original passages are obfuscated with machine translation services. This research demonstrates that manual circumvention methods work very well while automated translation methods are not effective. The obfuscation method reduces the techniques' effectiveness to the level of random guessing and the imitation attempts succeed up to 67% of the time depending on the stylometry technique used. These results are more significant given the fact that experimental subjects were unfamiliar with stylometry, were not professional writers, and spent little time on the attacks:
https://www.cs.drexel.edu/~sa499/papers/adversarial_stylometry.pdf
In short,
We developed three methods of circumvention against stylometry techniques in the form of obfuscation, imitation and machine translation passages. Two of these, obfuscation and imitation, were manually written by human subjects. These passages were very effective at circumventing attempts at authorship recognition. Machine Translation passages are automated attempts at obfuscation utilizing machine translation services. These passages were not sufficient in obfuscating the identity of an author. The full results and effectiveness of these circumvention methods are detailed in the evaluation section.
1. Obfuscation. In the obfuscation approach the author attempts to write a document in such a way that their personal writing style will not be recognized. There is no guidance for how to do this and there is no specific target for the writing sample. An ideal obfuscated document would be difficult to attribute to any author. For our study, however, we only look at whether or not it successfully deters recognition of the true author.
2. Imitation. The imitation approach is when an author attempts to write a document such that their writing style will be recognized as that of another specific author. The target is decided upon before a document is written and success is measured both by how successful the document is in deterring authorship recognition systems and how successful it is in imitating the target author. This could also be thought of as a “framing” attack.
3. Machine Translation. The machine translation approach translates an unmodified passage written in english to another language, or to two other languages, and then back to english. Our hypothesis was that this would sufficiently alter the writing style and obfuscate the identity of the original author. We did not find this to be the case. We studied this problem through a variety of translation services and languages. We measured the effectiveness of the translation as an automated method as well as the accuracy of the translation in producing comprehensible, coherent obfuscation passages.
The study demonstrated the effectiveness of adversarial writing against modern methods of stylometry: Neural Network, Writeprints-Static Approach and Synonym-Based Approach.
A short description: neural networks can be used to analyze authorship of texts. Training vectors, or known items, are used as a training set through a process known as backpropagation, where error is calculated and used to update the process to increase accuracy.
Developed by Clark & Hannon, the synonym-based approach exploits the choice of a specific word given all the possible alternatives that exist. The theory behind this method is that when a word has a large number of synonyms, the choice the author makes is significant in understanding his or her writing style.
Writeprints is one of the most successful methods of stylometry that has been published to date because of its high levels of accuracy on a range of data sets with large numbers of unique authors. If I've interpreted it correctly, The method uses the writer invariant: a property of a text which is invariant of its author. An example of a writer invariant is frequency of function words used by the writer. In one such method, the text is analyzed to find the 50 most common words. The text is then broken into 5,000 word chunks and each of the chunks is analyzed to find the frequency of those 50 words in that chunk. This generates a unique 50-number identifier for each chunk. These numbers place each chunk of text into a point in a 50-dimensional space. This 50-dimensional space is flattened into a plane using principal components analysis (PCA). This results in a display of points that correspond to an author's style. If two literary works are placed on the same plane, the resulting pattern may show if both works were by the same author or different authors.
The obfuscation approach weakens all methods to the point that they are no better than randomly guessing the correct author of a document. The imitation approach was widely successful in causing authorship to be attributed to the intended imitation target. Additionally, these passages were generated by participants in very short periods of time by amateur writers who lacked expertise in stylometry. Translation with widely available machine translation services does not appear to be a viable mode of circumvention. Our evaluation did not demonstrate sufficient anonymization and the translated document has, at best, questionable grammar and quality.
-
Thank you for this great post!
-
it is safe to say that the other side has all the tools it could ever need to defeat little Silk Road drug dealers.......so what is the problem, maybe they are just going to slow play it whenever the feds relent and give the goons their green light.............
no doubt there is a few cops who sole plot is to disrupt the discussions various on Silk Road.
Their end will come, but it is the future that scares me...........the past saddens me.........