Authorship recognition based on linguistics (known as Stylometry) has contributed to literary and historical breakthroughs. These successes have led to the use of these techniques in criminal investigations and prosecutions. Stylometry, however, can also be used to infringe upon the privacy of individuals who wish to publish documents anonymously. Our research demonstrates how various types of attacks can reduce the effectiveness of stylometric techniques down to the level of random guessing and worse. These results are made more significant by the fact that the experimental subjects were unfamiliar with stylometric techniques, without specialized knowledge in linguistics, and spent little time on the attacks. This talk will also examine the ways in which authorship recognition can be used to thwart privacy and anonymity and how these attacks can be used to mitigate this threat. It will also cover our current progress in establishing a large corpus of writing samples and attack data and the creation of a tool which can aid authors in preserving their privacy when publishing anonymously. This research was originally motivated by the idea of using stylometry, which is the study of authorship recognition based on linguistic style, to increase security. Could stylometry be used as an aid for verifying the identity of a user? The first step was to see how stylometry held up against adversarial attacks. We developed two attacks and found that they were devastatingly effective against various methods of stylometry. This turned our goal for the research from looking at how stylometry could increase security by verifying an identity to how attacking stylometry can increase security by helping anonymous authors maintain their privacy and protect their identity. This research presents a framework for adversarial attacks including obfuscation attacks, where a subject attempts to hide their identity and imitation attacks, where a subject attempts to frame another subject by imitating their writing style. The major contribution of this research is that it demonstrates that both attacks work very well. The obfuscation attack reduces the effectiveness of the techniques to the level of random guessing and the imitation attack succeeds with 68-91% probability depending on the stylometric technique used. This research also provides another significant contribution to the field in using human subjects to empirically validate the claim of high accuracy for current techniques (without attacks) by reproducing results for three representative stylometric methods. The talk examines the threat that stylometry can pose to anonymity, and what can be done about it. Advice is offered on how to obfuscate your writing style based on what was learned from the subjects in this study. The talk will also discuss current work to create a tool that helps authors hide their writing style. This tool will use a large corpus of existing writing and attack passages in multiple languages along with a variety of stylometric techniques based on different features and machine learning methods. A call for help is also put out to the listeners and readers of this research to participate in the creation of this corpus in multiple languages so the tool can be helpful to as many authors as possible.
Secdocs is a project aimed to index high-quality IT security and hacking documents. These are fetched from multiple data sources: events, conferences and generally from interwebs.
Serving 8166 documents and 531.0 GB of hacking knowledge, indexed from 2419 authors from 163 security conferences.