This thesis aimed at developing a Drug Name Recognition system to extract drug terms from the cryptomarket forum of Silk Road 2 thanks to a Conditional Random Fields model to operate a classification between the terms that are considered as completely new to a database of well-known drugs, those that are variants of already-known drugs and those that are variants of new drug terms.
This thesis aimed at fulfilling two particular objectives. First, we wanted to analyze whether or not the use of a CRF model could improve the performance of the model. Second, we aimed at investigating whether forum posts could provide useful information for national agencies as regards the early appearance of drug names.
Our model enabled us to discover the presence of 232 new drug names as well as to acknowledge that our model outperforms the results of the pre-annotation phase as well as of other studies.
[Keywords: drug name recognition, conditional random fields, forensic linguistics, cryptomarket forums]