“Crawling and Analysis of Dark Network Data”, Ying yang, Guichun Zhu, Lina Yang, Huanhuan yu2020 (; similar)⁠:

Due to its anonymity and non-traceability, it is very difficult to research websites on the dark web. The research of the dark web is very important for our network security. Now there is very little data for studying the dark web, so we independently developed dark web crawler that runs automatically.

This article will detail the implementation process of our dark web crawler and the data analysis process of crawled data. Currently, we can use crawled data to detect if multiple URLs belong to the same site. We can use data to extract features of similar websites and we have generated an ever-increasing data set that can be used for simple website classification.

We use the crawled data as a categorical dataset to categorize newly discovered URLs. When we get a certain number of new URLs, we crawl again and the crawled data will be added to the previous data set. After multiple rounds of crawling, our data sets will be more and more abundant.

Through our approach, we can solve the problem that the dark web data is small, researchers can use our method to get enough data to study all aspects of the dark web.