The Crawling software: DARPA MEMEX Undercrawler has been used to create a dataset of HTML posts as part of the FloraGuard project for the purposes of studying the online illegal plant trade. HTML Posts in the created dataset contain personal data and may contain evidence of criminality around the online illegal trade in plants. In total nine wildlife trade related forums and marketplaces were crawled, providing 13,697 posts by 4,009 authors in 1,826 forum threads. Posts dated from 2006 to 2019. The Crawling software: DARPA MEMEX Undercrawler is available via Related Resources. Dataset includes processed versions of this raw data, including JSON files of extracted text and metadata, and JSON files of clausal text extracted using OpenIE algorothms. This dataset was used to produce results for the below published work: Middleton, S.E. Lavorgna, A. Neumann, G. Whitehead, D. Information Extraction from the Long Tail: A Socio-Technical AI Approach for Criminology Investigations into the Online Illegal Plant Trade. In Proceedings of ACM Web Science conference (WebSci 2020). ACM, July 6–10, 2020, Southampton, United Kingdom. 4 pages. https://doi.org/10.1145/3394332.3402838Over the last 60 years, commerce in exotic wild plants increased in Western countries (Sajeva et al 2007). Alongside the legal trade in plants, the profitability of the market also boosted illegal markets. Wild plant crimes have long been a focus of concern mainly in conservation science. In criminology, while the illegal trade in wild animals (and animal parts eg ivory) is receiving increasing attention, the illegal trade in plants has so far been under-investigated. However, wild plant trafficking threatens and destroys numerous species and important natural resources (Herbig & Joubert 2006) and hinders the rule of law and security as profits are also used to finance other forms of trafficking (WWF 2016). The Internet has increased the illegal trade in wild plants, facilitating the encounter of supply and demand; no matter how highly specialised the market in certain wild plants, it is much easier to find potential buyers or sellers online than in the physical world (Lavorgna 2014a). There is consensus that the policing of such a criminal activity is still scarce and poorly resourced (Nurse 2011; Elliot 2012; Lavorgna 2014a; Lemieux 2014; Runhovde 2016). A major challenge is the fact that law enforcement agencies have limited training opportunities and lack of equipment and specific expertise to counter effectively this illegal trade (CITES2016). In this context, the question of how can we best control and prevent this criminal market needs to be addressed. The proposed project combines innovative and cross-disciplinary ways of analysing online marketplaces for the illegal trade in endangered plants and analyses of existing policing practices to assist law enforcement in the detection and investigation of illegal trades of endangered plants. It focuses on the UK, which serves as a major transit and destination market for the European region (EU Commission2016). The result of this research will be of significant importance for the work of law enforcement (eg national wildlife crime units, custom officers) in combating the illegal trade in endangered plants (in both its online and offline elements), disrupt criminal networks involved in such trade, and preserve biodiversity. In line with the latest WWF position paper (WWF 2016), the project fosters the improvement of awareness and technical capacity in investigation and prosecution services for wildlife crimes. The proposed approach will identify and disseminate best practice for other researchers and law enforcement officers with an interest in online crime markets and wildlife policing; in addition, it will improve our understanding of the online marketplaces and the offline market routes for the trafficking of endangered plants into Western countries, supporting new avenues of investigation. By integrating insights and expertise from criminology, computer sciences and conservation science, the proposed project has also important implications for demonstrating interdisciplinary methodological developments. The research is structured around three cumulative work-packages (WP). WP1 comprises analysis of economic, social and geographical dynamics of a sample of online marketplaces active in the UK and associated with the illegal trade of endangered plants. WP2 focuses on the policing of this criminal activity by mapping current law enforcement practices and interventions, assessing their effectiveness in the light of the findings of WP1, and identifying law enforcement's needs for more effective policing. WP3 develops and tests a digital package of resources to assist law enforcement investigations into illegal trades of endangered plants in the UK. In doing so, it promotes engagement and effective communication with a non-academic audience (law enforcement, NGOs, botanic gardens, international institutions). The Royal Botanic Garden (Kew, the scientific authority for CITES plant trade in the UK) and the UK Border Force are formal non-academic partners to this project.
Automated crawling