IDTraffickers: An Authorship Attribution Dataset to link and connect Potential Human-Trafficking Operations on Text Escort Advertisements

DOI

Human trafficking (HT) is a pervasive global issue affecting vulnerable individuals, violating their fundamental human rights. Investigations reveal that a significant number of HT cases are associated with online advertisements (ads), particularly in escort markets. Consequently, identifying and connecting HT vendors has become increasingly challenging for Law Enforcement Agencies (LEAs). To address this issue, we introduce IDTraffickers, an extensive dataset consisting of 87,595 text ads and 5,244 vendor labels to enable the verification and identification of potential HT vendors on online escort markets. To establish a benchmark for authorship identification, we train a DeCLUTR-small model, achieving a macro-F1 score of 0.8656 in a closed-set classification environment. Next, we leverage the style representations extracted from the trained classifier to conduct authorship verification, resulting in a mean r-precision score of 0.8852 in an open-set ranking environment. Finally, to encourage further research and ensure responsible data sharing, we plan to release IDTraffickers for the authorship attribution task to researchers under specific conditions, considering the sensitive nature of the data. We believe that the availability of our dataset and benchmarks will empower future researchers to utilize our findings, thereby facilitating the effective linkage of escort ads and the development of more robust approaches for identifying HT indicators.

The dataset contains text sequences as inputs and labels as the arbitrary vendor IDs obtained by linking the phone numbers mentioned in Backpage escort market advertisements. To protect privacy, all personal information, except for the pseudonyms used by the escorts and the post locations, has been redacted so that it cannot be retrieved. For more details, kindly refer to our research attached to the submission. It is important to emphasize that this dataset should only be used for its intended purpose, research on authorship attribution of vendors on escort markets, and not other commercial/non-commercial purposes.

Identifier
DOI https://doi.org/10.34894/NZ7VLC
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/NZ7VLC
Provenance
Creator Saxena, Vageesh ORCID logo; Van Dijck, Gijs ORCID logo; Spanakis, Gerasimos ORCID logo; Bashpole, Benjamin
Publisher DataverseNL
Contributor Saxena, Vageesh; Van Dijck, Gijs; Spanakis, Jerry
Publication Year 2023
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Saxena, Vageesh (Maastricht University); Van Dijck, Gijs (Maastricht University); Spanakis, Jerry (Maastricht University)
Representation
Resource Type Dataset
Version 1.0
Discipline Agriculture, Forestry, Horticulture, Aquaculture; Agriculture, Forestry, Horticulture, Aquaculture and Veterinary Medicine; Construction Engineering and Architecture; Engineering; Engineering Sciences; Jurisprudence; Law; Life Sciences; Social Sciences; Social and Behavioural Sciences; Soil Sciences