AVIATOR: A MITRE Emulation Plan-Derived Living Dataset for Advanced Persistent Threat Detection and Investigation

DOI

With the growing trend for developing new detection and investigation systems for Advanced Persistent Threat (APT), the urgent issue of lacking sound and authentic datasets becomes more visible. New datasets for research on APT detection and investigation have been released over the past few years in an accelerated manner. Yet, our examination of the existing datasets yields the finding that the gap between these datasets’ attack scenarios and real-world APT attacks is significant. Recognizing the flaws of prior datasets particularly in terms of attack scenario complexity and authenticity, we develop a novel sound dataset called Aviator, which is backed by MITRE emulation plans. The well-known organization MITRE has released nearly a dozen emulation plans, which closely reproduce APT groups’ real-world attack campaigns observed in the past. However MITRE has not published any datasets. Thus, we resort to stringently implementing these emulation plans. Further, we extend these emulation plans to include an industrial control system and attack steps on it, mimicking APT groups most known for their attacks against critical infrastructures in the past. Comparing to existing datasets, our dataset Aviator has the highest attack scenario complexity and authenticity. Moreover, Aviator is designed with dataset operability, usability, reproducibility and extensibility in mind, for which existing datasets lag far behind. That is, along with the Aviator dataset, we also provide log shipping tools, log parsing tools, and logging configuration files to encourage other researchers to make their own datasets, which may better suit the evaluation of their detection systems. Besides, we would add more log types in future versions of our dataset Aviator. We are committed to maintaining Aviator as a living dataset.

Identifier
DOI https://doi.org/10.35097/8s5b0u5yqgfs2y0d
Related Identifier IsSupplementTo https://doi.org/10.1109/BigData62323.2024.10826006
Metadata Access https://www.radar-service.eu/oai/OAIHandler?verb=GetRecord&metadataPrefix=datacite&identifier=10.35097/8s5b0u5yqgfs2y0d
Provenance
Creator Liu, Qi (ORCID: 0000-0002-9334-953X)
Publisher Karlsruhe Institute of Technology
Contributor RADAR
Publication Year 2025
Funding Reference Helmholtz-Gemeinschaft 501100001656 Crossref Funder ID 37.12.01 ; Helmholtz-Gemeinschaft 501100001656 Crossref Funder ID 46.23.02 ; Karlsruhe Institute of Technology 100009133 Crossref Funder ID
Rights Open Access; Creative Commons Attribution 4.0 International; info:eu-repo/semantics/openAccess; https://creativecommons.org/licenses/by/4.0/legalcode
OpenAccess true
Representation
Language English
Resource Type Dataset
Format application/x-tar
Discipline Computer Science; Computer Science, Electrical and System Engineering; Engineering Sciences