Normalization of Anomaly Detection Scores Based on Antagonistic Fuzzy-sets - Evaluation Tests - Dataset

Dataset

Normalization of Anomaly Detection Scores Based on Antagonistic Fuzzy-sets - Evaluation Tests

DOI

Normalization of Anomaly Detection Scores Based on Antagonistic Fuzzy-sets - Evaluation Tests

conducted for the paper: Interpreting and Unifying Anomaly Scores with Antagonistic Fuzzy Sets by Félix Iglesias, Tanja Zseby and Arthur Zimek. 2025 IEEE International Conference on Fuzzy Systems.

Context and methodology

Beyond or in addition to binary labels (expressing anomalous and non-anomalous), most anomaly detection algorithms generate scores associated with the anomaly quality of each data point. Raw scores are often difficult to interpret directly, as they depend on the specific data and the analysis algorithm used. To overcome this drawback, traditionally these scores are normalized using the probabilistic interpretation proposed by Kriegel et al [1]. Such approach has obvious benefits, but also presents conceptual issues and some loss of information. A normalization based on antagonistic fuzzy-sets is more natural with the measurements provided by field algorithms while minimizing the possible loss of information. The advantages of fuzzy normalization vs. probabilistic normalization are explored and evaluated with the experiments provided in this repository.

Technical details

Experiments are in Python 3 (tested with v3.9.6). Provided scripts process all data and generate results. We keep paper-results in the repo for the sake of comparability and replicability. The file and folder structure is as follows:

[datasets], folder with datasets in .npz format.

[LICENSES], folder with third-party licences.

[results], folder with results as shown in the paper.

ensemble.py runs evaluation experiments.

extract_tables.py extracts .tex tables and .pdf plots as shown in the paper.

indices.py implement different accuracy performance metrics commonly used in anomaly detection.

perf.csv contains experiment results in tabular format as shown in the paper.

requirements.txt lists Python package dependencies.

LICENSE contains the GNU GPL license text.

README.md provides explanations and step by step instructions for replication.

References

[1] H. Kriegel, P. Kröger, E. Schubert, and A. Zimek, “Interpreting and unifying outlier scores,” in Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011, B. Liu, H. Liu, C. Clifton, T. Washio, and C. Kamath, Eds. United States: Society for Industrial and Applied Mathematics, Dec. 2011, pp. 13–24.

Licenses

All distributed code is under the GNU GPL license.

As for the datasets, they were originally published in the repository: ADBench (https://github.com/Minqi824/ADBench/tree/main), In particular, the Classical collection (https://github.com/Minqi824/ADBench/tree/main/adbench/datasets/Classical)

These datasets are © the original authors and are licensed under the BSD 2-Clause "Simplified" License. No endorsement by the original authors is implied.

Identifier
DOI	https://doi.org/10.48436/043x1-fws73
Related Identifier	HasPart https://github.com/Minqi824/ADBench/
Related Identifier	IsVersionOf https://doi.org/10.48436/dbyyd-ss702
Metadata Access	https://researchdata.tuwien.ac.at/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:researchdata.tuwien.ac.at:043x1-fws73

Provenance
Creator	Iglesias Vazquez, Felix (ORCID: 0000-0001-6081-969X)
Publisher	TU Wien
Publication Year	2025
Rights	GNU General Public License v3.0 or later; https://www.gnu.org/licenses/gpl-3.0-standalone.html
OpenAccess	true
Contact	tudata(at)tuwien.ac.at

Representation
Language	English
Resource Type	Software
Discipline	Other