Temporal Silhouette for Stream Clustering Validation - Evaluation Tests

DOI

Temporal Silhouette for Stream Clustering Validation - Evaluation Tests conducted for the paper: Temporal Silhouette: Validation of Stream Clustering Robust to Concept Drift Context and methodology The Temporal Silhouette (TS) is an index for the internal validation of stream clustering that is robust and consistent in the event of concept drift and different types of outliers. TS is based on the well-known Silhouette index (Rousseeuw, 1987). In this repository, TS is compared with 3 popular CVIs (Silhouette, Davies-Bouldin, Calinski-Harabasz) and 3 iCVIs (incremental Xie-Beni index, incremental Partition Separation index, incremental representative Cross Information Potential) when evaluating performances of 4 stream clustering algorithms (CluStream, DenStream, BIRCH and StreamKMeans++). Different data scenarios are used: 2 real-life cases, 4 stationary popular datasets for clustering evaluation submitted to 32 different forms-levels of degradation, and 200 synthetic scenarios that implement different types of concept drift identified in the literature, as well as spatial and temporal outliers. This repository is framed within the research on the following domains: algorithm evaluation, streaming data analysis, stream clustering, unsupervised learning, machine learning, data mining, data analysis. Datasets and algorithms can be used for experiment replication and for further clustering evaluation and comparison. References Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Comput and Applied Mathematics 20:53–65 Technical details Experiments are conducted in Python 3. The file and folder structure is as follows:

[dataR] contains 4 datasets obtained from real data. [dataS] contains 80 synthetic datasets for concept drift tests. [dataT] contains 4 datasets for stationary tests [plots] contains plots results generated by test scripts. [results] contains tables with results generated by test scripts. [utils] contains utilities for transforming data and plotting results. "dependencies.py" installs required python packages. "LICENSE" file. "README.md" for further details, link to sources and instructions for reproducibility. "run_analysis_real.py" runs experiments with stream clustering and real data. "run_analysis_synthetic.py" runs experiments with stream clustering and synthetic data submitted to concept drift. "run_stationary.py" runs experiments with stationary data submitted to different perturbations. "run_TS_stability.py" runs sensitivity analysis on TS w and k parameters. "toy_tests.py" shows some simple examples of TS main cases of concept drift. "TSindex.py" implements and provides TS functions. License The CC-BY license applies to all data generated with MDCgen. All distributed code is under the GNU GPL license. Note This version replaces and makes obsolete: Iglesias Vázquez, Felix (2023). py-temporal-silhouette-main.zip. figshare. Conference contribution. https://doi.org/10.6084/m9.figshare.22149854.v1

Identifier
DOI https://doi.org/10.48436/ss6a3-3r720
Related Identifier IsVersionOf https://github.com/CN-TU/py-temporal-silhouette/tree/main
Related Identifier IsNewVersionOf https://doi.org/10.6084/m9.figshare.22149854.v1
Metadata Access https://researchdata.tuwien.ac.at/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:researchdata.tuwien.ac.at:ss6a3-3r720
Provenance
Creator Iglesias Vázquez, Félix (ORCID: 0000-0001-6081-969X)
Publisher TU Wien
Publication Year 2023
Rights Creative Commons Attribution 4.0 International; GNU General Public License v3.0 or later; https://creativecommons.org/licenses/by/4.0/legalcode; https://www.gnu.org/licenses/gpl-3.0-standalone.html
OpenAccess true
Contact tudata(at)tuwien.ac.at
Representation
Resource Type Software
Version 2.0.0
Discipline Other