Dataset 1: Beijing Subway Smart Card Data (April 2010)
This dataset contains anonymized smart card transaction records from the Beijing subway system, collected during one week in April 2010. It consists of 239,728 records, each corresponding to a unique trip taken by passengers using a smart card. Each record includes a unique card identifier, check-in and check-out timestamps, and station locations. While these smart cards are purchased anonymously and do not store personal identity information, the presence of unique identifiers allows for the extraction of travel patterns, potentially leading to individual re-identification. This characteristic is particularly relevant to privacy considerations under the General Data Protection Regulation (GDPR), as pseudonymization alone is not sufficient for privacy protection.
To mitigate re-identification risks, we applied detection k-anonymity, a privacy-preserving method that ensures individual trip data is generalized within groups of similar travel patterns. The accompanying Python implementation of detection k-anonymity demonstrates how this technique can be applied to large-scale mobility data, preserving both analytical utility and privacy. This dataset and method were used as part of our research on privacy-preserving mobility analytics.
Dataset 2: Lelystad Public Transport Smart Card Data (January–April 2023)
This dataset comprises 230,228 smart card travel records collected from the public bus network of Lelystad, the Netherlands, covering the period from 08 January 2023 to 01 April 2023. The Lelystad bus network consists of 98 transport nodes connected by 194 edges, primarily bidirectional routes. Each trip record contains information on the origin and destination bus stops, timestamps, bus lines used, and a uniquely encrypted smart card identifier per trip.
Given the complexity of real-world public transportation networks, this dataset was used to evaluate advanced data processing techniques, particularly for handling complex queries related to passenger flows, route optimization, and privacy considerations. The dataset was analyzed using privacy-preserving methods to ensure compliance with data protection standards.
The associated Python implementation demonstrates how detection k-anonymity can be applied to smart card data to prevent individual re-identification while maintaining the dataset's analytical utility. This dataset supports research on privacy-aware public transportation analytics and network modeling.