Compressed Population-scale Network Netherlands 2009-2023 in MLN format

DOI

This record (and attached Documentation, see Files) describes the content and structure of “Compressed Population-scale Network Netherlands 2009-2023”, a project-generated file available via the CBS Microdata environment. The dataset consists of yearly node-aligned multilayer network (MLN) objects for the entire resident population of the Netherlands, where edges are based on the person networks published within Microdata by CBS (BURENNETWERKTAB, COLLEGANETWERKTAB, HUISHOUDENNETWERKTAB, FAMILIENETWERKTAB, KLASGENOTENNETWERKTAB), and some basic node information and layer decoding are also present alongside the edge information.

The advantage of this data source is that compared to the original data, MLN objects are small in size, encode all different edge types (layers) between two people in the same location, and are memory-efficient if read with the corresponding mlnlib Python library. The mlnlib library also contains optimized procedures for further network analysis, sample selection and aggregation tasks, as well as exports into common network libraries or formats.

Each yearly network is directed and unweighted consisting of edges of 32 different types or layers belonging to 5 larger layer categories (called groups in the object) corresponding to the Microdata person network tables. The MLN objects follow the intermediate format required for fast loading by the mlnlib Python library (https://pypi.org/project/mlnlib), with each yearly network consisting of the following three files:

layers.csv (list of layers and layer groups), nodes.csv.gz (node table with attributes, compressed CSV), and edges.npz (sparse adjacency matrix using 64‑bit unsigned integers to encode layer presence using a binary encoding).

The mlnlib library which uses this as a native format can installed through ‘pip’ in a Python environment, as such, it can be requested when setting up a Microdata project.

Please check "Notes" for the general description of the micro databases such as the definition of the population, methodological details and the quality and origin of the data. In "Notes", you can find a description of the format of the database and the list of all possible scores for the categorical variables and their significance.

This dataset is only available under strict conditions within the Microdata environment at CBS. Since this is a project-generated dataset made available for reuse via the CBS Data Storage, access to the following source CBS files has to be requested:

BURENNETWERKTAB 2009–2023

COLLEGANETWERKTAB 2009–2023

FAMILIENETWERKTAB 2009–2023

HUISGENOTENNETWERKTAB 2009–2023

KLASGENOTENNETWERKTAB 2009–2023

GBAPERSOONTAB 2009–2023

KINDOUDERTAB 2024

GBAADRESOBJECTBUS 2009–2023

GBAOVERLIJDENTAB 2022–2024

(see full list under "Data Source"). Read more about the conditions to access and use the CBS data: www.cbs.nl/nl-nl/onze-diensten/maatwerk-en-microdata/microdata-zelf-onderzoek-doen

Description

This dataset was constructed from the five CBS person network microdata layers with additional information on the underlying population for each calendar year 2009–2023. The construction uses the conversion toolchain from the repository https://github.com/planet-nl/compressed_population_scale_network_nl, which reads the original edge lists from the RA along with some node information, and produces per‑year MLN objects that are fast to load. The conversion aligns nodes across layers and years, encodes multilayer edges in a single sparse adjacency matrix using binary encoding for the layers, merges node attributes from multiple CBS sources into nodes.csv.gz, and produces analysis‑ready files per year (layers.csv, nodes.csv.gz, edges.npz).

Performance & footprint:

Each yearly MLN typically loads into RA memory in ~1–2 minutes (depending on year size). Storage footprint is reduced to ~6% of the raw source edge‑list volume by using compressed sparse matrices and compressed node tables.

Reproducibility:

The RA conversion pipeline is fully scripted and can automatically be re‑run for additional new years. Configuration files (JSON) define file linkages and layer definitions. Its code is available in the following repository: https://github.com/planet-nl/compressed_population_scale_network_nl Public code for MLN analysis and generic CSV→MLN conversion used in the above toolchain is available via pip install https://pypi.org/project/mlnlib/, as well as on the PLANET-NL github: https://github.com/planet-nl/mlnlib.

Description of the population

Year YYYY: all people alive and registered in the BRP with an active address on January 1 in year YYYY. All years: The network nodes are aligned based on the union of the yearly populations between 2009 and 2023, thus the full population of the dataset is anyone having an active address and alive on at least one of the January 1st dates of the years between 2009 and 2023. The total number of rows and columns in yearly edges.npz adjacency matrices correspond to the number of people in this unioned population.

Structure of database

Each year YYYY has a separate folder, and each year folder contains three files: layers.csv, nodes.csv.gz, and edges.npz.

Identifier
DOI https://doi.org/10.34894/8575OP
Related Identifier IsCitedBy https://doi.org/10.1038/s41598-023-36324-9
Related Identifier IsCitedBy https://doi.org/10.5281/ZENODO.10838866
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/8575OP
Provenance
Creator Bokányi, Eszter ORCID logo; Kazmina, Yuliia ORCID logo; Heemskerk, Eelke M. ORCID logo; Takes, Frank W. ORCID logo; PLANET-NL Team
Publisher DataverseNL
Contributor Bokányi, Eszter; PLANET-NL Team; Leiden University and University of Amsterdam through the PLANET-NL project; Centraal Bureau voor Statistiek
Publication Year 2025
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Bokányi, Eszter (Leiden University); PLANET-NL Team
Representation
Resource Type Dataset
Format application/pdf
Size 343273
Version 1.0
Discipline Agriculture, Forestry, Horticulture, Aquaculture; Agriculture, Forestry, Horticulture, Aquaculture and Veterinary Medicine; Life Sciences; Social Sciences; Social and Behavioural Sciences; Soil Sciences