This record (and attached Documentation, see Files) describes the content and structure of “Compressed Population-scale Network Netherlands 2009-2023”, a project-generated file available via the CBS Microdata environment. The dataset consists of yearly node-aligned multilayer network (MLN) objects for the entire resident population of the Netherlands, where edges are based on the person networks published within Microdata by CBS (BURENNETWERKTAB, COLLEGANETWERKTAB, HUISHOUDENNETWERKTAB, FAMILIENETWERKTAB, KLASGENOTENNETWERKTAB), and some basic node information and layer decoding are also present alongside the edge information.
The advantage of this data source is that compared to the original data, MLN objects are small in size, encode all different edge types (layers) between two people in the same location, and are memory-efficient if read with the corresponding mlnlib Python library. The mlnlib library also contains optimized procedures for further network analysis, sample selection and aggregation tasks, as well as exports into common network libraries or formats.
Each yearly network is directed and unweighted consisting of edges of 32 different types or layers belonging to 5 larger layer categories (called groups in the object) corresponding to the Microdata person network tables. The MLN objects follow the intermediate format required for fast loading by the mlnlib Python library (https://pypi.org/project/mlnlib), with each yearly network consisting of the following three files:
layers.csv (list of layers and layer groups),
nodes.csv.gz (node table with attributes, compressed CSV), and
edges.npz (sparse adjacency matrix using 64‑bit unsigned integers to encode layer presence using a binary encoding).
The mlnlib library which uses this as a native format can installed through ‘pip’ in a Python environment, as such, it can be requested when setting up a Microdata project.
Please check "Notes" for the general description of the micro databases such as the definition of the population, methodological details and the quality and origin of the data. In "Notes", you can find a description of the format of the database and the list of all possible scores for the categorical variables and their significance.
This dataset is only available under strict conditions within the Microdata environment at CBS. Since this is a project-generated dataset made available for reuse via the CBS Data Storage, access to the following source CBS files has to be requested:
BURENNETWERKTAB 2009–2023
COLLEGANETWERKTAB 2009–2023
FAMILIENETWERKTAB 2009–2023
HUISGENOTENNETWERKTAB 2009–2023
KLASGENOTENNETWERKTAB 2009–2023
GBAPERSOONTAB 2009–2023
KINDOUDERTAB 2024
GBAADRESOBJECTBUS 2009–2023
GBAOVERLIJDENTAB 2022–2024
(see full list under "Data Source"). Read more about the conditions to access and use the CBS data: www.cbs.nl/nl-nl/onze-diensten/maatwerk-en-microdata/microdata-zelf-onderzoek-doen
Description
This dataset was constructed from the five CBS person network microdata layers with additional information on the underlying population for each calendar year 2009–2023. The construction uses the conversion toolchain from the repository https://github.com/planet-nl/compressed_population_scale_network_nl, which reads the original edge lists from the RA along with some node information, and produces per‑year MLN objects that are fast to load. The conversion aligns nodes across layers and years, encodes multilayer edges in a single sparse adjacency matrix using binary encoding for the layers, merges node attributes from multiple CBS sources into nodes.csv.gz, and produces analysis‑ready files per year (layers.csv, nodes.csv.gz, edges.npz).
Performance & footprint:
Each yearly MLN typically loads into RA memory in ~1–2 minutes (depending on year size).
Storage footprint is reduced to ~6% of the raw source edge‑list volume by using compressed sparse matrices and compressed node tables.
Reproducibility:
The RA conversion pipeline is fully scripted and can automatically be re‑run for additional new years. Configuration files (JSON) define file linkages and layer definitions. Its code is available in the following repository: https://github.com/planet-nl/compressed_population_scale_network_nl
Public code for MLN analysis and generic CSV→MLN conversion used in the above toolchain is available via pip install https://pypi.org/project/mlnlib/, as well as on the PLANET-NL github: https://github.com/planet-nl/mlnlib.
Description of the population
Year YYYY: all people alive and registered in the BRP with an active address on January 1 in year YYYY.
All years: The network nodes are aligned based on the union of the yearly populations between 2009 and 2023, thus the full population of the dataset is anyone having an active address and alive on at least one of the January 1st dates of the years between 2009 and 2023. The total number of rows and columns in yearly edges.npz adjacency matrices correspond to the number of people in this unioned population.
Structure of database
Each year YYYY has a separate folder, and each year folder contains three files: layers.csv, nodes.csv.gz, and edges.npz.