Abstract copyright UK Data Service and data collection copyright owner.
The Annual Survey of Hours and Earnings, 2020: Synthetic Data Pilot is a synthetic version of the Annual Survey of Hours and Earnings (ASHE) study available via Trusted Research Environments (TREs). ASHE is one of the most extensive surveys of the earnings of individuals in the UK. Data on the wages, paid hours of work, and pensions arrangements of nearly one per cent of the working population are collected. Other variables relating to age, occupation and industrial classification are also available. The ASHE sample is drawn from National Insurance records for working individuals, and the survey forms are sent to their respective employers to complete. ASHE is available for research projects demonstrating public good to accredited or approved researchers via TREs such as the Office for National Statistics Secure Research Service (SRS) or the UK Data Service Secure Lab (at SN 6689). To access collections stored within TREs, researchers need to undergo an accreditation process. Gaining access to data in a secure environment can be time and resource intensive. This pilot has created a low fidelity, low disclosure risk synthetic version of ASHE data, which can be made available to researchers more quickly while they wait for access to the real data.The synthetic data were created using the Synthpop package in R. The sample method was used; this takes a simple random sample with replacement from the real values. The project was carried out in the period between 19th December 2022 and 3rd January 2023. Further information is available within the documentation. User feedback received through this pilot will help the ONS to maximise benefits of data access and further explore the feasibility of synthesising more data in future.
Main Topics:
The ASHE synthetic data contain the same variables as ASHE for each individual, relating to wages, hours of work, pension arrangements, and occupation and industrial classifications. There are also variables for age, gender and full/part-time status. Because ASHE data are collected by the employer, there are also variables relating to the organisation employing the individual. These include employment size and legal status (e.g. public company). Various geography variables are included in the data files. The year variable in this synthetic dataset is 2020.
Simple random sample
Compilation/Synthesis