Synthetic Dataset of Crimes in England and Wales

Dataset

DOI

This resource includes the necessary codes to generate a synthetic dataset of all crimes that occurred in each output area in England and Wales in 2011. Counts of violence, property crime and criminal damage can be generated, and three different approaches to counting crime are possible - synthetic data of all crimes, synthetic data of police recorded crimes, synthetic data of survey estimated crimes. Having generated the crime counts at output area, they can be aggregated to any spatial scale of interest. Crime counts are synthesised by predicting invidual victimisation propensities using the Crime Survey for England and Wales (2011), then mapping these propensities on to individuals (and households) based on population counts from the UK census.There is probably no other scientific endeavour more relevant to the field of Criminology than to count crime accurately. Crime estimates are central to policy. They are used in the allocation of police resources, and more generally they are a central theme of political debate with apparent increases in crime serving as an indictment on existing law and order policies. Academics also make regular use of crime statistics in their work, both seeking to understand why some places and people are more prone to crime, and using variations in crime to help explain other social outcomes. And of course, members of the public also refer to this information. For example, historic crime trends are now included on many house-buying websites. Currently, there are two main ways of estimating the amount of crime: directly using police records of incidents that they are aware of; and approximating crime using victimisation surveys like the Crime Survey for England and Wales, where a sample of people are asked to report any victimisations in the past year. Theoretical work has highlighted a number of sources of potential error in these data, suggesting that both approaches are deficient. However, we currently lack an empirically robust quantification of the difference sources of error in each. Nor do we fully understand the potential impact that these errors might have on the estimates from analyses that makes use of this data, although evidence from other fields suggests that this may be at a minimum substantial. In this project we will use cutting edge statistical models developed in the fields of epidemiology, biostatistics and survey research to estimate and adjust for problems of measurement error present in police recorded crime and crime survey data. Drawing on data from 2011 to 2019 we will show the extent of systematic bias and random error in these two data sources, and how these errors may have evolved over time. Once the examination of the presence of measurement error in crime data is completed, we will use our findings to generate adjusted counts of crime across England and Wales, providing a unique picture of how different crimes vary across space and time. Finally, we will use these new crime estimates in tandem with 'off the shelf' measurement error adjustment techniques to demonstrate the potential influence that measurement error has on the findings of existing research. Alongside this rigorous empirical work, we will also engage in a range of capacity building exercises to furnish researchers with the necessary skills to incorporate measurement error adjustments in their own work with crime data.

Data are synthetic. The following steps were followed to generate a synthetic dataset of crimes in England and Wales: 1. Download Census data aggregates at the Output Area level under a Open Government Licence 2. Download microdata of the Crime Survey for England and Wales (CSEW) 2011/12 from the UK Data Service. 3. Generate a synthetic population of residents (or households) in Output Areas based on empirical parameters observed in Census data and covariance matrix observed in CSEW 4. Based on parameters from the CSEW 2011/12, generate crimes (violence, property crime and damage) reported within each unit in the synthetic population 5. Based on parameters from the CSEW 2011/12, predict if each crime generated in Step 4 is known to, and recorded by, the police or not (this will be the synthetic dataset of police-recorded crimes) 6. Draw a random sample of units from the synthetic population following sampling design of the CSEW (this will be the synthetic dataset of crimes recorded by the CSEW) This generates three sets of synthetic crime data, which can be then compared at the different spatial scales: i) 'synthetic_population_crimes.RData': synthetic data of all crime - split in 7 files (Generated in Step 4) ii) 'synthetic_police_crimes.RData': synthetic data of police-recorded crime (Generated in Step 5) iii) 'synthetic_survey_crimes.RData': synthetic data of survey-recorded crime (Generated in Step 6)

Identifier
DOI	https://doi.org/10.5255/UKDA-SN-857314
Metadata Access	https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=022c66d6eb20f463ee7a1f6bba74c2a73f577291d8f17259407ccb789a94e829

Provenance
Creator	Brunton-Smith, I, University of Surrey; Buil-Gil, D, University of Manchester; Pina-Sanchez, J, University of Leeds; Cernat, A, University of Manchester; Moretti, A, Utrecht University
Publisher	UK Data Service
Publication Year	2024
Funding Reference	Economic and Social Research Council
Rights	David Buil-Gil, University of Manchester; The Data Collection is available from an external repository. Access is available via Related Resources.
OpenAccess	true

Representation
Resource Type	Numeric; Geospatial
Discipline	Jurisprudence; Law; Social and Behavioural Sciences
Spatial Coverage	census output areas across England and Wales; England and Wales