This resource includes the necessary codes to generate a synthetic dataset of all crimes that occurred in each output area in England and Wales in 2011. Counts of violence, property crime and criminal damage can be generated, and three different approaches to counting crime are possible - synthetic data of all crimes, synthetic data of police recorded crimes, synthetic data of survey estimated crimes. Having generated the crime counts at output area, they can be aggregated to any spatial scale of interest. Crime counts are synthesised by predicting invidual victimisation propensities using the Crime Survey for England and Wales (2011), then mapping these propensities on to individuals (and households) based on population counts from the UK census.There is probably no other scientific endeavour more relevant to the field of Criminology than to count crime accurately. Crime estimates are central to policy. They are used in the allocation of police resources, and more generally they are a central theme of political debate with apparent increases in crime serving as an indictment on existing law and order policies. Academics also make regular use of crime statistics in their work, both seeking to understand why some places and people are more prone to crime, and using variations in crime to help explain other social outcomes. And of course, members of the public also refer to this information. For example, historic crime trends are now included on many house-buying websites. Currently, there are two main ways of estimating the amount of crime: directly using police records of incidents that they are aware of; and approximating crime using victimisation surveys like the Crime Survey for England and Wales, where a sample of people are asked to report any victimisations in the past year. Theoretical work has highlighted a number of sources of potential error in these data, suggesting that both approaches are deficient. However, we currently lack an empirically robust quantification of the difference sources of error in each. Nor do we fully understand the potential impact that these errors might have on the estimates from analyses that makes use of this data, although evidence from other fields suggests that this may be at a minimum substantial. In this project we will use cutting edge statistical models developed in the fields of epidemiology, biostatistics and survey research to estimate and adjust for problems of measurement error present in police recorded crime and crime survey data. Drawing on data from 2011 to 2019 we will show the extent of systematic bias and random error in these two data sources, and how these errors may have evolved over time. Once the examination of the presence of measurement error in crime data is completed, we will use our findings to generate adjusted counts of crime across England and Wales, providing a unique picture of how different crimes vary across space and time. Finally, we will use these new crime estimates in tandem with 'off the shelf' measurement error adjustment techniques to demonstrate the potential influence that measurement error has on the findings of existing research. Alongside this rigorous empirical work, we will also engage in a range of capacity building exercises to furnish researchers with the necessary skills to incorporate measurement error adjustments in their own work with crime data.
Data are synthetic. The following steps were followed to generate a synthetic dataset of crimes in England and Wales: 1. Download Census data aggregates at the Output Area level under a Open Government Licence 2. Download microdata of the Crime Survey for England and Wales (CSEW) 2011/12 from the UK Data Service. 3. Generate a synthetic population of residents (or households) in Output Areas based on empirical parameters observed in Census data and covariance matrix observed in CSEW 4. Based on parameters from the CSEW 2011/12, generate crimes (violence, property crime and damage) reported within each unit in the synthetic population 5. Based on parameters from the CSEW 2011/12, predict if each crime generated in Step 4 is known to, and recorded by, the police or not (this will be the synthetic dataset of police-recorded crimes) 6. Draw a random sample of units from the synthetic population following sampling design of the CSEW (this will be the synthetic dataset of crimes recorded by the CSEW) This generates three sets of synthetic crime data, which can be then compared at the different spatial scales: i) 'synthetic_population_crimes.RData': synthetic data of all crime - split in 7 files (Generated in Step 4) ii) 'synthetic_police_crimes.RData': synthetic data of police-recorded crime (Generated in Step 5) iii) 'synthetic_survey_crimes.RData': synthetic data of survey-recorded crime (Generated in Step 6)