This house price per square metre dataset was created on 1/4/2021 and is based on the LR PPD, Domestic EPCs and NSPL downloaded on the same day. It covers over 18 million transactions with 104 variables in England and Wales between 1/1/1995 and 26/2/2021. 16 of the 104 variables come from the LR PPD, 84 variables come from Domestic EPCs, one variable (lad21cd) from NSPL and three variables (i.e.id, classt, priceper) are created by the first author. Before the data linkage, a unique identifier (id) is created for all the unique EPCs after removing the individual lodgement identifier (i.e. LMK_KEY variable). During the data linkage, a variable named classt is created to identify 1:1 and 1:n linkage relationships. After the data linkage, a derived house price per square metre variable (i.e. priceper) is obtained through dividing the transaction price paid in the LR PPD with the total floor area variable in the EPC dataset. The NSPL (May 2021 version) is used to assign the local authority unit (lad21cd) to the house price per square metre dataset. During the data linkage process, the transactions in the LR PPD assigned as category B (Additional Price Paid entry) and other property types are removed. This version of the dataset unlike the previous version can be described as ‘uncorrected’ as we have not removed transactions with any improbable price per square metre values (e.g. total floor area values are null, 0). This uncorrected version of the data will offer the most flexibility for researchers. Researchers are recommended to clean the uncorrected version according to their research need.This repository covers an updated but uncorrected version of the attribute-linked residential property price dataset in UK Data Service ReShare 854240 (https://reshare.ukdataservice.ac.uk/854240/). It is also the entire uncorrected version of the open access (limited attribute) house price per square metre dataset published by local authority in the Greater London Authority (GLA) London Datastore (https://data.london.gov.uk/dataset/house-price-per-square-metre-in-england-and-wales). This linked dataset contains individual property transactions and associated variables from the Land Registry Price Paid Dataset (LR PPD) linked at address level to all attributes, other than the individual lodgement identifier, address and postcode attributes, contained in Version VI of the Domestic Energy Performance Certificate (EPC) data published by the Ministry for Housing, Communities and Local Government (MHCLG). The linked data in this repository is the uncorrected version, recording over 18 million transactions with 104 variables in England and Wales between 1/1/1995 and 26/2/2021. We have offered technical validation and data cleaning code in UKDA ReShare 854240 to help users evaluate the representation of the linked data for a given time period. The data cleaning code shows our methods for cleaning up unlikely floor size records before using this data in analysis. Users can create their own rules and undertake this clean-up process based on their own experience and research aims. This repository also covers the original LR PPD and Domestic EPCs for the linked data (house price per square metre dataset). The LR PPD in this repository has been added in the field header in the open access LR PPD. Domestic EPCs in this repository has had removed the six variables (individual lodgement identifier, address, address 1, address 2, address 3, postcode) with a newly created unique identifier (id). This id column is newly created for Version VI Domestic EPCs, which is not the same id as in the Domestic EPCs from UK Data Service ReShare 854240.
The LR PPD dataset is open and available online (https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads). The LR PPD records 25,914,817 transactions in England and Wales between 1/1/1995 and 26/02/2021. The Domestic Energy Performance Certificates (EPCs) dataset is open and available on-line from the Ministry for Housing, Communities and Local Government – MHCLG (https://epc.opendatacommunities.org/). The Domestic EPCs dataset downloaded in 1/4/2021 is the sixth released version and contains EPCs issued between 1/10/2008 and 20/9/2020, which records 18,575,357 energy performance data records with 85 fields. These two datasets both contain property information at address level but their address structures are different, thus a matching method containing a four-stage (251 matching rules) process was designed to achieve linkage between them. Details of data linkage are published in a UCL Open Environment paper: (https://ucl.scienceopen.com/hosted-document?doi=10.14324/111.444/ucloe.000019). The linkage methodology to create this version of the data remains the same as that in UK Data Service ReShare service (https://reshare.ukdataservice.ac.uk/854942/).