Understanding and Improving Data Linkage Consent in Surveys, 2018-2019

Dataset

DOI

Linking survey and administrative data offers the possibility of combining the strengths, and mitigating the weaknesses, of both. Such linkage is therefore an extremely promising basis for future empirical research in social science. For ethical and legal reasons, linking administrative data to survey responses will usually require obtaining explicit consent. It is well known that not all respondents give consent. Past research on consent has generated many null and inconsistent findings. A weakness of the existing literature is that little effort has been made to understand the cognitive processes of how respondents make the decision whether or not to consent. The overall aim of this project was to improve our understanding about how to pursue the twin goals of maximizing consent and ensuring that consent is genuinely informed. The ultimate objective is to strengthen the data infrastructure for social science and policy research in the UK. Specific aims were: 1. To understand how respondents process requests for data linkage: which factors influence their understanding of data linkage, which factors influence their decision to consent, and to open the black box of consent decisions to begin to understand how respondents make the decision. 2. To develop and test methods of maximising consent in web surveys, by understanding why web respondents are less likely to give consent than face-to-face respondents. 3. To develop and test methods of maximising consent with requests for linkage to multiple data sets, by understanding how respondents process multiple requests. 4. As a by-product of testing hypotheses about the previous points, to test the effects of different approaches to wording consent questions on informed consent. Our findings are based on a series of experiments conducted in four surveys using two different studies: The Understanding Society Innovation Panel (IP) and the PopulusLive online access panel (AP). The Innovation Panel is part of Understanding Society: the UK Household Longitudinal Study. It is a probability sample of households in Great Britain used for methodological testing, with a design that mirrors that of the main Understanding Society survey. The Innovation Panel survey was conducted in wave 11, fielded in 2018. The Innovation Panel data are available from the UK Data Service (SN: 6849, http://doi.org/10.5255/UKDA-SN-6849-12). Since the Innovation Panel sample size (around 2,900 respondents) constrained the number of experimental treatment groups we could implement, we fielded a parallel survey with additional experiments, using a different sample. PopulusLive is a non-probability online panel with around 130,000 active sample members, who are recruited through web advertising, word of mouth, and database partners. We used age, gender and education quotas to match the sample composition of the Innovation Panel. A total of nine experiments were conducted across the two sample sources. Experiments 1 to 5 all used variations of a single consent question, about linkage to tax data (held by HM Revenue and Customs, HMRC). Experiments 6 and 7 also used single consent questions, but respondents were either assigned to questions on tax or health data (held by the National Health Service, NHS) linkage. Experiments 8 and 9 used five different data linkage requests: tax data (held by HMRC), health data (held by the NHS), education data (held by the Department for Education in England, DfE, and equivalent departments in Scotland and Wales), household energy data (held the Department for Business, Energy and Industrial Strategy, BEIS), and benefit and pensions data (held by the Department for Work and Pensions, DWP). The experiments, and the survey(s) on which they were conducted, are briefly summarized here: 1. Easy vs. standard wording of consent request (IP and AP). Half the respondents were allocated to the ‘standard’ question wording, used previously in Understanding Society. The balance was allocated to an ‘easy’ version, where the text was rewritten to reduce reading difficulty and to provide all essential information about the linkage in the question text rather than an additional information leaflet. 2. Early vs. late placement of consent question (IP). Half the respondents were asked for consent early in the interview, the other half were asked at the end. 3. Web vs. face-to-face interview (IP). This experiment exploits the random assignment of IP cases to explore mode effects on consent. 4. Default question wording (AP). Experiment 4 tested a default approach to giving consent, asking respondents to “Press ‘next’ to continue” or explicitly opt out, versus the standard opt-in consent procedure. 5. Additional information question wording (AP). This experiment tested the effect of offering additional information, with a version that added a third response option (“I need more information before making a decision”) to the standard ‘yes’ or no’ options. 6. Data linkage domain (AP). Half the respondents were assigned to a question asking for consent to link to HMRC data; the other half were asked for linkage to NHS data. 7. Trust priming (AP).This experiment was crossed with the data linkage domain experiment, and focused on the effect of priming trust on consent. Half the sample saw an additional statement: “HMRC / The NHS is a trusted data holder” on an introductory screen prior to the consent question. This was followed by an icon symbolizing data security: a shield and lock symbol with the heading “Trust”. The balance was not shown the additional statement or icon. 8. Format of multiple consents (AP). For one group, the five consent questions were each presented on a separate page, with respondents consenting to each in turn. For the second group the questions were all presented on one page; however, the respondent still had to answer each consent question individually. For the third group all five data requests were presented on a single page and the respondent answered a single yes/no question, whether they consented to all the linkages or not. 9. Order of multiple consents (AP). One version asked the five consent questions in ascending order of sensitivity of the request (based on previous data), with NHS asked first. The other version reversed the order, with consent to linkage to HMRC data asked first. For all of the experiments described above, we examined the rates of consent. We also tested comprehension of the consent request, using a series of knowledge questions about the consent process. We also measured subjective understanding, to get a sense of how much respondents felt they understood about the request. Finally, we also ascertained subjective confidence in the decision they had made. In additional to the experiments, we used digital audio-recordings of the IP11 face-to-face interviews (recorded with respondents’ permission) to explore how interviewers communicate the consent request to respondents, whether and how they provide additional information or attempt to persuade respondents to consent, and whether respondents raise questions when asked for consent to data linkage. Key Findings Correlates of consent: (1) Respondents who have better understanding of the data linkage request (as measured by a set of knowledge questions) are also more likely to consent. (2) As in previous studies, we find no socio-demographic characteristics that consistently predict consent in all samples. The only consistent predictors are positive attitudes towards data sharing, trust in HMRC, and knowledge of what data HMRC have. (3) Respondents are less likely to consent to data linkage if the wording of the request is difficult and the question is asked late in the questionnaire. Position has no effect on consent if the wording is easy; wording has no effect on consent if the position is early. (4) Priming respondents to think about trust in the organisations involved in the data linkage increases consent. (5) The only socio-demographic characteristic that consistently predicts objective understanding of the linkage request is education. Understanding is positively associated with the number of online data sharing behaviours (e.g., posting text or images on social media, downloading apps, online purchases or banking) and with trust in HMRC. (6) Easy wording of the consent question increases objective understanding of the linkage request. Position of the consent question in the questionnaire has no effect on understanding. The consent decision process: (7) Respondents decide about the consent request in different ways: some use more reflective decision-making strategies, others use less reflective strategies. (8) Different decision processes are associated with very different levels of consent, comprehension, and confidence in the consent decision. (9) Placing the consent request earlier in the survey increases the probability of the respondent using a reflective decision-making process. Effects of mode of data collection on consent: (10) As in previous studies, respondents are less likely to consent online than with an interviewer. (11) Web respondents have lower levels of understanding than face-to-face respondents. (12) There is no difference by mode in respondents’ confidence in their decisions. (13) Web respondents report higher levels of concern about data security than face-to-face respondents. (14) Web respondents are less likely to use reflective strategies to make their decision than face-to-face respondents, and instead more likely to make habit-based decisions. (15) Easier wording of the consent request does not reduce mode effects on rates of consent. (16) Respondents rarely ask questions and interviewers rarely provide additional information. Multiple consent requests: (17) The format in which a sequence of consent requests is asked does not seem to matter. (18) The order of multiple consent requests affects consent rates, but not in a consistent way. (19) Objective knowledge, subjective understanding and subjective confidence in the decision do not differ much by order and format of sequential consent requests. (20) The order effects of multiple consent requests from Study 1 do not replicate in Study 2. Conclusions and Recommendation This series of studies has shed light on some of the processes underlying the consent process and offered a theoretical framework for better understanding how the consent decision is made. The different decision processes employed by survey respondents are associated with different levels of consent, comprehension, and confidence in the consent decision. Generally, respondents reach a consent decision relatively quickly. Given this, simply providing more information on the consent process is unlikely to be effective. Rather, wording consent requests in an easy-to-read format and emphasising trust in the organisations involved will likely increase rates of consent without compromising understanding of the request or confidence in the decision. This research has advanced our understanding on how the decision to consent to administrative data linkages is made. It points to the importance of understanding how respondents process the request for consent in different ways, suggesting that targeting different strategies based on respondents’ decision-making preferences may be effective at increasing informed consent. Our work also points to the importance of focusing not only on the outcome of the request (i.e., maximizing consent rates) but also on understanding how informed the consent is, measured both objectively and subjectively. However, more work remains to achieve the goal of maximizing informed consent to administrative record linkage in surveys, especially those administered online.One of the most promising avenues for empirical social science research involves linking administrative or process generated data with survey data. Administrative data (whether held by government or private entities) are useful on their own, but will be much more useful if we can use surveys to “fill the gaps”. Sometimes the gaps will be specific types of information (e.g. administrative data do not contain information on expectations or subjective wellbeing), and sometimes it will be to provide a suitable frame to allow inference to the general population (especially in the UK where there is not an appropriate individual identifier, or register, to provide a frame). In the UK, survey data can only be linked to administrative or other process generated data, if survey respondents give informed consent to the linkage. Previous research suggests that people do not have strong fixed views on consent and that the decision to consent can be influenced. Our aims are to examine which factors influence the decision to consent and to develop and test ways of maximising informed consent, in particular in web surveys and when consent for linkages to multiple datasets are requested. We will design experiments to test whether different features of the consent request are effective for different types of people, to measure the respondent decision-making process, to ascertain how informed the consent decision is, whether and how informed consent varies with the experimental treatments and respondent characteristics, and how it differs between face-to-face interviews and self-completion web surveys.

The data were collected on two independent samples from the PopulusLive online access panel. The first sample was surveyed twice, with a one-year interval. The first wave (AP1-1) was fielded in June 2018 and included eleven experimental conditions with n~500 respondents each. A total of 46,206 panelists were invited to AP1-1, of whom 6,532 started the survey and 5,633 completed it (401 broke off and 498 were screened out), for a survey response rate of 12.2%. To track changes in consent over time, four of these eleven experimental groups were re-interviewed about a year later (AP1-2). Of the 2,053 panelists invited to AP1-2, 1,693 started the survey and 1,630 completed it, for a response rate of 79.4%. As a follow up to the results from these two surveys, a second sample was drawn (AP2) and surveyed, with eight experimental groups designed to address further research questions. This sample was fielded in December 2019. A total of 30,682 panelists were invited to AP2, of whom 6,459 started the survey and 3,850 completed it (301 broke off and 2,308 were screened out), for a response rate of 21.1%. The samples were restricted to Great Britain with quotas to match the composition of the Understanding Society Innovation Panel: gender (50% male, 50% female), age (33% 16-40, 33% 41-59, 33% 60+), and highest educational qualification (40% degree or equivalent, 20% A-level or equivalent, 40% GCSE or lower). All surveys included either a single question asking for consent to link the survey data to government administrative records or a set of five consent questions, as well as background questions on socio-demographics, understanding of the linkage request, perceived sensitivity of the consent request, trust in data holding institutions, and general data sharing attitudes and behaviours. Dependent on experimental group, median times for completion of the questionnaire ranged between 9 and 12 minutes (in AP1-1).

Identifier
DOI	https://doi.org/10.5255/UKDA-SN-855036
Metadata Access	https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=5142c936d7f618050905444f5657c5e96d0622e42142210830f5cb4e7e491bcc

Provenance
Creator	Jäckle, A, University of Essex; Burton, J, University of Essex; Couper, M, University of Michigan; Crossley, T, European University Institute
Publisher	UK Data Service
Publication Year	2021
Funding Reference	Nuffield Foundation; Economic and Social Research Council
Rights	Annette Jäckle, University of Essex. Jonathan Burton, University of Essex. Mick Couper, University of Michigan. Thomas Crossley, European University Institute. Sandra Walzenbach, University of Konstanz; The Data Collection is available for download to users registered with the UK Data Service.
OpenAccess	true

Representation
Resource Type	Numeric
Discipline	Economics; Social and Behavioural Sciences
Spatial Coverage	Great Britain