Interactions on social media have the potential to help us to understand human behaviour, including the development of both good and poor mental health. However, to do the best science we need to know as much as possible about the people who are participating in our research. The CLOSER group of UK longitudinal cohorts include people who have contributed their data to research since birth. By inviting participants in these cohorts to also allow us to derive information from their social media feeds, we will be able to relate this information to gold-standard measures of the behaviours we are trying to understand and to world-class data on other aspects of life. To work out the best way to do this, our project will engage with participants in the Children of the '90s cohort to find out what is acceptable to them in terms of collecting and using their interactions on social media. We will use what we have learnt to develop software that collects and codes social media data in a way that protects the anonymity of participants by scoring Tweets without making the text available to researchers. We will share this software with other CLOSER cohorts to make it easy for them to invite participants to contribute their Twitter data in a safe and secure way. The high-resolution data collected in this way will help us to understand human behaviour and how mental health changes over time. Collecting these data in well known groups of people will also give scientists the information they need to improve the quality of all research using social media.Interactions on social media have the potential to help us to understand human behaviour, including the development of both good and poor mental health. However, to do the best science we need to know as much as possible about the people who are participating in our research. The CLOSER group of UK longitudinal cohorts include people who have contributed their data to research since birth. By inviting participants in these cohorts to also allow us to derive information from their social media feeds, we will be able to relate this information to gold-standard measures of the behaviours we are trying to understand and to world-class data on other aspects of life. To work out the best way to do this, our project will engage with participants in the Children of the '90s cohort to find out what is acceptable to them in terms of collecting and using their interactions on social media. We will use what we have learnt to develop software that collects and codes social media data in a way that protects the anonymity of participants by scoring Tweets without making the text available to researchers. We will share this software with other CLOSER cohorts to make it easy for them to invite participants to contribute their Twitter data in a safe and secure way. The high-resolution data collected in this way will help us to understand human behaviour and how mental health changes over time. Collecting these data in well known groups of people will also give scientists the information they need to improve the quality of all research using social media.
We are demonstrating collection, anonymisation and analysis of social media data from consenting participants in the Avon Longitudinal Study of Parents and Children. Initially we are studying Twitter use, and gathering data through the platforms API. Our software gathers social media posts and interactions from participants every few days, with datasets being stored under security ISO 27001 certification. Derived, depersonalised datasets can be made available to approved researchers, and we aim to provide a means to evaluate sentiment analysis methods against ground truth data.