Birmingham Elsevier interdisciplinary research discourse datasets

DOI

This project investigated the discourse of interdisciplinary research (IDR) through comprehensive linguistic analyses of the full holdings of a successful IDR journal, Global Environmental Change (GEC) in the period 1990-2010, and of ten other comparison journals published by Elsevier. The ten were chosen to represent other interdisciplinary (ID) journals and monodisciplinary (MD) journals. The corpus data cannot be included in the repository as it belongs to Elsevier – individual files can all be consulted through the Elsevier website. The main lines of analysis were multidimensional analysis (MDA). From the MDA, we derived six constellations in which papers with similar MDA profiles clustered. We then examined the N-grams and P-frames in each constellation – the raw numerical data are available in this repository. A second computational approach taken was to use topic modelling to establish, in an inductive manner, what the papers in the GEC corpus are ‘about’. The TopicModel folder contains data for this investigation. We also conducted survey and interview data analysis and the (anonymised) data are presented here. This project investigated the discourse of interdisciplinary research (IDR) through comprehensive linguistic analyses of the full holdings of a successful IDR journal, Global Environmental Change (GEC) in the period 1990-2010, and of ten other comparison journals published by Elsevier. The ten were chosen to represent other interdisciplinary (ID) journals and monodisciplinary (MD) journals. The corpus data cannot be included in the repository as it belongs to Elsevier – individual files can all be consulted through the Elsevier website. The main lines of analysis were multidimensional analysis (MDA) for which Doug Biber (Northern Arizona University) acted as a consultant. From the MDA, we derived six constellations in which papers with similar MDA profiles clustered. We then examined the N-grams and P-frames in each constellation – the raw numerical data are available in this repository. A second computational approach taken was to use topic modelling to establish, in an inductive manner, what the papers in the GEC corpus are ‘about’. The TopicModel folder contains data for this investigation, some of which are discussed in our paper that appears in the Corpora journal (publication mid 2016). We also conducted survey and interview data analysis and the data are presented here.

The corpus was built from XML files provided by the publisher Elsevier, which were converted to annotated text files in the research. The interview data were skype or phone calls, then transcribed. The survey data were collected online, throuogh a web-based survey interface maintained by Elsevier.

Identifier
DOI https://doi.org/10.5255/UKDA-SN-852198
Metadata Access https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=32fcc4ef0f140c53e19a5352808d0e696a7aeabf5225d768ff63ef1b4dca2c07
Provenance
Creator Thompson, P, University of Birmingham
Publisher UK Data Service
Publication Year 2016
Funding Reference ESRC
Rights Paul Thompson, University of Birmingham
OpenAccess true
Representation
Resource Type Numeric
Discipline Social Sciences
Spatial Coverage United Kingdom