Key verbs in academic writing: Dataset for "Evaluation of keyness metrics: Performance and reliability"

DOI

This dataset contains corpus-based frequency data for an analysis of key verbs in published academic writing. The data are from the Corpus of Contemporary American English (COCA; Davies 2008-) and cover a period of 30 years (1990-2019). The section ‘academic’, which contains research articles from peer-reviewed journals, represents the target variety, and the reference variety is fictional writing as represented in the ‘fiction’ section (which contains short stories, plays, movie scripts, and the first chapter of novels). The total number of text files is 26,137 (academic) and 25,992 (fiction). To reduce computational expense for our methodological simulation study, we restrict our attention to verb lemmas whose whole-(sub)corpus normalized frequency exceeds 10 pmw in the academic section of COCA. The data therefore contain frequency information on only 700 verb lemmas.

Identifier
DOI https://doi.org/10.18710/EUXSMW
Related Identifier IsCitedBy https://doi.org/10.1515/cllt-2022-0116
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/EUXSMW
Provenance
Creator Sönning, Lukas (ORCID: 0000-0002-2705-395X)
Publisher DataverseNO
Contributor Sönning, Lukas; University of Bamberg; The Tromsø Repository of Language and Linguistics (TROLLing)
Publication Year 2023
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Sönning, Lukas (University of Bamberg)
Representation
Resource Type corpus data; Dataset
Format text/plain; application/octet-stream
Size 13948; 18570556; 2357139; 125329955; 155604865; 10555; 10980
Version 1.2
Discipline Humanities; Linguistics