Replication Data for: The copular subschema [become/devenir + past participle] in English and French: Productivity and degrees of passivity

Dataset

DOI

These data form the basis for a contrastive analysis of the English copular subschema [become + past participle] and the equivalent copular subschema [devenir + past participle] in French. See the article abstract below.

The dataset contains 2500 corpus examples for each copular subschema. These two samples were extracted from the the English Web corpus 2013 and the French Web corpus 2012 in the Sketch Engine family of corpora (https://www.sketchengine.eu/), respectively. Moreover, several variables were encoded, addressing the past participles in subject complement position and quantitative measurements pertaining to these past participles and the infinitives from which the participles are derived. See the codebook file for more details. Finally, the dataset can be analyzed by means of the accompanying R script, in order to reproduce the findings of the associated research article.

Article abstract: This article presents a contrastive analysis of the English copular subschema [become + past participle] and the equivalent copular subschema [devenir + past participle] in French, based on web data. It is shown that both patterns are almost equally productive at the subject complement level. Furthermore, a more in-depth analysis demonstrates that, in the segment of participles with a high adjectival potential, devenir accumulates more participle tokens than become. Conversely, the reverse holds true for participles with a high verbal potential, in which case become is characterized by more participle tokens than devenir. This high amount of combinations between become and eventive participles also suggests a higher degree of passivity for become. However, in the segment of participles with an intermediate verbal potential, devenir is slightly more type frequent than become, which hints at an emerging productivity in this area for devenir as well.

Identifier
DOI	https://doi.org/10.18710/UDVRZM
Related Identifier	IsCitedBy https://doi.org/10.1075/lic.19013.van
Metadata Access	https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/UDVRZM

Provenance
Creator	Van Wettere, Niek (ORCID: 0000-0002-9455-368X)
Publisher	DataverseNO
Contributor	Van Wettere, Niek; Vrije Universiteit Brussel; Ghent University; The Tromsø Repository of Language and Linguistics (TROLLing)
Publication Year	2020
Rights	CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess	true
Contact	Van Wettere, Niek (Universiteit Gent & Vrije Universiteit Brussel)

Representation
Resource Type	corpus data; Dataset
Format	text/plain; application/pdf; text/csv; type/x-r-syntax
Size	4935; 518234; 471239; 3158064; 15073; 1955
Version	1.1
Discipline	Humanities