Replication Data for: A Corpus Based Analysis of V2 Variation in West Flemish and French Flemish Dialects

DOI

Dataset abstract The dataset includes an annotated dataset of N = 1413 sentences (or parts thereof) taken from an authentic spoken corpus data from West Flemish and French Flemish (Dialects of Dutch). The sentences are annotated for V2 variation (Subject-Verb inversion, the outcome variable of the associated study) and seven predictor variables, including city, region, prosodic integration, form and function of the topicalized constituent, form of the subject, and the number of constituents in the prefield on (non)inverted word order. The dataset also includes geographical data to create a dialect map showing the relative frequencies of V2 variation. An R Notebook with the data analysis is provided.

Article abstract This paper explores V2 variation in West Flemish and French Flemish dialects of Dutch based on an extensive corpus of authentic spoken data. After taking stock of the existing literature, we probe into the effect of region, prosodic integration, form and function of the topicalized constituent, form of the subject, and the number of constituents in the prefield on (non)inverted word order. This is the first study that carries out regression analysis on the combined impact of these variables in the entire West Flemish and French Flemish region, with additional visualization of effect sizes. The results show that noninversion is generally more widespread than originally anticipated, with unexpected higher occurrence of noninversion in continental West Flemish and lower frequencies in western West Flemish. With the exception of the variable number of constituents in the prefield, all other variables had a significant impact on word order: Clausal topicalized elements, elements that have peripheral functions, and elements that lack prosodic integration all favor noninverted word order. The form of the subject also impacted word order, but its effect is sometimes overruled by discourse considerations.

MS Excel, Microsoft Office Professional Plus 2016

R, version 4.0.5

RStudio, Version 1.4.1106

Data for the present study was gathered from the dialect recordings collected by Ghent University and the Meertens Institute in Amsterdam in the 1960s and 1970s; see Dialectloket: Stemmen uit het verleden. URL: http://www.dialectloket.be/geluid/stemmen-uit-het-verleden.
The purpose of these recordings was to capture the authentic local dialects that were affected as little as possible by Standard Dutch or other dialects. Recorded speakers had to meet several criteria: they had to be born and raised in the same place, have a relatively old age (older than 60) and a low level of education. Ideally both their parents and their partner spoke the same dialect. Most of the recorded dialect speakers who met these criteria were farmers born around 1900. All of the dialect speakers were born and raised well before the democratisation of education and the introduction of the mass media, which enhanced the spread of Standard Dutch in Flanders. The authentic local dialects were collected based on what Mesthrie et al. (2009:90) refer to as sociolinguistic interviews: An interviewer asks questions about the interviewee’s youth, profession, his/her experiences in times of war, and so on. To minimize the distance between the “middle-class researcher versus the subject” (Mesthrie et al. 2009:90) and the impact of age or class differences between the interviewer and the interviewee, the interviews proceeded in an informal environment with an interviewer taking on the role of a student.
References: Mesthrie, Rajend, Joan Swann, Ana Deumert, & William L. Leap. 2009. Introducing sociolinguistics. 2nd edn. Edinburgh: Edinburgh University Press.

http://www.dialectloket.be/geluid/stemmen-uit-het-verleden/

Identifier
DOI https://doi.org/10.18710/NSFN2B
Related Identifier https://doi.org/10.1017/S1470542718000028
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/NSFN2B
Provenance
Creator Lybaert, Chloé ORCID logo; De Clerck, Bernard ORCID logo; Saelens, Jorien; De Cuypere, Ludovic ORCID logo
Publisher DataverseNO
Contributor De Cuypere, Ludovic; Ghent University; The Tromsø Repository of Language and Linguistics
Publication Year 2021
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact De Cuypere, Ludovic (Vrije Universiteit Brussel - Ghent University)
Representation
Resource Type corpus data; Dataset
Format text/plain; text/csv; application/octet-stream; text/comma-separated-values
Size 15549; 85373; 14055; 823; 93006
Version 1.1
Discipline Humanities; Linguistics
Spatial Coverage (2.200W, 50.700S, 3.700E, 51.500N); Ghent