Replication Data for: Perceiving and identifying vowels in regional accents of English: Evidence from Dutch- and Spanish-speaking L2 listeners

DOI

Dataset abstract

This dataset contains the results of a study on cross-language and second-language vowel perception in Dutch-speaking and Spanish-speaking learners of English. The dataset includes both acoustic similarity predictions and behavioral data from two perceptual tasks.

For the acoustic comparisons, Linear Discriminant Analysis (LDA) models were trained on native vowel data from Dutch and Spanish speakers, recorded in earlier studies. The models were tested on English vowel tokens produced by speakers of Southern British English (S.Eng), Northern British English (N.Eng), and Australian English (AusE), and predict how similar these English vowels are to Dutch and Spanish vowels based on acoustic properties, such as formant frequencies and vowel duration.

In addition to these acoustic predictions, the dataset includes behavioral responses collected during two experimental sessions. In the first session, 40 L1 Dutch and 40 L1 Spanish participants completed (i) a demographic and language background questionnaire, (ii) a cross-language vowel categorization task consisting of 210 trials, and (iii) a general vocabulary test (LexTALE; Lemhöfer & Broersma, 2012). During the cross-language categorization task, participants listened to English vowels produced in the three accents and indicated which vowel from their native language was most similar to that vowel, followed by a goodness-of-fit rating (i.e., how good an example of that vowel the sound was). In the second session, the same participants completed a second-language vowel categorization task with the same 210 trials, in which they were asked to identify which English vowel they heard and to rate how good an example of that vowel it was.

The participants’ cross-language categorization responses were compared to the acoustic similarity scores from the LDA models, to assess how perceived (phonetic) similarity and acoustic similarity align. Participants' identification accuracy in the second-language task was analyzed using a mixed-effects logistic regression model. The repository includes all raw and processed data, the R code used for statistical analysis, and the model outputs.

Article abstract

This study examines how L2 English listeners perceive and categorize vowels produced in three regional accents of English: Southern British (S.Eng), Northern British (N.Eng), and Australian English (AusE). Specifically, we investigate how L1 speakers of Belgian Dutch and European Spanish classify these vowels in terms of their native vowel categories, and how such perceptual classifications relate to acoustic similarity between L1-L2 vowels and L2 vowel identification accuracy. To quantify cross-language acoustic similarity and predict which L2 vowel contrasts would be perceptually challenging, Linear Discriminant Analysis (LDA) models were trained on Dutch and Spanish vowel data and tested on English vowel data. 40 Dutch-speaking and 40 Spanish-speaking participants then completed a cross-language categorization task and second-language vowel identification task using naturally produced /CVC/ syllables. The results demonstrate that LDA-based acoustic similarity largely predicts cross-language perception, although certain vowel categorization patterns point to differences in acoustic cue-weighting between the LDA models and participants. Compared to Spanish listeners, Dutch listeners’ classifications showed greater divergence from the LDA model, likely reflecting the denser vowel inventory of Dutch and the resulting increase in category competition. Additionally, participants’ cross-language vowel categorization responses predicted their L2 vowel identification accuracy. That is, L2 vowels consistently mapped onto a (single) different L1 category with high goodness-of-fit were more likely to be identified correctly. Identification accuracy was highest for S.Eng vowels, aligning with participants’ greater self-reported familiarity with that accent. Together, our findings highlight the complex interplay between cross-language similarity, vowel inventory and accent familiarity in shaping L2 perception.

R, 4.4.1

R Studio, 2024.04.2+764

Praat, 6.4.18

PsychoPy, 2023.1.2

MS Excel, 16.76

Identifier
DOI https://doi.org/10.18710/FEC2BO
Related Identifier IsCitedBy https://doi.org/10.1016/j.wocn.2026.101499
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/FEC2BO
Provenance
Creator Verbeke, Gil ORCID logo; Escudero, Paola ORCID logo; Mitterer, Holger ORCID logo; Simon, Ellen ORCID logo
Publisher DataverseNO
Contributor Verbeke, Gil; Ghent University; The Tromsø Repository of Language and Linguistics (TROLLing)
Publication Year 2026
Funding Reference Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) 1178623N ; Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) K253023N ; Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) K131425N
Rights CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Contact Verbeke, Gil (Ghent University)
Representation
Resource Type sociodemographic and linguistic background information; Dataset
Format text/plain; text/comma-separated-values; application/pdf; text/x-r-notebook
Size 57438; 5026; 5345; 5007; 120468; 492821; 487997; 2227858; 93416; 933960; 83869; 357292; 46611688; 5110; 5396; 130179; 87488; 27693052; 9337; 3232; 5347; 4393; 42479589; 164192; 716; 29495372; 4316956; 1696; 1809; 1702
Version 1.1
Discipline Acoustics; Engineering Sciences; Humanities; Mechanical and industrial Engineering; Mechanics and Constructive Mechanical Engineering
Spatial Coverage Flanders, Belgium