Replication Data for: Perceiving and identifying vowels in regional accents of English: Evidence from Dutch- and Spanish-speaking L2 listeners

Dataset

DOI

Dataset abstract

This dataset contains the results of a study on cross-language and second-language vowel perception in Dutch-speaking and Spanish-speaking learners of English. The dataset includes both acoustic similarity predictions and behavioral data from two perceptual tasks.

For the acoustic comparisons, Linear Discriminant Analysis (LDA) models were trained on native vowel data from Dutch and Spanish speakers, recorded in earlier studies. The models were tested on English vowel tokens produced by speakers of Southern British English (S.Eng), Northern British English (N.Eng), and Australian English (AusE), and predict how similar these English vowels are to Dutch and Spanish vowels based on acoustic properties, such as formant frequencies and vowel duration.

In addition to these acoustic predictions, the dataset includes behavioral responses collected during two experimental sessions. In the first session, 40 L1 Dutch and 40 L1 Spanish participants completed (i) a demographic and language background questionnaire, (ii) a cross-language vowel categorization task consisting of 210 trials, and (iii) a general vocabulary test (LexTALE; Lemhöfer & Broersma, 2012). During the cross-language categorization task, participants listened to English vowels produced in the three accents and indicated which vowel from their native language was most similar to that vowel, followed by a goodness-of-fit rating (i.e., how good an example of that vowel the sound was). In the second session, the same participants completed a second-language vowel categorization task with the same 210 trials, in which they were asked to identify which English vowel they heard and to rate how good an example of that vowel it was.

The participants’ cross-language categorization responses were compared to the acoustic similarity scores from the LDA models, to assess how perceived (phonetic) similarity and acoustic similarity align. Participants' identification accuracy in the second-language task was analyzed using a mixed-effects logistic regression model. The repository includes all raw and processed data, the R code used for statistical analysis, and the model outputs.

Article abstract

This study examines how L2 English listeners perceive and categorize vowels produced in three regional accents of English: Southern British (S.Eng), Northern British (N.Eng), and Australian English (AusE). Specifically, we investigate how L1 speakers of Belgian Dutch and European Spanish classify these vowels in terms of their native vowel categories, and how such perceptual classifications relate to acoustic similarity between L1-L2 vowels and L2 vowel identification accuracy. To quantify cross-language acoustic similarity and predict which L2 vowel contrasts would be perceptually challenging, Linear Discriminant Analysis (LDA) models were trained on Dutch and Spanish vowel data and tested on English vowel data. 40 Dutch-speaking and 40 Spanish-speaking participants then completed a cross-language categorization task and second-language vowel identification task using naturally produced /CVC/ syllables. The results demonstrate that LDA-based acoustic similarity largely predicts cross-language perception, although certain vowel categorization patterns point to differences in acoustic cue-weighting between the LDA models and participants. Compared to Spanish listeners, Dutch listeners’ classifications showed greater divergence from the LDA model, likely reflecting the denser vowel inventory of Dutch and the resulting increase in category competition. Additionally, participants’ cross-language vowel categorization responses predicted their L2 vowel identification accuracy. That is, L2 vowels consistently mapped onto a (single) different L1 category with high goodness-of-fit were more likely to be identified correctly. Identification accuracy was highest for S.Eng vowels, aligning with participants’ greater self-reported familiarity with that accent. Together, our findings highlight the complex interplay between cross-language similarity, vowel inventory and accent familiarity in shaping L2 perception.

R, 4.4.1

R Studio, 2024.04.2+764

Praat, 6.4.18

PsychoPy, 2023.1.2

MS Excel, 16.76

Identifier
DOI	https://doi.org/10.18710/FEC2BO
Related Identifier	IsCitedBy https://doi.org/10.1016/j.wocn.2026.101499
Metadata Access	https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/FEC2BO

Provenance
Creator	Verbeke, Gil ; Escudero, Paola ; Mitterer, Holger ; Simon, Ellen
Publisher	DataverseNO
Contributor	Verbeke, Gil; Ghent University; The Tromsø Repository of Language and Linguistics (TROLLing)
Publication Year	2026
Funding Reference	Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) 1178623N ; Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) K253023N ; Fonds Wetenschappelijk Onderzoek - Vlaanderen (FWO) K131425N
Rights	CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess	true
Contact	Verbeke, Gil (Ghent University)

Representation
Resource Type	sociodemographic and linguistic background information; Dataset
Format	text/plain; text/comma-separated-values; application/pdf; text/x-r-notebook
Size	57438; 5026; 5345; 5007; 120468; 492821; 487997; 2227858; 93416; 933960; 83869; 357292; 46611688; 5110; 5396; 130179; 87488; 27693052; 9337; 3232; 5347; 4393; 42479589; 164192; 716; 29495372; 4316956; 1696; 1809; 1702
Version	1.1
Discipline	Acoustics; Engineering Sciences; Humanities; Mechanical and industrial Engineering; Mechanics and Constructive Mechanical Engineering
Spatial Coverage	Flanders, Belgium