Replication Data for: Learning to predict - second language perception of reduced multi-word sequences - Dataset

Dataset

Replication Data for: Learning to predict - second language perception of reduced multi-word sequences

DOI

DATASET ABSTRACT

This is the data and code from a word-monitoring task, in which advanced learners of English responded to the word 'to' in verb + to-infinitive structures (V-to-Vinf) in English, where 'to' could occur in a full or reduced pronunciation (e.g. "prefer to" [tʊ] or "preferda" [ɾə]). The design of this experiment is replicated from our earlier study with American English native speakers (Lorenz & Tizón-Couto, 2019, see link to paper and dataset below *). We tested the effects of string frequency (V+to) and transitional probability (of 'to' given the V) on the accuracy and speed of recognition of "to" in spoken sentences. These effects were analysed with mixed-effects generalized additive models (GAMM); the code also includes visualisations of these models. The experiment was run with OpenSesame (version 3.2.6 for Mac, see Mathôt et al. 2012). The data include information on frequencies of occurrence of words and bigrams; this was extracted from the Corpus of Contemporary American English (COCA, Davies 2008–). We used R (version 4.3.1, R Core Team 2023) for all data analyses, hence the code can best be replicated in R.

*) Lorenz, D. & Tizón-Couto, D. (2019). Chunking or predicting – frequency information and reduction in the perception of multi-word sequences. Cognitive Linguistics, 30(4), 751-784. https://doi.org/10.1515/cog-2017-0138 (the paper); https://doi.org/10.18710/7TSABU (the data)

PUBLICATION ABSTRACT

The cognitive entrenchment of frequent sequences comes as ‘chunking’ (holistic storage) and as ‘procedure strengthening’ (predicting elements in a sequence). A growing body of research shows effects of entrenchment of multi-word sequences in the native language, which is learned and shaped continuously and intuitively. But how do they affect L2 speakers, whose language acquisition is more analytic but who nonetheless also learn through usage? The present study tests advanced English learners’ receptive processing of multi-word sequences with a word-monitoring experiment. Recognition of to in the construction V-to-Vinf was tested for full and reduced forms ([tʊ] vs [ɾə]), conditioned by the general frequency of the V-to sequence and the transitional probability (TP) of to given the verb (V > to). The results are compared with those previously obtained from native speakers (Lorenz & Tizón-Couto, 2019). Results show that recognition profits from surface frequency, but not from TP. Reduced forms delay recognition, but this is mitigated in high- frequency sequences. Unlike native speakers, advanced learners do not exhibit a chunking effect of high-frequency reduced forms, and no facilitating effect of TP. We attribute these findings to learners’ lesser experience with spontaneous speech and phonetic reduction. They recognize reduced forms less easily, show weaker entrenchment of holistic representations, and do not draw on the full range of probabilistic cues available to native speakers.

OpenSesame, 3.2.6 for Mac

R, 4.3.1

Identifier
DOI	https://doi.org/10.18710/TE5ZOG
Metadata Access	https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/TE5ZOG

Provenance
Creator	Lorenz, David (ORCID: 0000-0002-7451-099X); Tizón-Couto, David
Publisher	DataverseNO
Contributor	Lorenz, David; Universidade de Vigo; Universität Rostock; Beller, Danielle (speaker on the audio stimulus items); The Tromsø Repository of Language and Linguistics (TROLLing)
Publication Year	2024
Funding Reference	Ministerio de Ciencia e Innovación (MCIN) / Agencia Estatal de Investigación (AEI) PID2020-118143GA-I00 ; Xunta de Galicia ED431C2021/52
Rights	info:eu-repo/semantics/openAccess
OpenAccess	true
Contact	Lorenz, David (Lunds universitet)

Representation
Resource Type	experimental data (responses and response times); Dataset
Format	text/plain; application/pdf; application/x-rlang-transport; text/comma-separated-values
Size	19528; 1868; 391164; 231512; 38980; 28672; 400776; 564295; 1243370
Version	1.0
Discipline	Humanities
Spatial Coverage	Vigo