Dataset description:
This dataset contains two data files that the related publication is based on. In particular, the data file Dataset_Diminutives contains in total 1886 diminutive constructions extracted from the Bangor Miami Corpus and the El Paso Bilingual Corpus. These constructions are coded for intralinguistic variables relating to the linguistic properties of both the base and the diminutive marker. The data file Metadata_Conversations_El_Paso_Bilingual_Corpus contains metadata about the conversations in the El Paso Bilingual Corpus.
Article Abstract:
Research on language contact outcomes, such as code-switching, continues to face theoretical and methodological challenges, particularly due to the difficulty of comparing findings across studies that use divergent data collection methods (Parafita Couto et al., 2021; Toribio, 2017). Accordingly, scholars have emphasized the need for publicly available and comparable bilingual corpora (Deuchar, 2020; Gullberg et al., 2009; Munarriz & Parafita Couto, 2014). This paper introduces the El Paso Bilingual Corpus, a new Spanish-English bilingual corpus recorded in El Paso (TX) in 2022, designed to be methodologically comparable to the Bangor Miami Corpus (Deuchar et al., 2014). The paper is structured in three main sections. First, we review existing Spanish-English corpora and examine the theoretical challenges posed by studies using non-comparable methodologies (Parafita Couto et al., 2021; Toribio, 2017), thereby underscoring the gap addressed by the El Paso Bilingual Corpus. Second, we outline the corpus creation process, discussing participant recruitment, data collection, and transcription, and provide an overview of these data, including participants’ sociolinguistic profiles. Third, to demonstrate the practical value of methodologically aligned corpora, we report a comparative case study on diminutive expressions in the El Paso and Bangor Miami corpora, illustrating how shared collection protocols can elucidate the role of community-specific social factors on bilinguals’ morphosyntactic choices.