Dataset - B2FIND

CooccurrenceFieldSampler (CFS)

The CooccurrenceFieldSampler (CFS) was developed for sampling from corpora to facilitate lexicographical data analysis. It works with corpora from different sources, text types...

Corpus PINO: A spoken language resource for multiple simultaneous comparisons

Corpus PINO (Corpus Pluristilistico di Italiano e Napoletano Orali, “Multistylistic Corpus of Spoken Italian and Neapolitan”) is a resource designed for research on different...

Connecting Conditionals (Reuneker 2022; dissertation)

Scripts (Python, R) and data (corpus data from CGN and SoNaR) belonging to the PhD dissertation 'Connecting Conditionals: A corpus-based approach to conditional constructions in...

Replication Data for: Metaphor analysis meets lexical strings: Finetuning the...

Dataset Abstract: This is the data that serves as the basis for a methodological article which proposes and illustrates two ways to extend the Metaphor Identification Procedure...

The Multilingual Emotional Football Corpus (MEmoFC)

Multilingual Emotional Football Corpus, (MEmoFC) has been manually collected from English, German, and Dutch websites of individual football clubs to investigate the way...

Background Data for Constructionalization of body part constructions in Russi...

Dataset abstract The dataset includes the attestations of five Russian constructions with body part term anchors (face, eyes, forehead, and back): VP v lico ('in face'), VP v...

Replication Data for: From machine learning to classroom learning: mobile vow...

The present study reports on a machine learning experiment concerning mobile vowels in the Russian preposition v ‘in(to)’. Data are extracted from the Russian National Corpus....

Replication Data for: Seeing from without, seeing from within: aspectual diff...

This is the data that serves as the basis for an article comparing the grammatical category of aspect in Spanish and Russian. Here is the abstract of the article: Linguistic...

Replication Data for: Non-prototypical aspectual clusters and corpus data: th...

The dataset accompanies the article Non-prototypical Aspectual Clusters and Corpus Data: the grozit’ ‘threaten’ cluster in Russian. The dataset includes contexts where the...

Replication Data for: Less is More: Why All Paradigms are Defective, and Why ...

Only a fraction of lexemes are encountered in all their paradigm forms in any corpus or even in the lifetime of any speaker. This raises a question as to how it is that native...

K-SPAN (Korean Surface Phones and Neighborhoods)

This corpus provides surface phonetic forms derived from a publicly available orthographic corpus of Korean, along with neighborhood density statistics for each word in the...

Replication Data for: Motion verbs and secondary predications: What corpus da...

This dataset sheds light on the secondary predication construction in Russian, which involves a choice between adjectives or numerals in the long form nominative, short form...

Replication data for: Russian adverbials with the preposition "v"

В современной когнитивной лингвистике большое внимание уделяется соотношению между донорской и реципиентной зонами в метафорах. В статье рассматривается такое соотношение на...

Die mentale Repräsentation von Aspektpartnerschaften russischer Verben

Kaum eine sprachliche Struktur des Russischen wird so kontrovers diskutiert wie der Verbalaspekt: Welche Verben können Aspektkorrelationen bilden? Was gilt als Aspektpaar? Zu...

Multi-Dimensional Analysis of Czech

Original data for a general-purpose multi-dimensional analysis model of register variation in Czech. This post contains a CSV data set of 137 linguistic features measured on...

Replication Data for: When modality and tense meet. The future marker budet ‘...

Dataset description: This is a study of examples of Russian impersonal constructions with the modal word možno ‘can, be possible’ with and without the future copula budet ‘will...

Replication Data for: Looking into the Russian future

This dataset concerns the data for the article that covers the topic of future tense meanings in Russian. Abstract: The relationship between future time and future tense forms...

Replication Data for: The trajectory of the “Možno ja X?” construction: Varia...

This dataset concerns the data for the article that covers the topic of variation in speech acts of request with the modal word možno in Russian. I explore the ongoing language...

Replication data for: Verbal borrowability and turnover rates

This is the dataset used in the study of verbal and nominal borrowings in written literary Russian language, their diachronic developments and their connection to frequency. The...

Replication Data for: Threatening in Russian with or without -sja: grozit' vs...

In order to get a clearer picture of the constructions of grozit’ and grozit’sja, we have put together a database with examples of usages of both words from the Russian National...

200 datasets found