Dataset - B2FIND

Replication data for: Prefix variation in Russian путать

This thesis explores the prefix variation in путать and consists of three case studies: Case study 1 “The choice of prefix under prefix variation”: Is it possible to predict...

Replication data for: Animacy and Differential Object Marking in Old Church S...

This article explores the synchronic variation between the nominative-accusative (NA) and genitive-accusative (GA) in the oldest layer of canonical Old Church Slavonic (OCS),...

Replication data for: Death of a construction: Old Church Slavonic touch verbs

Data set for a study of the locative argument structure construction in Old Church Slavonic. In this article I examine an argument structure construction on its deathbed,...

Replication data for: Who needs particles? A challenge to the classification ...

In 1985, Zwicky argued that “particle” is a pretheoretical notion that should be eliminated from linguistic analysis. We propose a reclassification of Russian particles that...

Replication data for: The ongoing eclipse of possessive suffixes in North Saa...

North Saami is replacing the use of possessive suffixes on nouns with a morphologically simpler analytic construction. Our data (>2K examples culled from >.5M words) track...

Replication Data for: A network of allostructions: quantified subject constru...

Data and R code are provided for statistical analysis of approximately 39,000 corpus examples of predicate agreement in constructions with quantified subjects in Russian. The...

Replication data for: Slangs go online, or the rise and fall of the Olbanian ...

All the data were taken from the website udaff.com (the center of the padonki culture and one of the cradles of the Olbanian language), from the section kreativy ('creative...

Solving Russian velars: Palatalization, the lexicon and gradient contrast uti...

This dataset consists of (1) an excel file with type and token counts of all paired consonants word-finally and before non-front vowels, their probabilities, and the entropies...

Metonymy in Word-Formation: Russian, Czech, and Norwegian

Publication abstract: A foundational goal of cognitive linguistics is to explain linguistic phenomena in terms of general cognitive strategies rather than postulating an...

Replication data for: Prefix variation in путать: в-. за-, пере- and с-

This case study of the four Natural Perfectives of the Russian simplex verb путать ‘tangle’ sheds light on the following questions: Is it possible to predict the choice of...

LOCOLE (Longitudinal Corpus of Learner English)

Information about LOCOLE This corpus comprises essays written by university students of English Philology over the course of one academic year. The essays were collected four...

SIKOR North Saami corpus

SIKOR North Saami corpus is a monolingual text corpus of North Saami that contains administrative, law, religious, non-fiction, fiction, and science texts. It is work done at...

Aspect and prefixation in Old Church Slavonic

In this article we focus on one grammaticalization path to perfective markers, that of the so-called "bounder perfectives" (Bybee and Dahl 1989). Systems with this kind of...

Corpus of Transcriptions - part 1

The first part of the Corpus of Transcriptions contains phonemic transcriptions of a short passage from Lecumberri and Maidment (2000, p. 78) performed by the undergraduate...

English-Lithuanian Parallel Migration Corpus

English-Lithuanian Parallel Migration Corpus includes original English texts and their Lithuanian translations, aligned at the sentence level. The texts are drawn from EU legal...

LegISTyr test set

LegISTyr is a machine translation test set for evaluating the quality of legal terminology translation from Italian to South Tyrolean German, a minor standard variety of German....

SFU Opinion and Comments Corpus (SOCC) for NoSketch Engine

The SFU Opinion and Comments Corpus (SOCC) is a corpus for the analysis of online news comments. It contains opinionated articles and comments. It was tagged using TreeTagger...

StarwarsNER French Italian Corpus - sample

The StarwarsNER French Italian Corpus - sample is a multilingual benchmark resource for Named Entity Recognition (NER) in the wastewater and stormwater management domain. It...

StarwarsNER French Italian Corpus - sample

The StarwarsNER French Italian Corpus - sample is a multilingual benchmark resource for Named Entity Recognition (NER) in the wastewater and stormwater management domain. It...

KIParla - KIPasti transcripts

The KIPasti corpus is part of the larger KIParla collection (www.kiparla.it), which can be freely queried through the NoSketch Engine interface. The ParlaBO corpus was compiled...

2,943 datasets found