-
Replication Data for: Non-prototypical aspectual clusters and corpus data: th...
The dataset accompanies the article Non-prototypical Aspectual Clusters and Corpus Data: the grozit’ ‘threaten’ cluster in Russian. The dataset includes contexts where the... -
Replication Data for: Motion verbs and secondary predications: What corpus da...
This dataset sheds light on the secondary predication construction in Russian, which involves a choice between adjectives or numerals in the long form nominative, short form... -
GER_SET: Situation Entity Type labelled corpus for German
Semantic clause types, also called Situation Entity (SE) types (Smith, 2003) are linguistic characterizations of aspectual properties shown to be useful for tasks like... -
TALN - archives : articles des conférences TALN et RECITAL
Archive numérique francophone des articles de recherche en Traitement Automatique des Langues publiés par l'Association pour le Traitement Automatique des LAngues (ATALA) lors... -
Corpus PINO: A spoken language resource for multiple simultaneous comparisons
Corpus PINO (Corpus Pluristilistico di Italiano e Napoletano Orali, “Multistylistic Corpus of Spoken Italian and Neapolitan”) is a resource designed for research on different... -
Key verbs in academic writing: Dataset for "Evaluation of keyness metrics: Pe...
This dataset contains corpus-based frequency data for an analysis of key verbs in published academic writing. The data are from the Corpus of Contemporary American English... -
Connecting Conditionals (Reuneker 2022; dissertation)
Scripts (Python, R) and data (corpus data from CGN and SoNaR) belonging to the PhD dissertation 'Connecting Conditionals: A corpus-based approach to conditional constructions in... -
The Multilingual Emotional Football Corpus (MEmoFC)
Multilingual Emotional Football Corpus, (MEmoFC) has been manually collected from English, German, and Dutch websites of individual football clubs to investigate the way... -
Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2017 – VERSION 1)
german version see below The ‘Ancillary Monitor Corpus: Common Crawl - german web’ was designed with the aim of enabling a broad-based linguistic analysis of the... -
Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2021 – VERSION 1)
german version see below The ‘Ancillary Monitor Corpus: Common Crawl - german web’ was designed with the aim of enabling a broad-based linguistic analysis of the... -
Individual Textual Profiles of Hillary Clinton and Donald Trump
This corpus consists of full transcriptions of both Democratic and Republican 2016 presidential candidate debates, with a special focus on the idiolects of Hillary Clinton and... -
CsEnVi Pairwise Parallel Corpora
CsEnVi Pairwise Parallel Corpora consist of Vietnamese-Czech parallel corpus and Vietnamese-English parallel corpus. The corpora were assembled from the following sources:... -
LiFR-Law. Corpus of Paraphrased Czech Administrative Texts with Reading Compr...
LiFR-Law is a corpus of Czech legal and administrative texts with measured reading comprehension and a subjective expert annotation of diverse textual properties based on the... -
International Corpus of English: East Africa (ICE-EA)
One million words of spoken and written English from Kenya and Tanzania. Part of the ICE project -
Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2019 – VERSION 1)
german version see below The ‘Ancillary Monitor Corpus: Common Crawl - german web’ was designed with the aim of enabling a broad-based linguistic analysis of the... -
KAMOKO-Digitalizer
This editor was developed especially for the needs of the KAMOKO project (https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-3261). The editor allows the quick entry... -
Khresmoi Query Translation Test Data 2.0
This package contains data sets for development and testing of machine translation of medical queries between Czech, English, French, German, Hungarian, Polish, Spanish ans... -
CEHugeWebCorpus
This corpus was originally created for performance testing (server infrastructure CorpusExplorer - see: diskurslinguistik.net / diskursmonitor.de). It includes the filtered... -
Indonesian web corpus (idWac)
Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd... -
Ancillary Monitor Corpus: Common Crawl - german web (YEAR 2013 – VERSION 1)
german version see below The ‘Ancillary Monitor Corpus: Common Crawl - german web’ was designed with the aim of enabling a broad-based linguistic analysis of the...
