Dataset - B2FIND

Replication Data for Grammatical Gender in Norwegian Dialects: Variation, Acq...

[Dataset abstract:] The dataset consists of data related to investigations of grammatical gender across multiple Norwegian dialects. The data has been collected as part of the...

Genusvariasjon i norsk skriftspråk

Dette datasettet inneheld materialet ifrå ei undersøking av genusvariasjon i norsk skriftspråk. Undersøkinga har sitt utspring i to oppdrag eg fekk ifrå Språkrådet om å...

Sõnaveeb 2025. EKI keeleportaal Language portal Sõnaveeb 2025

Sõnaveeb on Eesti Keele Instituudi uus sõnastikuportaal, kuhu on koondatud keeleinfo instituudi paljudest sõnakogudest ja andmebaasidest. More info at https://sonaveeb.ee/...

Ekilex 2025. EKI sõnastiku- ja terminibaasisüsteem

Eesti Keele Instituudi sõnastiku- ja terminibaasisüsteem Ekilex on loodud sõnastike ja terminibaaside koostamiseks ja ajakohastamiseks leksikograafidele, terminoloogidele ning...

Monitor corpus of Slovene Trendi 2025-12

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 59 publishers. Trendi 2025-12 covers the period from January...

Monitor corpus of Slovene Trendi 2025-11

The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 59 publishers. Trendi 2025-11 covers the period from January...

Little Big Translation Literature – Czech and German Translations of Yiddish ...

In order to make the process of preparing analyses for a planned monograph about Czech and German translations of Yiddish texts transparent, five source texts were transcribed...

CantusCorpus v1.0

CantusCorpus 1.0 is a large dataset of Gregorian chant intended for computational research. The dataset consists of all chants that are accessible through the Cantus Index...

Judikatura 2024

A corpus from court decisions of three main courts in the Czech Republic (namely Supreme Court, Supreme Administrative Court and Constitutional Court). The corpus is tagged...

The German Political Sentiment Dictionary (SUF edition)

Full edition for scientific use. This dataset contains a German-language sentiment dictionary of 5001 negative words and their associated sentiment strength on a...

Austrian Immigrant Survey 2016 (SUF edition)

Full edition for scientific use. The Austrian Immigrat Survey 2016 is a supplementary telephone survey to the main survey of the Social Survey Austria (SSÖ) 2016. In the...

Training Data for German Sentiment Analysis of Political Communication (SUF e...

Full edition for scientific use. The dataset contains 125871 sentences extracted from Austrian parliamentary debates and party press releases. Press releases were collected...

AUTNES Automatic Content Analysis of the Media Coverage 2013 (SUF edition)

Full edition for scientific use. This is a dataset on the media coverage on the 2013 Austrian National Election. 43021 contributions from 61 print TV, radio and online media...

AUTNES Content Analysis of Party Leader Statements 2002 (SUF edition)

Full edition for scientific use. The AUTNES coding of leader statements covers all public statements and actions of leaders of relevant parties in the six weeks before the 2002...

Is it a Southern thing? Accent bias of Italian listeners

Voice recordings and SPSS datasets of three experiments reported in Journal of Sociolinguistics Materials of the experiments were audio recordings of accent speakers. The...

CCLL Lemmatised Frequency Lists

The resource contains 6 frequency lists for the Corpus of Contemporary Lithuanian language (CCLL) (https://sitti.vdu.lt/en/services/) 1-LT_token_freq_list.txt - a full frequency...

Lithuanian Science and Research Terminology: Multilingual Term List

Tab-separated (TSV) UTF-8 text file containing 223 Lithuanian science and research terms with definitions and translation equivalents in English, German, and French. Intended...

Morphological lexicon Sloleks 2.0

Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains...

Morphological lexicon Sloleks 1.2

Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains...

Supporting data for: Ukrainian Indefinite Pronouns and Language Typology

Dataset abstract: In order to shed light on the distribution of Ukrainian indefinite pronouns and adverbs we carried out corpus searches and created datasets with corpus data...

5,717 datasets found