-
Monitor corpus of Slovene Trendi 2025-08
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 58 publishers. Trendi 2025-08 covers the period from January... -
Monitor corpus of Slovene Trendi 2025-07
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 57 publishers. Trendi 2025-07 covers the period from January... -
Comparable corpus of parliamentary debates ParlaMint-IL 1.0
The ParlaMint-IL corpus is the Israeli contribution to the ParlaMint collection of comparable parliamentary corpora (https://www.clarin.eu/parlamint), which contain... -
Slovene instruction-following dataset for large language models GaMS-Instruct...
GaMS-Instruct-MED is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions in the medical domain. It consists of pairs of... -
Domain-Specific Languages for the GreekSchools project
The repository hosts the Context-Free Grammars for the Domain-Specific Languages developed within the GreekSchools project. The repository includes diplomatic and literary DSLs... -
GreekSchools Public Editions
The GitHub repository archive hosting the XML documents for the open access critical edition of the 885222-GreekSchools ERC project. GreekSchools XML Data for PHerc. 327... -
Women’s Empowerment – Inner and Outer Communication (Pilot Corpus)
The submitted data consists of the Women’s Empowerment Pilot Corpus, a curated collection of 30 short texts and dialogue excerpts documenting the communicative journey of... -
Oral History Resource: Lithuanian Testimonies of Siberian Deportations
The oral history resource includes: (1) Audio recordings (recorded in 2009-2010) of personal narratives by siblings Pranas Šuminskas and Vladislava Šuminskaitė about their... -
The corpus of older Slovenian narrative prose PriLit 1.0
The PriLit corpus contains 37 texts of older Slovenian narrative prose by 12 authors. One text, Sreča v nesreči (Fortune in Misfortune) by Janez Cigler (first published in... -
Semantic lexicon of Slovene sloWNet 3.1
sloWNet is the Slovene WordNet developed in the expand approach: it contains the complete Princeton WordNet 3.0 and over 70,000 Slovene literals. These literals have been added... -
Monitor corpus of Slovene Trendi 2025-06
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 57 publishers. Trendi 2025-06 covers the period from January... -
Dataset for primary stress identification in Croatian and related languages a...
The dataset contains recordings and offset annotations of a sample of the Croaitan parliamentary recordings from the corpus ParlaSpeech-HR. It contains training and testing data... -
Slovenian Day of Resistance X & news corpus
The dataset contains social media posts from X and traditional media articles from online news sources related to the Slovenian commemorations of the Day of Resistance. We used... -
Corpus of Slovenian periodicals (1771-1914) sPeriodika 1.0
The corpus of Slovenian periodicals sPeriodika contains linguistically annotated periodicals published during the 18th, 19th, and beginning of 20th century (1771-1914). The... -
Uniform Meaning Representation 2.1 (Czech and Latin)
Czech and Latin UMR data, both manually annotated and programmatically converted from manually annotated tectogrammatical data. -
Desam v2.0
DESAM is a czech morphologically annotated corpus which has been manually disambiguated. Each token annotated for lemma, part-of-speech and all grammatical categories using the... -
Carniolan Provincial Assembly corpus Kranjska 1.0
The corpus contains meeting proceedings of the Carniolan Provincial Assembly from 1861 to 1913 (Obravnave deželnega zbora kranjskega / Bericht über die Verhandlungen des... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint-en.ana 5.0 is the English machine translation of the ParlaMint.ana 5.0 (http://hdl.handle.net/11356/2005) set of corpora of parliamentary debates across Europe. The... -
Multilingual comparable corpora of parliamentary debates ParlaMint 4.1
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and...
