-
Domain-Specific Languages for the GreekSchools project
The repository hosts the Context-Free Grammars for the Domain-Specific Languages developed within the GreekSchools project. The repository includes diplomatic and literary DSLs... -
GreekSchools Public Editions
The GitHub repository archive hosting the XML documents for the open access critical edition of the 885222-GreekSchools ERC project. GreekSchools XML Data for PHerc. 327... -
Women’s Empowerment – Inner and Outer Communication (Pilot Corpus)
The submitted data consists of the Women’s Empowerment Pilot Corpus, a curated collection of 30 short texts and dialogue excerpts documenting the communicative journey of... -
Oral History Resource: Lithuanian Testimonies of Siberian Deportations
The oral history resource includes: (1) Audio recordings (recorded in 2009-2010) of personal narratives by siblings Pranas Šuminskas and Vladislava Šuminskaitė about their... -
Lists of Slovene accentuated units SNES 1.0
SNES (Stalno naglašene enote iz Sloleksa; Constantly accentuated units from Sloleks) is a dataset containing Slovene final accentuated word parts (i.e., the ending part of an... -
The corpus of older Slovenian narrative prose PriLit 1.0
The PriLit corpus contains 37 texts of older Slovenian narrative prose by 12 authors. One text, Sreča v nesreči (Fortune in Misfortune) by Janez Cigler (first published in... -
Semantic lexicon of Slovene sloWNet 3.1
sloWNet is the Slovene WordNet developed in the expand approach: it contains the complete Princeton WordNet 3.0 and over 70,000 Slovene literals. These literals have been added... -
Monitor corpus of Slovene Trendi 2025-07
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 57 publishers. Trendi 2025-07 covers the period from January... -
Monitor corpus of Slovene Trendi 2025-06
The Trendi corpus is a monitor corpus of Slovenian. It contains news articles from 106 media websites, published by 57 publishers. Trendi 2025-06 covers the period from January... -
Dataset for primary stress identification in Croatian and related languages a...
The dataset contains recordings and offset annotations of a sample of the Croaitan parliamentary recordings from the corpus ParlaSpeech-HR. It contains training and testing data... -
Spoken corpora of parliamentary debates ParlaSpeech 3.0
The ParlaSpeech corpora are built from the transcripts of parliamentary proceedings of Croatian, Serbian, Polish, and Czech parliaments available in the ParlaMint 4.0 corpus... -
Slovenian Day of Resistance X & news corpus
The dataset contains social media posts from X and traditional media articles from online news sources related to the Slovenian commemorations of the Day of Resistance. We used... -
Corpus of Slovenian periodicals (1771-1914) sPeriodika 1.0
The corpus of Slovenian periodicals sPeriodika contains linguistically annotated periodicals published during the 18th, 19th, and beginning of 20th century (1771-1914). The... -
Uniform Meaning Representation 2.1 (Czech and Latin)
Czech and Latin UMR data, both manually annotated and programmatically converted from manually annotated tectogrammatical data. -
The "Mobile languages" corpus MoJezik 1.0 (audio)
The "Mobile Languages" corpus documents in-depth, semi-structured sociolinguistic interviews with speakers from two Slovene regions and distinctive dialects: Idrija (Cerkno... -
The "Mobile languages" corpus MoJezik 1.0 (transcription)
The "Mobile Languages" corpus documents in-depth, semi-structured sociolinguistic interviews with speakers from two Slovene regions and distinctive dialects: Idrija (Cerkno... -
Multilingual comparable corpora of parliamentary debates ParlaMint 5.0
ParlaMint 5.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and... -
Desam v2.0
DESAM is a czech morphologically annotated corpus which has been manually disambiguated. Each token annotated for lemma, part-of-speech and all grammatical categories using the... -
Carniolan Provincial Assembly corpus Kranjska 1.0
The corpus contains meeting proceedings of the Carniolan Provincial Assembly from 1861 to 1913 (Obravnave deželnega zbora kranjskega / Bericht über die Verhandlungen des... -
Linguistically annotated multilingual comparable corpora of parliamentary deb...
ParlaMint-en.ana 5.0 is the English machine translation of the ParlaMint.ana 5.0 (http://hdl.handle.net/11356/2005) set of corpora of parliamentary debates across Europe. The...