163 datasets found

Keywords: corpus

Filter Results
  • Lithuanian Corpus of the EU Primary and Secondary Law Acts of the Period 2015...

    274,460 word corpus comprised of selected primary and secondary law acts of the EU of the period 2015-2017. The corpus was compiled of documents containing words with the root...
  • English-Lithuanian Comparable Vaccination Corpus

    Two news portals were selected for comparable corpora building: the Lithuanian portal DELFI and the English portal The Guardian. The compiled corpora comprise 135 Lithuanian...
  • Corpus of Discourse on Crime

    Specialised "Corpus of Discourse on Crime" is synchronic, monolingual, unannotated, consists of two subcorpora. Subcorpus 1: all texts on crime, published in criminal columns on...
  • Lithuanian Coreference Corpus

    Lithuanian Coreference Corpus The corpus is made out of 100 articles from news portals focusing on political news, as such texts are rich in quotations and named entity...
  • MWE Kraszewski

    Józef Ignacy Kraszewski
  • Corpus of the Contemporary Lithuanian Language

    Corpus of the Contemporary Lithuanian Language, which comprises 208 million words, is a collection of texts designed to represent the current Lithuanian. The corpus has been...
  • 1000 Novels Corpus

    Corpus of literary texts intended as benchmark collection for text categorization. It contains 1000 novels written in polish or translated to polish by various authors. Each...
  • MWE Rodziewicz

    Maria Rodziewicz
  • Polish Parliamentary Corpus

    The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of Polish Parliament, Sejm and Senate. The corpus files are...
  • DELFI.lt corpus

    DELFI.lt is corpus made of articles published by news portal DELFI.lt since March 2014 till November 2016. Metadata was collected with articles as well: author, title, date,...
  • MWE Mniszek

    Helena Mniszek
  • Lithuanian morphologically annotated corpus - MATAS

    MATAS v0.2 - Morphologically Annotated Lithuanian Corpus (manually checked) Contains 4 parts: Documents (21%), Fiction (19%), Periodicals (36%), Scientific texts (24%) Wordform...
  • MWE Świętochowski

    Aleksander Świętochowski
  • Corpus KLASIUS v.02

    900 extracts for the corpus were collected from manuals and publications for secondary school students included in the compulsory bibliographic descriptions of the university...
  • MWE Domańska

    Antonina Domańska
  • MWE Kaczkowski

    Zygmunt Kaczkowski
  • LITIS v.1

    Corpus of user-generated comments collected from two Lithuanian portals: www.delfi.lt and www.lrytas.lt Each comment is in a separate file (TXT). Each file contains: a comment,...
  • Polish Spatial Texts (PST) 1.0

    Texts derived from polish travel blogs manually annotated with spatial expressions, A spatial expression is a text fragment which describes a relative location of two or more...
  • Lithuanian Parliament Corpus for Authorship Attribution

    23.9 m word Lithuanian Parliament corpus is specially designed for authorship attribution task. The corpus consists of 111 thousand samples of speech transcripts by 147...
  • English-French-Lithuanian Parallel Corpus of EU Financial Documents

    The corpus is comprised of 154 EU legislative documents (English documents and their translations into French and Lithuanian) related to various financial issues and enacted in...
You can also access this registry using the API (see API Docs).