161 datasets found

Keywords: corpus

Filter Results
  • Corpus of Discourse on Crime

    Specialised "Corpus of Discourse on Crime" is synchronic, monolingual, unannotated, consists of two subcorpora. Subcorpus 1: all texts on crime, published in criminal columns on...
  • Lithuanian Coreference Corpus

    Lithuanian Coreference Corpus The corpus is made out of 100 articles from news portals focusing on political news, as such texts are rich in quotations and named entity...
  • MWE Kraszewski

    Józef Ignacy Kraszewski
  • Corpus of the Contemporary Lithuanian Language

    Corpus of the Contemporary Lithuanian Language, which comprises 208 million words, is a collection of texts designed to represent the current Lithuanian. The corpus has been...
  • 1000 Novels Corpus

    Corpus of literary texts intended as benchmark collection for text categorization. It contains 1000 novels written in polish or translated to polish by various authors. Each...
  • MWE Rodziewicz

    Maria Rodziewicz
  • Polish Parliamentary Corpus

    The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of Polish Parliament, Sejm and Senate. The corpus files are...
  • DELFI.lt corpus

    DELFI.lt is corpus made of articles published by news portal DELFI.lt since March 2014 till November 2016. Metadata was collected with articles as well: author, title, date,...
  • MWE Mniszek

    Helena Mniszek
  • Lithuanian morphologically annotated corpus - MATAS

    MATAS v0.2 - Morphologically Annotated Lithuanian Corpus (manually checked) Contains 4 parts: Documents (21%), Fiction (19%), Periodicals (36%), Scientific texts (24%) Wordform...
  • MWE Świętochowski

    Aleksander Świętochowski
  • Corpus KLASIUS v.02

    900 extracts for the corpus were collected from manuals and publications for secondary school students included in the compulsory bibliographic descriptions of the university...
  • MWE Domańska

    Antonina Domańska
  • MWE Kaczkowski

    Zygmunt Kaczkowski
  • LITIS v.1

    Corpus of user-generated comments collected from two Lithuanian portals: www.delfi.lt and www.lrytas.lt Each comment is in a separate file (TXT). Each file contains: a comment,...
  • Polish Spatial Texts (PST) 1.0

    Texts derived from polish travel blogs manually annotated with spatial expressions, A spatial expression is a text fragment which describes a relative location of two or more...
  • Lithuanian Parliament Corpus for Authorship Attribution

    23.9 m word Lithuanian Parliament corpus is specially designed for authorship attribution task. The corpus consists of 111 thousand samples of speech transcripts by 147...
  • English-French-Lithuanian Parallel Corpus of EU Financial Documents

    The corpus is comprised of 154 EU legislative documents (English documents and their translations into French and Lithuanian) related to various financial issues and enacted in...
  • MWE Prus

    Bolesław Prus
  • MWE Sygietyński

    Antoni Sygietyński
You can also access this registry using the API (see API Docs).