Eesti keele ühendkorpus 2023 (annoteerimata) Estonian National Corpus 2023 (prevert)

DOI

kirjeldus

Estonian corpus of written texts. Consists of the Estonian Reference Corpus (90s–2008), Contemporary and old literature, Estonian Web (2013, 2017, 2019, 2021, 2023), Timestamped Estonian corpora (2014–2021, 2020–2023), Estonian Wikipedia (articles: 2023, talkpages: 2017) and Estonian academic writing (2020–2023). Cleaned, deduplicated. Text type annotation: topics, genres.

ENCODING: UTF-8

== Comparison to ENC 2021 corpus Balanced Corpus 1990–2008 ................. kept without changes Reference Corpus 1990–2008 ................ kept without changes Literature Old 1864–1945 .................. updated according to the source Literature Contemporary 2000–2023 ......... updated according to the source (licensed under CLARIN ACA) Web 2013 .................................. kept without changes Web 2017 .................................. kept without changes Wikipedia Talk 2017 ....................... kept without changes Academic Texts (formerly DOAJ) up to 2023 . updated with new data Web 2019 .................................. kept without changes Web 2021 .................................. kept without changes Wikipedia 2023 ............................ replacing Wikipedia 2021 Feeds (JSI) 2014–2021 ..................... kept without changes Feeds (LC) 2020–2023 ...................... updated with new data Web 2023 .................................. new

Identifier
DOI https://doi.org/10.15155/3-00-0000-0000-0000-08C04M
Metadata Access https://metashare.ut.ee/oai_pmh/?verb=GetRecord&metadataPrefix=olac&identifier=ec397bb9bae611ee9c10e99c00eb27649a7f673b85724ebfaeb0f267373423c0
Provenance
Creator Jelena Kallas, jelena.kallas[at]eki.ee, Eesti Keele Instituut, Kristina Koppel, Kristina.Koppel[at]eki.ee, Eesti Keele Instituut, Helen Kaljumäe, helen.kaljumae[at]eki.ee, Eesti Keele Instituut
Publisher CLARIN
Contributor Jelena Kallas, jelena.kallas[at]eki.ee, Eesti Keele Instituut; Kristina Koppel, Kristina.Koppel[at]eki.ee, Eesti Keele Instituut
Publication Year 2024
Rights CC-BY; CLARIN_ACA
OpenAccess true
Contact info(at)keeleressursid.ee
Representation
Language Estonian
Resource Type Text
Size 3400000000 tokens
Discipline Linguistics