Automatically Annotated Corpora with Stanza and UDPipe for Czech, English, and Greek

Dataset

PID

This resource contains six automatically annotated corpora derived from the Leipzig Corpora Collection, covering three languages: Czech, English, and Greek. For each language, two corpora are provided — one annotated with Stanza and one annotated with UDPipe — resulting in two corpora per language and six corpora in total.

Identifier
PID	http://hdl.handle.net/11234/1-6120
Metadata Access	http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-6120

Provenance
Creator	Diamantopoulos, Konstantinos
Publisher	Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year	2026
Rights	Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0); http://creativecommons.org/licenses/by-nc-sa/3.0/; PUB
OpenAccess	true
Contact	lindat-help(at)ufal.mff.cuni.cz

Representation
Language	Czech; English; Greek, Modern (1453-); Greek
Resource Type	corpus
Format	application/x-gzip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline	Linguistics