Annotated Corpus of Czech Case Law for Segmentation Tasks


Annotated corpus of 350 decision of Czech top-tier courts (Supreme Court, Supreme Administrative Court, Constitutional Court).

280 decisions were annotated by one trained annotator and then manually adjudicated by one trained curator. 70 decisions were annotated by two trained annotators and then manually adjudicated by one trained curator. Adjudication was conducted destructively, therefore dataset contains only the correct annotations and does not contain all original annotations.

Corpus was developed as training and testing material for text segmentation tasks. Dataset contains decision segmented into Header, Procedural History, Submission/Rejoinder, Court Argumentation, Footer, Footnotes, and Dissenting Opinion. Segmentation allows to treat different parts of text differently even if it contains similar linguistic or other features.

Related Identifier
Related Identifier
Metadata Access
Creator Harašta, Jakub; Šavelka, Jaromír; Kasl, František; Míšek, Jakub
Publisher Masaryk University, Brno
Publication Year 2019
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0);; PUB
OpenAccess true
Contact Masaryk University, Brno
Language Czech
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; application/pdf; downloadable_files_count: 2
Discipline Linguistics