Annotated Corpus of Czech Case Law for Segmentation Tasks

Dataset

PID

Annotated corpus of 350 decision of Czech top-tier courts (Supreme Court, Supreme Administrative Court, Constitutional Court).

280 decisions were annotated by one trained annotator and then manually adjudicated by one trained curator. 70 decisions were annotated by two trained annotators and then manually adjudicated by one trained curator. Adjudication was conducted destructively, therefore dataset contains only the correct annotations and does not contain all original annotations.

Corpus was developed as training and testing material for text segmentation tasks. Dataset contains decision segmented into Header, Procedural History, Submission/Rejoinder, Court Argumentation, Footer, Footnotes, and Dissenting Opinion. Segmentation allows to treat different parts of text differently even if it contains similar linguistic or other features.

Identifier
PID	http://hdl.handle.net/11372/LRT-2901
Related Identifier	http://jusletter-it.weblaw.ch/issues/2019/23-Mai-2019/automatic-segmentati_f1ab10b8b5.html
Related Identifier	https://www.muni.cz/vyzkum/projekty/36467
Metadata Access	http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11372/LRT-2901

Provenance
Creator	Harašta, Jakub; Šavelka, Jaromír; Kasl, František; Míšek, Jakub
Publisher	Masaryk University, Brno
Publication Year	2019
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); http://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess	true
Contact	lindat-help(at)ufal.mff.cuni.cz

Representation
Language	Czech
Resource Type	corpus
Format	application/octet-stream; application/pdf; downloadable_files_count: 2
Discipline	Linguistics