Semantic hypergraph corpus SemCRO 1.0

Dataset

PID

This corpus can be used to build and evaluate methods for extracting and presenting knowledge based on a semantic hypergraph. The corpus consists of 184 simple, complex and dependently complex sentences. All sentences are marked on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, named entities, and semantic roles. This resource also includes, a representation of a subset of 176 sentences in the form of a semantic hypergraph that can be used to evaluate knowledge extraction methods for Croatian. The sentences used in this corpora are taken from the textbook:

Hudeček, L., Mihaljević, M., Sršen, J. and Čamagajevac, S. (2017). Hrvatska Školska Gramatika. Zagreb: Institut za hrvatski jezik i jezikoslovlje. https://gramatika.hr/impresum/

Identifier
PID	http://hdl.handle.net/11356/1377
Related Identifier	https://www.acnltutor.net/
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1377

Provenance
Creator	Vasić, Daniel; Žitko, Branko; Gašpar, Angelina; Ljubešić, Nikola; Štrkalj Despot, Kristina; Merkler, Danijela
Publisher	University of Mostar; University of Split; Jožef Stefan Institute
Publication Year	2020
Rights	Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Croatian
Resource Type	corpus
Format	application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline	Linguistics