This corpus can be used to build and evaluate methods for extracting and presenting knowledge based on a semantic hypergraph. The corpus consists of 184 simple, complex and dependently complex sentences. All sentences are marked on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, named entities, and semantic roles. This resource also includes, a representation of a subset of 176 sentences in the form of a semantic hypergraph that can be used to evaluate knowledge extraction methods for Croatian.
The sentences used in this corpora are taken from the textbook:
Hudeček, L., Mihaljević, M., Sršen, J. and Čamagajevac, S. (2017). Hrvatska Školska Gramatika. Zagreb: Institut za hrvatski jezik i jezikoslovlje. https://gramatika.hr/impresum/