Croatian language corpus Riznica 0.1

PID

The Croatian Language Corpus was built between 2007 and 2011 at the Institute of Croatian Language and Linguistics in the scope of the research programme "Hrvatska jezična riznica" as a reference corpus of Croatian language to serve various lexicographic and other linguistic and language technology projects. The corpus consists of 28% of fiction texts and 72% of specialized texts. In 2017, the corpus was segmented, part-of-speech tagged and lemmatized inside the MREŽNIK project to be used for the development of the first Croatian corpus-based dictionary.

Identifier
PID http://hdl.handle.net/11356/1180
Related Identifier http://riznica.ihjj.hr/CLC-Slavicorp.pdf
Related Identifier http://riznica.ihjj.hr
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1180
Provenance
Creator Brozović Rončević, Dunja; Ćavar, Damir; Ćavar, Małgorzata; Stojanov, Tomislav; Štrkalj Despot, Kristina; Ljubešić, Nikola; Erjavec, Tomaž
Publisher Institute of Croatian Language and Linguistics
Publication Year 2018
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); https://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Croatian
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics