Collocations Dictionary of Modern Slovene KSSS 2.0

PID

The database of the Collocations Dictionary of Modern Slovene 2.0 contains 4,491,958 collocations in 81,443 entries. Collocations occur in 81 different syntactic relations. Collocations are labelled according to their status as "automatic" (automatically extracted, not yet manually validated) and "manual" (manually validated). In total, there are 2,090 completed entries (all collocations manually validated) and 11,227 entries with sense division and a combination of manual and automatic collocations. The IDs, provided for headwords, senses and collocations, come from the Digital Dictionary Database for Slovene.

Collocations were obtained from the Gigafida 2.0 corpus, using a method for extracting collocation data from text corpora based on a formal definition of syntactic structures, which takes into account not only the POS-tagging level of annotation but also syntactic parsing (syntactic treebank model) and introduces the possibility of controlling the canonical form of extracted collocations based on statistical data on forms with different properties in the corpus. The link to the paper describing the procedure (Krek et al. 2022) is listed as a reference in this entry.

The dictionary is split into 41 files of 2000 entries to keep the file size manageable.

Identifier
PID http://hdl.handle.net/11356/1933
Related Identifier https://elex.link/elex2023/wp-content/uploads/100.pdf
Related Identifier http://euralex.org/wp-content/themes/euralex/proceedings/Euralex%202022/EURALEX2022_Pr_p240-252_Krek-Gantar-Kosem.pdf
Related Identifier http://hdl.handle.net/11356/1250
Related Identifier http://hdl.handle.net/11356/2090
Related Identifier https://www.cjvt.si/kssj/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1933
Provenance
Creator Kosem, Iztok; Arhar Holdt, Špela; Krek, Simon; Gantar, Polona; Pori, Eva; Čibej, Jaka; Klemenc, Bojan; Laskowski, Cyprian; Dobrovoljc, Kaja; Gorjanc, Vojko; Ljubešić, Nikola; Zgaga, Karolina; Roblek, Rebeka
Publisher Centre for Language Resources and Technologies, University of Ljubljana
Publication Year 2023
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type lexicalConceptualResource
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 1
Discipline Linguistics