Collocations Dictionary of Modern Slovene KSSS 1.0

PID

The database of the Collocations Dictionary of Modern Slovene 1.0 contains entries for 35,862 headwords (18,043 nouns, 5,148 verbs, 10,259 adjectives and 2,412 adverbs) and 7,310,983 collocations that were automatically extracted from the Gigafida 1.0 corpus. For the automatic extraction via the Sketch Engine API we used a specially adapted Sketch grammar for Slovene, and, based on manual evaluation, a set of parameters that determined: maximum number of collocates per grammatical relation, minimum frequency of a collocate, minimum frequency of a grammatical relation, minimum salience (logDice) score of a collocate, and minimum salience of a grammatical relation.

The procedure of automatic extraction, which produced a list of collocates (lemmas) in a particular relation, was followed by a set of post-processing steps: - removal of collocations that were represented by repetitions of the same sentence - preparation of full collocations by the addition of the headword, and, if needed, the third element in the grammatical relation (such as preposition). The headwords/collocates were also put in the correct case, depending on the grammatical relation. - addition of IDs from the Slovenian morphological lexicon Sloleks (http://hdl.handle.net/11356/1230) to every element in the collocation.

Identifier
PID http://hdl.handle.net/11356/1250
Related Identifier http://euralex.org/wp-content/themes/euralex/proceedings/Euralex%202018/118-4-2939-1-10-20180820.pdf
Related Identifier https://www.cjvt.si/kssj/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1250
Provenance
Creator Kosem, Iztok; Gantar, Polona; Krek, Simon; Arhar Holdt, Špela; Čibej, Jaka; Laskowski, Cyprian; Pori, Eva; Klemenc, Bojan; Dobrovoljc, Kaja; Gorjanc, Vojko; Ljubešić, Nikola
Publisher Centre for Language Resources and Technologies, University of Ljubljana
Publication Year 2019
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type lexicalConceptualResource
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics