The MWE lexicon was extracted from the Gigafida 2.1 Corpus of Written Standard Slovene (https://www.clarin.si/noske/run.cgi/corp_info?corpname=gfida21) using specialized scripts for extracting data from corpora containing syntactic dependency annotations. The lexicon contains 5,242 Multiword Expressions with 12,358 examples from Gigafida 2.1. Each MWE entry (or sense) contains at least one and up to three extracted examples.
MWEs were analysed using the JOS dependency parser system (http://nl.ijs.si/jos/bib/jos-skladnja-navodila.pdf) and were assigned matching syntactic structure IDs. The corpus sentences containing the MWE components and matching syntactic structure features were identified in the corpus and assigned to the corresponding headword or sense.
MWEs variants (or variant senses) are linked with the "senseKey" attribute values, forming a MWE cluster of related variants or variant senses. A sample of MWE headwords also contains manually created sense division with descriptions of meaning for each sense.