Morphological lexicon Sloleks 2.0

PID

Sloleks is the reference morphological lexicon for Slovenian language, developed to be used in NLP applications and language manuals. Encoded in LMF XML, the lexicon contains approx. 100,000 most frequent Slovenian lemmas, their inflected or derivative word forms and the corresponding grammatical description. Lemmatization rules, part-of-speech categorization and the set of feature-value pairs follow the JOS morphosyntactic specifications. In addition to grammatical information, each word form is also given the information on its absolute corpus frequency and its compliance with the reference language standard.

Sloleks 2.0 includes accents automatically assigned by the use of neural networks (Krsnik 2017) and partially manually corrected, as well as automatically generated IPA and SAMPA transcriptions on lemmas and word-forms.

The canonical version is encoded in XML, against the Sloleks LMF DTD. The resource is also available as a TSV file in the MULTEXT-East format, with wordform, lemma, MSD and frequency columns, also mapped to Universal Dependencies features.

References: Kaja Dobrovoljc, Simon Krek and Tomaž Erjavec, 2017: The Sloleks Morphological Lexicon and its Future Development. In (Vojko Gorjanc, Polona Gantar, Iztok Kosem and Simon Krek, eds.): Dictionary of Modern Slovene: Problems and Solutions. Ljubljana University Press, Faculty of Arts. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/download/2/1/47-1

Krsnik, Luka. Napovedovanje naglasa slovenskih besed z metodami strojnega učenja: magistrsko delo: magistrski program druge stopnje Računalništvo in informatika. Ljubljana: [L. Krsnik], 2017. http://eprints.fri.uni-lj.si/3978/

Identifier
PID http://hdl.handle.net/11356/1230
Related Identifier https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/download/2/1/47-1?inline=1
Related Identifier http://eprints.fri.uni-lj.si/3978/
Related Identifier http://hdl.handle.net/11356/1039
Related Identifier http://hdl.handle.net/11356/1745
Related Identifier http://eng.slovenscina.eu/sloleks/opis
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1230
Provenance
Creator Dobrovoljc, Kaja; Krek, Simon; Holozan, Peter; Erjavec, Tomaž; Romih, Miro; Arhar Holdt, Špela; Čibej, Jaka; Krsnik, Luka; Robnik-Šikonja, Marko
Publisher Centre for Language Resources and Technologies, University of Ljubljana
Publication Year 2019
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); https://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type lexicalConceptualResource
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 2
Discipline Linguistics