SNES (Stalno naglašene enote iz Sloleksa; Constantly accentuated units from Sloleks) is a dataset containing Slovene final accentuated word parts (i.e., the ending part of an accentuated word from its last grapheme with an accentuation diacritic to the end of the word; for instance, -álnik for "računálnik", -úlja for "hodúlja") that have been automatically extracted from the accentuated forms of the approximately 100,800 manually validated lexemes of Sloleks 3.0 (http://hdl.handle.net/11356/1745). The extracted parts were then manually categorized to compile a manually validated machine-readable list of final accentuated word parts that are always or almost always accentuated in Slovene (e.g. -álnik, -ílnik). Only accentuated word parts that are accentuated in at least 80% of examples were included in the manual list. The list can be used as a resource in post-processing to correct some of the errors in the output of Slovene accentuation models.
Version 1.0 includes 24,188 automatically extracted final accentuated word parts, 1,013 of which have been manually validated, categorized, and included in a separate manual list of Slovene final word parts that are always or very frequently accentuated. For more details on the structure of the files, please consult 00README.txt.