Frequency lists of word parts from the GOS 1.0 corpus 1.1

PID

Frequency lists of words split into word parts were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all lemmas, lower-case word forms or standardized word forms occurring in the corpus, split into their initial or final part (i.e. the initial or final string of 1, 2, 3, 4 or 5 characters in the word) and the rest. In addition, the lists also contain absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy.

The lists were extracted for each part-of-speech category. For each part-of-speech, a total of 30 lists were extracted: 1) 10 lists for initial or final word parts extracted from lemmas, 2) 10 lists for initial or final word parts extracted from lower-case word forms, 3) 10 lists for initial or final word parts extracted from standardized word forms.

In addition, 30 lists were extracted from all words (regardless of their part-of-speech category).

Compared to the previous version (http://hdl.handle.net/11356/1270), this one includes fixes of several typos and substitutes all instances of "normalized forms" with the more adequate term "standardized forms" (as used in the SSJ project).

Identifier
PID http://hdl.handle.net/11356/1366
Related Identifier http://hdl.handle.net/11356/1270
Related Identifier http://slovnica.ijs.si/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1366
Provenance
Creator Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja; Krek, Simon
Publisher Centre for Language Resources and Technologies, University of Ljubljana; Jožef Stefan Institute
Publication Year 2020
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type lexicalConceptualResource
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics