Nova beseda Frequency Lexicon (ELEXIS)

PID

Nova beseda Frequency Lexicon was compiled from the Nova beseda text corpus at the Fran Ramovš Institute of Slovenian Language with hyphen characters unified and with leading and trailing non-breaking spaces deleted. Unlike most other Slovenian corpora Nova beseda texts were pre-processed before inclusion. Typos and words with supefluous hyphens, originating from false line joinings were corrected and parts of texts in foreign, non-Slovenian language were marked-up and excluded from the lexicon. The corpus contains 318 million tokens, mostly wordforms. It is available for search through the web page http://bos.zrc-sazu.si/a_beseda.html, where wordform search is reached by selecting "word seach" in the right-hand side "What to do?" column. On the mentioned web page the corpus structure is also explained. See also: http://hdl.handle.net/11356/1155

Identifier
PID http://hdl.handle.net/11356/1619
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1619
Provenance
Creator Jakopin, Primož
Publisher ZRC SAZU
Publication Year 2020
OpenAccess true
Contact info(at)clarin.si
Representation
Language Slovenian; Slovene
Resource Type lexicalConceptualResource
Format downloadable_files_count: 0
Discipline Linguistics