-
KRES corpus n-grams 1.0
This is a collection of n-grams extracted from the KRES corpus of written Slovene. In addition to the separate lists of n-grams for tokens and their attributes (morphosyntacic... -
Gos corpus n-grams 1.0
This is a collection of n-grams extracted from the Gos corpus of spoken Slovene. http://hdl.handle.net/11356/1040. In addition to the separate lists of n-grams for tokens and... -
Janes corpus n-grams 1.0
A collection of n-grams extracted from the Janes corpus of Slovenian user-generated content version 1.0 (cf. http://nl.ijs.si/janes/). Three sets of n-gram lists are provided... -
Keywords and n-grams from a textbook corpus
Wordlists, keywords and n-grams were extracted from a corpus of textbooks for Slovenian elementary and secondary schools. The corpus contains 4,302,857 words (5,373,268 tokens),... -
Gos corpus n-grams 2.0
A collection of n-grams extracted from the Gos corpus of spoken Slovene (cf. http://eng.slovenscina.eu/korpusi/gos). Three sets of n-gram lists are provided for lowercased word... -
Kres corpus n-grams 2.0
A collection of n-grams extracted from the Kres corpus of written Slovene (cf. http://eng.slovenscina.eu/korpusi/kres). Three sets of n-gram lists are provided for lowercased... -
IMP corpus n-grams 1.0
This is a collection of n-grams extracted from the IMP corpus of historical Slovene (http://hdl.handle.net/11356/1031). In addition to the separate lists of n-grams for tokens... -
IMP corpus n-grams 2.0
A collection of n-grams extracted from the IMP corpus of historical Slovene (cf. http://nl.ijs.si/imp/). Three sets of n-gram lists are provided for lowercased word n-grams of... -
Assessment Data of the Dictionary of Modern Lithuanian versus Joint Corpora
The resource is the assessment data of The Dictionary of Modern Lithuanian, 6th edition (DML6) [1], from the point of view of its coverage in the Joint Corpus of Lithuanian... -
Wordlist of the Contemporary Corpus of Lithuanian language
Dabartinės lietuvių kalbos tekstyno žodžių formų dažniniai sąrašai Worlists of Wordforms of the Contemporary Corpus of Lithuanian language Tekstyno struktūra/Corpus Structure... -
Wordlist of Lemmas from the Joint Corpus of Lithuanian
The resource is a wordlist of lemmas from the Joint Corpus of Lithuanian (JCL). The JCL is a merge of three corpora: 1) Vilnius university corpus compiled out of the Lithuanian... -
Lemmatised Wordlist of 1 m. Corpus of Contemporary Lithuanian
The lemmatised wordlist of 1 m. word Lithuanian corpus. The structure of the tab delimited text file (dazninis.txt): HeadwordPart of SpeechWordformFrequency of Occurrence. The... -
Cameroonian Languages Dataset
This is a collection of resources on Cameroonian Languages. The collection comprises electronic copies of scanned wordlists and TEI-XML encoded files of the wordlists.