-
Cameroonian Languages Dataset
This is a collection of resources on Cameroonian Languages. The collection comprises electronic copies of scanned wordlists and TEI-XML encoded files of the wordlists. -
Lemmatised Wordlist of 1 m. Corpus of Contemporary Lithuanian
The lemmatised wordlist of 1 m. word Lithuanian corpus. The structure of the tab delimited text file (dazninis.txt): HeadwordPart of SpeechWordformFrequency of Occurrence. The... -
Assessment Data of the Dictionary of Modern Lithuanian versus Joint Corpora
The resource is the assessment data of The Dictionary of Modern Lithuanian, 6th edition (DML6) [1], from the point of view of its coverage in the Joint Corpus of Lithuanian... -
Wordlist of the Contemporary Corpus of Lithuanian language
Dabartinės lietuvių kalbos tekstyno žodžių formų dažniniai sąrašai Worlists of Wordforms of the Contemporary Corpus of Lithuanian language Tekstyno struktūra/Corpus Structure... -
Wordlist of Lemmas from the Joint Corpus of Lithuanian
The resource is a wordlist of lemmas from the Joint Corpus of Lithuanian (JCL). The JCL is a merge of three corpora: 1) Vilnius university corpus compiled out of the Lithuanian... -
Wordlist of the Contemporary Corpus of Lithuanian Language in the Face of War...
We present the comparative wordlist based on the Corpus of the Contemporary Lithuanian Language (CCLL2 version 2, pre-2020), supplemented by the media (courtesy of the news... -
Gos corpus n-grams 1.0
This is a collection of n-grams extracted from the Gos corpus of spoken Slovene. http://hdl.handle.net/11356/1040. In addition to the separate lists of n-grams for tokens and... -
Gos corpus n-grams 2.0
A collection of n-grams extracted from the Gos corpus of spoken Slovene (cf. http://eng.slovenscina.eu/korpusi/gos). Three sets of n-gram lists are provided for lowercased word... -
Keywords and n-grams from a textbook corpus
Wordlists, keywords and n-grams were extracted from a corpus of textbooks for Slovenian elementary and secondary schools. The corpus contains 4,302,857 words (5,373,268 tokens),... -
Kres corpus n-grams 2.0
A collection of n-grams extracted from the Kres corpus of written Slovene (cf. http://eng.slovenscina.eu/korpusi/kres). Three sets of n-gram lists are provided for lowercased... -
Janes corpus n-grams 1.0
A collection of n-grams extracted from the Janes corpus of Slovenian user-generated content version 1.0 (cf. http://nl.ijs.si/janes/). Three sets of n-gram lists are provided... -
IMP corpus n-grams 1.0
This is a collection of n-grams extracted from the IMP corpus of historical Slovene (http://hdl.handle.net/11356/1031). In addition to the separate lists of n-grams for tokens... -
KRES corpus n-grams 1.0
This is a collection of n-grams extracted from the KRES corpus of written Slovene. In addition to the separate lists of n-grams for tokens and their attributes (morphosyntacic... -
IMP corpus n-grams 2.0
A collection of n-grams extracted from the IMP corpus of historical Slovene (cf. https://nl.ijs.si/imp/). Three sets of n-gram lists are provided for lowercased word n-grams of...