Wordlist of Lemmas from the Joint Corpus of Lithuanian

PID

The resource is a wordlist of lemmas from the Joint Corpus of Lithuanian (JCL). The JCL is a merge of three corpora: 1) Vilnius university corpus compiled out of the Lithuanian internet content from 2014 and primarily used for machine translation (779,2m tokens), 2) legal document corpus in a form of wordlist (courtesy of the Office of the Seimas of the Republic of Lithuania, 2011) (443,1m tokens) and 3) corpus of the contemporary Lithuanian language (CCLL) of Vytautas Magnus University (112,6m tokens). Total size of the JCL is more than 1,3 billion tokens. The size of the frequency list of lemmas is 169,787 lemmas.

Identifier
PID http://hdl.handle.net/20.500.11821/41
Metadata Access https://clarin.vdu.lt/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.vdu.lt:20.500.11821/41
Provenance
Creator Dadurkevičius, Virginijus
Publisher Vilnius university
Publication Year 2020
Rights PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT; https://clarin.vdu.lt/licenses/eula/PUB_CLARIN-LT_End-User-Licence-Agreement_EN-LT.htm; PUB
OpenAccess true
Contact info(at)clarin.vdu.lt
Representation
Language Lithuanian
Resource Type lexicalConceptualResource
Format text/plain; charset=utf-8; application/zip; downloadable_files_count: 1
Discipline Linguistics