ccGigafida ARPA language model 1.0

Dataset

PID

The ccGigafida ARPA language model was created from the ccGigafida written corpus of Slovenian (https://www.clarin.si/repository/xmlui/handle/11356/1035) using the KenLM algorithm in the Moses machine translation framework. It is a general language model of contemporary standard Slovenian language that can be used as a language model in statistical machine translation systems.

The language model was created as a part of the master thesis: Kadivec, Jože. 2016. Prilagoditev statističnega strojnega prevajalnika za specifično domeno v slovenskem jeziku (Domain specific adaptation of a statistical machine translation engine in Slovene language). Master's thesis, Faculty of computer and information science, University of Ljubljana. https://repozitorij.uni-lj.si/IzpisGradiva.php?id=84815

Identifier
PID	http://hdl.handle.net/11356/1119
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1119

Provenance
Creator	Kadivec, Jože; Robnik-Šikonja, Marko; Vintar, Špela
Publisher	Faculty of Computer and Information Science, University of Ljubljana
Publication Year	2017
Rights	Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	toolService
Format	application/gzip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline	Linguistics