Indonesian web corpus

PID

Indonesian web corpus crawled in 2010. Encoded in UTF-8, cleaned, deduplicated, tagged by Morphind.

Identifier
PID http://hdl.handle.net/11234/1-2970
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-2970
Provenance
Creator MEDVEĎ, MAREK; Suchomel, Vít
Publisher Masaryk University, NLP Centre
Publication Year 2019
Rights NLP Centre Web Corpus License; https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC; ACA
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Indonesian
Resource Type corpus
Format text/plain; charset=utf-8; application/octet-stream; downloadable_files_count: 1
Discipline Linguistics