Indonesian web corpus (idWac)

PID

Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd (http://septinalarasati.com/morphind/).

Identifier
PID http://hdl.handle.net/11234/1-2586
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-2586
Provenance
Creator Medveď, Marek; Suchomel, Vít
Publisher Natural Language Processing Centre, Faculty of Informatics, Masaryk University
Publication Year 2017
Rights NLP Centre Web Corpus License; https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC; ACA
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language Indonesian
Resource Type corpus
Format text/plain; charset=utf-8; application/x-xz; downloadable_files_count: 1
Discipline Linguistics