Corpus of the Colloquial Polish Language

PID

The Corpus of the Colloquial Polish Language (CCPL) is a UGC-based corpus tagged with morpho-syntactic features by the team of professional linguists from the Wrocław University of Technology. It consists of 400 000 tagged segments and has been used for training of the UGC-tagger, also available in the CLARIN repository. Main resources: Corpus files (NCP tagset): CCPL - anonimizacja_xml_out_ver(3.05).zip Manual annotation guidelines: Specification for morphosyntactic tagging of UGC texts.pdf Corpus files (UD tagset): corpus_petrov_tags.zip

Identifier
PID http://hdl.handle.net/11321/637
Related Identifier https://sentione.com/knowledge/eu-research-project
Metadata Access https://clarin-pl.eu/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin-pl.eu:11321/637
Provenance
Creator Oleksy, Marcin; Dominiak, Daria; Wróż, Anita; Kobylińska, Wioleta; Kałkus, Dagmara; Zielińska, Kamila; Fikus, Dominika; Walentynowicz, Wiktor
Publisher SentiOne
Publication Year 2019
Rights Creative Commons - Attribution 4.0 International (CC BY 4.0); https://creativecommons.org/licenses/by/4.0/; CC
OpenAccess true
Contact clarin-pl(at)pwr.edu.pl
Representation
Language Polish
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; application/pdf; downloadable_files_count: 5
Discipline Linguistics