KrdWrd CANOLA Corpus 1.0

PID

The CANOLA Corpus is a visually annotated English web corpus for training classification engines to remove boiler plate on unseen Web pages. It was harvested, annotated and evaluated by the tools and infrastructure of the KrdWrd Project.

Identifier
PID http://hdl.handle.net/20.500.12124/8
Related Identifier https://github.com/krdwrd/data/releases/tag/v1.0
Related Identifier https://www.sigwac.org.uk/raw-attachment/wiki/WAC5/WAC5_proceedings.pdf
Related Identifier http://hdl.handle.net/20.500.12124/9
Related Identifier https://krdwrd.github.io
Metadata Access http://clarin.eurac.edu/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:clarin.eurac.edu:20.500.12124/8
Provenance
Creator Stemle, Egon W.; Steger, Johannes M.
Publisher Institute for Applied Linguistics, Eurac Research
Publication Year 2010
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0); https://creativecommons.org/licenses/by-sa/4.0/; PUB
OpenAccess true
Contact clarin(at)eurac.edu
Representation
Language English
Resource Type corpus
Format application/gzip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics