DK-CLARIN LSP Corpus - Nanotechnology domain

PID

Texts in the Nanotechnology domain come from iNano (Interdisciplinary Nanoscience Center, AU), Nano (DTU), Niels Bohr Institutet, Forskningscenter Risø, Ministeriet for Sundhed og Forebyggelse (via DTU), Miljøstyrelsen, Aktuel Naturvidenskab and have been collected in the DK-CLARIN project, WP2.2, 2008 - 2011. The corpus consists of 358,144 words in 157 files. Communicative setting/Number of files: expert->advanced (13) expert->basic (144) All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, sentence and paragrapgsegmentation, pos-tagging, lemmatisation and termhood annotation placed in separate text external spangroups. "DK-CLARIN LSP Corpus - Nanotechnology domain" is a part of the Danish DK-CLARIN LSP corpus consisting of seven sub-corpora from following subject domains: Agriculture, Construction, Economics, Environment, Health, IT and Nanotechnology.

Identifier
PID http://hdl.handle.net/20.500.12115/16
Metadata Access http://repository.clarin.dk/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:repository.clarin.dk:20.500.12115/16
Provenance
Creator Olsen, Sussi; Braasch, Anna; Jakob, Halskov; Hansen, Dorte Haltrup
Publisher Centre for Language Technology, NorS, University of Copenhagen; The Danish Language Council
Publication Year 2011
Rights CLARIN-ACA-NC; https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&NORED=1; ACA
OpenAccess true
Contact info(at)clarin.dk
Representation
Language Danish
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; text/plain; application/pdf; text/xml; downloadable_files_count: 12
Discipline Linguistics