DK-CLARIN LSP Corpus - Health domain

PID

Texts in the Health and Medicine Domain come from netpatient.dk, Søfartsstyrelsen, Sundhedsstyrelsen, regionH, Libris, Aktuel Naturvidenskab and have been collected in the DK-CLARIN project, WP2.2, 2008 - 2011. The corpus consists of 3,972,573 words in 3273 files. Communicative setting/Number of files: expert->expert (27) expert->advanced (40) expert->basic (3206). All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, sentence and paragraph segmentation, pos-tagging, lemmatisation and termhood annotation placed in separate text external spangroups. "DK-CLARIN LSP Corpus - Health and Medicine domain" is a part of the Danish DK-CLARIN LSP corpus consisting of seven sub-corpora from following subject domains: Agriculture, Construction, Economics, Environment, Health, IT and Nanotechnology.

Identifier
PID http://hdl.handle.net/20.500.12115/14
Metadata Access http://repository.clarin.dk/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:repository.clarin.dk:20.500.12115/14
Provenance
Creator Olsen, Sussi; Braasch, Anna; Jakob, Halskov; Hansen, Dorte Haltrup
Publisher Centre for Language Technology, NorS, University of Copenhagen; The Danish Language Council
Publication Year 2011
Rights CLARIN-ACA-NC; https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&NORED=1; ACA
OpenAccess true
Contact info(at)clarin.dk
Representation
Language Danish
Resource Type corpus
Format text/plain; charset=utf-8; application/zip; text/plain; text/xml; application/pdf; downloadable_files_count: 15
Discipline Linguistics