Texts in the Environment Domain come from Hovedland, Danske Miljøundersøgelser, Det Økologiske Råd and Aktuel Naturvidenskab(via DMI).
The corpus consists of 1,478,298 words in 93 files.
Communicative setting/Number of files: expert->expert (2) expert->advanced (23) expert->basic (68).
All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, pos-tagging, sentence and paragraph segmentation, lemmatisation and termhood annotation placed in separate text external spangroups.
"DK-CLARIN LSP Corpus - Environment domain" is a part of the Danish DK-CLARIN LSP corpus consisting of seven sub-corpora from following subject domains: Agriculture, Construction, Economics, Environment, Health, IT and Nanotechnology.