The aligned corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 and 2011 (http://europa.eu/rapid/search.htm).
The corpus comprises 5330 + 2200 press releases (files) for each language Danish, English and German with app. 5,000,000 words per language and 260,000 - 270,000 aligned sentences for the language pair Danish - English and Danish - German.
All documents are processed with Uplug (https://bitbucket.org/tiedemann/uplug/wiki/Home) and aligned with HunAlign.
Files with more than 10 % negative alignments have been removed and so has all 0-alignmants.
The documents are in txt-format for each language and in tmx-format for the aligned language pairs (da-en and da-de).