MULTEXT-East "1984" document corpus 4.0

PID

The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages.

This version of the corpus contains structurally annotated texts only, which contain elements such as the paragraph, the footnote, and highlighted text. In terms of linguistic annotations, the text contain names and sentences.

The linguistically annotated texts are a separate submission (http://hdl.handle.net/11356/1043) also with somewhat different languages.

Identifier
PID http://hdl.handle.net/11356/1044
Related Identifier https://doi.org/10.1007/s10579-011-9174-8
Related Identifier http://nl.ijs.si/ME/Vault/V4/
Metadata Access http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/1044
Provenance
Creator Erjavec, Tomaž; Bruda, Ştefan; Dimitrova, Ludmila; Ide, Nancy; Kaalep, Heiki-Jaan; Krstev, Cvetana; Orav, Heili; Oravecz, Csaba; Paldre, Leho; Petkevič, Vladimír; Priest-Dorman, Greg; Simov, Kiril; Sinapova, Lydia; Sokolovsky, Paul; Sryvkin, Sergey; Tufiş, Dan; Utka, Andrius; Villandi, Viire; Vitas, Duško; Vuković, Olga
Publisher Jožef Stefan Institute
Publication Year 2010
Funding Reference info:eu-repo/grantAgreement/EC/FP7/211938
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); https://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact info(at)clarin.si
Representation
Language Bulgarian; Czech; English; Estonian; Hungarian; Lithuanian; Romanian; Moldavian; Moldovan; Russian; Slovenian; Slovene; Serbian
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics