MULTEXT-East "1984" document corpus 4.0


The novel "1984" by George Orwell is the central component of the MULTEXT-East corpus. This parallel and sentence aligned corpus contains the novel in the English original (about 100,000 words in length), and its translations into a number of languages.

This version of the corpus contains structurally annotated texts only, which contain elements such as the paragraph, the footnote, and highlighted text. In terms of linguistic annotations, the text contain names and sentences.

The linguistically annotated texts are a separate submission ( also with somewhat different languages.

Related Identifier
Related Identifier
Metadata Access
Creator Erjavec, Tomaž; Bruda, Ştefan; Dimitrova, Ludmila; Ide, Nancy; Kaalep, Heiki-Jaan; Krstev, Cvetana; Orav, Heili; Oravecz, Csaba; Paldre, Leho; Petkevič, Vladimír; Priest-Dorman, Greg; Simov, Kiril; Sinapova, Lydia; Sokolovsky, Paul; Sryvkin, Sergey; Tufiş, Dan; Utka, Andrius; Villandi, Viire; Vitas, Duško; Vuković, Olga
Publisher Jožef Stefan Institute
Publication Year 2010
Funding Reference info:eu-repo/grantAgreement/EC/FP7/211938
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0);; PUB
OpenAccess true
Contact info(at)
Language Bulgarian; Czech; English; Estonian; Hungarian; Lithuanian; Romanian; Moldavian; Moldovan; Russian; Slovenian; Slovene; Serbian
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics