PatTR: Patent Translation Resource

DOI

PatTR is a sentence-parallel corpus extracted from the MAREC patent collection. The current version contains more than 22 million German-English and 18 million French-English parallel sentences collected from all patent text sections as well as 5 million German-French sentence pairs from patent titles, abstracts and claims. The corpus is sorted by language pairs and by text sections of a patent document, namely title, abstrac t, claims and description. Parallel data from title, abstract and claims sections were extracted from documents belonging to the European Patent Office ( EPO) and the World Intellectual Property Organization (WIPO) corpora in MAREC. Both resources feature multilingual documents that contain for example both an English and a German abstract. Since there are no multilingual descriptions, data from this section were collected by exploiting patent families to align German and French documents from the EPO corpus to English documents from the United S tates Patent and Trademark Office (USPTO) corpus, following Utiyama, Masao and Isahara, Hitoshi: A Japanese-English patent parallel corpus. MT summit XI (2007), 475--482. All sections were sentence-aligned using the Gargantua aligner. Preprocessing was done automatically. Sentence boundaries were detected using the Europarl processing tools. For a detailed description of the corpus construction process, please see the publications above.

Identifier
DOI https://doi.org/10.11588/data/10002
Metadata Access https://heidata.uni-heidelberg.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.11588/data/10002
Provenance
Creator Wäschle, Katharina; Riezler, Stefan
Publisher heiDATA
Contributor Prof. Dr. Stefan Riezler; Wäschle, Katharina
Publication Year 2014
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Prof. Dr. Stefan Riezler (Department of Computational Linguistics)
Representation
Resource Type Dataset
Format application/x-gzip; text/plain
Size 245640661; 1348412107; 1403883854; 702207465; 1113706581; 658863580; 676894791; 4505
Version 3.1
Discipline Other
Spatial Coverage Heidelberg, Germany