GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation

DOI

Paper: "GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation" (COLING 2020) by Zhijing Jin, Qipeng Guo, Xipeng Qiu, and Zheng Zhang. (https://aclanthology.org/2020.coling-main.217/) Abstract: Data collection for the knowledge graph-to-text generation is expensive. As a result, research on unsupervised models has emerged as an active field recently. However, most unsupervised models have to use non-parallel versions of existing small supervised datasets, which largely constrain their potential. In this paper, we propose a large-scale, general-domain dataset, GenWiki. Our unsupervised dataset has 1.3M text and graph examples, respectively. With a human-annotated test set, we provide this new benchmark dataset for future research on unsupervised text generation from knowledge graphs.

Identifier
DOI https://doi.org/10.17617/3.YGO7EW
Metadata Access https://edmond.mpg.de/api/datasets/export?exporter=dataverse_json&persistentId=doi:10.17617/3.YGO7EW
Provenance
Creator Jin, Zhijing
Publisher Edmond
Publication Year 2024
OpenAccess true
Contact zhij.jin(at)gmail.com
Representation
Language English
Resource Type Dataset
Version 1
Discipline Other