Source code and data for the PhD Thesis "Linguistically-Inspired Neural Coherence Modeling"

Dataset

DOI

This dataset contains source code and data used in the PhD thesis "Linguistically-Inspired Neural Coherence Modeling". The dataset is split into five repositories:

StruSim: Source code to run experiments for Chapter 4 "Document Structure Similarity-Enhanced Coherence Modeling".

ConnRel: Source code to run experiments for Chapter 5 "Annotation-inspired Implicit Discourse Relation Classification".

Exp2Imp: Source code to run experiments for Chapter 6 "Explicit to Implicit Discourse Relation Classification".

RelCoh: Source code to run experiments for Chapter 7 "Discourse Relation-Enhanced Coherence Modeling".

EntyRelCoh: Source code to run experiments for Chapter 8 "Coherence Modeling Using Entities and Discourse Relations".

The data used in the experiments can be downloaded from Linguistic Data Consortium (https://www.ldc.upenn.edu/):

PDTB 2.0: https://catalog.ldc.upenn.edu/LDC2008T05

PDTB 3.0: https://catalog.ldc.upenn.edu/LDC2019T05

TOEFL Dataset: https://catalog.ldc.upenn.edu/LDC2014T06

GCDC: https://github.com/aylai/GCDC-corpus

CoheSentia: https://github.com/AviyaMn/CoheSentia

Identifier
DOI	https://doi.org/10.11588/DATA/ZBNUCG
Metadata Access	https://heidata.uni-heidelberg.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.11588/DATA/ZBNUCG

Provenance
Creator	Liu, Wei
Publisher	heiDATA
Contributor	Liu, Wei; heiDATA: Heidelberg Research Data Repository
Publication Year	2025
Funding Reference	Klaus Tschira Foundation
Rights	info:eu-repo/semantics/openAccess
OpenAccess	true
Contact	Liu, Wei (Heidelberg University, Heidelberg Institute for Theoretical Studies (HITS))

Representation
Resource Type	Dataset
Format	application/zip; text/plain
Size	41966; 889; 356954; 573748; 51456; 41290
Version	1.0
Discipline	Humanities; Linguistics