Source code and data for the PhD Thesis "Linguistically-Inspired Neural Coherence Modeling"

DOI

This dataset contains source code and data used in the PhD thesis "Linguistically-Inspired Neural Coherence Modeling". The dataset is split into five repositories:

StruSim: Source code to run experiments for Chapter 4 "Document Structure Similarity-Enhanced Coherence Modeling".

ConnRel: Source code to run experiments for Chapter 5 "Annotation-inspired Implicit Discourse Relation Classification".

Exp2Imp: Source code to run experiments for Chapter 6 "Explicit to Implicit Discourse Relation Classification".

RelCoh: Source code to run experiments for Chapter 7 "Discourse Relation-Enhanced Coherence Modeling".

EntyRelCoh: Source code to run experiments for Chapter 8 "Coherence Modeling Using Entities and Discourse Relations".

The data used in the experiments can be downloaded from Linguistic Data Consortium (https://www.ldc.upenn.edu/):

PDTB 2.0: https://catalog.ldc.upenn.edu/LDC2008T05

PDTB 3.0: https://catalog.ldc.upenn.edu/LDC2019T05

TOEFL Dataset: https://catalog.ldc.upenn.edu/LDC2014T06

GCDC: https://github.com/aylai/GCDC-corpus

CoheSentia: https://github.com/AviyaMn/CoheSentia

Identifier
DOI https://doi.org/10.11588/DATA/ZBNUCG
Metadata Access https://heidata.uni-heidelberg.de/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.11588/DATA/ZBNUCG
Provenance
Creator Liu, Wei
Publisher heiDATA
Contributor Liu, Wei; heiDATA: Heidelberg Research Data Repository
Publication Year 2025
Funding Reference Klaus Tschira Foundation
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Liu, Wei (Heidelberg University, Heidelberg Institute for Theoretical Studies (HITS))
Representation
Resource Type Dataset
Format application/zip; text/plain
Size 41966; 889; 356954; 573748; 51456; 41290
Version 1.0
Discipline Humanities; Linguistics