This dataset contains source code and data used in the PhD thesis "Linguistically-Inspired Neural Coherence Modeling". The dataset is split into five repositories:
StruSim: Source code to run experiments for Chapter 4 "Document Structure Similarity-Enhanced Coherence Modeling".
ConnRel: Source code to run experiments for Chapter 5 "Annotation-inspired Implicit Discourse Relation Classification".
Exp2Imp: Source code to run experiments for Chapter 6 "Explicit to Implicit Discourse Relation Classification".
RelCoh: Source code to run experiments for Chapter 7 "Discourse Relation-Enhanced Coherence Modeling".
EntyRelCoh: Source code to run experiments for Chapter 8 "Coherence Modeling Using Entities and Discourse Relations".
The data used in the experiments can be downloaded from Linguistic Data Consortium (https://www.ldc.upenn.edu/):
PDTB 2.0: https://catalog.ldc.upenn.edu/LDC2008T05
PDTB 3.0: https://catalog.ldc.upenn.edu/LDC2019T05
TOEFL Dataset: https://catalog.ldc.upenn.edu/LDC2014T06
GCDC: https://github.com/aylai/GCDC-corpus
CoheSentia: https://github.com/AviyaMn/CoheSentia