-
Scrambled text: training Language Models to correct OCR errors using syntheti...
This data repository contains the key datasets required to reproduce the paper "Scrambled text: training Language Models to correct OCR errors using synthetic data". In addition... -
NCSE v2.0: A Dataset of OCR-Processed 19th Century English Newspapers
NCSE v2.0 Dataset RepositoryThis repository contains the NCSE v2.0 dataset and associated supporting data used in the paper "Reading the unreadable: Creating a dataset of 19th...
