This deposit contains Historical Postcards Dataset (COCO) — v1.0 (2025), a Common Objects in COntext (COCO) format dataset of historical postcard images and structured annotations intended for text and postal markings detection. Printed text detections include transcriptions (manual and OCR), text orientation, and OCR confidence scores — suitable for detection and historical OCR benchmarking. Transcriptions of postal markings, handwritten texts, and scene texts will be added in future versions.
The COCO format allows the use of the COCO API (or pycocotools for Python).
Contents & purpose
4,293 postcard images (Image’Est archive, Grand-Est region, France; period ca. 1899–1930) with COCO annotations for text and postal markings detection. The dataset is intended for training/evaluating text detection, OCR pipelines, and postal markings recognition for cultural-heritage research.
This dataset is presented at the 7th workshop on analySis, Understanding and proMotion of heritAge Contents (7th SUMAC @ ACM Multimedia 2025), on 27 October 2025 in Dublin, Ireland. The related paper describes the dataset and methodological details.
Provided archives
Historical_Postcards_Dataset_v1-Train2025.zip — set used in 5-fold cross cross-validation
annotations-Historical_Postcards_Dataset_v1-Train2025.zip — annotations only
Historical_Postcards_Dataset_v1-Test2025.zip — test set used in the conference article
annotations-Historical_Postcards_Dataset_v1-Test2025.zip — annotations only
Historical_Postcards_Dataset_v1-Synth2025.zip — set with synthetic annotations
annotations-Historical_Postcards_Dataset_v1-Synth2025.zip — annotations only
This subdivision facilitates importation into CVAT.
More information in the README.md file.
File structure & Annotation schema
See the README.md file for more details.
Acknowledgments
We would like to thank Image'Est for making historical postcards data available and the Grand Est Region, France who supported this work.
Python, 3.12.8
numpy, 2.1.3
pandas, 2.2.3
pillow, 11.2.1
torch, 2.7.0
ultralytics, 8.3.140
pytesseract, 0.3.13
easyocr, 1.7.2
pycocotools, 2.0.10