-
Ground Truth Model for Pracalit for Sanskrit and Newar MSS 16th to 19th C.
Ground truth data for a an OCR model. Will be continually updated. Originally trained on Transkribus with a PyLaia model created from ground truth data based on transcripts into... -
Ground Truth data for printed Malayalam
Ground Truth (GT) data (JPG, PAGE and ALTO XML files) which can be used to train OCR models that recognize printed text in Malayalam script. The training material is gathered... -
Ground Truth data for printed Devanagari
Ground truth (GT) data (jpg and alto xml files) for an OCR model that recognizes printed text in Devanagari script. The GT data was trained on Transkribus with the HTR+ engine.... -
Ground Truth transcriptions for training OCR of historical Bengali printed te...
This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print...