Digital library and corpus of historical Slovene IMP 1.1


The IMP digital library contains historical Slovene books and other publications, together 658 texts with over 45,000 pages from the period 1584-1919. Each text contains extensive meta-data, per-page links to facsimiles, and hand-corrected transcriptions with structural and editorial annotations.

These texts were annotated to be used as a language corpus. In the corpus each word is marked-up with its modernised form, lemma, and morphosyntactic description (fine grained PoS tag). Note that the annotations are automatic, so they contain a fair amount of errors.

The digital library is available in source TEI P5 XML and derived HTML. The corpus is available in source TEI P5 XML and in the simpler and smaller vertical format, used by various concordancers, e.g. CWB and Sketch Engine. Note that the vertical format does not contain all the information from the source TEI.

Related Identifier
Related Identifier
Metadata Access
Creator Erjavec, Tomaž
Publisher Jožef Stefan Institute
Publication Year 2014
Funding Reference info:eu-repo/grantAgreement/EC/FP7/215064
Rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0);; PUB
OpenAccess true
Contact info(at)
Language Slovenian; Slovene
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 4
Discipline Linguistics