The Mapping Manuscript Migrations (MMM) project was funded from 2017 to 2020 by the Digging into Data Challenge of the Trans-Atlantic Platform. The project partners were the University of Oxford, the University of Pennsylvania, Aalto University, and the Institut de recherche et d'histoire des textes. The project's goal was to bring together data from different sources relating to the history and provenance of medieval and Renaissance manuscripts, enabling large-scale browsing and searching through a semantic Web portal as well as by direct access to the data. Three separate datasets covering more than 200,000 manuscripts, were combined into a unified knowledge graph, using Linked Open Data technologies. This approach includes a unified data model which is based on the CIDOC-CRM and FRBRoo ontologies, as well as more than 20 million RDF triples. Overlapping vocabularies for persons, places, and organizations in the source datasets were reconciled against identifiers from VIAF, GeoNames, and the Getty Thesaurus of Geographical Names. Works and manuscripts were reconciled by semi-automatic matching techniques based on string similarities. The three source datasets were: (1) Schoenberg Database of Manuscripts from the Schoenberg Institute for Manuscript Studies, University of Pennsylvania; (2) Bibale database from the Institut de recherche et d'histoire des textes (IRHT-CNRS, Paris) and (3) Medieval Manuscripts in Oxford Libraries catalogue from the Bodleian Libraries, University of Oxford. To test and demonstrate its usefulness, the MMM Knowledge Graph is in use in the MMM Semantic Portal. Based on the Sampo-UI software developed at Aalto University, the portal enables browsing, searching, and filtering across the project's triple store, together with map-based visualizations of the results.Hundreds of thousands of European pre-modern manuscripts have survived until the present day. As the result of changes in their ownership over the centuries, they are now spread all over the world. Collectively they constitute a great cultural and scholarly treasure. There are many sources of data relating to them, and new sources continue to proliferate in the digital environment. This project will link disparate datasets from Europe and North America to provide an international view of the history and provenance of these manuscripts. The aggregated data will enable researchers to analyse and visualize these topics at scales ranging from individual manuscripts to thousands of manuscripts. We will be able to show how these manuscripts have travelled across time and space to their current locations, where they continue to find new audiences. The project will also be of particular relevance and value to libraries and other collecting institutions. The results of its analyses will situate their manuscript collections in the broader historical context of patterns and trends in collecting, while its methodology and its body of data will provide a very important resource for further aggregation and exploration in the future. The data linkage techniques and visualization methodologies deployed by the project will be of wider applicability to all kinds of cultural heritage objects and collections as well as manuscripts.
The Mapping Manuscript Migrations (MMM) project transformed three separate datasets into a unified knowledge graph: Schoenberg Database of Manuscripts (relational database); Bibale (relational database ); and Medieval Manuscripts in Oxford Libraries (XML documents in Text Encoding Initiative format). Each source dataset was transformed into RDF (Resource Description Framework) triples, and mapped to the MMM Data Model, which combined elements from the CIDOC-CRM and FRBRoo ontologies. Overlapping vocabularies were reconciled using two methods: (1) automatic reconciliation using references to external authoritative Linked Open Data identifiers, and (2) semi-automatic reconciliation using expert review of possible matches identified by string similarity. The combined data were then loaded to a public triple store, and made available through a SPARQL endpoint and a semantic portal interface using the Sampo-UI software.