Mapping onto EUDAT-B2FIND Metadata Schema

The offered metadata must be mapped to the B2FIND schema in a meaningful way. And this is currently happening through a joint action, i.e. by iterative discussions between the data provider and the B2FIND team.

   Specification of Community Metadata
   Homogenisation and Semantic Mapping
   EUDAT-B2FIND metadata schema
    Concordance with other Standards
    The central facet Discipline

Specification of Community Metadata

The implementation of the mapping, as described in the following subsection, is based on a detailed specification and documentation of the community specific metadata. For this a spreadsheet must be filled out. The excel template can be requested via the support form or sending us an email or download the version in the google drive at Community-B2FIND_template.xlsx

. This template or form is divided in several tabs or sub parts :

  • General Information : Data providers should provide here information about the contact persons and the community.
  • Metadata Specification : More detailed information about teh specific metadata formats, schemas and structure used.
  • Harvesting : Specify here the 'harvesting endpoints' (e.g. OAI-URL's), the protocols and API's used and the sub sets, if available.
  • Mapping : This table specifies the mapping of the community properties to the B2FIND schema andcoverage information. This is iteratively discussed and developed with the data provider during the uptake process.

Homogenisation and Semantic Mapping

To transform and reformat the harvested, ‘raw’ metadata records to datasets, which can be uploaded to the B2FIND catalogue and indexed and displayed in the B2FIND portal, the following processing steps must are carried out :
  1. Select entries from the XML records, based on XPATH rules that depend on community specific metadata formats (see providing metadata)
  2. Parse through the selected values and assign them to the in the XPATH rules specified keys, i.e. fields of the B2FIND schema.
  3. Store the resulting key-value pairs in JSON dictionaries.
  4. Check and validate these JSON records before the upload to the B2FIND repository
This mapping procedure needs regular adaption and extensions according to the needs of the changing requirements of the communities.

EUDAT-B2FIND Metadata Schema

To allow a unique search space, B2FIND established a common, interdisciplinary metadata schema. This schema is based on the DataCite Metadata Schema 3.1 and therefore as well compatible with other e-infrastructures as OpenAire, which uses as well the DataCite schema.

The B2FIND Metadata Schema 0.1 is the first published version and was released on August 30, 2016. The associated XSD file is available and downloadable as XSD file from b2find_schema_0.1.xsd .

Currently the schema comprises 17 fields or facets as listed in the following table with their semantic definitions, allowed values and references to the associated properties in teh DataCite Metadata Schema 4.0 fields .

Element name Semantic Definition Allowed values, constraints and CV's DataCite reference Obligation Occurence Comments and Issues
Title A name or a title by which a resource is known Free text 3. Title Mandatory 1 Coding must be UTF-8 (unicode)
Description An additional information describing the content of the resource. Could be an abstract, a summary or a Table of Content. Free text 17.Description Recommended 0-1 Coding should be UTF-8 (unicode)
Tags A subject, keyword, classification code, or key phrase describing the content. List of strings, filter out 'non nouns' by using 'stop words' 6.Subject Optional 1 Try to use keyword thesauri from communities
Source An identifier (URL) that uniquely identifies a resource. Should be resolvable URL 1.Identifier 1 Identifer is mandatory 0-1
PID A persistent identifier (implemented as a handle in a Handleserver) that uniquely identifies a resource. Must be resolvable URL and registered at a handle server 1.Identifier 1 Identifer is mandatory 0-1
DOI A persistent, citable identifier (registered at DataCite) that uniquely identifies a resource. Must be resolvable URL, registered at DataCite as DOI 1.Identifier 1.1. identiferType = DOI 1 Identifer is mandatory 0-1
Checksum Checksum of the underlying data resource MD5 checksum N/A Optional 0-1
Rights Any rights information for this resource. Free text 16. Rights Optional 0-1
Discipline The scientific disciplines linked with the resource. Controlled vocabulary, see b2find_disciplines.json N/A [ sometimes information in 6. Subject ] Optional 0-n
Creator The main researchers involved in producing the data, or the authors of the publication, in priority order. List of names 2. Creator Optional 0-1
Publisher The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role. List of names 4. Publisher Optional 0-1
PublicationYear The year when the data was or will be made publicly available. 5. PublicationYear Optional 0-1
Language Allowed values are taken from ISO 639‐1 language codes. 9. Language Optional 0-1 Examples: English, German, French
Temporal Coverage 8. Date Optional 0-1
Statial Coverage Optional 0-1
Format Optional 0-1
ResourceType A description of the resource 10. ResourceType Optional 0-1
Contact [ may be 7. Contributor] Optional 0-1
MetaDataAccess Link to the original harvested metadata record (GetRecord request) Optional 0-1

Concordance with other Standards

As said before the EUDAT-B2FIND schema is compatible with other widely used standards, which are based on the DataCite schema. In the following table the compability with the core schema of EUDAT-B2SHARE and the open access initiative OpenAIRE is shown by referring to the DataCite schema.

DCite # DataCite 4.0 B2FIND B2SHARE OpenAIRE Comments and Issues
1 Identifier (+ 1.1. identifierType=[DOI]) Source | DOI | PID N/A (self referenced) Identifier (+ 1.1. identifierType=[DOI , ...])
2 Creator Creator Creator Creator
3 Title Title Title Title
4 Publisher Publisher Publisher Publisher
5 PublicationYear PublicationYear PublicationYear PublicationYear
6 Subject Tags and Discipline Keywords and Discipline
7 Contributor [ --> Contact]
8 Date [ --> Temporal Coverage]
9 Language Language
10 ResourceType Not yet
11 AlternateIdentifier
11 AlternateIdentifier
12 RelatedIdentifier
13 Size
14 Format Format
15 Version N/A [ --> checksum]
16 Rights
17 Description Description
18 GeoLocation Spatial Coverage
19 FundingReference N/A

The central facet Discipline

For the central facet Discipline B2FIND has defined a closed vocabulary with three levels of sub disciplines:

    1. Humanities
      1.1 Human History
      • 1.1.1 African History
      • 1.1.2 American History
      • 1.1.3 Ancient History
      • 1.1.4 History of Australia|Australian History
      • 1.1.5 History of Asia|Asian History
      • 1.1.6 History of Europe|European History
      • 1.1.7 History of China|Chinese History
      • 1.1.8 Economic History of the world|Economic History
      • 1.1.9 Ancient Greece|Greek History
      • 1.1.10 History of Iran|Iranian History
      • 1.1.11 History of India|Indian History
      • 1.1.12 History of Indonesia|Indonesian History
      • 1.1.13 Intellectual History
      • 1.1.14 History of Latin America|Latin American History
      • 1.1.15 Modern History
      • 1.1.16 History of political thought|Political History
      • 1.1.17 Pre-Columbian era
      • 1.1.18 Ancient Rome|Roman History
      • 1.1.19 History of Russia|Russian History
      • 1.1.20 History of Science|Scientific History
      • 1.1.21 History of Technology|Technological History
      • 1.1.22 World History
    • 1.2 Linguistics

    2. Social Sciences
    3. Natural Sciences
    4. Formal Sciences
    5. Professions