Mapping onto EUDAT-B2FIND Metadata Schema

The provided metadata must be mapped to the B2FIND schema in a meaningful way. Currently this is done in close cooperation between the data provider and the B2FIND team. By iteratively discussing the process a suitable solution is reached in each case.

Specification of Community Metadata

The implementation of the mapping, as described in the following subsection, is based on a detailed specification and documentation of the community-specific metadata. We have designed a template for gathering the required data, see B2FIND community_template. This file will document the communication process and decisions regarding the ingestion of the provider's metadata into B2FIND.

This template is divided into several parts:

  • General Information: In this tab, data providers should provide information about the contact persons and the community.
  • Metadata Specification: Please give us more detailed information about the specific metadata formats, schemas and structure used.
  • Harvesting: Here the harvesting endpoints (e.g. OAI-URLs) should be provided, as well as the protocols and APIs used, and the subsets, if available.
  • Mapping: In this table, the mapping of the community properties to the B2FIND schema and coverage information should be laid out. This is iteratively discussed and developed with the data provider during the initial intake process.

Homogenisation and Semantic Mapping

To transform and reformat the harvested raw metadata records to datasets which can be uploaded to the B2FIND catalogue and indexed and displayed in the B2FIND portal, the following processing steps must be carried out:

  1. Select entries from the XML records that depend on community-specific metadata formats (see Providing Metadata).
  2. Parse through the selected values and assign them to the keys specified in the elements of the B2FIND schema.
  3. Store the resulting key-value pairs in JSON dictionaries.
  4. Check and validate these JSON records before uploading to B2FIND.
This mapping procedure needs regular adaption and extensions according to the needs of the changing requirements of the communities.

EUDAT-B2FIND Metadata Schema

To allow a single but interdisciplinary search space, B2FIND established a generic, non-hierarchical metadata schema. This schema is based on the DataCite Metadata Schema and therefore is also compatible with guidelines of other e-infrastructures such as OpenAire, as their schemas are based on the DataCite schema as well. Additional elements of the B2FIND schema include "Discipline", "Instrument" and "TemporalCoverage".

The B2FIND Metadata Schema 2.0 is the current version and was released on November 11, 2020. The associated XSD file is available as XSD file at b2find_schema_2.0.xsd.

Currently the schema consists of 26 elements. These are listed in the following table with their description, occurrences and allowed values. The level of obligation is indicated with each element as follows:

  • Mandatory (M): properties must be provided.
  • Mandatory if applicable: (M/A): if your metadata contains this value, you must provide it.
  • Recommended (R): properties are optional, but strongly recommended for interoperability and higher quality of the metadata.
  • Optional (O): properties are optional and provide richer description.
Providers, who submit both the mandatory and recommended sets of properties significantly enhance the chance of their metadata being found, cited and linked to original research.

Metadata Type B2FIND Name Description Occurrence Allowed values Comments and Issues
General Information Community (M) The scientific community, research infrastructure, project or data provider from which B2FIND harvests the metadata. 1 Textual
Title (M) A name or a title by which a resource is known 1-n Textual
Description (R) All additional information that does not fit in any of the other categories. May be used for technical information. Could be an abstract, a summary or a table of content.It is good practice to supply a description.0-1 Textual
Keywords (R) Subject, keyword, classification code, or key phrase describing the resource. 0-n List of strings Try to use keyword thesauri from community-specific vocabularies.
Identifier DOI (M/A) A persistent citable identifier that uniquely identifies a resource. 0-1 Must be resolvable URI, registered at DataCite as DOI. At least one resource identifier is mandatory.
PID (M/A) A persistent identifier that uniquely identifies a resource. 0-1 Must be resolvable URI, registered at a handle server.
Source (M/A) An identifier that uniquely identifies a resource. It may link to the data itself or a landing page that curates the data. 0-1 Should be resolvable URI.
RelatedIdentifier (O) Identifiers of related resources. 0-n Should be resolvable URI.
MetadataAccess (R) Link to the originally harvested metadata record. 0-1 Should be resolvable URI. Automatically generated by B2FIND script (GetRecord request for OAI-PMH).
Provenance Creator (R) The main researchers involved working on the data, or the authors of the publication in priority order. May be a corporate/institutional or personal name. 0-n The personal name format should be: family, given. Non-roman names may be transliterated according to the ALA-LC schemes. Examples: Smith, John; Miller, Elizabeth.
Publisher (M) The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role. 1-n Examples: World Data Center for Climate (WDCC); GeoForschungsZentrum Potsdam (GFZ); Geological Institute, University of Tokyo, GitHub
Contributor (O) The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource. 0-n List of names
Instrument (O) The technical instrument(s) used to generate, observe or measure the data. 0-n Could be instrument ID (or name) and hosting facility name.
PublicationYear (M) Year when the data is made publicly available. If an embargo period has been in effect, use the date when the embargo period ends. 1 UTC Year format (YYYY)
FundingReference (O) Information about financial support (funding) for the resource. 0-n Could be funder name or grant number.
Rights (R) Any rights information for this resource. 0-n Textual
OpenAccess (M/A) Information on whether the resource is openly accessible or not. 1 Boolean Automatically generated by B2FIND script based on the information given in "Rights" element. Default value is "True" unless stated otherwise.
Contact (O) A reference to contact information for this resource. 0-n List of Names
Representation Language (R) Language(s) of the resource. 0-n Allowed values are ISO 639-1 or ISO 639-3 language codes or text. Examples: en; eng; English
ResourceType (R) The type(s) of the resource. 0-n Free text. Examples: Dataset; Image; Audiovisual
Format (R) Technical format of the resource. 0-n Textual. Use file extension or MIME type where possible, e.g. PDF, XML, MPG or application/pdf, text/xml, video/mpeg.
Size (O) Size information about the resource. 0-n Free text. Examples: 15 pages; 6 MB; 45 minutes.
Version (O) Version information about the resource. 0-n Suggested practice: track major_version.minor_version. Example: v1.02
Discipline (M) The research discipline(s) the resource can be categorized in. 1-n Controlled vocabulary, see b2find_disciplines.json. If not applicable, add community specific discipline term.
Spatial Coverage (O) The spatial coverage the research data is related to. Content of this category is displayed in plain text. If a longitude/latitude information is given it will be displayed on the map. 0-1 Geographical coordinates
  • lat/lon for point
  • [min_lat, min_lon, max_lat, max_lon] for bounding box
  • or free text.
Recommended, in accordance with DataCite: Use WGS 84 (World Geodetic System) coordinates. Use only decimal numbers for coordinates. Longitudes are -180 to 180(0 is Greenwich, negative numbers are west, positive numbers are east), Latitudes are -90 to 90 (0 is the equator; negative numbers are south, positive numbers north).
Temporal Coverage (O) Period of time the research data itself is related to. Could be a date format or plain text. 0-1 YYY,YYYY-MM-DD, YYYY-MM-DDThh:mm:ssTZD or any other format or level of granularity described in W3CDTF24. Use RKMS-ISO860125 standard for depicting date ranges.Example: 2004-03-02/2005-06-02.Years before 0000 must be prefixed with a - sign, e.g. -0054 to indicate 55 BC. You can also use plain text, e.g. Viking Age.

Concordance with Other Standards

As mentioned before, the EUDAT-B2FIND schema is compatible with other widely used standards. In the following table the compatibility with the metadata schemas of DataCite, OpenAIRE, DublinCore and DDI-3 is shown.

DataCite 4.3 B2FIND OpenAIRE DublinCore DDI 2.x Comments and Issues
1. Identifier Identifier [DOI or PID or Source (URL)] 1. Identifier Identifier <IDNo>2.1.1.5 or
<holdings location=”” callno=”” URI=””>2.1.8
While for DataCite a DOI is mandatory as identifier, B2FIND requires "only" at least an URL linked to the underlying data resource.
2.1 creatorName Creator 2.1 creatorName Creator <AuthEnty<2.1.2.1
3. Title Title 3. Title Title <titl> 2.1.1.1
4. Publisher Publisher 4. Publisher Publisher <producer> 2.1.3.1
5. PublicationYear PublicationYear PublicationYear Date <distDate>1.4.4.5
6. Subject Keywords and/or Discipline 6. Subject Subject <keyword>2.2.1.1 or
<topcClas>2.2.1.2
7.1 contributorName Contributor 7. Contributor Contributor <othId>2.1.2.2
8. Date PublicationYear or TemporalCoverage 8. Date Date <prodDate>2.1.3.3 The DataCite definition here is a bit vague (*Different dates relevant to the work*). B2FIND has the element *PubicationYear*, i.e. the year the dataset is published or when its embargo period ends. Another temporal element of B2Find would be *TemporalCoverage*, i.e. the interval of time that the underlying data of the resource covers, with a useful 'Filter by time' search option associated on the B2FIND GUI.
9. Language Language 9. Language Language N/A
10. ResourceType ResourceType 10. ResourceType Type <dataKind>2.2.3.8
11. AlternateIdentifier N/A 11. AlternateIdentifier N/A N/A
12. RelatedIdentifier RelatedIdentifier 12. RelatedIdentifier Relation or Source <othrStdyMat>2.5 or
<sources> 2.3.1.8
13. Size Size 13. Size N/A <collSize>2.4.1.4
14. Format Format 14. Format Format <fileType>3.1.5
15. Version Version 15. Version N/A <version>1.1.6.1
16. Rights Rights 16. Rights Rights <copyright>2.1.3.2
17. Description Description 17. Description Description <abstract>2.2.2
18. GeoLocation SpatialCoverage 18. GeoLocation Coverage <geogCover>2.2.3.4 In B2FIND *SpatialCoverage*, i.e. the geospatial coverage, is associated with a 'Filter by location' map search interface.
19. FundingReference FundingReference 7. Contributor, 7.1 contributorType="Funder" N/A <fundAg>1.4.3.6