Mapping onto EUDAT-B2FIND Metadata Schema
The provided metadata must be mapped to the B2FIND schema in a meaningful way. Currently this is done in close cooperation between the data provider and the B2FIND team. By iteratively discussing the process a suitable solution is reached in each case.
- Specification of Community Metadata
- Homogenisation and Semantic Mapping
- EUDAT-B2FIND Metadata Schema
- Concordance with other Standards
The implementation of the mapping, as described in the following subsection, is based on a detailed specification and documentation of the community-specific metadata. We have designed a template for gathering the required data, see B2FIND community_template. This file will document the communication process and decisions regarding the ingestion of the provider's metadata into B2FIND.
This template is divided into several parts:
- General Information: In this tab, data providers should provide information about the contact persons and the community.
- Metadata Specification: Please give us more detailed information about the specific metadata formats, schemas and structure used.
- Harvesting: Here the harvesting endpoints (e.g. OAI-URLs) should be provided, as well as the protocols and APIs used, and the subsets, if available.
- Mapping: In this table, the mapping of the community properties to the B2FIND schema and coverage information should be laid out. This is iteratively discussed and developed with the data provider during the initial intake process.
To transform and reformat the harvested raw metadata records to datasets which can be uploaded to the B2FIND catalogue and indexed and displayed in the B2FIND portal, the following processing steps must be carried out:
- Select entries from the XML records that depend on community-specific metadata formats (see Providing Metadata).
- Parse through the selected values and assign them to the keys specified in the elements of the B2FIND schema.
- Store the resulting key-value pairs in JSON dictionaries.
- Check and validate these JSON records before uploading to B2FIND.
The B2FIND Metadata Schema 2.0 is the current version and was released on November 11, 2020. The associated XSD file is available as XSD file at b2find_schema_2.0.xsd.
Currently the schema consists of 26 elements. These are listed in the following table with their description, occurrences and allowed values. The level of obligation is indicated with each element as follows:
- Mandatory (M): properties must be provided.
- Mandatory if applicable: (M/A): if your metadata contains this value, you must provide it.
- Recommended (R): properties are optional, but strongly recommended for interoperability and higher quality of the metadata.
- Optional (O): properties are optional and provide richer description.
|Metadata Type||B2FIND Name||Description||Occurrence||Allowed values||Comments and Issues|
|General Information||Community (M)||The scientific community, research infrastructure, project or data provider from which B2FIND harvests the metadata.||1||Textual|
|Title (M)||A name or a title by which a resource is known||1-n||Textual|
|Description (R)||All additional information that does not fit in any of the other categories. May be used for technical information. Could be an abstract, a summary or a table of content.It is good practice to supply a description.||0-1||Textual|
|Keywords (R)||Subject, keyword, classification code, or key phrase describing the resource.||0-n||List of strings||Try to use keyword thesauri from community-specific vocabularies.|
|Identifier||DOI (M/A)||A persistent citable identifier that uniquely identifies a resource.||0-1||Must be resolvable URI, registered at DataCite as DOI.||At least one resource identifier is mandatory.|
|PID (M/A)||A persistent identifier that uniquely identifies a resource.||0-1||Must be resolvable URI, registered at a handle server.|
|Source (M/A)||An identifier that uniquely identifies a resource. It may link to the data itself or a landing page that curates the data.||0-1||Should be resolvable URI.|
|RelatedIdentifier (O)||Identifiers of related resources.||0-n||Should be resolvable URI.|
|MetadataAccess (R)||Link to the originally harvested metadata record.||0-1||Should be resolvable URI.||Automatically generated by B2FIND script (GetRecord request for OAI-PMH).|
|Provenance||Creator (R)||The main researchers involved working on the data, or the authors of the publication in priority order. May be a corporate/institutional or personal name.||0-n||The personal name format should be: family, given. Non-roman names may be transliterated according to the ALA-LC schemes.||Examples: Smith, John; Miller, Elizabeth.|
|Publisher (M)||The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role.||1-n||Examples: World Data Center for Climate (WDCC); GeoForschungsZentrum Potsdam (GFZ); Geological Institute, University of Tokyo, GitHub|
|Contributor (O)||The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource.||0-n||List of names|
|Instrument (O)||The technical instrument(s) used to generate, observe or measure the data.||0-n||Could be instrument ID (or name) and hosting facility name.|
|PublicationYear (M)||Year when the data is made publicly available. If an embargo period has been in effect, use the date when the embargo period ends.||1||UTC Year format (YYYY)|
|FundingReference (O)||Information about financial support (funding) for the resource.||0-n||Could be funder name or grant number.|
|Rights (R)||Any rights information for this resource.||0-n||Textual|
|OpenAccess (M/A)||Information on whether the resource is openly accessible or not.||1||Boolean||Automatically generated by B2FIND script based on the information given in "Rights" element. Default value is "True" unless stated otherwise.|
|Contact (O)||A reference to contact information for this resource.||0-n||List of Names|
|Representation||Language (R)||Language(s) of the resource.||0-n||Allowed values are ISO 639-1 or ISO 639-3 language codes or text.||Examples: en; eng; English|
|ResourceType (R)||The type(s) of the resource.||0-n||Free text.||Examples: Dataset; Image; Audiovisual|
|Format (R)||Technical format of the resource.||0-n||Textual.||Use file extension or MIME type where possible, e.g. PDF, XML, MPG or application/pdf, text/xml, video/mpeg.|
|Size (O)||Size information about the resource.||0-n||Free text.||Examples: 15 pages; 6 MB; 45 minutes.|
|Version (O)||Version information about the resource.||0-n||Suggested practice: track major_version.minor_version.||Example: v1.02|
|Discipline (M)||The research discipline(s) the resource can be categorized in.||1-n||Controlled vocabulary, see b2find_disciplines.json.||If not applicable, add community specific discipline term.|
|Spatial Coverage (O)||The spatial coverage the research data is related to. Content of this category is displayed in plain text. If a longitude/latitude information is given it will be displayed on the map.||0-1||Geographical coordinates
||Recommended, in accordance with DataCite: Use WGS 84 (World Geodetic System) coordinates. Use only decimal numbers for coordinates. Longitudes are -180 to 180(0 is Greenwich, negative numbers are west, positive numbers are east), Latitudes are -90 to 90 (0 is the equator; negative numbers are south, positive numbers north).|
|Temporal Coverage (O)||Period of time the research data itself is related to. Could be a date format or plain text.||0-1||YYY,YYYY-MM-DD, YYYY-MM-DDThh:mm:ssTZD or any other format or level of granularity described in W3CDTF24.||Use RKMS-ISO860125 standard for depicting date ranges.Example: 2004-03-02/2005-06-02.Years before 0000 must be prefixed with a - sign, e.g. -0054 to indicate 55 BC. You can also use plain text, e.g. Viking Age.|
As mentioned before, the EUDAT-B2FIND schema is compatible with other widely used standards. In the following table the compatibility with the metadata schemas of DataCite, OpenAIRE, DublinCore and DDI-3 is shown.
|DataCite 4.3||B2FIND||OpenAIRE||DublinCore||DDI 2.x||Comments and Issues|
|1. Identifier||Identifier [DOI or PID or Source (URL)]||1. Identifier||Identifier||<IDNo>220.127.116.11 or
<holdings location=”” callno=”” URI=””>2.1.8
|While for DataCite a DOI is mandatory as identifier, B2FIND requires "only" at least an URL linked to the underlying data resource.||2.1 creatorName||Creator||2.1 creatorName||Creator||<AuthEnty<18.104.22.168|
|3. Title||Title||3. Title||Title||<titl> 22.214.171.124|
|4. Publisher||Publisher||4. Publisher||Publisher||<producer> 126.96.36.199|
|6. Subject||Keywords and/or Discipline||6. Subject||Subject||<keyword>188.8.131.52 or
|7.1 contributorName||Contributor||7. Contributor||Contributor||<othId>184.108.40.206|
|8. Date||PublicationYear or TemporalCoverage||8. Date||Date||<prodDate>220.127.116.11||The DataCite definition here is a bit vague (*Different dates relevant to the work*). B2FIND has the element *PubicationYear*, i.e. the year the dataset is published or when its embargo period ends. Another temporal element of B2Find would be *TemporalCoverage*, i.e. the interval of time that the underlying data of the resource covers, with a useful 'Filter by time' search option associated on the B2FIND GUI.|
|9. Language||Language||9. Language||Language||N/A|
|10. ResourceType||ResourceType||10. ResourceType||Type||<dataKind>18.104.22.168|
|11. AlternateIdentifier||N/A||11. AlternateIdentifier||N/A||N/A|
|12. RelatedIdentifier||RelatedIdentifier||12. RelatedIdentifier||Relation or Source||<othrStdyMat>2.5 or
|13. Size||Size||13. Size||N/A||<collSize>22.214.171.124|
|14. Format||Format||14. Format||Format||<fileType>3.1.5|
|15. Version||Version||15. Version||N/A||<version>126.96.36.199|
|16. Rights||Rights||16. Rights||Rights||<copyright>188.8.131.52|
|17. Description||Description||17. Description||Description||<abstract>2.2.2|
|18. GeoLocation||SpatialCoverage||18. GeoLocation||Coverage||<geogCover>184.108.40.206||In B2FIND *SpatialCoverage*, i.e. the geospatial coverage, is associated with a 'Filter by location' map search interface.|
|19. FundingReference||FundingReference||7. Contributor, 7.1 contributorType="Funder"||N/A||<fundAg>220.127.116.11|