Mapping onto EUDAT-B2FIND Metadata Schema
The offered metadata must be mapped to the B2FIND schema in a meaningful way. And this is currently happening through a joint action, i.e. by iterative discussions between the data provider and the B2FIND team.
Specification of Community Metadata
Homogenisation and Semantic Mapping
EUDAT-B2FIND metadata schema
Concordance with other Standards
The central facet Discipline
The implementation of the mapping, as described in the following subsection, is based on a detailed specification and documentation of the community specific metadata. For this a spreadsheet must be filled out. The excel template can be requested via the support form or sending us an email or download the version in the google drive at Community-B2FIND_template.xlsx
. This template or form is divided in several tabs or sub parts :
- General Information : Data providers should provide here information about the contact persons and the community.
- Metadata Specification : More detailed information about teh specific metadata formats, schemas and structure used.
- Harvesting : Specify here the 'harvesting endpoints' (e.g. OAI-URL's), the protocols and API's used and the sub sets, if available.
- Mapping : This table specifies the mapping of the community properties to the B2FIND schema andcoverage information. This is iteratively discussed and developed with the data provider during the uptake process.
- Select entries from the XML records, based on XPATH rules that depend on community specific metadata formats (see providing metadata)
- Parse through the selected values and assign them to the in the XPATH rules specified keys, i.e. fields of the B2FIND schema.
- Store the resulting key-value pairs in JSON dictionaries.
- Check and validate these JSON records before the upload to the B2FIND repository
The B2FIND Metadata Schema 0.1 is the first published version and was released on August 30, 2016. The associated XSD file is available and downloadable as XSD file from b2find_schema_0.1.xsd .
Currently the schema comprises 17 fields or facets as listed in the following table with their semantic definitions, allowed values and references to the associated properties in teh DataCite Metadata Schema 4.0 fields .
|Element name||Semantic Definition||Allowed values, constraints and CV's||DataCite reference||Obligation||Occurence||Comments and Issues|
|Title||A name or a title by which a resource is known||Free text||3. Title||Mandatory||1||Coding must be UTF-8 (unicode)|
|Description||An additional information describing the content of the resource. Could be an abstract, a summary or a Table of Content.||Free text||17.Description||Recommended||0-1||Coding should be UTF-8 (unicode)|
|Tags||A subject, keyword, classification code, or key phrase describing the content.||List of strings, filter out 'non nouns' by using 'stop words'||6.Subject||Optional||1||Try to use keyword thesauri from communities|
|Source||An identifier (URL) that uniquely identifies a resource.||Should be resolvable URL||1.Identifier||1 Identifer is mandatory||0-1|
|PID||A persistent identifier (implemented as a handle in a Handleserver) that uniquely identifies a resource.||Must be resolvable URL and registered at a handle server||1.Identifier||1 Identifer is mandatory||0-1|
|DOI||A persistent, citable identifier (registered at DataCite) that uniquely identifies a resource.||Must be resolvable URL, registered at DataCite as DOI||1.Identifier 1.1. identiferType = DOI||1 Identifer is mandatory||0-1|
|Checksum||Checksum of the underlying data resource||MD5 checksum||N/A||Optional||0-1|
|Rights||Any rights information for this resource.||Free text||16. Rights||Optional||0-1|
|Discipline||The scientific disciplines linked with the resource.||Controlled vocabulary, see b2find_disciplines.json||N/A [ sometimes information in 6. Subject ]||Optional||0-n|
|Creator||The main researchers involved in producing the data, or the authors of the publication, in priority order.||List of names||2. Creator||Optional||0-1|
|Publisher||The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role.||List of names||4. Publisher||Optional||0-1|
|PublicationYear||The year when the data was or will be made publicly available.||5. PublicationYear||Optional||0-1|
|Language||Allowed values are taken from ISO 639‐1 language codes.||9. Language||Optional||0-1||Examples: English, German, French|
|Temporal Coverage||8. Date||Optional||0-1|
|ResourceType||A description of the resource||10. ResourceType||Optional||0-1|
|Contact||[ may be 7. Contributor]||Optional||0-1|
|MetaDataAccess||Link to the original harvested metadata record (GetRecord request)||Optional||0-1|
As said before the EUDAT-B2FIND schema is compatible with other widely used standards, which are based on the DataCite schema. In the following table the compability with the core schema of EUDAT-B2SHARE and the open access initiative OpenAIRE is shown by referring to the DataCite schema.
|DCite #||DataCite 4.0||B2FIND||B2SHARE||OpenAIRE||Comments and Issues|
|1||Identifier (+ 1.1. identifierType=[DOI])||Source | DOI | PID||N/A (self referenced)||Identifier (+ 1.1. identifierType=[DOI , ...])||2||Creator||Creator||Creator||Creator|
|6||Subject||Tags and Discipline||Keywords and Discipline|
|7||Contributor||[ --> Contact]|
|8||Date||[ --> Temporal Coverage]|
|15||Version||N/A [ --> checksum]|
For the central facet Discipline B2FIND has defined a closed vocabulary with three levels of sub disciplines: