This is a structured JSON dataset for training and evaluating AI models for automated subject indexing of research data and linking of research data and publications. It was created for the project DA-FDM in order to train a vector based model to give automated suggestions for the topic classification of research datasets in DaRUS. In the context of this project, the DFG-classification was integrated as a controlled vocabulary in DaRUS for the Topic Classification field. DFG classes were added manually to datasets from DaRUS that were uploaded prior to the integration.
This dataset includes classification tags (DFG, GND, Wikidata), publication links, respective open-access information, and, if the publication is open-access, the respective full texts for datasets from DaRUS as well as TUdatalib.
Example object for the dataset:
{
"name": "doi:10.18419/darus-1234",
"tags": [
{
"name": "dfg-fs$102-04",
"url": "https://w3id.org/dfgfo/2020/102-04"
},
{
"name": "ResearchDataSet"
}
],
"links": [
{
"name": "doi:10.12345/abc5678",
"type": "publication",
"is_open_access": true,
"open_access_url": "https://www.asdfg.com/10.12345/abc5678",
"text": "Extracted publication full text ..."
}
]
}