B2FIND Search Guide


   Introduction
   The B2FIND Portal
     Free Text Search
       Full Text Search
       Key-Value Search
       Combined Search
     Faceted Search Interface
       Filter by Location
       Filter by Time
       Filter by Publication Year
       Filter and Sort Textual Facets
     Search Results
       Metadata Display
       Data Access
         Resource Data
         Metadata as Harvested
   Command Line Interface
   Use Case Scenario

Introduction

The EUDAT metadata service B2FIND can be utilized in two ways:

  • The discovery web portal supports user-friendly navigation and filtering features. Powerful search functionalities are provided, that include:
    • Free text search over the full text bodies of all datasets indexed in the B2FIND catalog
    • Geospatial and temporal search for all datasets, that cover a chosen region or, respectively, a chosen time period.
    • Other 'faceted' search, i.e. selecting values from certain metadata fields
  • The script searchB2FIND.py enables submitting search requests from the command line using the CKAN API functionality

The according search requests can be combined and executed in one go. A successful search results in the list of all datasets of the B2FIND catalogue, which fulfill the search criteria. The metadata fields of each found dataset can be displayed and comprises as well links to access the underlying data objects. In the following we describe the usage of B2FIND step by step.


The B2FIND Portal

You can access the B2FIND web portal at b2find.eudat.eu. The only required prerequisite - beside access to the internet - to use the web interface is, that JavaScript is enabled in the used web browser.

B2FIND portal entry page
Fig. 1 - Entry page of the B2FIND portal

By clicking 'Communities' you get an overview about all communities that provide metadata to B2FIND.
There are two ways to start the search process and to get taken to the search result page (fig. 2), with all available datasets listed on the right side and the interface to several search and filter functionalities in the navigation bar on the left side:

  1. By clicking 'Faceted Search'
  2. By pressing the magnifying glass in the free text field 'Search your data'. (At this stage you can already enter a string to be searched for or choose one of the shown 'Popular Tags').

Free Text Search

As in the entry page, shown in fig. 1, as well the main search page provides a "Google-like" free text search that basically works with an input box where you can type your query.


Full Text Search

You would usually simply type the keywords you are interested in and hit return. For example, if you are interested in documents or datasets on Climate and you know that somebody with the name Scott did something with it you would type

and get this result:

B2FIND Free Text Search Result Page
Fig. 2 - Search Result Page for Free text Search

Key-Value Search

You may use the free text search field as well to search for certain values of 'keys' or 'facets' by using a colon. Facets are searchable B2FIND categories. Please have a look at the next section for a detailed explanation of the interfaces to smart faceted search functionalities in the navigation bar. Here we only address the possibility to search for facets via the 'Free Text Search Field'.

For example you may find all resources that are originated from the discipline Biology by typing:

B2FIND Free text Search Result Page
Fig. 3 - Search Result Page for Facet Search

A full list of facets, which can be searched for in the B2FIND catalog, and their description is found in the user documentation in the section Metadata field definitions. Please note that this search method is case intensive and requires accurate spelling. E.g. typing

will lead to no result, because there exists no facet discipline with small letter d.


Combined Search

In order to specify your search you may combine several different search methods. With the Boolean Operators AND and OR you can add or replace and exclude keywords or facets. Regarding our example you may search for all resources within the discipline Biology that include the word foraminifera or have something to do with the name Schiebel by typing a query like this:

B2FIND Free text Search Result Page
Fig. 4 - Search Result Page for Combined Search

Faceted Search Interface

The faceted search interface provides you with options to filter your search by choosing 'facets'. This tool may help you to narrow down the search results for your specific search demands.

B2FIND provides the opportunity to filter out datasets that have a given extent in space or in time. This is implemented by the following three graphical interfaces :

  • 'Filter by location' searchs for all datasets which have an intersection with a region chosen from the world map.
  • 'Filter by time' searchs for all datasets which cover a chosen time period.
  • 'Publication Year' searchs for all datasets, which are 'published' within a given period of years.

Furthermore filtering and sorting for facets like 'Creator', 'Discipline', 'Language' or 'Publisher' are provided. In the following sub sections the usage of these different kinds of faceted search is described in datail.

Filter by location

In the world map widget in the left upper corner you can select a region by drag and drop. This triggers a search for all datasets their spatial extension has an intersection with the selected region.

1. Clear possibly previous search request by clicking the button 'Clear' in the 'Filter by location' interface in the navigation bar. B2FIND Free text Search Result Page

2. Selection of a region from the world map

2.a Click on the 'Draw a rectangle' button in the right upper corner of the world map widget to start the spatial selection.

B2FIND Free text Search Result Page
2.b Drag with the mouse over the wished spatial selection, the rectangle's borders will be marked red. Finally press the 'Apply' button to execute the search request. B2FIND Free text Search Result Page
3. Search result: If execution of the query is finished, all datasets whose spatial extension has an overlap with the selected region are listed in the right panel. B2FIND Free text Search Result Page

Filter by time

With the search widget 'Filter by time' you can select a time period the research data are related to by zooming in date and time histograms as described in the following.

1. Clear possibly previous search request by clicking the 'Clear' button of the 'Filter by time' interface in the navigation bar. B2FIND Free text Search Result Page
2. Clicking the button 'Filter by time' in the navigation panel opens the time line chart. B2FIND Free text Search Result Page
2.a Select a base period by dragging the mouse with pressed left in the 'histogram' at the bottom. This causes the opening of the datasets/time diagram over the chosen period on the upper part of the chart. B2FIND Free text Search Result Page

2.b Zooming in time: 'Drag and drop' with the left mouse button hold over a time interval in the upper time graph (zoomed part is shown).

2.c Repeat zooming until the desired section is shown in the chart.

2.d Optionally you can reset the last zoom by clicking the button 'Reset zoom'.

B2FIND Free text Search Result Page

3. Select a time interval by holding "Ctrl" (Win/Linux) or "Cmd" (Mac) down and clicking on two points (start and end time) in the chart line.

Close the timeline chart by clicking on 'Apply'.

B2FIND Free text Search Result Page

4. Up to now the search request is applied but not executed. To perform the search you must click on the magnifying glass within the free text search field.

The amount of datasets that are shown is reduced and adapted to the chosen time-period, this period is displayed on the left side time-boxes as well.

B2FIND Free text Search Result Page
5. Please note that the time-period is displayed in 'Seconds since/before Christ'. For a better understanding the real date is displayed if you mouse-over the digits. The start and end time can as well directly changed by editing the integers in the fields or by using the arrow buttons to in- or decrease the time span. B2FIND Free text Search Result Page

Filter by Publication Year

With the search widget 'Publication Year' you can search for datasets that are published within a certain period of time.

1. Clear possibly previous search request by clicking the 'Clear' button. B2FIND Free text Search Result Page

2. Select a period of publishing years

2.a Select a start year: Click on the left text field in the interface 'Publication Year'. This results in opening a select panel showing the years of the current decade. By clicking on the '<<' or the '>>' button you can switch to the previous or following decade. Finally select the wished start year by clicking on it in the decade panel. This will apply the chosen year for the search, and all datasets that are published in this year or later are listed.

B2FIND Free text Search Result Page
2.b Select an end year: Click on the right text field in the interface 'Publication Year'. The selection of the end year works then in teh same manner as for the start year and excecutes the search for all datasets published between start and end year. B2FIND Free text Search Result Page
3. Search result: If execution of the query is finished, all datasets that are published within the chosen years are listed in the right panel. B2FIND Free text Search Result Page

Filter and Sort Textual Facets

For the crucial textual facets 'Communities', 'Tags', 'Creator', 'Disciplines', 'Languages' and 'Publisher' filtering and sorting is provided with auto-complete functionality. This is illustrated by the following figures describing the search for a specific 'Creator'.

Filter out by autocomplete functionality: By typing in Morr in the filter field of the menue 'Creator' the full value list is restricted to all names cntaining the string 'Morr'.

B2FIND Filter Creator
Select a name from creator list: Click on Morris Riedel (10) and the ten matching datasets will be listed. Furthermore you can see that Morris Riedel has two datasets created, where as well Gabrielle Cavallero is a 'Creator'. If you want to narrow down the results to these two records, just click on this name. B2FIND Select Creator

Search Results

The datasets found to fulfill the search criteria are shown on the result page on the right side of the portal. Consider e.g. the following examplary use case. An user wants to search within the community B2SHARE for all data created by Morris Riedel, belonging to discipline Remote sensing and tagged with the keyword cross validation.

B2FIND textual facet result
Fig. 5 - B2FIND result of combined textual facet search

Metadata Display

By clicking the dataset title the dataset view opens. The textual metadata are shown on the right side; on top the title, the description and the tags as clickable buttons and underneath the other B2FIND fields with their values in tabulated form. On the left side bar the spatial extent is shown as a red bounding box.

B2FIND dataset view
Fig. 6 - B2FIND dataset view

Data Access

Among these fields are links provided that enable access on related data.

  • The data resources the metadata are related to.
  • The original metadata - as harvested from the provider by OAI-PMH and formatted in XML.

Resource Data

If you are interested to download the data resource lying behind the found metadata, use the link provided in the field 'Source'. As far as available additionally the associated 'PID' and/or 'DOI' is provided.

B2FIND dataset view
Fig. 7 - 'Landing page' of the link in the B2FIND field 'Source' (excerpt)

In our example the link leads you to the landing page of the related data of the WDCC. On this page are further metadata provided and a further link under 'Data access' is provided, to allow you the download of the data, if you have the needed authorisation information.

Metadata as Harvested

If you want to examine the 'raw' metadata as originally harvested by B2FIND from the data provider you can click on the field 'MetadataAccess' and the associated XML record is listed in the browser.

B2FIND OAI metadata view
Fig. 8 - Metadata record as originally harvested in XML format (excerpt)

In this case the metadata is provided in community specific format (ISO19139) and namespaces are used to describe the geo referenced fields.


The Command Line Search

Beside the possibility to search over the web interface you can as well use the CKAN API suite to submit search requests directly from the command line. Adapted for the needs and features of B2FIND the Python script searchB2FIND.py is provides a powerful and userfriendly tool to submit complex search demands. The script resides in the git repository https://github.com/EUDAT-B2FIND/md-ingestion and can be downloaded from there. If you want for instance a list of all records which belongs to the discipline 'Earth Sciences', enter:

>./searchB2FIND.py Discipline:"Earth?Sciences"

----------------------------------------------------------------------------------------------------
Search in       b2find.eudat.eu
for pattern     Discipline:Earth?Sciences
.....
=> 3410 datasets found

The script writes the list of the IDs of the found records to the file results.txt.

Further options and arguments are shown by entering:

>./searchB2FIND.py -h
usage: searchB2FIND.py [-h] [--ckan IP/URL] [--output STRING]
                       [--community STRING] [--ids [IDS [IDS ...]]]
                       [PATTERN [PATTERN ...]]

Description: Lists identifers of datasets that fulfill the given search criteria

positional arguments:
  PATTERN               CKAN search pattern, i.e. by logical conjunctions
                        joined field:value terms.

optional arguments:
  -h, --help            show this help message and exit
  --ckan IP/URL         CKAN portal address, to which search requests are
                        submitted (default is b2find.eudat.eu)
  --output STRING, -o STRING
                        Output file name and format. Format is determined by
                        the extention, supported are 'txt' (plain ascii file)
                        or 'hd5' file. Default is the ascii file results.txt.
  --community STRING, -c STRING
                        Community where you want to search in
  --ids [IDS [IDS ...]], -i [IDS [IDS ...]]
                        Identifiers of found records outputed. Default is
                        'id'. Additionally 'Source','PID' and 'DOI' are
                        supported.

Examples:
           1. >./searchB2FIND.py -c aleph tags:LEP
             searchs for all datasets of community ALEPH with tag "LEP" in b2find.eudat.eu.
           2. >./searchB2FIND.py author:"Jones*" AND Discipline:"Crystal?Structure" --ckan eudat-b1.dkrz.de
             searchs in eudat-b1.dkrz.de for all datasets having an author starting with "Jones" and belongs to the discipline "Crystal Structure"
           3. >./searchB2FIND.py -c narcis DOI:'*' --ids DOI
             returns the list of id's and DOI's for all records in community "NARCIS" that have a DOI

Use Case Scenario

To demonstrate a possible combined search via the provided faceted search functionalities we take the example of th following use case scenario :

A biologist is looking for research data in her discipline concerning the temporal period from year 1897 to 2012 and the European region. She is interested only in datasets published between 2009 and 2016 and created by her colleague V. Neumann.

The results can be narrowed down step by step using the filter functionalities provided in the navigation bar as illustrated in the following rows. While on the left hand side the search request are described in words, in the other two columns the related select action and resulting display page are shown, respectively.

Description of the search request Selection action in the navigation bar Search result page
She starts the search by choosing the 'Filter of time' tool in order to get only search results taht are related with the period from 1897 to 2012 B2FIND_useCase_choose_Temporal B2FIND_useCase_result_Temporal
In order to filter out all datasets they are related to the European area she draws a bounding box surrounding the European continent. B2FIND_useCase_choose_Geospatial B2FIND_useCase_result_Geospatial
In the widget 'Publication year' the researcher selects as start year 2009 and as end year 2016. B2FIND_usecase_choose_PublYear B2FIND Facet Creator
Next our Scientist restricts her query to her research area, i.e. she chooses 'Biology' form the facet 'Discipline' and get 443 datasets left. B2FIND Facet Discipline B2FIND Facet Discipline
Finally she selects the 'Creator' 'Neumann, V.' and results in two datasets. Above the title of all chosen foki (values) of the textual facets - here for Discipline and Creator - are displayed. By clicking on the 'x' a new search query will start that exclude the closed facets. B2FIND_useCase_choose_Creator B2FIND_useCase_creator_result