Indian art music tonic datasets

Dataset

DOI

These datasets comprise audio excerpts and manually done annotations of the tonic pitch of the lead artist for each audio excerpt. Each excerpt is accompanied by its associated editorial metadata. These datasets can be used to develop and evaluate computational approaches for automatic tonic identification in Indian art music. These datasets have been used in several articles mentioned below. A majority of these datasets come from the CompMusic corpora of Indian art music, for which each recording is associated with a MBID. With the MBID other information can be obtained using the Dunya API. We here provide an overview of the tonic identification datasets.

Datasets ------- The statistics about the datasets for tonic identification is listed in the table below. These six datasets are used in Gulati, S., Bellur, A., Salamon, J., Ranjani, H. G., Ishwar, V., Murthy, H. A., & Serra, X. (2014). Automatic Tonic Identification in Indian Art Music: Approaches and Evaluation. Journal of New Music Research, 43(01), 55–73 for a comparative evaluation. To the best of our knowledge these are the largest datasets available for tonic identification for Indian art music. These datases vary in terms of the audio quality, recording period (decade), the number of recordings for Carnatic, Hindustani, male and female singers and instrumental and vocal excerpts. For a detailed information about these datasets we refer to Chapter 3 of this thesis (http://hdl.handle.net/10803/398984). The audio files corresponding to these datsets are made available on request for only research purposes. To obtain the files fill the FORM (https://goo.gl/forms/kWzpCsZW8DM7noW63).

CompMusic Tonic Identification Datasets --- Datasets: CM1, CM2, CM3 Features: pitch + multipitch histogram + pitch histograms

IITM Tonic Identification Datasets Datasets: IITM1, IITM2 Features: pitch + multipitch histogram + pitch histograms

IISc Tonic identification Dataset Dataset: IISc Features: pitch + multipitch histogram + pitch histograms

Annotation Format ---The tonic annotations are availabe both in tsv and json format. TSV: relative path to audio, tonic (Hz), Carnatic or Hindustani, artist_name, gender of the singer, vocal or instrumental JSON: name of the lead artist if available, 'filepath': relative path to the audio file, gender of the lead singer if available, 'mbid': musicbrainz id when available, 'tonic': tonic in Hz, 'tradition': Hindustani or Carnatic, 'type': vocal or instrumental where keys of the main dictionary are the filepaths to the audio files (feature path is exactly the same with a different extension of the file name).

This dataset comprises 597 commercially available audio music recordings of Indian art music (Hindustani and Carnatic music), each manually annotated with the tonic of the lead artist. This dataset is used as the test corpus for the development of tonic identification approaches.

Identifier
DOI	https://doi.org/10.34810/data458
Metadata Access	https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/data458

Provenance
Creator	CompMusic
Publisher	CORA.Repositori de Dades de Recerca
Publication Year	2023
Funding Reference	European Comission EC/FP7/267583
Rights	Custom Dataset Terms; info:eu-repo/semantics/openAccess; https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data458
OpenAccess	true

Representation
Resource Type	Other; Dataset
Format	text/html; application/zip; text/plain
Size	160760; 464518; 225660; 66451368; 446832158; 761299033; 57556; 49688; 251374; 1459
Version	1.0
Discipline	Fine Arts, Music, Theatre and Media Studies; Humanities; Music