Indian art music tonic datasets

DOI

These datasets comprise audio excerpts and manually done annotations of the tonic pitch of the lead artist for each audio excerpt. Each excerpt is accompanied by its associated editorial metadata. These datasets can be used to develop and evaluate computational approaches for automatic tonic identification in Indian art music. These datasets have been used in several articles mentioned below. A majority of these datasets come from the CompMusic corpora of Indian art music, for which each recording is associated with a MBID. With the MBID other information can be obtained using the Dunya API. We here provide an overview of the tonic identification datasets.

Datasets ------- The statistics about the datasets for tonic identification is listed in the table below. These six datasets are used in Gulati, S., Bellur, A., Salamon, J., Ranjani, H. G., Ishwar, V., Murthy, H. A., & Serra, X. (2014). Automatic Tonic Identification in Indian Art Music: Approaches and Evaluation. Journal of New Music Research, 43(01), 55–73 for a comparative evaluation. To the best of our knowledge these are the largest datasets available for tonic identification for Indian art music. These datases vary in terms of the audio quality, recording period (decade), the number of recordings for Carnatic, Hindustani, male and female singers and instrumental and vocal excerpts. For a detailed information about these datasets we refer to Chapter 3 of this thesis (http://hdl.handle.net/10803/398984). The audio files corresponding to these datsets are made available on request for only research purposes. To obtain the files fill the FORM (https://goo.gl/forms/kWzpCsZW8DM7noW63).

CompMusic Tonic Identification Datasets --- Datasets: CM1, CM2, CM3 Features: pitch + multipitch histogram + pitch histograms

IITM Tonic Identification Datasets Datasets: IITM1, IITM2 Features: pitch + multipitch histogram + pitch histograms

IISc Tonic identification Dataset Dataset: IISc Features: pitch + multipitch histogram + pitch histograms

Annotation Format ---The tonic annotations are availabe both in tsv and json format. TSV: relative path to audio, tonic (Hz), Carnatic or Hindustani, artist_name, gender of the singer, vocal or instrumental JSON: name of the lead artist if available, 'filepath': relative path to the audio file, gender of the lead singer if available, 'mbid': musicbrainz id when available, 'tonic': tonic in Hz, 'tradition': Hindustani or Carnatic, 'type': vocal or instrumental where keys of the main dictionary are the filepaths to the audio files (feature path is exactly the same with a different extension of the file name).

This dataset comprises 597 commercially available audio music recordings of Indian art music (Hindustani and Carnatic music), each manually annotated with the tonic of the lead artist. This dataset is used as the test corpus for the development of tonic identification approaches.

Identifier
DOI https://doi.org/10.34810/data458
Metadata Access https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/data458
Provenance
Creator CompMusic
Publisher CORA.Repositori de Dades de Recerca
Publication Year 2023
Funding Reference European Comission EC/FP7/267583
Rights Custom Dataset Terms; info:eu-repo/semantics/openAccess; https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data458
OpenAccess true
Representation
Resource Type Other; Dataset
Format text/html; application/zip; text/plain
Size 160760; 464518; 225660; 66451368; 446832158; 761299033; 57556; 49688; 251374; 1459
Version 1.0
Discipline Fine Arts, Music, Theatre and Media Studies; Humanities; Music