These datasets comprise audio excerpts and manually done annotations of the tonic pitch of the lead artist for each audio excerpt. Each excerpt is accompanied by its associated editorial metadata. These datasets can be used to develop and evaluate computational approaches for automatic tonic identification in Indian art music. These datasets have been used in several articles mentioned below. A majority of these datasets come from the CompMusic corpora of Indian art music, for which each recording is associated with a MBID. With the MBID other information can be obtained using the Dunya API. We here provide an overview of the tonic identification datasets.
Datasets -------
The statistics about the datasets for tonic identification is listed in the table below. These six datasets are used in Gulati, S., Bellur, A., Salamon, J., Ranjani, H. G., Ishwar, V., Murthy, H. A., & Serra, X. (2014). Automatic Tonic Identification in Indian Art Music: Approaches and Evaluation. Journal of New Music Research, 43(01), 55–73 for a comparative evaluation. To the best of our knowledge these are the largest datasets available for tonic identification for Indian art music. These datases vary in terms of the audio quality, recording period (decade), the number of recordings for Carnatic, Hindustani, male and female singers and instrumental and vocal excerpts. For a detailed information about these datasets we refer to Chapter 3 of this thesis (http://hdl.handle.net/10803/398984).
The audio files corresponding to these datsets are made available on request for only research purposes. To obtain the files fill the FORM (https://goo.gl/forms/kWzpCsZW8DM7noW63).
CompMusic Tonic Identification Datasets ---
Datasets: CM1, CM2, CM3
Features: pitch + multipitch histogram + pitch histograms
IITM Tonic Identification Datasets
Datasets: IITM1, IITM2
Features: pitch + multipitch histogram + pitch histograms
IISc Tonic identification Dataset
Dataset: IISc
Features: pitch + multipitch histogram + pitch histograms
Annotation Format ---The tonic annotations are availabe both in tsv and json format.
TSV: relative path to audio, tonic (Hz), Carnatic or Hindustani, artist_name, gender of the singer, vocal or instrumental
JSON: name of the lead artist if available, 'filepath': relative path to the audio file, gender of the lead singer if available, 'mbid': musicbrainz id when available, 'tonic': tonic in Hz, 'tradition': Hindustani or Carnatic, 'type': vocal or instrumental where keys of the main dictionary are the filepaths to the audio files (feature path is exactly the same with a different extension of the file name).
This dataset comprises 597 commercially available audio music recordings of Indian art music (Hindustani and Carnatic music), each manually annotated with the tonic of the lead artist. This dataset is used as the test corpus for the development of tonic identification approaches.