This deposit contains transcriptions of oral interviews and conversations in various dialects of Udmurt (Permic < Uralic; ISO 639-2 code udm). It contains 25 recordings with transcripts with a total of 93.6 thousand words.
Description of the contents
The contents are as follows:
eaf (directory as ZIP archive): sound files and their transcripts in ELAN
metadata_texts.csv: tab-delimited metadata for the transcriptions
metadata_speakers.csv: tab-delimited metadata for speakers
readme.txt: documentation
Transcriptions
All sound recordings are in WAV format, although some of them were originally recorded in a format with compression (see metadata). Transcriptions are stored in ELAN files. Each ELAN file is linked to one recording. The transcriptions were not thoroughly proofread and may contain mistakes. Please listen to the relevant segments to make sure their transcription is accurate. See readme.txt for further details.
Metadata
The transcript-level metadata are:
filename (without the extension);
code of the collector (TA: Timofey Arkhangelskiy; NA: Nikolai Anisimov; YZ: Iuliia Zubova);
name of the place where recording was made (in Russian);
original format of the recording (wav/wma/mp3);
genre;
date of the recording.
The speaker-level metadata are:
code of the speaker;
speaker type: native vs. (non-native) linguist;
sex (F/M);
year of birth (when known);
variety of Udmurt they represent; usually this is the settlement where the speaker was born or spent their formative years.
The recordings were transcribed by Tatiana Anisimova and Nikolai Anisimov. Sound-alignment was performed by Timofey Arkhangelskiy and Marina Pankova.
References
ELAN (Version 6.9) [Computer software]. (2024). Nijmegen: Max Planck Institute for Psycholinguistics, The Language Archive. Retrieved from https://archive.mpi.nl/tla/elan
Contact
If you have any questions or would like to propose a collaboration, please email Timofey Arkhangelskiy at timarkh@gmail.com.
The preparation of the corpus, as well as collection of some of the data, was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) grant — project no. 428175960.