kidLUCID: London UCL Children's clear speech in interaction database

DOI

This collection contains the quantitative data resulting from the analysis of the kidLUCID audio corpus – a set of speech recordings collected for 96 children aged 9 to 14 years inclusive. Recordings were made while participants carried out a collaborative ‘spot the difference’ picture task (‘diapix’) in pairs in three different conditions: (1) in good listening conditions when both could hear each other normally (‘no barrier’ or NB condition), or when perception was impaired for one of the participants by (2) distorting the input they received from their conversation partner via a three channel vocoder (VOC condition) or (3) by adding multispeaker babble noise (BAB condition). The aim was to examine the clarification strategies used by children in adverse communicative conditions to maintain effective communication. The SPSS spreadsheet contains, for each of the 96 participants, quantitative data resulting from (a) the acoustic analysis of the recordings, (b) measures of communication efficiency and (c) perceptual ratings of clarity collected for excerpts from the recordings. This project investigates how children learn to adapt their speech to maximise communication effectiveness in difficult listening situations. Little is known about how this crucial skill develops and at what age it reaches maturity. A large corpus of speech from 96 children aged 9-14 years was recorded, with pairs of children working cooperatively to complete a ‘spot the difference’ picture task. The task was conducted in good and poor listening conditions so as to be able to analyse how children clarify their speech to overcome the communication barrier. We investigated how the ability to produce 'clear' speech develops, whether children make similar acoustic-phonetic enhancements as adults, and whether these vary according to the communication barrier. We related the perceived clarity of individual speakers to the acoustic-phonetic characteristics of their speech. This work helps to expand models of speech production such as Lindblom’s Hyper-Hypo model. Understanding how children control their speech production is important for developing better strategies for communication with children with hearing or language impairments, and in noisy environments. The LUCID Child corpus (kidLUCID), fully transcribed and web-accessible, is a rich resource for a range of analyses of child speech.

A total of 96 children and adolescents aged between 9 and 14 took part in the study (46M, 50F, mean age: 11;8 years, range 9;0 to 15;0 years). Participants were non-bilingual native Southern British English speakers who reported no history of hearing or language impairments. In the study, pairs of children and adolescents carried out the diapix task, a 'spot-the difference' game where they had to describe their pictures to each other to work out what the differences are. These conversations were audio-recorded and during the recording the two participants sat in different rooms and communicated via headsets with an attached microphone that was fitted with a condenser cardioid microphone (Beyerdynamic DT297). The speech of each participant was recorded on a separate channel at a sampling rate of 44 100 Hz (16 bit) using an EMU 0404 USB audio interface and Adobe Audition. For each pair of participants, six recordings were made in which each of the participant was the 'lead speaker' for each of the three conditions described above (NB, VOC, BAB). For all recordings, each channel was orthographically transcribed to a set of transcription guidelines. Word- and phoneme-level alignment software that was developed in-house at UCL was used to automatically align the transcriptions with the waveform and create Praat Textgrids with separate word and phoneme tiers. Alignment was manually checked and corrected for Speaker A in two stages: first the word level alignment of all the files was manually checked and adjusted where necessary. Corrected word level textgrids were automatically re-aligned to correct the phoneme level. The alignment of three vowels ([i:], [o] and [ae:]) was then verified and corrected by hand where necessary in these new textgrid files. Acoustic analyses of fundamental frequency, articulation rate, pause rate, intensity, vowel formant frequencies were carried out using Praat scripts. The number of words and syllables produced was also calculated. Samples of the spontaneous speech were presented to adult listeners in a perceptual rating test where they had to judge clarity on a 7-point scale.

Identifier
DOI https://doi.org/10.5255/UKDA-SN-851525
Metadata Access https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=c9aba84b4f7eb481e8f5081e6859815d8785a8f3cb9cdfa1debed59760a49ce9
Provenance
Creator Hazan, V, University College London; Pettinato, M, University College London; Tuomainen, O, University College London
Publisher UK Data Service
Publication Year 2014
Funding Reference Economic and Social Research Council
Rights Valerie Hazan, University College London; The Data Collection is available to any user without the requirement for registration for download/access.
OpenAccess true
Representation
Resource Type Numeric
Discipline Acoustics; Engineering Sciences; Mechanical and industrial Engineering; Mechanics and Constructive Mechanical Engineering; Psychology; Social and Behavioural Sciences
Spatial Coverage London; United Kingdom