The KROHOT corpus consists of 10 audio recordings of private, spontaneous conversations between two or three speakers, with a total duration of 232 minutes. Most recordings were made between May and September 2025. The conversations include recollections about past events that triggered spontaneous humorous reactions among participants (conversational humour).
Segments containing humour were manually annotated using a tagging scheme developed exclusively for this corpus. The scheme comprises five primary categories: vocabulary (lexical choice, including figurative use), relation (relationship between speakers), content (topical focus), attitude (speaker’s opinion toward the topic), and manner (purposefully humorous way of speaking). These categories are not mutually exclusive and can be combined.
The corpus allows for the analysis of linguistic and communicative phenomena, including markers of humour and strategies used to achieve humorous effects (teasing, mocking, irony, or metaphorical language) in informal private spoken conversations.
The corpus is available as WAV audio recordings, while the (aligned) transcriptions are given in the formats of the EXMARaLDA (https://exmaralda.org/en/) and Transcriber (https://trans.sourceforge.net/) tools, as well as in plain text.