Corpus of conversational humor Krohot 1.0

Dataset

PID

The KROHOT corpus consists of 10 audio recordings of private, spontaneous conversations between two or three speakers, with a total duration of 232 minutes. Most recordings were made between May and September 2025. The conversations include recollections about past events that triggered spontaneous humorous reactions among participants (conversational humour).

Segments containing humour were manually annotated using a tagging scheme developed exclusively for this corpus. The scheme comprises five primary categories: vocabulary (lexical choice, including figurative use), relation (relationship between speakers), content (topical focus), attitude (speaker’s opinion toward the topic), and manner (purposefully humorous way of speaking). These categories are not mutually exclusive and can be combined.

The corpus allows for the analysis of linguistic and communicative phenomena, including markers of humour and strategies used to achieve humorous effects (teasing, mocking, irony, or metaphorical language) in informal private spoken conversations.

The corpus is available as WAV audio recordings, while the (aligned) transcriptions are given in the formats of the EXMARaLDA (https://exmaralda.org/en/) and Transcriber (https://trans.sourceforge.net/) tools, as well as in plain text.

Identifier
PID	http://hdl.handle.net/11356/2065
Related Identifier	https://doi.org/10.18690/um.ff.4.2024.10
Related Identifier	https://www.clarin.si/info/services/projects/#Corpus_of_conversational_humor_Krohot
Metadata Access	http://www.clarin.si/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:www.clarin.si:11356/2065

Provenance
Creator	Krajnc Ivič, Mira; Mihailović, Larisa; Ivič, Dominik; Verdonik, Darinka
Publisher	Filozofska fakulteta, Univerza v Mariboru; Faculty of Electrical Engineering and Computer Science, University of Maribor
Publication Year	2025
Rights	Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0); https://creativecommons.org/licenses/by-nc-nd/4.0/; PUB
OpenAccess	true
Contact	info(at)clarin.si

Representation
Language	Slovenian; Slovene
Resource Type	corpus
Format	text/plain; charset=utf-8; application/octet-stream; application/zip; downloadable_files_count: 2
Discipline	Linguistics