SpokenSTS

Dataset

DOI

Spoken versions of the Semantic Textual Similarity dataset for testing semantic sentence level embeddings.Contains thousands of sentence pairs annotated by humans for semantic similarity. The spoken sentences can be used in sentence embedding models to test whether your model learns to capture sentence semantics.All sentences available in 6 synthetic Wavenet voices and a subset (5%) in 4 real voices recorded in a sound attenuated booth. Code to train a visually grounded spoken sentence embedding model and evaluation code is available at https://github.com/DannyMerkx/speech2image/tree/Interspeech21

The total size of the dataset exceeds the download limits. Pleasecontact DANS at info@dans.knaw.nl for delivery of the dataset.

Identifier
DOI	https://doi.org/10.17026/dans-z48-3ev6
Metadata Access	https://ssh.datastations.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.17026/dans-z48-3ev6

Provenance
Creator	D.G.M. Merkx; S.L. Frank; M.T.C. Ernestus
Publisher	DANS Data Station Social Sciences and Humanities
Contributor	RU Radboud University
Publication Year	2022
Rights	CC BY 4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess	true
Contact	RU Radboud University

Representation
Resource Type	Dataset
Format	application/zip
Size	18181; 9675354515; 9671319514; 9671553240; 7572257296; 9677185662
Version	2.0
Discipline	Humanities