KIParla - ParlaTO transcripts

PID

The ParlaTO corpus is part of the larger KIParla collection (www.kiparla.it), which can be freely queried through the NoSketch Engine interface.

The ParlaTO corpus was was funded by the CRT Foundation ("ParlaTO - Corpus del Parlato di Torino" project).

It consists of about 50 hours of interactions collected in Turin and its province through semi-structured interviews. The interviews, conducted between 2018 and 2020, involved 88 speakers with different origins, ages, education levels, and types of occupation, and addressed personal life experiences in the city (study, work, leisure activities, retirement, memories of the past, etc.). The transcriptions have been anonymized.

Overall, the module is made up of 68 conversations and includes 100 speakers.

This repository contains: • metadata for both speakers (occupation, gender, age, origin, L1, educational achievement) and conversations (collection point, year, languages used), in the metadata subfolder • descriptions of the set of transcription conventions used for this module • for each conversation you will find: .eaf file in eaf/ folder (time-aligned Jefferson-style transcriptions); .txt file in linear-jefferson/ folder (linearized Jefferson-style transcription); .txt file in linear-orthographic/ folder (linearized transcription retaining only orthographic words); .tsv file in tsv/ folder (tokenised version of the transcription).

More information can be found in the README.md file.

Due to GDPR restrictions, pseudo-anonymized audio files (MP3) are available under a restricted-access license. To request access, please contact the corpus coordinators through the KIParla website and follow the provided procedure.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Identifier
PID http://hdl.handle.net/20.500.11752/OPEN-1051
Related Identifier http://ceur-ws.org/Vol-2481/
Related Identifier https://doi.org/10.60760/unibo/parlato
Related Identifier https://kiparla.it/parlato/
Metadata Access http://dspace-clarin-it.ilc.cnr.it/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:dspace-clarin-it.ilc.cnr.it:20.500.11752/OPEN-1051
Provenance
Creator Ballarè, Silvia; Cerruti, Massimo
Publisher Università degli studi di Torino
Publication Year 2020
Rights Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0); http://creativecommons.org/licenses/by-nc-sa/4.0/; PUB
OpenAccess true
Contact dspace-clarin-it-ilc-help(at)ilc.cnr.it
Representation
Language Italian
Resource Type corpus
Format application/zip; application/octet-stream; text/plain; charset=utf-8; downloadable_files_count: 4
Discipline Linguistics