Torlak corpus represents a spoken variety of the endangered Torlak dialect from the Timok area in Southeast Serbia. It comprises transcripts of interviews with the local population, collected in the field between 2015 and 2017. Semi-structured interviews were conducted eliciting spontaneous speech in the form of long narratives about traditional culture and history. The corpus is made up of semi-orthographic transcripts of 86.5 hours of recordings from locations evenly distributed across the Timok area of the Torlak dialect zone. The dialect is presently under the influence of a more prestigious Standard Serbian variety and expresses a great deal of variation in the use of non-standard features. The corpus contains samples of the typical representatives of the dialect with little influence of the standard, as well as a smaller portion of speakers who use both dialect and standard features. The corpus contains 489,021 tokens with accentuation, morphosyntacitc tags and lemmatisation. Accentuation was done manually by trained transcribers. Morphosyntactic annotation and lemmatisation (available in the TEI and vertical formats of the corpus) were done automatically, with minor manual corrections. The morphosyntactic tags follow the MULTEXT-East specificatins for Torlak, cf. https://github.com/clarinsi/mte-msd.