Educational dataset: Application of long read sequencing to determine expressed antigen diversity in Trypanosoma brucei infections

DOI

The dataset is created for use in the teaching of the undergraduate course BINF200 - Analysis of Biological Sequences and Structures, first used in Autumn 2023.

The dataset contains (1) the processed, filtered and sample annotated sequenced reads in fasta format from Jayaraman et al (2019), repackaged in individual sample files and (2) a local database of variant surface glycoprotein (VSG) sequences in the Trypanosoma brucei reference strain TREU927.

The original data are available in GEO dataset GSE114843. Further details about the original data are available at github.com/siddharthjayaraman/longread-application (archived at https://doi.org/10.5281/zenodo.10043245).

The database of TREU927 VSG sequences is based on the (outdated) TREU927 genome v26. It is included for educational purposes and to reproduce results from the original publication. For research purposes always use the most recent genome version at tritrypdb.org.

The code that was used to generate the present data as well as educational notebooks of downstream analyses are available at github.com/tmichoel/BINF200-bio-sequences-structures (archived at https://doi.org/10.5281/zenodo.10043222).

Consider using Tree View to browse the files efficiently.

Identifier
DOI https://doi.org/10.18710/FFANM0
Related Identifier https://doi.org/10.1371/journal.pntd.0007262
Metadata Access https://dataverse.no/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.18710/FFANM0
Provenance
Creator Michoel, Tom ORCID logo
Publisher DataverseNO
Contributor Michoel, Tom; Morrison, Liam; Morrison, Liam (The University of Edinburgh); Jayaraman, Siddharth (The University of Edinburgh); University of Bergen
Publication Year 2023
Funding Reference Royal Society UF090083 ; Royal Society UF140610 ; Royal Society RG110378 ; BBSRC BB/J004227/1 ; BBSRC BB/J004235/1 ; BBSRC BBS/E/D/20002173
Rights CC0 1.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/publicdomain/zero/1.0
OpenAccess true
Contact Michoel, Tom (University of Bergen)
Representation
Resource Type sequencing data; Dataset
Format text/plain; application/octet-stream
Size 5324; 512909223; 31545649; 23962094; 22246377; 29394269; 33589359; 39316971; 30613434; 37892945; 11613340; 36859733; 6671747; 24187634; 16175290; 20883261; 20666356; 24555636; 29750177; 29554042; 22812408; 20321564; 2350389; 481397; 18784; 485559
Version 1.0
Discipline Life Sciences; Medicine