The dataset is created for use in the teaching of the undergraduate course BINF200 - Analysis of Biological Sequences and Structures, first used in Autumn 2023.
The dataset contains (1) the processed, filtered and sample annotated sequenced reads in fasta format from Jayaraman et al (2019), repackaged in individual sample files and (2) a local database of variant surface glycoprotein (VSG) sequences in the Trypanosoma brucei reference strain TREU927.
The original data are available in GEO dataset GSE114843. Further details about the original data are available at github.com/siddharthjayaraman/longread-application (archived at https://doi.org/10.5281/zenodo.10043245).
The database of TREU927 VSG sequences is based on the (outdated) TREU927 genome v26. It is included for educational purposes and to reproduce results from the original publication. For research purposes always use the most recent genome version at tritrypdb.org.
The code that was used to generate the present data as well as educational notebooks of downstream analyses are available at github.com/tmichoel/BINF200-bio-sequences-structures (archived at https://doi.org/10.5281/zenodo.10043222).
Consider using Tree View to browse the files efficiently.