Protein structures predicted using DMPfold2, plus training data

Dataset

DOI

This dataset comprises predicted protein structures from the paper "Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins". Structures were predicted using DMPfold2.BFD_1.3M.hdf5 contains all the models from the set of 1.3M that were generated. The models can be retrieved from this file using the provided hdf5_extract.py script and the list of IDs in bfdfold_1.3M_target_ids.csv.Also provided are tarballs of the models and sequence alignments for the 5193 Pfam families modelled in the paper, as well as for the set of 255 Pfams with released structures used for comparisons against DMPfold1 and C-I-TASSER.train_data.tar.bz2 contains the data used to train the DMPfold2 neural network. Further scripts and instructions are available on the associated GitHub page: https://github.com/psipred/DMPfold2

Identifier
DOI	https://doi.org/10.5522/04/14979990.v3
Related Identifier	HasPart https://ndownloader.figshare.com/files/28839921
Related Identifier	HasPart https://ndownloader.figshare.com/files/28839924
Related Identifier	HasPart https://ndownloader.figshare.com/files/28839930
Related Identifier	HasPart https://ndownloader.figshare.com/files/28839933
Related Identifier	HasPart https://ndownloader.figshare.com/files/28839936
Related Identifier	HasPart https://ndownloader.figshare.com/files/28912617
Related Identifier	HasPart https://ndownloader.figshare.com/files/28912623
Related Identifier	HasPart https://ndownloader.figshare.com/files/31185027
Related Identifier	HasPart https://ndownloader.figshare.com/files/33897233
Metadata Access	https://api.figshare.com/v2/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:figshare.com:article/14979990

Provenance
Creator	Kandathil, Shaun ; Lau, Andy; Greener, Joe; Jones, David
Publisher	University College London UCL
Contributor	Figshare
Publication Year	2022
Rights	https://creativecommons.org/licenses/by/4.0/
OpenAccess	true
Contact	researchdatarepository(at)ucl.ac.uk

Representation
Language	English
Resource Type	Dataset
Discipline	Biology; Life Sciences