GraphML files for protein sequence networks of expansin homologues


GraphML files for undirected weighted graphs with nodes that represent protein sequences of expansin homologues. Protein sequences were clustered by a threshold of sequence identity to derive representative sequences.Pairwise sequence identity between two sequences was derived from global Needleman-Wunsch alignment. Protein sequence networks were generated with edge weights of pairwise sequence identity, filtered by a predefined threshold. Metadata of the nodes (e.g. annotations) and of the edges (the edge weights) were summarized in GraphML files.

The GraphML attributes for the edges comprise the edge weights (pairwise sequence identity, "weight"). The GraphML attributes for the nodes comprise the identifiers from the ExED ("sequence_id", "protein_id", "hfam_id", and "sfam_id" for sequence, protein, homologous family and superfamily identifiers, respectively), the NCBI taxonomy ID ("tax_id"), the annotated (organism) source name ("tax_name"), the taxonomic lineage of the source organism ("lineage", with taxa separated by "<--"), and the length of the amino acid sequence ("sequence_length"). In addition, suggested color names are given for both fill color and border color of each node ("color" and "color_border").

Related Identifier
Metadata Access
Creator Lohoff, Caroline (Universität Stuttgart)
Publisher DaRUS
Contributor Pleiss, Jürgen
Publication Year 2020
Rights CC BY 4.0; info:eu-repo/semantics/openAccess;
OpenAccess true
Contact Pleiss, Jürgen (Universität Stuttgart)
Resource Type Dataset
Format text/xml-graphml
Size 74017190; 187334563; 26408406; 21810184; 90050475
Version 1.1
Discipline Life Sciences; Medicine