Clustered IMG VR v3 file: IMGVR70

DOI

How this file was created? All proteins (n=66,585,678) were retrieved from IMG/VR v3 database (https://genome.jgi.doe.gov/portal/IMG_VR/IMG_VR.home.html) version IMG_VR_2020-10-12_5.1 (https://doi.org/10.1093/nar/gkaa946). We used MMseqs2 (https://doi.org/10.1038/nbt.3988) for similarity-based clustering with a threshold of 70% identity (using default greedy mode and 80% reciprocal coverage of target and query). We then extracted one representative sequence per cluster (n=16,555,061) to build the IMGVR70.fa file.

Identifier
DOI https://doi.org/10.15454/RZDFOR
Metadata Access https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.15454/RZDFOR
Provenance
Creator Florian Maumus
Publisher Recherche Data Gouv
Contributor Francillonne, Nicolas; Florian Maumus
Publication Year 2021
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Contact Francillonne, Nicolas (INRAE); Florian Maumus (INRAE)
Representation
Resource Type Dataset
Format application/gzip
Size 2394973344
Version 1.0
Discipline Agriculture, Forestry, Horticulture; Life Sciences; Plant Science; Biology; Omics