Données de réplication pour : Garbage in, garbage out: An industrial perspective on drug absorption modeling

DOI

Abstract Lead optimization failures are often linked to poor absorption, compounded by strong efflux transport and low recovery. Here we report a comprehensive analysis and modeling of public and industrial data on adsorption of organic molecules. Comparative analysis of one pharma-industrial chemical space was used to examine the relationship between critical permeability parameters. Our findings highlighted misconceptions in the transport route characterization. We demonstrated the importance of considering recovery, distribution coefficient, and topological polar surface area during Multi-Parameter Optimization. A Multi-Task Learning approach was employed for predictive model development. The models built on the public data were validated on the industrial data, revealing key discrepancies influenced by variation in experimental protocols. Our analysis emphasizes the model building on proprietary data in industrial absorption evaluations, which allows to avoid applicability domain issues and standardized measurement protocols. Finally, the integration of predictive models with Generative Topographic Mapping for chemical space exploration introduces a novel strategy to better understand optimization challenges. This work proposes a visual approach for MPO to improve drug discovery efficiency. The developed public models and curated public datasets are publicly accessible.

This repository contains the data collected for this work.

Identifier
DOI https://doi.org/10.57745/ANHVUS
Metadata Access https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.57745/ANHVUS
Provenance
Creator LLOMPART, Pierre ORCID logo; MINOLETTI, Claire ORCID logo; MARCOU, Gilles ORCID logo; VARNEK, Alexandre (ORCID: 0000-0003-1886-925X)
Publisher Recherche Data Gouv
Contributor Marcou, Gilles; VARNEK, Alexandre; Université de Strasbourg; Centre national de la recherche scientifique; IDD/CADD, Sanofi, Vitry-Sur-Seine, France; Entrepôt-Catalogue Recherche Data Gouv
Publication Year 2025
Funding Reference CIFRE 2021/1684
Rights etalab 2.0; info:eu-repo/semantics/openAccess; https://spdx.org/licenses/etalab-2.0.html
OpenAccess true
Contact Marcou, Gilles (Laboratory of Chemoinformatics UMR 7140 ; University of Strasbourg, CNRS ; Strasbourg ; France); VARNEK, Alexandre (Laboratory of Chemoinformatics UMR 7140 ; University of Strasbourg, CNRS ; Strasbourg ; France)
Representation
Resource Type Dataset
Format text/tab-separated-values; text/plain
Size 5910417; 1170805; 1759755; 101943; 207585; 1825451; 3215235; 2528565; 11958871; 1758174; 436395; 387713; 218005; 358143; 222881; 1011904; 537314; 2847831; 5345; 31452; 951
Version 1.0
Discipline Chemistry; Natural Sciences