Abstract
Lead optimization failures are often linked to poor absorption, compounded by strong efflux transport and low recovery. Here we report a comprehensive analysis and modeling of public and industrial data on adsorption of organic molecules. Comparative analysis of one pharma-industrial chemical space was used to examine the relationship between critical permeability parameters. Our findings highlighted misconceptions in the transport route characterization. We demonstrated the importance of considering recovery, distribution coefficient, and topological polar surface area during Multi-Parameter Optimization. A Multi-Task Learning approach was employed for predictive model development. The models built on the public data were validated on the industrial data, revealing key discrepancies influenced by variation in experimental protocols. Our analysis emphasizes the model building on proprietary data in industrial absorption evaluations, which allows to avoid applicability domain issues and standardized measurement protocols. Finally, the integration of predictive models with Generative Topographic Mapping for chemical space exploration introduces a novel strategy to better understand optimization challenges. This work proposes a visual approach for MPO to improve drug discovery efficiency. The developed public models and curated public datasets are publicly accessible.
This repository contains the data collected for this work.