α-Amino ester hydrolases (AEHs) offer a promising route to the stereoselective synthesis of β-lactams such as cephalexin. However, published kinetic studies have encountered difficulty when extended beyond fitting of the data, indicating practical non-identifiability of the underlying kinetic models. Here, we address this issue using Bayesian inference combined with a reaction-consistent neural ODE surrogate that substantially accelerates parameter estimation. This framework enables efficient development of complex enzyme kinetic models even on limited hardware while providing rigorous uncertainty quantification of all parameters. To account for batch-dependent differences in active enzyme concentration, it was treated as a free parameter in each time series. Using this approach, the number of kinetic parameters was reduced from 12 to 9, and a useful kinetic model was obtained which is identifiable, mechanistically consistent, and predictive even under high substrate conditions.
Available Models
models/model_04.json: The most comprehensive 12-parameter model including all major reaction pathways, competitive inhibition, substrate inhibition, and detailed enzyme regulation mechanisms. This model provides the most biologically detailed description but requires the most parameters to be estimated.
models/model_06.json: A streamlined 9-parameter model that simplifies some regulatory interactions while maintaining core kinetic behavior. This represents a good compromise between detail and parameter identifiability.
models/model_07.json: An intermediate 10-parameter model that includes additional regulatory terms compared to Model 06, capturing more complex enzyme behavior under varying substrate conditions.
models/model_08.json: An optimized 9-parameter model that balances predictive accuracy with parameter parsimony. This model was developed through systematic model reduction to retain essential kinetic features while minimizing parameter uncertainty.
models/model_04_no_e0.json: Identical to Model 04 but with fixed enzyme concentration (E₀) rather than estimating it from data. Use this when enzyme concentration is known or measured separately.
models/model_08_no_e0.json: Identical to Model 08 but with fixed enzyme concentration. This provides a direct comparison of modeling approaches with and without enzyme concentration estimation.
Model File Structure and Components
Each model file (JSON format) contains a complete mathematical description of the kinetic system:
Species definitions: Lists all chemical species with their names and symbolic identifiers used in equations
Constants: Fixed parameters like enzyme concentration (p0) that may be estimated or held constant
ODEs: The system of ordinary differential equations describing how each species concentration changes over time. These equations encode the reaction kinetics and mass balances.
Parameters: Adjustable kinetic parameters (rate constants, binding affinities, inhibition constants) with their prior distributions for Bayesian inference
Algebraic assignments: Complex mathematical expressions that define reaction rates, enzyme-substrate complexes, and regulatory terms as functions of the parameters and species concentrations
The models use symbolic mathematics where enzyme-substrate complexes and reaction rates are expressed algebraically, making them both interpretable and computationally efficient.
System Requirements
Software Dependencies
The analysis pipeline requires several specialized Python packages for scientific computing, probabilistic programming, and machine learning:
pip install catalax
Hardware Requirements
The computational analysis is moderately demanding due to Bayesian MCMC sampling and neural network training:
CPU: Multi-core processor (recommended: 12+ cores) - MCMC chains run in parallel across available cores for efficient sampling
RAM: 16GB minimum, 32GB recommended - Memory requirements peak during MCMC sampling when storing large arrays of posterior samples
Operating System and Python Version
Supported OS: Linux or macOS (primary testing on macOS)
Python version: 3.10 or higher required for compatibility with JAX and NumPyro
Shell: Bash-compatible shell for running analysis scripts
How to Reproduce
Quick Start
Install dependencies:
pip install catalax
Train the neural ODE surrogate:
jupyter notebook TrainNeuralODE.ipynb
Run all cells to create trained/rateflowode.eqx
Run the complete analysis:
export XLA_FLAGS="--xla_force_host_platform_device_count=12" # Adjust number for your CPU cores
chmod +x fit_all.sh
./fit_all.sh
What This Does
The analysis pipeline:
Uses Bayesian inference (MCMC) to estimate kinetic parameters with uncertainty quantification
Compares multiple model complexities (Models 04, 06, 07, 08)
Treats enzyme concentration as a free parameter in each experiment
Generates diagnostic plots and statistical summaries
Saves all results to the results/ directory
Individual Model Analysis
To analyze just one model:
python run_inference.py models/model_08.json
For analysis without enzyme concentration estimation:
python run_inference.py models/model_08_no_e0.json --no-e0
Outputs
Statistical Results Files
These files contain the quantitative outcomes of the parameter estimation and model evaluation:
{model_name}_summary.csv: Comprehensive MCMC parameter statistics including posterior means, standard deviations, 95% credible intervals, effective sample sizes (ESS), and R-hat convergence diagnostics. This file provides the key numerical results for parameter interpretation.
{model_name}_samples.nc: Complete posterior distribution samples stored in NetCDF format. Contains 10,000 samples × 12 chains for each parameter, enabling detailed uncertainty analysis, prediction intervals, and further statistical computations.
{model_name}_metrics.json: Model performance metrics including various error measures (L1, L2 losses), coefficient of determination (R²), Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). These metrics allow comparison of model quality and complexity.
{model_name}_mean_e0.npy: Estimated enzyme concentrations for each experimental measurement (when E₀ estimation is enabled). This file contains the posterior mean enzyme concentrations that can be used for subsequent analyses or experimental validation.
Visualization Outputs
(plots/ subdirectory)
Diagnostic and result plots for model assessment and interpretation:
Trace plots: Time series of MCMC samples for each parameter, allowing visual inspection of mixing and convergence
Corner plots: Two-dimensional projections of parameter correlations and marginal distributions
Posterior distributions: Histograms and density plots showing parameter uncertainty
Model fit plots: Comparison of model predictions vs. experimental data over time
MCMC diagnostics: Monte Carlo Standard Error (MCSE) and Effective Sample Size (ESS) plots to assess sampling quality
Fitted Model Files (models/ subdirectory)
Updated model definitions with estimated parameters:
{model_name}_bi.json: Model with parameters set to Bayesian posterior means. This represents the most probable parameter values given the data and priors, suitable for point predictions and further analysis.
{model_name}_fitted.json: Model with parameters optimized using deterministic methods. These parameters minimize prediction errors and are typically used for the best-fit model predictions.
Catalax, 0.5.2
Python, 3.11