Dataset: Molecular Simulations Assisted by an Artificial Intelligence Agent

DOI

This Dataset is for the paper Molecular Simulations Assisted by an Artificial Intelligence Agent (ArIA). This set contains codes and full datasets used to reproduce the results shown in the paper.

Dataset Structure This set contains three main directories: All the scripts require uv (https://docs.astral.sh/uv/getting-started/installation/) All the test scripts were tested in our local cluster with L40S GPU.

App_deployment This directory is used for deploying the ArIA chatbot. LoRA adapters trained in the Model_development directory are transferred here for use within the LangGraph framework.

Make_prompt This directory contains scripts for generating synthetic prompts from ORCA input files, as well as synthetic reasoning texts (CoT, CoVe, ToT, GoT, and intrinsic reasoning) used for model fine-tuning. ORCA input files were generated with the method used in this paper: https://doi.org/10.1039/D4DD00366G. The scripts for calculating F1, classifying errors are also included in this directory.

Model_development This directory is dedicated to LoRA adapter development. It includes ORCA input file execution to ensure runnability, along with validation and feedback agents.

Each directory is separated. Enter a directory to load and use the corresponding module.

DOI of the preprint: https://chemrxiv.org/doi/full/10.26434/chemrxiv.15002344/v1

Identifier
DOI https://doi.org/10.34894/RNPTDS
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/RNPTDS
Provenance
Creator Chanmungkalakul, Supphachok ORCID logo; van der Ree, Michiel (ORCID: 0009-0005-0442-936X); Giuntoli, Andrea ORCID logo; Pollice, Robert ORCID logo
Publisher DataverseNL
Contributor Groningen Digital Competence Centre; DataverseNL Network
Publication Year 2026
Rights CC-BY-4.0; info:eu-repo/semantics/openAccess; http://creativecommons.org/licenses/by/4.0
OpenAccess true
Contact Groningen Digital Competence Centre (University of Groningen)
Representation
Resource Type Dataset
Format application/zip; text/markdown
Size 3550691608; 1125
Version 1.0
Discipline Chemistry; Natural Sciences