This Dataset is for the paper Molecular Simulations Assisted by an Artificial Intelligence Agent (ArIA). This set contains codes and full datasets used to reproduce the results shown in the paper.
Dataset Structure
This set contains three main directories:
All the scripts require uv (https://docs.astral.sh/uv/getting-started/installation/)
All the test scripts were tested in our local cluster with L40S GPU.
App_deployment
This directory is used for deploying the ArIA chatbot. LoRA adapters trained in the Model_development directory are transferred here for use within the LangGraph framework.
Make_prompt
This directory contains scripts for generating synthetic prompts from ORCA input files, as well as synthetic reasoning texts (CoT, CoVe, ToT, GoT, and intrinsic reasoning) used for model fine-tuning.
ORCA input files were generated with the method used in this paper: https://doi.org/10.1039/D4DD00366G.
The scripts for calculating F1, classifying errors are also included in this directory.
Model_development
This directory is dedicated to LoRA adapter development. It includes ORCA input file execution to ensure runnability, along with validation and feedback agents.
Each directory is separated. Enter a directory to load and use the corresponding module.
DOI of the preprint: https://chemrxiv.org/doi/full/10.26434/chemrxiv.15002344/v1