PHITS simulations of neutron and gamma-ray production from and transport of 70–250 MeV protons in heterogeneous 1D tissue phantoms

DOI

PHITS simulations of neutron and gamma-ray production from and transport of 70--250 MeV protons in heterogeneous 1D tissue phantoms

Hunter N. Ratliff¹, Francesco Blangiardi², Toni Kögler³˒⁴

¹Department of Computer science, Electrical engineering and Mathematical sciences, Western Norway University of Applied Sciences, Inndalsveien 28, Bergen, 5063, Vestland, Norway ORCID

²Technology Methods and Systems Data Based Methods, Fraunhofer ENAS, Technologie Campus 3, Chemnitz, 09126, Saxony, Germany ORCID

³Helmholtz-Zentrum Dresden — Rossendorf, Institute of Radiooncology — OncoRay, Dresden, Germany; ⁴OncoRay — National Center for Radiation Research in Oncology, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Helmholtz-Zentrum Dresden — Rossendorf, Dresden, Germany ORCID

Introduction

This dataset corresponds to the PHITS simulation data used in "Fast proton transport and neutron production in proton therapy using Fourier neural operators" [1]. A concise description of the simulation setup is provided here; please refer to the paper for detailed discussion, description, analysis, and further results derived from this dataset, along with additional references.

Description of simulations

This dataset consists of PHITS [2] simulations for 47 different proton energies from 70 MeV to 250 MeV incident upon different "1D" heterogeneous cylindrical phantoms (varied materials every 0.5 mm in length, uniform radially and rotationally) whose composition (materials and sequence along length) are taken from randomly sampled rays cast through a 3D CT phantom with CT number mapped to material composition and density via the HumanVoxelTable-KumamotoUniv.data conversion table within the RT-PHITS utility distributed with PHITS. Included tallies score spatial distributions of energy deposition, LET, proton current (with an additional angular dimension), neutron production, gamma-ray production, and a variety of diagnostic tallies. Event-by-event "list-mode" data is scored for neutron and gamma-ray production, called "dump" tallies in PHITS.

Given the objective of these simulations was for AI model development, the 47 energies are divided into 37 training energies (70 MeV to 250 MeV in 5 MeV steps) and 10 testing energies (73 MeV to 245.8 MeV in 19.2 MeV steps). For each energy, two simulations were ran: (1) a simulation with 1E8 (one hundred million) protons simulated where all tallies (including dump tallies) were included/enabled and (2) a simulation with 1E9 (one billion) protons simulated but with only dump tallies enabled (other tallies disabled to reduce memory consumption and increase simulation speed). Furthermore, all of the above was actually performed twice: (1) initially with purely monoenergetic beam energies and with a spatial spread of 2.5 mm and (2) a second "more realistic" set with Gaussian-distributed energies (with energy-dependent FWHM) and slightly wider 4.0 mm beam spread.

All simulation outputs were automatically processed from the plaintext and binary files produced by PHITS into compressed pickle file objects (NumPy arrays, Pandas DataFrames, dictionaries) using the PHITS Tools [3] Python utility. These Python objects were then utilized in the subsequent analysis of the paper this simulation set was generated for. The corresponding data repository used for AI model development can be found at [4].

Structure of this repository

The volume of data present in this repository is quite substantial (~700 GB available here / some TB including files only available upon request). Therefore, the repository has been structured in a way to allow flexibility in only downloading data of interest.

The root directory of this repository consists of 39 top-level directories whose names indicate their contents; each has been .tar archived and has either undergone .xz compression via xz on the tarball or with Python's LZMA compression on the tarball's contents prior to archiving. Within each are two directories: training and testing. Within each of these are directories of the format ???_MeV, where ??? is replaced by three digits specifying the nominal beam energy in MeV. (This is ???p? for the energies of the testing dataset, with p in place of a decimal point.) Thus, each training directory contains 37 subdirectories, and each testing directory contains 10 subdirectories. (One should note that there are no setup differences between training and testing data; they are simply divided here in the same way as in the paper.) Each ???_MeV/???p?_MeV directory contains simulation input/output and/or PHITS Tools processed output, depending on the top-level directory it is contained within. Input and output file names do not differ between different energies; directory structure is used to keep them distinguished/separated.

PHITS input information

One top-level directory differs from all of the others, and this is common_inputs. As the name suggests, this directory contains all PHITS input information used in generating all of the simulation outputs.

The core two PHITS input files used are beam-on-target_phits-input_MonoE.inp for the monoenergetic beam simulation set and beam-on-target_phits-input_GaussE.inp for the Gaussian-distributed beam energy simulation set. Within these inputs are lines using the PHITS insert file function infl:{}; all inserted files used in the PHITS simulations are also contained within this common_inputs directory. The single exception to this is PARAMETERS_files-1-and-7.txt, which is simply the file(1) and file(7) PHITS [Parameters] arguments and will be system-specific paths to PHITS installation/data files. Also note that relative paths are used in the infl:{} commands; these relative paths differ to how this repository is structured given the repository has been restructured in post for distribution convenience. File names are still unique and can be found in this common_inputs directory. The CELL subdirectory contains the [Cell] sections used for the varied phantom compositions, and the MAPPINGS_OF_ENERGY_TO_CELL_FILES.csv file details how these files are paired with the 47 different beam energies.

PHITS outputs (raw and processed)

The remaining 38 top-level directories contain simulation/processed output. When these simulations were ran, all output was contained in each ???_MeV directory. As detailed earlier, these have been split into various top-level directories here to allow more convenient download of only desired files. Nominally, each of these ???_MeV directories contained the following before being split:

a beam-on-target_phits-input.inp PHITS input file (and a simple phits.in pointing to this input file, needed for parallel running of PHITS); note that these inputs have all specific source energy information populated within this file
a phantom_composition_info.csv file also detailing the phantom composition used for that beam energy
phits*.out file(s), raw summary output files generated by PHITS
*.out raw plaintext tally output files from PHITS
*.eps graphical visualizations of tally output, generated by PHITS
*_dmp.out* raw binary tally dump files from PHITS
*.pickle.xz processed tally output (and phits.out metadata) from PHITS Tools, LZMA-compressed pickle files
*_dmp_namedtuple_list.pickle.xz processed tally dump output from PHITS Tools, formatted as a NumPy record array (np.recarray)
*_dmp_Pandas_df.pickle.xz processed tally dump output from PHITS Tools, formatted as a Pandas DataFrame (same numerical data as in NumPy recarray)
*.png and *.pdf graphical visualizations of tally output, generated by PHITS Tools

The top-level directories of this repository are named in a way to detail (1) which simulations their contents pertain to and (2) which output files are contained within them. The directories are named using an underscore-delimited pattern whose components have the following names and meanings:

Beam type:

    MonoE refers to simulations with the monoenergetic beams with 2.5 mm spread
    GaussE refers to simulations with the Gaussian-distributed energies and 4.0 mm spread


Simulated number of protons:

    1E8 refers to simulations with 10^8 (one hundred million) protons simulated
    1E9 refers to simulations with 10^9 (one billion) protons simulated


Output source/type:

    raw refers to the PHITS input and PHITS-generated output
    processed refers to the Python-formatted processed output produced by PHITS Tools
    plots refers to the *.eps files produced by PHITS and the *.png and *.pdf files produced by PHITS Tools, all containing graphical plots of tally output (only relevant to 1E8 simulations)


Other labels:

    proton-tally refers to output from the huge [T-Cross] tally used only in 1E8 simulations for scoring proton phase space as a function of energy, position, and direction (separated from others owing to its considerable size)
    neutron-dump refers to the event-by-event neutron production data scored by a [T-Product] tally's "dump" option

        NumPy and Pandas to denote if processed contents are formatted as NumPy record arrays or Pandas Dataframes


    gamma-dump refers to the event-by-event gamma-ray production data scored by a [T-Product] tally's "dump" option

        NumPy and Pandas to denote if processed contents are formatted as NumPy record arrays or Pandas Dataframes


    other refers to output from all other tallies aside from the above three (energy deposition, LET, diagnostic tallies, etc.; only relevant to 1E8 simulations given all tallies except dump tallies were disabled for 1E9 simulations) along with (for raw directories) PHITS input-related files and phits*.out file(s).

For clarity, the dataset notation here corresponds to that used in [1] as follows: GaussE_1E8 = ES8 and GaussE_1E9 = ES9. (The paper did not use MonoE_1E8 and MonoE_1E9, but if it had they would've been designated with NES8 and NES9, respectively.)

All put together, this results in the following top-level directories contained in this repository:

(\begin{array}{lrrc} \textbf{Directory} & \textbf{Files} & \textbf{Uncompressed size (GB)} & \textbf{Available upon request} \ \hline \texttt{common_inputs} & 54 & 0.002 & \ \texttt{GaussE_1E8_raw_proton-tally} & 564 & 361.30 & \ \texttt{GaussE_1E8_raw_neutron-dump} & 611 & 41.24 & \ \texttt{GaussE_1E8_raw_gamma-dump} & 611 & 85.82 & \ \texttt{GaussE_1E8_raw_other} & 1927 & 22.73 & \ \texttt{GaussE_1E8_processed_proton-tally} & 564 & 17.19 & \ \texttt{GaussE_1E8_processed_neutron-dump_NumPy} & 611 & 37.25 & \ \texttt{GaussE_1E8_processed_neutron-dump_Pandas} & 611 & 45.77 & \ \texttt{GaussE_1E8_processed_gamma-dump_NumPy} & 611 & 73.81 & \ \texttt{GaussE_1E8_processed_gamma-dump_Pandas} & 611 & 90.16 & \ \texttt{GaussE_1E8_processed_other} & 1551 & 4.35 & \ \texttt{GaussE_1E8_plots} & 3525 & 7.60 & \ \texttt{GaussE_1E9_raw_neutron-dump} & 1081 & 408.81 & \times \ \texttt{GaussE_1E9_raw_gamma-dump} & 1081 & 854.58 & \times \ \texttt{GaussE_1E9_raw_other} & 1316 & 0.59 & \ \texttt{GaussE_1E9_processed_neutron-dump_NumPy} & 1081 & 372.46 & \times \ \texttt{GaussE_1E9_processed_neutron-dump_Pandas} & 1081 & 457.42 & \times \ \texttt{GaussE_1E9_processed_gamma-dump_NumPy} & 1081 & 738.00 & \times \ \texttt{GaussE_1E9_processed_gamma-dump_Pandas} & 1081 & 901.41 & \times \ \texttt{GaussE_1E9_processed_other} & 1175 & 0.02 & \ \texttt{MonoE_1E8_raw_proton-tally} & 94 & 360.92 & \ \texttt{MonoE_1E8_raw_neutron-dump} & 282 & 40.83 & \ \texttt{MonoE_1E8_raw_gamma-dump} & 282 & 83.69 & \ \texttt{MonoE_1E8_raw_other} & 1222 & 30.17 & \ \texttt{MonoE_1E8_processed_proton-tally} & 47 & 13.69 & \ \texttt{MonoE_1E8_processed_neutron-dump_NumPy} & 94 & 37.23 & \ \texttt{MonoE_1E8_processed_neutron-dump_Pandas} & 94 & 45.77 & \ \texttt{MonoE_1E8_processed_gamma-dump_NumPy} & 94 & 72.23 & \ \texttt{MonoE_1E8_processed_gamma-dump_Pandas} & 94 & 88.29 & \ \texttt{MonoE_1E8_processed_other} & 846 & 2.18 & \ \texttt{MonoE_1E8_plots} & 799 & 2.25 & \ \texttt{MonoE_1E9_raw_neutron-dump} & 2364 & 407.99 & \times \ \texttt{MonoE_1E9_raw_gamma-dump} & 2364 & 836.55 & \times \ \texttt{MonoE_1E9_raw_other} & 329 & 0.04 & \ \texttt{MonoE_1E9_processed_neutron-dump_NumPy} & 94 & 371.30 & \times \ \texttt{MonoE_1E9_processed_neutron-dump_Pandas} & 94 & 455.94 & \times \ \texttt{MonoE_1E9_processed_gamma-dump_NumPy} & 94 & 721.08 & \times \ \texttt{MonoE_1E9_processed_gamma-dump_Pandas} & 94 & 879.13 & \times \ \texttt{MonoE_1E9_processed_other} & 188 & 0.01 & \ \hline \textbf{TOTAL} & \textbf{30397} & \textbf{8969.84} & \end{array})

(Data marked as "Available upon request" is only available upon additional specific request.)

And, as stated earlier, each of these top-level directories is divided into a training subdirectory (containing 37 ???_MeV directories) and a testing subdirectory (containing 10 ???p?_MeV directories), where the ???[p?]_MeV directories only (1) contain particular files (2) relevant to certain simulations—as specified by the top-level directory's name.

As a note to anyone surveying the raw files, all GaussE simulations were ran with OpenMP parallelization with 10 processes. For 1E8 simulations, this was conducted as ten PHITS runs of 1E7 protons each; for 1E9 simulations, this was conducted as twenty runs of 5E7 protons each. (PHITS runs can be "chained" as "restart calculations", where one run can resume from where a previous run ended.) In these simulations, the generated phits.out files from each run were renamed to phits-#A-#B.out (where #A was an internal number 1 to 47 pairing with each simulated beam energy, and #B is the run number, 0 to 19) and moved into a phitsout subdirectory after each run's completion. However, this was less uniform for the MonoE simulations; for those, the strategy was to complete each simulation in a single run of PHITS. This generally involved using a hybrid OpenMP + MPI parallelization with anywhere from 80 to 160 processes each, split between OMP and MPI (noting that some 1E9 runs were conducted with only MPI parallelization). None of this influences the output format of the standard tally outputs. However, the number of dump files produced is equal to the number of MPI processes utilized. This means that each GaussE simulation only has one dump file per dump tally owing to only using OpenMP parallelization (which merges its dump files at the end of calculation) while the MonoE simulations contain a varied number of dump files per dump tally owing to varriations in parallelization strategies employed in those simulations. PHITS Tools ultimately merges all dump outputs back together in its processing, meaning if looking at the processed output this quirk of how simulations were conducted should not be apparent at all.

Given PHITS Tools was under ongoing development as this dataset was being produced, the GaussE directories contain some extra output not present in the MonoE directories. Most notably, only for the GaussE simulations do the plot directories contain PNG and PDF plot files generated by PHITS Tools and the processed directories contain dictionary objects of the processed phits*.out files.

Note that, for convenience, the phits.out file(s) for each simulation are also copied to all raw directories. The phits.out file(s) contain the full PHITS input echo, among other information about the simulation. For the GaussE simulations, these are within a further phitsout subdirectory for each beam energy. Also for all GaussE_processed directories, the processed phits.out file(s), phits_out.pickle.xz, are included too.

References

[1] F. Blangiardi, H.N. Ratliff et al., "Fast proton transport and neutron production in proton therapy using Fourier neural operators", (in preparations for submission) (2025)

[2] T. Sato, Y. Iwamoto, S. Hashimoto, T. Ogawa, T. Furuta, S. Abe, T. Kai, Y. Matsuya, N. Matsuda, Y. Hirata, T. Sekikawa, L. Yao, P.E. Tsai, H.N. Ratliff, H. Iwase, Y. Sakaki, K. Sugihara, N. Shigyo, L. Sihver and K. Niita, "Recent improvements of the Particle and Heavy Ion Transport code System - PHITS

Identifier
DOI https://doi.org/10.14278/rodare.4526
Related Identifier IsIdenticalTo https://www.hzdr.de/publications/Publ-43014
Related Identifier IsPartOf https://doi.org/10.14278/rodare.3996
Related Identifier IsPartOf https://rodare.hzdr.de/communities/novo
Related Identifier IsPartOf https://rodare.hzdr.de/communities/rodare
Metadata Access https://rodare.hzdr.de/oai2d?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:rodare.hzdr.de:4526
Provenance
Creator Ratliff, Hunter ORCID logo; Blangiardi, Francesco ORCID logo; Kögler, Toni ORCID logo
Publisher Rodare
Publication Year 2025
Rights Creative Commons Attribution 4.0 International; Open Access; https://creativecommons.org/licenses/by/4.0/legalcode; info:eu-repo/semantics/openAccess
OpenAccess true
Contact https://rodare.hzdr.de/support
Representation
Language English
Resource Type Dataset
Version 1.0.0
Discipline Life Sciences; Natural Sciences; Engineering Sciences