Dataset

Dataset: GridStratLLM: Agent Framework for Coordinated Cyberattacks on the Smart Grid with Large Language Models

DOI

A new cybersecurity threat emerges: Recent Large Language Models (LLMs) with advanced reasoning and tool calling enable even attackers lacking expert knowledge to coordinate large-scale attacks on Smart Grids (SG). These LLMs can orchestrate multiple malware instances, select appropriate signals and deltas, and execute data-modification attacks on the S7 and Modbus protocols. Thereby, the automatically generated attack progresses towards the targeted unsafe state and evades detection by the Intrusion Detection System (IDS). To assess this emerging threat, we introduce GridStratLLM, a novel agent framework for coordinated attacks on industrial networks. Furthermore, we evaluate attack plans generated by four frontier Large Language Models using the open-source Network Security Monitor (NSM) Zeek and a commercial NSM. Finally, we contribute a dataset recorded in a Hardware-in-the-Loop (HIL) testbed to support the training of IDS solutions against these attacks. The dataset is 24 hours and 11 minutes long, containing 436 attacks with 212 coordinated attacks.

GridStratLLM Dataset

This dataset contains coordinated cyberattacks generated using the GridStratLLM agent framework against a hardware-in-the-loop testbed of a distributed generation environment. It covers one normal operation and five attack datasets, each using a different LLM. Every dataset captures network traffic, process data from SCADA, log messages, and metadata from the attack scripts.

Paper: https://doi.org/10.1145/3765611.3815147 GridStratLLM source code: https://github.com/nbke/GridStratLLM

Each dataset directory contains: - attack_session_llm.parquet: LLM prompts, plans, chain-of-thought reasoning, token usage - attack_worker.parquet: Network interface info (MAC, IP, interface name) - packet_metadata.parquet: Packet metadata (timestamps, addresses, ports, protocol) - packets.pcap: Raw packet capture - attack_datamod_history.parquet: Packet modification log with delta values - attack_exec_steps.parquet: Attack execution timeline per worker - process_data.parquet: WinCC SCADA data - logs.parquet: Logs from PLC 1512 and PLC 1516

See network.json for a list of network devices. modbus.json contains a mapping of Modbus registers to signal names. s7_connections.json contains all signals transmitted via S7.

Parquet Files

attack_session_llm.parquet

The structure of the plan column is explained in appendix E of the paper. coordinated is true if an attack session uses multiple attack workers. The column all_messages may be NULL due to a data capture issue.

packet_metadata.parquet

The entries in packet_metadata.parquet are in the same order as packets.pcap. If the UUID of a packet (id in packet_metadataparquet) is contained in the packet_id column in attack_datamod_history.parquet, then the packet originates from an attack script.

SCADA process data: `process_data.parquet`

PV: - Control Signals: on_off - Monitor Signals: temp_air, poa_direct, wind_speed, poa_diffuse, cell_temperature, inverter_ac_power, inverter_dc_power

Wind: - Control Signals: blade_rotation, rotation_speed - Monitor Signals: power, height, pressure, wind_speed_a, wind_speed_b, temperature_a, temperature_b

Battery: - Control Signals: on_off, target_power - Monitor Signals: current, voltage, temperature, state_of_charge, actual_charge_power

Log messages: `logs.parquet`

Log messages are in German and only sent when the signal value changes. Example: Wertänderung "SysLogDaten".Inverter_ac_power Altwert: 239,0 aktueller Wert: 20,0 CPU:SECCPU16

DuckDB File: `merged_datasets.duckdb`

The combined merged_datasets.duckdb file contains all data from the Parquet files plus raw packet data. Differences from the Parquet files:

process_data is split into four tables: wind_process_data, pv_process_data, battery_process_data, demand_process_data (one column per signal instead of JSON values).
attack_session_llm is renamed to attack_session.
attack_exec_steps is renamed to exec_steps and setup_duration is stored as an interval instead of a bigint.
packet_metadata is renamed to packets. The packets from the PCAP files are stored in the raw_packet BLOB column. The l2_flow_id, l3_flow_id, and l4_flow_id columns are omitted, which are always null in the Parquet files.
id columns use uuid type instead of blob.
Categorical columns (model_name, transport, state, kind, etc.) use DuckDB enum types instead of string.

Column name prefix for process data tables: - C_: Control signals (commands sent to power plants) - M_: Monitor signals (measured values sent to SCADA)

Funding

This research is supported in part by funding from the topic Engineering Secure Systems of the Helmholtz Association (HGF) and by KASTEL Security Research Labs (structure 46.23.02).

Identifier
DOI	https://doi.org/10.35097/bx5337kcykte438h
Related Identifier	IsIdenticalTo https://publikationen.bibliothek.kit.edu/1000193145
Metadata Access	https://www.radar-service.eu/oai/OAIHandler?verb=GetRecord&metadataPrefix=datacite&identifier=10.35097/bx5337kcykte438h

Provenance
Creator	Kellerer, Nicolai ; Hagenmeyer, Veit
Publisher	Karlsruhe Institute of Technology
Contributor	RADAR
Publication Year	2026
Rights	Open Access; Creative Commons Attribution 4.0 International; info:eu-repo/semantics/openAccess; https://creativecommons.org/licenses/by/4.0/legalcode
OpenAccess	true

Representation
Resource Type	Dataset
Format	application/x-tar
Size	6,5 GB
Discipline	Computer Science; Computer Science, Electrical and System Engineering; Engineering Sciences