A new cybersecurity threat emerges: Recent Large Language Models (LLMs) with advanced reasoning and tool calling enable even attackers lacking expert knowledge to coordinate large-scale attacks on Smart Grids (SG).
These LLMs can orchestrate multiple malware instances, select appropriate signals and deltas, and execute data-modification attacks on the S7 and Modbus protocols.
Thereby, the automatically generated attack progresses towards the targeted unsafe state and evades detection by the Intrusion Detection System (IDS).
To assess this emerging threat, we introduce GridStratLLM, a novel agent framework for coordinated attacks on industrial networks.
Furthermore, we evaluate attack plans generated by four frontier Large Language Models using the open-source Network Security Monitor (NSM) Zeek and a commercial NSM.
Finally, we contribute a dataset recorded in a Hardware-in-the-Loop (HIL) testbed to support the training of IDS solutions against these attacks.
The dataset is 24 hours and 11 minutes long, containing 436 attacks with 212 coordinated attacks.
GridStratLLM Dataset
This dataset contains coordinated cyberattacks generated using the GridStratLLM agent framework against a hardware-in-the-loop testbed of a distributed generation environment.
It covers one normal operation and five attack datasets, each using a different LLM.
Every dataset captures network traffic, process data from SCADA, log messages, and metadata from the attack scripts.
Paper: https://doi.org/10.1145/3765611.3815147
GridStratLLM source code: https://github.com/nbke/GridStratLLM
Each dataset directory contains:
- attack_session_llm.parquet: LLM prompts, plans, chain-of-thought reasoning, token usage
- attack_worker.parquet: Network interface info (MAC, IP, interface name)
- packet_metadata.parquet: Packet metadata (timestamps, addresses, ports, protocol)
- packets.pcap: Raw packet capture
- attack_datamod_history.parquet: Packet modification log with delta values
- attack_exec_steps.parquet: Attack execution timeline per worker
- process_data.parquet: WinCC SCADA data
- logs.parquet: Logs from PLC 1512 and PLC 1516
See network.json for a list of network devices. modbus.json contains a mapping of Modbus registers to signal names.
s7_connections.json contains all signals transmitted via S7.
Parquet Files
attack_session_llm.parquet
The structure of the plan column is explained in appendix E of the paper.
coordinated is true if an attack session uses multiple attack workers.
The column all_messages may be NULL due to a data capture issue.
packet_metadata.parquet
The entries in packet_metadata.parquet are in the same order as packets.pcap.
If the UUID of a packet (id in packet_metadataparquet) is contained in the packet_id column in attack_datamod_history.parquet, then the packet originates from an attack script.
SCADA process data: process_data.parquet
PV:
- Control Signals: on_off
- Monitor Signals: temp_air, poa_direct, wind_speed, poa_diffuse, cell_temperature, inverter_ac_power, inverter_dc_power
Wind:
- Control Signals: blade_rotation, rotation_speed
- Monitor Signals: power, height, pressure, wind_speed_a, wind_speed_b, temperature_a, temperature_b
Battery:
- Control Signals: on_off, target_power
- Monitor Signals: current, voltage, temperature, state_of_charge, actual_charge_power
Log messages: logs.parquet
Log messages are in German and only sent when the signal value changes. Example:
Wertänderung "SysLogDaten".Inverter_ac_power Altwert: 239,0 aktueller Wert: 20,0 CPU:SECCPU16
DuckDB File: merged_datasets.duckdb
The combined merged_datasets.duckdb file contains all data from the Parquet files plus raw packet data.
Differences from the Parquet files:
process_data is split into four tables: wind_process_data, pv_process_data, battery_process_data, demand_process_data (one column per signal instead of JSON values).
attack_session_llm is renamed to attack_session.
attack_exec_steps is renamed to exec_steps and setup_duration is stored as an interval instead of a bigint.
packet_metadata is renamed to packets. The packets from the PCAP files are stored in the raw_packet BLOB column.
The l2_flow_id, l3_flow_id, and l4_flow_id columns are omitted, which are always null in the Parquet files.
id columns use uuid type instead of blob.
- Categorical columns (
model_name, transport, state, kind, etc.) use DuckDB enum types instead of string.
Column name prefix for process data tables:
- C_: Control signals (commands sent to power plants)
- M_: Monitor signals (measured values sent to SCADA)
Funding
This research is supported in part by funding from the topic Engineering Secure Systems of the Helmholtz Association (HGF) and by KASTEL Security Research Labs (structure 46.23.02).