Solar Images and Physical Features for Probabilistic Solar Wind Speed Prediction

DOI

This dataset provides solar images and physical features from June 2010 to June 2024, which can be used for solar wind speed (SWS) forecasting. The data is a supplement to the publication "PROSWIN: Probabilistic Solar Wind Speed Forecasting Using Deep Distributional Regression With Solar Images" (Journal TBD) by Collin et al. (2026) and facilitates reproducing all results therein. We provide:

(1) three preprocessed solar image channels and preprocessed magnetograms from the SDO satellite (AIA: 171 Å, 193 Å, 211 Å; HMI: magnetograms; note that the images are not yet normalized and not scaled to the interval [-1,1]),

(2) 63 physical features describing the solar wind conditions from the previous solar rotation, the state of the solar cycle, and the position angle of Earth relative to the solar equator,

(3) the trained neural network models based on combinations of solar image channels, magnetograms, and physical features from the journal publication,

(4) a list of high-speed solar wind streams (HSSs) and coronal mass ejections (CMEs), which can be used for investigating the effectiveness of a prediction model with regard to HSSs and CMEs,

(5) the predicted time series of all the models we trained and of the models from the literature that we compare to in the journal publication, and

(6) the solar wind speed and sunspot number time series we use in the journal publication.

Image downloading and preprocessing was done using the following Python code: https://github.com/DanielCollin96/solar_image_processing. For all details on the data preparation and usage, we refer to the original journal article by Collin et al. (2026).

Identifier
DOI https://doi.org/10.5880/GFZ.OJSJ.2026.001
Related Identifier Cites https://doi.org/10.1029/2020SW002673
Related Identifier Cites https://doi.org/10.1029/2021SW002976
Related Identifier Cites https://doi.org/10.24414/qnza-ac80
Related Identifier Cites https://doi.org/10.1029/2024SW004125
Related Identifier Cites https://doi.org/10.3847/1538-4365/ab1005
Related Identifier Cites https://doi.org/10.1002/2017JA024586
Related Identifier Cites https://doi.org/10.1007/s11207-024-02321-y
Related Identifier Cites https://doi.org/10.3847/1538-4365/adbaed
Related Identifier Cites https://doi.org/10.1051/0004-6361/202140640
Related Identifier Cites https://doi.org/10.1007/s11207-011-9776-8
Related Identifier Cites https://doi.org/10.48322/1shr-ht18
Related Identifier Cites https://doi.org/10.1029/2004JA010598
Related Identifier Cites https://doi.org/10.7910/DVN/C2MHTH
Related Identifier Cites https://doi.org/10.1007/s11207-011-9834-2
Metadata Access http://doidb.wdc-terra.org/oaip/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=oai:doidb.wdc-terra.org:8680
Provenance
Creator Collin, Daniel ORCID logo; Shprits, Yuri ORCID logo; Chiarabini, Luca ORCID logo; Hofmeister, Stefan ORCID logo; Klein, Nadja ORCID logo; Gallego, Guillermo ORCID logo
Publisher GFZ Data Services
Contributor Collin, Daniel
Publication Year 2026
Funding Reference Helmholtz International Berlin Research School in Data Science (HEIBRiDS)
Rights Creative Commons Attribution 4.0 International; https://creativecommons.org/licenses/by/4.0/legalcode
OpenAccess true
Contact Collin, Daniel (GFZ Helmholtz Centre for Geosciences, Potsdam, Germany)
Representation
Resource Type Dataset
Discipline Geosciences
Spatial Coverage (0.000 LON, 0.000 LAT); Covers the solar disk from June 2010 to June 2024