Data from Hackathon "GenHack 3 Generative Modeling Challenge": Predicting maize crop yield distribution under stochastic weather

DOI

Hackathon Overview

The GenHack 3 is a data challenge organized by École Polytechnique in 2024. 
The task was to construct generative models for predicting maize crop yield distributions conditional on temperatures and rainfall across multiple locations simultaneously. 
This hackathon was divided into two rounds, each with different levels of weather conditioning, to accurately reproduce the effects of weather on final crop yield. The detailed task description can be provided upon request.

Dataset Creation

The dataset was generated using a Stochastic Weather Generator (SWG) and a crop model. The SWG was trained on data from four French weather stations. 
This weather station data was downloaded through the

   INRAE CLIMATIK platform, managed by the AgroClim laboratory of Avignon, France (site available in French).
The stations and their identification numbers are as follows: Montreuil-Bellay (49215002), Mons-en-Chaussée (80557001), Saint-Martin-de-Hinx (40272002), and Saint-Gènes-Champanelle (63345002). 
We trained a daily multi-site and multivariate SWG using the following weather variables: daily minimum and maximum temperatures, precipitation, solar irradiance, and Penman evapotranspiration.


The SWG is an extension of the model described in the paper 
"Interpretable Seasonal Hidden Markov Model for Spatio-temporal Stochastic Rain Generation". The full training details are available in the 
tutorial of the Julia package 
StochasticWeatherGenerators.jl. 
The SWG generated N years of weather data, which was input into the 
STICS crop model for maize 
(see the STICS website) to produce N annual crop yield values. 
The parameters used in the STICS model are also described in the tutorial. The most important modification to the default parameters is that no irrigation was provided, to highlight the hydric stress on the plant.

Weather Data Aggregation

Daily maximum temperatures and average rainfall were aggregated into nine periods spanning April 27 to October 27 (the maize growth period):


Period 1: April 27 - May 16
Period 2: May 17 - June 5
Period 3: June 6 - June 25
Period 4: June 26 - July 15
Period 5: July 16 - August 4
Period 6: August 5 - August 24
Period 7: August 25 - September 13
Period 8: September 14 - October 3
Period 9: October 4 - October 27


Details on reproducing these aggregated variables are explained in the

tutorial section "Sensitivity of maize on rainfall during key growth periods".


Participants were provided with these aggregated weather variables and the resulting yield data. The objective was to build a generative model capable of generating yield values conditionally on specific weather conditions (e.g., high or low rainfall).

Dataset Structure

The dataset provided to participants included 104 realizations, with a separate validation dataset of 105 realizations used for evaluation.


Column 1 (YEAR): Year number
Columns 2-10 (W_1-W_9): Mean daily temperature (°C) over each of the nine periods
Columns 11-19 (W_10-W_18): Mean daily rainfall (mm/mm²) over each of the nine periods
Column 20 (YIELD): Annual maize yield (t/ha)

Julia, 1

Tutorial on how to generate similar dataset on https://dmetivie.github.io/StochasticWeatherGenerators.jl/dev/examples/tuto_add_station_variable/

Identifier
DOI https://doi.org/10.57745/C3FNBY
Metadata Access https://entrepot.recherche.data.gouv.fr/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.57745/C3FNBY
Provenance
Creator Métivier, David ORCID logo; Allouche, Michael; Saux, Marine; Gobet, Emmanuel ORCID logo; Pachebat, Jean
Publisher Recherche Data Gouv
Contributor Métivier, David; Entrepôt-Catalogue Recherche Data Gouv
Publication Year 2025
Rights etalab 2.0; info:eu-repo/semantics/openAccess; https://spdx.org/licenses/etalab-2.0.html
OpenAccess true
Contact Métivier, David (INRAE)
Representation
Resource Type Dataset
Format text/comma-separated-values; text/tab-separated-values
Size 35721192; 3443984; 35807198; 3458521; 35652151; 3447731; 35843381; 3459653
Version 1.2
Discipline Geosciences; Mathematics; Earth and Environmental Science; Environmental Research; Natural Sciences