Hackathon Overview
The GenHack 3 is a data challenge organized by École Polytechnique in 2024.
The task was to construct generative models for predicting maize crop yield distributions conditional on temperatures and rainfall across multiple locations simultaneously.
This hackathon was divided into two rounds, each with different levels of weather conditioning, to accurately reproduce the effects of weather on final crop yield. The detailed task description can be provided upon request.
Dataset Creation
The dataset was generated using a Stochastic Weather Generator (SWG) and a crop model. The SWG was trained on data from four French weather stations.
This weather station data was downloaded through the
INRAE CLIMATIK platform, managed by the AgroClim laboratory of Avignon, France (site available in French).
The stations and their identification numbers are as follows: Montreuil-Bellay (49215002), Mons-en-Chaussée (80557001), Saint-Martin-de-Hinx (40272002), and Saint-Gènes-Champanelle (63345002).
We trained a daily multi-site and multivariate SWG using the following weather variables: daily minimum and maximum temperatures, precipitation, solar irradiance, and Penman evapotranspiration.
The SWG is an extension of the model described in the paper
"Interpretable Seasonal Hidden Markov Model for Spatio-temporal Stochastic Rain Generation". The full training details are available in the
tutorial of the Julia package
StochasticWeatherGenerators.jl.
The SWG generated N years of weather data, which was input into the
STICS crop model for maize
(see the STICS website) to produce N annual crop yield values.
The parameters used in the STICS model are also described in the tutorial. The most important modification to the default parameters is that no irrigation was provided, to highlight the hydric stress on the plant.
Weather Data Aggregation
Daily maximum temperatures and average rainfall were aggregated into nine periods spanning April 27 to October 27 (the maize growth period):
Period 1: April 27 - May 16
Period 2: May 17 - June 5
Period 3: June 6 - June 25
Period 4: June 26 - July 15
Period 5: July 16 - August 4
Period 6: August 5 - August 24
Period 7: August 25 - September 13
Period 8: September 14 - October 3
Period 9: October 4 - October 27
Details on reproducing these aggregated variables are explained in the
tutorial section "Sensitivity of maize on rainfall during key growth periods".
Participants were provided with these aggregated weather variables and the resulting yield data. The objective was to build a generative model capable of generating yield values conditionally on specific weather conditions (e.g., high or low rainfall).
Dataset Structure
The dataset provided to participants included 104 realizations, with a separate validation dataset of 105 realizations used for evaluation.
Column 1 (YEAR): Year number
Columns 2-10 (W_1-W_9): Mean daily temperature (°C) over each of the nine periods
Columns 11-19 (W_10-W_18): Mean daily rainfall (mm/mm²) over each of the nine periods
Column 20 (YIELD): Annual maize yield (t/ha)
Julia, 1
Tutorial on how to generate similar dataset on https://dmetivie.github.io/StochasticWeatherGenerators.jl/dev/examples/tuto_add_station_variable/