When learning a movement based on binary success information, one is more variable
following failure than following success. Theoretically, the additional variability post
failure might reflect exploration of possibilities to obtain success. When average
behavior is changing (as in learning), variability can be estimated from differences
between subsequent movements. Can one estimate exploration reliably from such
trial-to-trial changes when studying reward-based motor learning? To answer this
question, we tried to reconstruct the exploration underlying learning as described by
four existing reward-based motor learning models. We simulated learning for various
learner and task characteristics. If we simply determined the additional change post
failure, estimates of exploration were sensitive to learner and task characteristics. We
identified two pitfalls in quantifying exploration based on trial-to-trial changes. Firstly,
performance-dependent feedback can cause correlated samples of motor noise and
exploration on successful trials, which biases exploration estimates. Secondly, the trial
relative to which trial-to-trial change is calculated may also contain exploration, which
causes underestimation. As a solution, we developed the additional trial-to-trial change
(ATTC) method. By moving the reference trial one trial back and subtracting trial-to-trial
changes following specific sequences of trial outcomes, exploration can be estimated
reliably for the three models that explore based on the outcome of only the previous
trial. Since ATTC estimates are based on a selection of trial sequences, this method
requires many trials. In conclusion, if exploration is a binary function of previous trial
outcome, the ATTC method allows for a model-free quantification of exploration.
See the OSF page: https://osf.io/x7hp9/