Dataset - B2FIND

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Cr...

Themis-CodeRewardBench is a code-specific reward model evaluation benchmark comprising ~8.9k diverse code preference pairs across eight programming languages and five quality...
Reward Modeling for Scientific Writing Evaluation

The components of this dataset are used in the experiments of the paper "Reward Modeling for Scientific Writing Evaluation". Please see README.md for more information.

You can also access this registry using the API (see API Docs).

2 datasets found