This repository contains five datasets used in the analysis of parliamentary speech and scientific claims. Below is a description of each file and its role in the research pipeline:
DB_Polgeo_Degurba.csv
This is the main dataset compiled for the study. It includes the full set of observations and variables used in the primary analyses.
DB_Polgeo_Degurba_coded.csv
This dataset includes initial labels generated through embbedings classification.
DF_Train_2.csv and DF_Test_2.csv
These files correspond to training and test datasets automatically generated using Mixtral. They serve as a synthetic gold standard for model validation.
Additional Information:
Please note that the scripts and code used to process and analyze these data are still under revision and preparation for easy replication. They will be uploaded as soon as the replication pipeline is finalized and verified. This note will be updated once the full code is available to ensure reproducibility. Now, they can be accessed upon request in (https://github.com/PSUBRodilla/RepMat_-Stereotypes)