3

Training on Plausible Counterfactuals Removes Spurious Correlations

We introduce a training paradigm that uses plausible counterfactual explanations (p-CFEs) to match standard model accuracy while reducing reliance on spurious correlations.