Why do models trained on anonymized data score worse?Classic anonymization techniques have in common that they manipulate original data in order to hinder tracing back individuals. They manipulate data and thereby destroy data in the process. The more you anonymize, the better your data is protected, but also the more your data is destroyed. This is especially devastating for AI and modeling tasks where “predictive power” is essential, because bad quality data will result in bad insights from the AI model. SAS demonstrated this, with an area under the curve (AUC*) close to 0.5, demonstrating that the models trained on anonymized data perform by far the worst.