Tag: Training Data

F1 Score and Beyond — When You Need One Metric to Rule Them All

Sixth in a Series: Mastering data quality for safe and scalable AI Saibal Banerjee, PhD and Steven Atneosen, JD Introduction. AI practitioners face a paradox: the more complex the model, the more elusive its reliability becomes. Precision and recall are essential metrics, but each tells only part of the story. When both false positives and…

October 29, 2025
Proving the Tradeoff. A New Way to Think About Precision and Recall.

Fifth in a Series: Mastering data quality for safe and scalable AI Saibal Banerjee, PhD and Steven Atneosen, JD Introduction Every AI practitioner knows the uneasy balance: boosting precision usually costs recall, and maximizing recall often tanks precision. Yet in high-stakes workflows – healthcare diagnostics, fraud detection, border security, or financial approvals – “intuition” about…

September 12, 2025
Measure twice, cut once. Weighted metrics matter.

Fourth in a Series: Mastering data quality for safe and scalable AI Saibal Banerjee, PhD and Steven Atneosen, JD Introduction AI practitioners often struggle with assessing the performance of multi-class classification models. When your model has to recognize multiple categories, especially if the populations of these categories are imbalanced – i.e., they vary widely –…

August 20, 2025
If it walks like a duck, talks like a duck, it’s a pig… The Confusion Matrix is a simple tool for model performance

Third in a Series: Mastering data quality for safe and scalable AI Saibal Banerjee, PhD and Steven Atneosen, JD Introduction If there’s one tool every data scientist should master, it’s the confusion matrix. Despite its simplicity, it offers unparalleled insight into model performance, including class-wise behaviors, average trends, systemic biases, and types of errors. If…

July 9, 2025
Precision vs. Recall — Choosing the Right Metric for the Job

Second in a Series: Mastering Data Quality for Safe & Scalable AI Saibal Banerjee, PhD and Steven Atneosen, JD IntroductionIn order to master data quality for safe and scalable AI, precision and recall go beyond the simplicity of accuracy by helping quantify the nature of classification errors. These metrics are essential for professionals aiming to…

May 22, 2025
Accuracy Isn’t Enough: Why You Need to Rethink AI Model Metrics

First in a Series: Mastering Data Quality for Safe & Scalable AI Saibal Banerjee, PhD and Steven Atneosen, JD Introduction Accuracy is the most commonly cited metric in machine learning. But if your dataset is imbalanced, accuracy can be dangerously misleading. For professionals working with sensitive PII, this misdirection can be catastrophic. Understanding how your…

May 15, 2025
If you can’t measure data privacy, you can’t manage data privacy.

Data privacy is important to consumers and a material compliance obligation of enterprises who serve them, but difficult to achieve – and prove – due to the absence of clear standards. Pew Research reports that over 80% of Americans highly value their data privacy and don’t trust enterprises to properly protect it. “Privacy” is defined…

December 7, 2023