Tag: Training Data

  • Proving the Tradeoff. A New Way to Think About Precision and Recall.

    Fifth in a Series: Mastering data quality for safe and scalable AI Saibal Banerjee, PhD and Steven Atneosen, JD Introduction Every AI practitioner knows the uneasy balance: boosting precision usually costs recall, and maximizing recall often tanks precision. Yet in high-stakes workflows – healthcare diagnostics, fraud detection, border security, or financial approvals – “intuition” about…

  • Measure twice, cut once. Weighted metrics matter.

    Fourth in a Series: Mastering data quality for safe and scalable AI Saibal Banerjee, PhD and Steven Atneosen, JD Introduction AI practitioners often struggle with assessing the performance of multi-class classification models. When your model has to recognize multiple categories, especially if the populations of these categories are imbalanced – i.e., they vary widely –…

  • If it walks like a duck, talks like a duck, it’s a pig… The Confusion Matrix is a simple tool for model performance

    Third in a Series: Mastering data quality for safe and scalable AI Saibal Banerjee, PhD and Steven Atneosen, JD Introduction If there’s one tool every data scientist should master, it’s the confusion matrix. Despite its simplicity, it offers unparalleled insight into model performance, including class-wise behaviors, average trends, systemic biases, and types of errors. If…

  • Precision vs. Recall — Choosing the Right Metric for the Job

    Second in a Series: Mastering Data Quality for Safe & Scalable AI Saibal Banerjee, PhD and Steven Atneosen, JD IntroductionIn order to master data quality for safe and scalable AI, precision and recall go beyond the simplicity of accuracy by helping quantify the nature of classification errors. These metrics are essential for professionals aiming to…

  • Accuracy Isn’t Enough: Why You Need to Rethink AI Model Metrics

    First in a Series: Mastering Data Quality for Safe & Scalable AI Saibal Banerjee, PhD and Steven Atneosen, JD Introduction Accuracy is the most commonly cited metric in machine learning. But if your dataset is imbalanced, accuracy can be dangerously misleading. For professionals working with sensitive PII, this misdirection can be catastrophic. Understanding how your…

  • If you can’t measure data privacy, you can’t manage data privacy.

    If you can’t measure data privacy, you can’t manage data privacy.

    Data privacy is important to consumers and a material compliance obligation of enterprises who serve them, but difficult to achieve – and prove – due to the absence of clear standards.  Pew Research reports that over 80% of Americans highly value their data privacy and don’t trust enterprises to properly protect it. “Privacy” is defined…