Blog

  • If you can’t measure data privacy, you can’t manage data privacy.

    If you can’t measure data privacy, you can’t manage data privacy.

    Data privacy is important to consumers and a material compliance obligation of enterprises who serve them, but difficult to achieve – and prove – due to the absence of clear standards.  Pew Research reports that over 80% of Americans highly value their data privacy and don’t trust enterprises to properly protect it. “Privacy” is defined…


  • Predicting rare instances for AI/ML in a GDPR world

    Predicting rare instances for AI/ML in a GDPR world

    Success with AI/ML depends first and predominantly on your data. “Garbage in, garbage out” is both a familiar and accurate refrain. The integrity of your data pipeline will be the biggest determinant of whether you are able to produce impactful AI/ML products to solve critical problems—the kind that face us in every industry to address…


  • Ethical AI begins with trusted data that has fidelity and privacy

    Ethical AI begins with trusted data that has fidelity and privacy

    As I prepared to speak with my fellow ethical AI panelists for the April 24, 2023 EAIGG discussion “New Wave of Data Centric AI Startups”, I thought further about how to articulate the mandate and opportunity presented by AI to solve critical problems faced by humanity. We are in the infancy of enterprise AI. The…


  • HIPAA fails both patients and product innovators

    HIPAA fails both patients and product innovators

    The HIPAA privacy rule fails to provide patient privacy or data value for enterprises that want to leverage sensitive data to innovate and improve patient care. A patient’s health information is vulnerable to disclosure, through accident or bad actor, even if an organization adheres to HIPAA. HIPAA should have established a much higher standard of…


  • Synthetic data is a hammer. Not everything is a nail.

    Synthetic data is a hammer. Not everything is a  nail.

    Synthetic data was created to create data, not privacy. Synthetic data is popular in the MLOps space, but it’s often used for the wrong applications. Used for amplifying data where you don’t have enough original data, it’s the wrong method when applied to sensitive data that must be kept private, especially where GDPR requirements are…


  • The Sorry State of Data Anonymization Performance Metrics

    The Sorry State of Data Anonymization Performance Metrics

    The lack of data anonymization standards requires customers to flip a coin in selecting these solutions to protect their sensitive data. Many new data anonymization technologies have been launched to serve data scientists and ML engineers, but without a standard measure of privacy and utility, customers can’t make objective choices on which technology to use.…