How we measure privacy, data utility, and accuracy
No universally accepted privacy or accuracy metric currently exists to compare privacy enhanced technology or anonymization technologies, so we compare our products to our competitors using their published standards and scientific methods as well as our more accurate and objective method based upon generally accepted scientific methods. Note below how tomtA.ai compares with synthetic data solutions. Synthetic data is an amplification technology, not a privacy technology, which makes the enterprise choose between accuracy or privacy, but not both.
Privacy is our competitive advantage and we exceed the compliance requirements of GDPR to provide an enhanced AI based differential privacy mathematical guarantee of privacy. Our True Atomic Privacy displays select material privacy metrics that indicate the level of privacy that you achieve. We use the most accepted data privacy scientific method by MIT professor Robert M. Fano (Fano). The Fano method, and the resulting Naïve Privacy Measure which we derived from it, measures the amount of sensitive information leaked.
In addition to the Naïve Privacy Measure, tomtA.ai calculates the Epsilon Loss Bound which shows how well one can anonymize sensitive data through the process of noise addition and suppression.
To measure data utility we use the same scientific method as other PETs, including synthetic data solutions such as Gretel and tonic.ai. The Jensen-Shannon (JS) Divergence produces a score that compares the accuracy of two ML models, one trained with the original data and the other trained with our anonymized data.
We measure the classification accuracy of any trial, like most competitive PETs, with a random 50% subset of the original labeled dataset for training and the remaining subset for testing; train a classifier RFC O with the original training subset and validate it against the testing subset which produces the percentage original accuracy; train a classifier RFC A on supersampled anonymous data produced from the training subset and validate it against the testing subset which produces the percentage anonymous accuracy; calculate accuracy difference = original accuracy minus anonymous accuracy, and classification accuracy = 100 – accuracy difference; and finally, we report the average classification accuracy over all the N trials.
tomtA.ai trust is in our outcomes
tomtA.ai’s True Atomic Privacy provides utility, accuracy and privacy metrics for every data set transformed into Data as a Safe Asset. Whatever the privacy metric produced, enterprises want clarity on the likelihood that Data as a Safe Asset can be re-identified; or more broadly, whether one could devise a method to identify the original sensitive data set. It is impossible to identify the original sensitive data set using the scientific method of exhaustive search.
Impossibility of sensitive data re-identification
Assume a sensitive table with m binary valued attributes and n rows of individuals – e.g., five columns and 100 rows. (The binary assumption is simply to make the analysis easier.) Given an nm anonymized table of this sensitive table, The probability of hitting the same data set above of 5 columns and 100 rows is around 0.5×10-233. A similar probability to identifying a binary password that is 777 characters long.
Data transformed by tomtA.ai True Atomic Privacy is safe to share.