Microsoft and the OpenDP Initiative at Harvard have collaborated on a new platform that will offer differential privacy for large datasets. Differential privacy allows researchers to analyze datasets without having important data withheld, while also preserving the privacy of that data, according to Microsoft.
“Differential privacy, the heart of today’s landmark milestone, was invented at Microsoft Research a mere 15 years ago. In the life cycle of transformative research, the field is still young. I am excited to see what this platform will make possible,” said Cynthia Dwork, Gordon McKay professor of CS at Harvard and Distinguished Scientist at Microsoft.
RELATED CONTENT: Microsoft sets plan to bridge “data divide”
According to Microsoft, data analysis is necessary to come up with solutions for the major issues facing us today, such as climate change, racial inequality, and COVID-19. According to John Kahan, chief data analytics officer at Microsoft, however, the deeper into a dataset a researcher goes, the more likely it is that they will reveal personally identifiable information (PII).
Microsoft and Harvard’s differential privacy platform uses two mechanisms for protecting PII in data sets.
First, it adds statistical noise to each data point, which protects the privacy of an individual without rendering the dataset useless.
Second, it calculates the amount of information revealed by a query and deducts that from an overall privacy budget. If it deems personal privacy might be compromised by revealing data, any additional queries are halted.
By masking PII in datasets, researchers aren’t blocked from that valuable data because of that information, and can continue utilizing that data in their research without being able to gather PII on the sources of the data. This also allows researchers to more safely and easily share their findings without worrying about unveiling PII.
“The resulting insights will have an enormous and lasting impact and will open new avenues of research that allow us to develop creative solutions for some of the most pressing problems we currently face,” Kahan wrote in a post.
Progress released new troubleshooting solution, Fiddler Jam
Data Quality: Volume, interdependencies can create big problems
CircleCI webhooks enables dev teams to streamline workflows