Summation and Average Queries: Detecting Trends in Your Data In our last Differential Privacy Blog, we discussed how to determine how many people drink pumpkin spice lattes in a given time period without learning their identifying information. But say, for example, you would like to know the total amount spent on pumpkin spice lattes this year, or the average price of a pumpkin spice latte since 2010. You'd like to detect these trends in data without being able to learn identifying information about specific customers to protect their privacy. To do this, you can use summation and average queries answered with differential privacy. In this post, we will move beyond counting queries and dive into answering summation and average queries with differential privacy. Starting with the basics: in SQL, summation and average queries are specified using the SUM and AVG aggregation functions: SELECT SUM(price) FROM PumpkinSpiceLatteSales WHERE year = 2020SELECT AVG(price) FROM PumpkinSpiceLatteSales WHERE year > 2010 In Pandas, these queries can be expressed using the sum() and mean() functions, respectively. But how would we run these queries while also guaranteeing differential privacy? |
No comments:
Post a Comment