Cybersecurity Insights

a NIST Blog

Summation and Average Queries: Detecting Trends in Your Data

In our last Differential Privacy Blog, we discussed how to determine how many people drink pumpkin spice lattes in a given time period without learning their identifying information. But say, for example, you would like to know the total amount spent on pumpkin spice lattes this year, or the average price of a pumpkin spice latte since 2010. You'd like to detect these trends in data without being able to learn identifying information about specific customers to protect their privacy. To do this, you can use summation and average queries answered with differential privacy.

In this post, we will move beyond counting queries and dive into answering summation and average queries with differential privacy. Starting with the basics: in SQL, summation and average queries are specified using the SUM and AVG aggregation functions:

SELECT SUM(price) FROM PumpkinSpiceLatteSales WHERE year = 2020SELECT AVG(price) FROM PumpkinSpiceLatteSales WHERE year > 2010

In Pandas, these queries can be expressed using the sum() and mean() functions, respectively. But how would we run these queries while also guaranteeing differential privacy?

If you have questions or problems with the subscription service, please contact subscriberhelp.govdelivery.com.
Technical questions? Contact inquiries@nist.gov. (301) 975-NIST (6478).

This service is provided to you at no charge by National Institute of Standards and Technology (NIST). 100 Bureau Drive, Stop 1070 · Gaithersburg, MD 20899 · 301-975-6478

News Information and Media Site

Thursday, December 17, 2020

Summation and Average Queries: Detecting Trends in Your Data

Cybersecurity Insights

Summation and Average Queries: Detecting Trends in Your Data

Read More

Connect with us

No comments:

Page List

Blog Archive

Report Abuse

About Me

Search This Blog

Patent Examination Data Systems (PEDS) retiring soon