In Apple’s latest Machine Learning Journal entry, the company details how it collects data from its customers while taking privacy concerns into account by using “local” differential privacy, rather than central.
Apple claims that local differential privacy allows the tech giant to harvest big data from iPads, iPhones, and Macs while maintaining user anonymity because the process randomizes data on user devices prior to uploading it to the company server.
This means Apple, or whoever else accesses the server, never sees raw data from a specific device. Central differential privacy, on the other hand, requires collecting raw data on the server itself.
Apple anonymizes data on user devices by adding random noise to it and uploading it en masse to its cloud server. On the server, the noise averages out, allowing Apple to glean insights (such as identifying popular emoji and slang) that it can use to improve user experience. This means Apple can better their products and services by collecting data, without tying it directly back to you.
“When many people submit data, the noise that has been added averages out and meaningful information emerges,” according to Apple.
Localized differential privacy means that Apple can collect data from a large number of users without ever receiving fully accurate or raw data from devices. Any association between records, along with IP identifiers and timestamps, are also immediately discarded once they arrive at the restricted-access server.
“At this point, we cannot distinguish, for example, if an emoji record and a Safari web domain record came from the same user,” the paper continues. “The records are processed to compute statistics. These aggregate statistics are then shared internally with the relevant teams at Apple.”
As an additional privacy safeguard, this is an opt-in system, meaning that users must explicitly grant Apple permission to collect data. You can find the toggle for reporting usage data to Apple under Settings > Privacy > Analytics on iOS, and under System Preferences > Security & Privacy > Privacy > Analytics on macOS Sierra and later.
While Apple suggests in its recent paper that by adopting local differential privacy (which it calls a “superior form of privacy”), it is maximizing user privacy protections, a team of researchers from USC, Indiana University, and Tsinghua University argue otherwise in a study published in September.
The researchers reverse-engineered Apple’s software in order to determine the degree to which user data has been anonymized, and were troubled to find that the user data collected by Apple is significantly more specific than recommended by typical privacy researchers. Moreover, Apple keeps its collection software secret, meaning it could potentially modify its privacy protections without disclosing the change to the public.
“Apple’s privacy loss parameters exceed the levels typically considered acceptable by the differential privacy research community,” said USC professor Aleksandra Korolova to Wired.
Frank McSherry, one of the inventors of differential privacy, put it more bluntly to Wired. “Apple has put some kind of handcuffs on in how they interact with your data,” he said. “It just turns out those handcuffs are made out of tissue paper.”
In response, Apple strongly disputed the accuracy of the team’s findings as well as some of the assumptions made in the course of their research, and noted that the data collection is purely opt-in.