Ohm, Paul. "Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization" UCLA Law Review 57:6 (August 2010), p.1701-1777.
Existing Citations
anonymized data (p.1703-1704): Imagine a database packed with sensitive information about many people. Perhaps this database helps a hospital track its patients, a school its students, or a bank its customers. Now imagine that the office that maintains this database needs to place it in long-term storage or disclose it to a third party without compromising the privacy of the people tracked. To eliminate the privacy risk, the office will anonymize the data, consistent with contemporary, ubiquitous data-handling practices. First, it will delete personal identifiers like names and social security numbers. Second, it will modify other categories of information that act like identifiers in the particular context—the hospital will delete the names of next of kin, the school will excise student ID numbers, and the bank will obscure account numbers. What will remain is a best-of-both-worlds compromise: Analysts will still find the data useful, but unscrupulous marketers and malevolent identity thieves will find it impossible to identify the people tracked. Anonymization will calm regulators and keep critics at bay. Society will be able to turn its collective attention to other problems because technology will have solved this one. Anonymization ensures privacy. Unfortunately, this rosy conclusion vastly overstates the power of anonymization. Clever adversaries can often reidentify or deanonymize the people
hidden in an anonymized database. This Article is the first to comprehensively incorporate an important new subspecialty of computer science, reidentification science, into legal scholarship. This research unearths a tension that shakes a foundational belief about data privacy: Data can be either useful or perfectly
anonymous but never both. (†1629)
anonymized data (p.1724): Many anonymization techniques would be perfect, if only the adversary knew nothing else about people in the world. In reality, of course, the world is awash in data about people, with new databases created every day. Adversaries combine anonymized data with outside information to pry out obscured identities. (†1631)
anonymized data (p.1746): The accretion problem is this: Once an adversary has linked two anonymized databases together, he can add the newly linked data to his collection of outside information and use it to help unlock other anonymized databases. Success breeds further success. Narayanan and Shmatikov explain that “once any piece of data has been linked to a person’s real identity, any association between this data and a virtual identity breaks the anonymity of the latter.” This is why we should worry even about reidentification events that seem to expose only nonsensitive information, because they increase the linkability of data, and thereby expose people to potential future harm. (†1632)
anonymized data (p.1752): Utility and privacy are, at bottom, two goals at war with one another. In order to be useful, anonymized data must be imperfectly anonymous. “[P]erfect privacy can be achieved by publishing nothing at all—but this has no utility; perfect utility can be obtained by publishing the data exactly as received from the respondents, but this offers no privacy.” No matter what the data administrator does to anonymize the data, an adversary with the right outside information can use the data’s residual utility to reveal other information. Thus, at least for useful databases, perfect anonymization is impossible. Theorists call this the impossibility result. There is always some piece of outside information that could be combined with anonymized data to reveal private information about an individual. (†1633)
data de-identification (p.1716): Privacy lawyers tend to refer to release-and-forget anonymization techniques using two other names: deidentification and the removal of personally identifiable information (PII). Deidentification has taken on
special importance in the health privacy context. Regulations implementing the privacy provisions of the Health Insurance Portability and Accountability Act (HIPAA) expressly use the term, exempting health providers and researchers who deidentify data before releasing it from all of HIPAA’s many onerous privacy requirements. (†1630)