Richardson, et al. 2015 (†714)Richardson, Victor, Sallie Milam, and Denise Chrysler. "Is Sharing De-identified Data Legal? The State of Public Health Confidentiality Laws and Their Interplay with Statistical Disclosure Limitation Techniques." Journal of Law, Medicine, and Ethics 43:1 Supplement (Spring 2015), p.83-86.
- data de-identification (p.85): The science of de-identification continues to advance, and data de-identification has become an accepted form of protecting the confidentiality of personal information under federal regulation. At the same time, re-identification studies have continued to focus on data disclosures that fail to meet any modern standard of de-identification. Thus, while public health organizations may lack specific guidance on how to de-identify data in a way permissible under their applicable state confidentiality laws, they can reasonably rely on the efficacy of modern de-identification techniques, so long as the governing confidentiality standard allows for the disclosure of data that does not identify an individual. (†1634)
- data de-identification (p.84): In recent years, researchers have studied techniques to re-identify purportedly confidential datasets. These studies often report startling high success rates, and have caused some scholars to question the efficacy of de-identification entirely.10 For example, an often cited 2000 study found 87% of the U.S. population could be uniquely identified by their combination of gender, date of birth, and zip code. Even when new researchers replicated the study to reflect a growing population, they still found 63% of the population uniquely identifiable using these variables. Out of context, these numbers are startling. In reality, however, these unique and exact combinations of gender, date of birth, and zip code would never be present in a de-identified dataset. Such combinations are either generalized or removed entirely, drastically reducing the risk of re-identification. In the latter study above, researchers found the risk of unique identification dropped sharply when given slightly more abstract data. When they replaced an individual’s full date of birth with only the month and year, only 4.2% of the population remained uniquely identifiable, and when they also replaced zip code with county, just 0.2% remained uniquely identifiable. More impressively, data de-identified using the HIPAA safe harbor method is said to present only a .04% risk of unique identification. Still, the majority of re-identification studies continue to target data that is not truly de-identified, leading to what some call “the myth of easy re-identification.” While academics and scientists debate de-identification’s merits, however, a more pertinent question has been neglected: is sharing de-identified data legal? (†1635)