pseudonymized data [English]

Syndetic Relationships

InterPARES Definition

n. ~ Data that has been anonymized by replacing personally identifiable information (US) or personal data (EU) with a different identity in order to remove any association between the individual and associated data.

General Notes

Pseudonymization anonymizes an individual's identifying information in a consistent manner across multiple datasets, allowing information in those datasets to be linked.

Other Definitions


  • Health Analytics 2013 (†704 ): Pseudonymization (from pseudonym) allows for the removal of an association with a data subject. It differs from anonymization (anonymous) in that it allows for data to be linked to the same person across multiple data records or information systems without revealing the identity of the person. The technique is recognized as an important method for privacy protection of personal health information. It can be performed with or without the possibility of re-identifying the subject of the data (reversible or irreversible pseudonymization). [Citing: Health Informatics 2008.] (†1615)
  • Health Analytics 2013 (†704 ): In simple terms then, anonymization is used to ensure you don't know whose record you're looking at; "pseudonymization" achieves the same thing but allows you to link multiple pseudonymized data sets together. . . . For example if you have Community Care records and GP records both stored by NHS Number: · If you anonymize both sets of data, you can't then join them to list all records of either type for a single patient, because the NHS numbers in each case have been anonymize in different ways. · If you pseudonymize both record sets, you can still join them together, because the each NHS number is replaced with the same thing in each set. (†1616)
  • Hon, et al. 2011 (†692 p.215-216): Pseudonyms, involving substituting nicknames, etc for names, are indirect identifiers. WP136 describes pseudonymization as ‘the process of disguising identities’, to enable collection of additional information on the same individual without having to know his identity, particularly in research and statistics. There are two types: *Retraceable/reversible pseudonymization aims to allow ‘retracing’ or re-identification in restricted circumstances. For example, ‘key-coded data’ involves changing names to code numbers, with a ‘key’ mapping numbers to names. This is common in pharmaceutical trials. Another example is applying two-way cryptography to direct identifiers. *Irreversible pseudonymization is intended to render re-identification impossible, for example ‘hashing’, applying one-way cryptography (hash functions) to direct identifiers. Retraceably pseudonymized data may be ‘personal data’, as the purpose is to enable re-identification, albeit in limited circumstances. If each code is unique to an individual, identification is still a risk, so pseudonymized information remains ‘personal data’. However, if pseudonymization reduces the risks for individuals, data protection rules could be applied more flexibly, and the processing of pseudonymized data subjected to less strict conditions, than the processing of information regarding directly identifiable individuals. (†1648)
  • Sonehara, et al. 2011 (†723 p.158): Whereas anonymity mechanisms are not suitable for personalized service since they either give unlimited access to personal data or prohibit any access, pseudonymity mechanisms allow a controlled disclosure of a subset of an individual’s personal data. (†1649)