OHRP Definition of Coded:
Coded means that
identifying information (such as name or social security number) that would enable the investigator to readily ascertain the identity of the individual to whom the private information or specimens pertain has been replaced with a number, letter, symbol, and/or combination thereof (i.e., the code); and
a key to decipher the code exists, enabling linkage of the identifying information to the private information or specimens.
In order to minimize risk to subjects, data sets and biospecimens are frequently coded. Coding a data set means replacing the name, MR# and other readily identifiable fields with a unique identifier code number. Just because the data is coded does not mean that all elements of PHI have to be removed. Data sets should contain the minimum PHI necessary consistent with the requirements of the research.
A wide variety of methods can be used to code data/specimens. The numbers could be sequential, they could be hashed from a sequential number, they could be randomly assigned, they could include a site identifier and a sequential number - it doesn't matter as long as they are unique and not otherwise associated with the individual.
HIPAA considers Code numbers that are derived from any element of PHI to still be considered PHI. For example, a Code that was derived from a subject's initials plus a unique number would still be considered PHI. Commonly, the Medical Record number is hashed or encrypted to create a unique Code number. The MR# is desirable to prevent enrolling the same subject more than once and to ensure that data updates are correctly merged with the correct subject's data.
Even though the Code number is unique and the subject is not be readily identifiable, since the it was derived from PHI the dataset cannot be claimed to be de-identified and HIPAA protections will apply when sharing the data. One solution is to generate new ID numbers that replace the hashed or encrypted MR# when exporting the data/specimens.
More information about sharing data/biospecimens with other investigators at CHOP and external to CHOP can be found on the IRB's page devoted to Sharing Data.
Complete Data Set with Identifiers
This data set has complete identifiers including names, MR# and date of birth and date of service.
|Subject ID#||Last Name||First Name||MR#||Birth Date||Date of Surgery||Age (yrs)||Diagnosis||Surgical Procedure|
|A00002||Smith||Sally||09890580||03/02/85||02/05/99||13.93||ASD||Patch closure ASD|
Master List + Coded Data Set
Risk of breach of confidentiality can be decreased by separating using a Master List for the as much of the PHI as possible. The data set should contain only the minimum necessary PHI for the purposes of the research. In this example, dates are retained to allow calculation of age at a later point in time.
|Subject ID#||Last Name||First Name||MR#||Birth Date||Date of Surgery|
Coded Data Set with Minimum Necessary PHI:
|Subject ID#||Birth Date||Date of Surgery||Diagnosis||Surgical Procedure|
|A00002||03/02/85||02/05/99||ASD||Patch closure ASD|
Coded Data Sets That Could Still Be Used to Re-Identify Individuals
Both data sets below are coded and could be relinked to subjects. The top data set retains the original subject ID code which is the key to link to the Master List. In the limited data set on the bottom, a new ID coded has replaced the original code. Even though new ID number can no longer serve as a key to relink the subjects to PHI, the data set still has some PHI (dates are retained). Subjects could potentially be re-identified from the data in either data set.
Coded Data Set without PHI (with a Key to Re-Link):
|Subject ID#||Age (yrs)||Diagnosis||Surgical Procedure|
|A00002||13.93||ASD||Patch closure ASD|
Limited Data Set (with Dates that Can be Used to Re-Link):
|Random ID#||Birth Date||Date of Surgery||Diagnosis||Surgical Procedure|
|B0065||03/02/85||02/05/99||ASD||Patch Closure ASD|
Deidentified or Anonymized Dataset
All identifiers have been stripped from this data set and the original ID number has been replaced with a new unique ID number.
- If the new code is randomly generated in a reproducible way and a key is retained that can allow the process to "go backwards" to regenerate the original code, the data set is encrypted.
- If it is not possible to regenerate the original code, so subjects cannot be re-identified from this data set then the data set has been anonymized or deidentified.
- If the data was originally collected in this format without a link to subjects, then the data set would be anonymous.
Deidentified Data Set:
|Random ID#||Age (yrs)||Diagnosis||Surgical Procedure|
|B0065||13.93||ASD||Patch closure ASD|
For more information see OHRP: Guidance on Research Involving Coded Private Information or Biological Specimens