Sharing Data

Few things are more confusing than the requirements for sharing data with other investigators. This page presents the requirements for sharing deidentified, limited and identifiable data sets and biospecimens (referred to on this page together as "data/specimens").

For more information on sharing data with other investigators with several specific examples, download the NIH Data Sharing Workbook. This document also reviews items that should be incorporated into the protocol when data sharing with other investigators is part of the research plan.

What is needed to share data/biospecimens?

Investigators may obtain data/specimens in a variety of ways. For example the data/specimens are likely to have come from one of the following:

  • Research data/specimens obtained with the informed consent/HIPAA authorization of the subject. The investigator obtaining consent could have stored the data/specimens or else they could have been sent to a data coordinating center or central laboratory. In either case, one of the three following situations likely applies:
    • Consent/HIPAA authorization was obtained that included sharing the data/specimens for any type of future research (broad consent) or to limited types of future research; or
    • Consent/HIPAA authorization was obtained that was silent regarding sharing the data/specimens for future research; or
    • Consent/HIPAA authorization was obtained that precluded sharing the data/specimens for future research.
  • Data/specimens obtained under an IRB-approved waiver of the requirements for consent/HIPAA. For example, an IRB could waive the requirements for both consent and HIPAA authorization to allow an investigator to receive leftover specimens originally obtained for clinical care purposes;
  • Data/specimens originally obtained under a data/use agreement between a provider and a recipient.

Sharing Data/Specimens Stored with Identifiers

Sharing Data

Data/specimens that include identifiers can be used or disclosed as permitted in the consent form and written authorization or with a waiver of consent and HIPAA authorization issued by the IRB. Investigators often want to use data collected for clinical or QI purposes or to analyze research data or use research specimens for other purposes (secondary use). If the investigator needs to retain identifiers for a legitimate research purpose then secondary uses of the identifiable data/specimens might be qualify as human subjects research or not as human subjects research depending on what is shared.

The diagram on the right show that data from a clinical or research database could qualify as not human subjects research – with or without a data use agreement – or might be considered exempt, or might require IRB review and approval. To qualify as not human subjects research, the individual exporting the identifiable data has to be independent of the research. § Independent means not connected in any way to the original research involved in creation of the clinical or research source database or in the new research – in other words the individual is an "honest broker". The second issue is whether or not there are any identifiers in the exported (new research) data set. * Identifier means one or more data elements that renders the subjects readily identifiable. This includes a code number or study ID# that can be used to link back to the individual.

The IRB needs to first determine whether or not the data is readily identifiable. If it is, the Common Rule applies; if it isn't, the Common Rule does not apply. Only after making this decision does the IRB consider whether or not the data contains PHI. A data set with PHI might not be considered readily identifiable but HIPAA would still apply. For example, a data set from the PHIS database represents patients from over 40 free-standing pediatric hospitals and is a national sample. The data could include dates of birth and services without making the individuals readily identifable. (Readily identifiable does not mean potentially identifiable or identifiable with substantial effort.)

Summary of Requirements for Sharing Data Stored with Identifiers

The requirements vary depending upon what type of data is shared; #1 and #2 below involve removal of identifiers so that the data is no longer readily identifiable, while #3 and #4 involve sharing data that will require IRB review and approval (#4).

  1. Creation of a de-identified data set (so that the data is not readily identifiable) by one of the following means:
    • A knowledgable person provides an analysis documenting that the subjects are not individually identifiable in accordance with 45 CFR 164.514(b)(1)
    • All PHI elements in the data set are removed from the data/specimens in accordance with 45 CFR 164.514(b)(2).
  2. Creation of a Limited Data Set (all PHI elements in the data set are removed except dates and zip codes so that the use of the data/specimens will no longer be human subjects research )
    • An IRB waives the requirements of HIPAA authorization; OR
    • A data use agreement is executed between the provider and the recipient.
  3. Recording the data to be used without inclusion of the direct identifiers and having an IRB determine that the research qualifies as exempt from the regulation (Exempt Category 4). (Sharing identifiable data with someone outside of the original investigative team would not qualify for an exemption.); OR

  4. Requesting that the IRB review and approves the research;
    • the investigator obtains the subjects' consent and HIPAA authorization; OR
    • the investigator obtains a waiver of the requirements for consent and authorization from the IRB.

Sharing Deidentified Data and Biospecimens

Data/specimens that have been deidentified would not be considered human subjects research and may be used or shared under the HIPAA Privacy Rule. Most commonly, data/biospecimens qualify as deidentified under the Safe Harbor provision of HIPAA, where all of the 18 HIPAA identifiers have been removed. If data is collected without recording any of the 18 HIPAA identifiers, then it is said to be anonymous. If the 18 identifiers are removed after data collection, then the data/specimens have been anonymized or deidentified.

Data/specimens are often linked to a Master List using a unique code. Codes should not be derived from PHI such as using initials. Codes derived from PHI is still considered PHI. Obscured or permuted dates derived from actual dates will usually still be considered to be PHI. If a data/specimens have had all PHI removed but if there is a code that links back to PHI, the data can only be considered to be deidentified under the following conditions:

  1. If the data/specimens were not collected specifically for the currently proposed research; AND
  2. If the investigator cannot readily ascertain the identify of the individuals whose coded data/specimens will be used because
    • the key to decipher the code has been destroyed before the research begins; OR
    • the investigators and the holder of key enter into an agreement prohibiting the release of the key to the investigators; OR
    • there are IRB-approved written polices and operating procedures for the data registry or specimen repository that prohibit the release of identifiers.

Sharing a Limited Data Set (LDS)

If all elements of PHI have been removed from the data/specimens except for dates (birth date, dates of service, etc.) and/or zip codes, then the data is considered a limited data set. Geocode data can include both 5 and 9 digit zip codes and can include census tract data (see page 53235 of the 2002 HIPAA Privacy Rule). It may not include street address. A LDS may be used or disclosed for the purposes of research with the subjects consent/authorization (e.g. at the time of the original consent) or without their written authorization provided that one (not both) of the following takes place:

  • An IRB issues a waiver of HIPAA authorization and (if applicable) a waiver of consent; OR
  • A data use agreement is duly executed that assures that the recipient of the LDS will use or disclose the PHI only for specified purposes.

If the data are not readily identifiable, an IRB can conclude that the use of the data/specimens does not constitute human subjects research (45 CFR 46 does not apply). (If the data are considered readily identifiable based on it containing dates and zip codes, then the research would be subject to 45 CFR 46 and would require IRB review.)

When the data/specimens are not readily identifiable – the research is not subject to the Common Rule (45 CFR 46) – IRB review or approval is not required under the regulations. Instead of obtaining a waiver of HIPAA authorization from an IRB or privacy board, the data can be shared as long as a Data Use Agreement, has been executed between the provider (or their institution and the recipient (or their institution). The IRB does not need to review Data Use Agreements. A Data Use Agreement can be used to to permit the sharing of data/specimens in all of the following situations:

  1. A provider at CHOP and a recipient investigator external to CHOP; or
  2. A provider external to CHOP and a recipient investigator at CHOP; or
  3. A provider at CHOP and a recipient investigator at CHOP.

Sharing a Limited Data Set Between Covered Entities

Providing that (1) an IRB has waived the requirement for HIPAA authorization or (2) a data use agreement has been duly executed between the provider and the recipient institutions, the PHI may be used and disclosed as part of the research. If the IRB has waived the requirement for HIPAA authorization, a data use agreement is not required (HIPAA requires one or the other). However, some institutions will insist on executing a data use agreement even though either the provider's or recipient's IRB has issued a waiver. For those at CHOP, more information about obtaining a data use agreement between CHOP and another institution can be found at Office of Technology Transfer intranet page.

PHIS Data Sets:
Many investigators at CHOP obtain and use data from the Physicians Health Informations Systems (PHIS) which is maintained by the Children's Health Corporation of America (CHCA). The CHOP IRB does not consider this activity to meet the definition of human subjects research because the PHIS data is considered not readily identifiable. A Letter Regarding Use of PHIS Data Sets can be downloaded and provided to a funding agency outlining the CHOP IRB's policy for use of these data sets.

Sharing a Limited Data Set Within CHOP

An LDS may be provided by someone within CHOP to be used by an investigator at CHOP. Since the LDS contains data derived from CHOP patients, it is more likely that the individuals might be potentially identifiable by the CHOP investigators. The requirements for sharing are dependent on whether or not the investigator receiving the LDS can readily identify the subjects.

  1. If CHOP subjects are readily identifiable from the LDS, then the CHOP investigator may only receive and use the LDS without the subjects written authorization after:
    (a) the IRB has either reviewed and approved the research or the IRB has made a determination of that the research is exempt; and
    (b) the IRB has determined that the use or disclosure satisfies the requirements for waiver of authorization.
  2. If CHOP subjects are not readily identifiable from an LDS and cannot be re-linked to the individuals' PHI, then the investigator at CHOP may receive and use the LDS from a provider at CHOP, provided the investigator enters into a data use agreement with the provider of the data set.

CHOP Internal Data Use Agreement

Investigators can execute their own Data Use Agreement if they are sharing a limited data set within CHOP.

  1. Download the CHOP Internal Data Use Agreement 3-25-15;
  2. The Agreement should be filled out and then printed;
  3. The Data Provider and all of the Data Recipients must sign the form;
  4. The Data Provider should retain the original copy and the Data Recipient should retain either a 2nd signed copy or a photocopy.

Data Use Agreement - Page 1

Data Use Agreement - Page 2

Data Use Agreement - Page 3

Can individuals be re-identified from a limited data set?

Probability of Being Reidentified

Part of the basis for allowing investigators to share an LDS without IRB oversight and just a data use agreement is that the risks are low and that subjects are not individually identifiable just from date and zip code data. But is this in fact true? The Data Privacy Lab has a tool on their website (How Unique are You?) that demonstrates that with just only one or two individuals have the same birthdate and sex within a specific zip code. It is also now clear that even with access to data (patterns of lab test results) devoid of any PHI and with the aid of powerful computer algorithms, subjects can be re-identified with a high probability.

Given how easy it is to re-identify individuals from very limited personal data, no data set can be considered truly anonymous even when devoid of all elements of PHI. To minimize risk to individuals, all sharing agreements, including data use agreements and agreements to share de-identified data, should include provisions to requiring that the recipient investigator promise not to make any attempt to re-identify the individuals in the data set.

The Limits of Anonymity: Selected References

  1. Sweeney L, Abu A, and Winn J. Identifying participants in the personal genome project by name.
  2. Bohannon J. Genetics. Genealogy databases enable naming of anonymous DNA donors
  3. Rothstein MA. Is deidentification sufficient to protect health privacy in research?
  4. Brothers KB, and Clayton EW. "Human non-subjects research": privacy and compliance.
  5. Gellman R. Why deidentification fails research subjects and researchers
  6. McGraw D. Data identifiability and privacy.
  7. Wjst M. Caught you: threats to confidentiality due to the public release of large-scale genetic data sets
  8. T El Emam K, and Dankar FK. Protecting privacy using k-anonymity
  9. Sweeney L. k-anonymity: A model for protecting privacy
Summary:

In order to minimize risk to subjects, data sets and biospecimens are frequently coded. Coding a data set means replacing the name, MR# and other readily identifiable fields with a unique identifier code number. To be unique, the code number cannot be derived from any element of PHI. For example, a code that was derived from a subject's initials plus a unique number would still be considered PHI. Just because the data is coded does not mean that all elements of PHI have to be removed. Data sets should contain the minimum PHI necessary consistent with the requirements of the research.

Summary:

This document makes clear that even though sharing de-identified data and specimens is not human subjects research under the 45 CFR 46, the NIH Policy requires the IO and the IRB to review the investigator's plan for data submission and must consider the appropriateness of the informed consent process and document. The Policy goes beyond the regulatory requirements because the genotype and phenotype information generated about individuals will be substantial and, in some instances, sensitive (such as data related to the presence or risk of developing particular diseases or conditions and information regarding family relationships or ancestry), the confidentiality of the data and the privacy of participants must be protected.

Summary:

An honest broker can provide a firewall between the investigator and subjects' identifiable information. For example, an honest broker could generate or receive a dataset and then strip out subject identifiers so that the data was no longer readily identifiable. They could either create a de-identified data set or a limited data set.

Top