Purpose of this best practice: This document addresses aspects of data archiving and data sharing in the context of data management planning. At a research project’s formulation state, it is important to consider post-publication utility of the project data.

Deciding what will be retained and/or can be shared has implications for preparing data for preservation and reuse. Assessing future utility of data and documenting those decisions is an important step in these preparations. Other documents in the collection linked below discuss other components and contexts for retaining data and preparing it for re-use.

Positions and Responsibilities

  • Principal Investigator (PI): The Principal Investigator (PI) must comply with JHU Policy on Access and Retention of Research Data and Materials. In general, keep data needed for reconstruction and evaluation of published results for 5 years after publication - Meet their funder / legal requirements for data sharing and dissemination, including general requirements and those specific to their research project- Selection, organization and documentation of datasets for dissemination and sharing -Continuity of security with staff changes (passwords, access to project materials/data)
  • Study Coordinator (SC): The Study Coordinator (SC) assists PI in complying with JHU Policy, keeping IRB paperwork and patient information (e.g., consent forms) in a secure physical and/or digital environment. The SC also assesses need for data use agreements for sharing data.
  • Data Manager (DM): The Data Manager (DM) assists the PI in complying with JHU Policy and keeping patient data in a secure digital environment. Additionally, the DM assists the PI in de-identification of sensitive data for sharing with colleagues or for wider public sharing. The DM also is also responsible for the preservation of, and access to, datasets selected for re-use (for datasets kept in house)
  • Data Analyst/Biostatistician: The Data Analysis/Biostatistician assists the PI in de-identification of sensitive data for sharing with colleagues or for wider public sharing using statistical techniques (e.g., adding noise, microaggregation)

Best Practices

  1. If possible assess future utility of the data at the beginning of projects or in the data management plans in order to keep data collection, analysis, and documentation more oriented toward future use by others.

  2. Consider audiences for data from beyond its immediate field, including unanticipated uses, e.g., ‘big data’ informatics.

  3. Assessment of future utility must consider ethical and legal obligations (e.g., HIPAA, PII, PHI, Consent limitations). For example, will Informed Consent forms provide permission for sharing data or retaining identifiers for a required period.

  4. Consider the efforts required to share data, and costs of providing for data access/re-use (costs can sometimes be included in proposal budgets for certain funders).

  5. Data acquired from other sources (e.g., another PI, EPIC) may come with ethical or legal stipulations that must be followed before they are retained and shared.

  6. If a data repository is to be utilized, consider selection and appraisal models that could be applied (e.g., common data elements).

Subject Experts / Contributing Authors:

The following subject experts may be consulted for additional information: JHU Data Services

JHU Policy on Access and Retention of Research Data and Materials Five steps to decide what data to keep: a checklist for appraising research data v.1. Edinburgh: Digital Curation Centre DCC: Five steps to decide what data to keep)

Data Sharing and Reproducible Research guides, JHU Data Services

Guidelines for de-identifying data for sharing, by JHU Data Services

Johns Hopkins Sheridan Libraries - Protecting Identifiers

Topic Experts:

JHU Data Services

Information Security - NEEDS URL

Johns Hopkins Biostatistics Center