This part of When You're Done discusses de-identifying your research data. Issues of health information privacy are complex, and the rules differ by country. You could probably take a whole course on the topic, and we'd like to have an expert present it later. But our goal in this section is just to give you a practical understanding of privacy issues and the sorts of studies and data sets we've been working with. We'll be covering this in the context of a specific US regulation called HIPAA, the Health Insurance Portability and Accountability Act. But the principles are broadly applicable. European countries generally have stricter data privacy laws than the US. We'll discuss what health information is and what we mean by protected health information, and finally we'll review what identifiers are and how to remove them from our data. As part of the Health Insurance Portability and Accountability Act of 1996, HIPAA, the U.S. Department of Health and Human Services issued regulations called Standards for Privacy of Individually Identifiable Health Information. Most U.S. groups had to comply with this rule beginning April 14th, 2003. It's called The Privacy Rule for short. The general purpose of the HIPAA Privacy Rule is to give individuals better control over how their health information is stored and used. Restrict how organizations can use that information by limiting the release and use of health records and requiring organizations to meet some compliance standards by disclosing to individuals how their information might be used, by publishing privacy protocols, and having somebody to monitor them. By training employees about privacy, and by securing health information physically, too. It also penalizes those organizations if they violate privacy rights. But, what do we mean by health information, in the first place. As defined by HIPAA, health information means any information, oral or recorded in any form that, one, is created or received by one of these entities listed in section A here, healthcare providers, health plans, public health authorities, employers, life insurers, schools, universities and health care clearinghouses. And, two, relates to the health or condition of an individual, the provision of care to that individual, or payment for that care at any point in time, past, present, or future. As an aside, HIPAA is often misspelled. Hippopotamus has a double P. HIPAA doesn't, so remember to spell HIPAA with two As. So, what is Protected Health Information? It's a subset of that health information that identifies an individual directly, or information that can be used and combined to identify an individual. We have to remove this protected health information in order to de-identify our data sets. This image is from the webpage of the U.S. Department of Health and Human Services. And it shows the two approved ways to de-identify your clinical research data. The first method, in yellow, is by expert determination. The second, in blue, is by following the safe harbor approach. This HIPAA safe harbor is a guideline on how to de-identify your data. There is a different safe harbor process that defines something like how U.S. companies can comply with the European Union's Data Protection Directive. That's a different thing, so be careful when you search for information online. It takes a lot of knowledge to be an expert in the field of de-identification. The expert has to be familiar with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable. The expert has to apply those principles, determine at the end that the risk is very small in the resulting data set. And also document the methods and results of the process. The Safe Harbor approach on the other hand is much more straight forward. You have to remove 18 types of information that are considered identifiers. You have to consider identifiers not just of the individual but also of relatives, employers or household members of that person. And to the best of your knowledge, you need to be certain that the information cannot be used alone or in combination with other information to reidentify that individual. There are 18 named identifiers. The first identifier on the list is names. And, as a reminder, this is the names of the individual or the individual's relatives, employers or household members. Derivatives that people's names like initials are also considered identifiers. Second is location information, such as, addresses, city names, postal codes and geocodes. HIPAA provides specific rules for how to de-identify US location information. Next are dates. These are tricky. You should remove all date information except for the year and event occurred, and for all people over 89 years old, group them into a category of 90 or older. There are some other ways to address dates also, and we'll discuss this later. The next three identifiers include telephone numbers, fax numbers and email addresses. The next six identifiers are all numbers that might be assigned uniquely to an individual. They include social security numbers, which is a U.S. specific reference, medical record numbers, health plan beneficiary numbers, account numbers like your credit card or your electric bill, for example,um, certificate license numbers and vehicle identifiers and serial numbers, including license plate numbers. Next we have three types of technology related identifiers, like the serial number of devices you own, URL's and IP addresses. The final three identifiers are biometric ones, including finger prints and voice prints, full face photographs, and other such images, and any other unique identifying number, characteristic or code. You can however, assign a code that links the de-identified record back to the PHI containing version, but if you disclose the mapping of that code, it's considered disclosing PHI. In my experience, it's much easier and more straight forward, to apply the safe harbor principles, especially for routine studies, but always treat that as the bare minimum requirement. Handle your data securely and make sure you know with whom you're sharing it. Treat it as if it were still identified. If you're developing an important software application that will regularly handle large amounts of data over time though, I'd recommend you consider an expert consultant also. The issue of sharing and de-identifying your data is only going to grow in importance with time, so you'll want to learn more than this basic overview. De-identifyng your data to share with other researchers can be different than posting it on line for anyone to access. This article shown here called, Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers is published in BMJ and is freely available on the internet. No subscription is required. I'd recommend you to read it for some additional guidelines and references, especially non-U.S centric ones. This is our brief review of protected health information. What it is, what it means in a US context in particular and how to remove identifying information from your data sets. Next we'll take a look at how this is done practically in a REDCap.