[BLANK_AUDIO] Welcome to this segment on data quality monitoring. We also call it data quality assurance, quality control, data querying, and data auditing. Remember that data house-cleaning we discussed in the last video? Well this is it. So roll up your sleeves. In this section on data quality monitoring, we're going to discuss what the data quality requirements are for clinical research. We'll also review how we can measure data quality and what we can do to improve it. Well, clinical trails are great at measuring data quality. They have very strict rules for quality in their data sets. Good clinical practice or GCP is a quality standard for clinical trails involving human subjects. It addresses more than just data practices but it's an important thing to know about in our field. As an aside, there is a parallel document called the Good Clinical Management Data Practice Guideline, and that's maintained and published by the Society for Clinical Data Management. It's not freely available, though, making it practically inaccessible for many data teams. Going back to GCP, and clinical trials. Those studies use data quality intervention. Such as, regular staff retraining and performance evaluations. Careful form design and testing. Double data entry and data auditing to increase the quality of the data that they collect. These quality control activities have lots of indirect benefits. Like ensuring the scientific validity of your study, increasing public confidence in your results and in clinical research in general. Satisfying regulatory requirements like those from the US National Institutes of Health and the US Food and Drug Administration. And they help to reduce future errors. Not every type of clinical research can leverage that full process though. Clinical trials can implement processes like staff retraining to prevent future errors because they have an opportunity to standardize and monitor all steps of the data collection process. Standardizing processes is important in research as you've learned. In the case of a clinical trial shown at the top here, data are generated during a patient visit, recorded in a study binder, copied to a case report form, and then entered, or maybe doubly to reduce errors, into the study database. All using guidelines specified by the trial. Or in paperless studies, you skip the study binder and paper forms all together, and enter data directly into the study database. But it doesn't work that way for every study. In retrospective studies that are reusing routing patient care data, all the information has already been recorded in the clinical record. It is what it is in terms of data quality because all of those early data collection steps have already happened. But you can still standardize procedures for how the information is extracted from the clinical record and entered into the study database. That's illustrated in the bottom picture. But let's assume you've planned your forms to collect complete and accurate data and already standardized your data collection processes. Now you're part way through the study and you need to assess how well those procedures have been working. There are two main processes that rech, research, researchers follow. They have many different names, but here we're going to call them data integrity checking and source document verification. Lets start with data integrity checking. This process checks whether the elements in your research data set are properly formatted, that the values make sense internally. I saw a research data set once with a male patient with a recorded hystorectomy. Chances are, this was a data entry errors of some sort, and that the data are internally consistent. By internally consistent, I mean the same thing that patients should have no recorded clinical visits that are dated after their date of death, ecetera. If you can define those data checks, then they can be automated and run on a regular basis to detect changes in data quality. If you are familiar with statisticals tools, like SaaS or R, you might use those to run your data checks. If you're not, then programs like REDCap have tools to help you define and run those data quality checks. I'll show you where to find it in case you want to explore the functionality. REDCap's feature is called the data quality module, and you can see it on the left side here, where the big orange arrow is pointing. REDCap includes built-in checks for missing values, out of range data, outliers for numerical fields, hidden fields that actually contain values, which often indicates a failed integrity check. And multiple choice fields with invalid values, perhaps because the codes for those fields were changed part way through the study. You can write your own data checks in the bottom section circled in green. REDCap has a pretty straightforward logic syntax for expressing those rules. You can find all the details in the help and FAQ menu linked on the left side by the red arrow. Log in and take a look at this feature if you have time. The principles you can learn from designing data quality checks are broadly applicable and will serve you well no matter what platform you use for your research data collection. What happens when your automated checks turn up missing or conflicting data? If this is an entirely internal process, well, go and try to fix it if possible. But in many cases, you are receiving data from research partners operating in other locations. The process of checking with them about questionable data, receiving a response, asking again until it's resolved is a data query loop. The data manager will open a data query on a specific value and describe a problem, which could be incomplete or inaccurate information, or a data that violates some other dimension of data quality. The person responsible for data at the research site w'll investigate whether the value is an error or not and send you a response. This might take a week, some studies are less strict on time but some are more so. The response might be a correction to the data value and the appropriate documentation or confirmation that the questionable looking data is actually correct. The data manager then evaluates the response. If it's sufficient he can resolve the data query. if it's not, the process repeats. Some groups handle data queries via email, but it's difficult to record the status of potentially hundreds of open and resolved queries for a single project. REDCap has a built in module to help you with this too. The formal version is called the data resolution workflow, and here's how to enable it, in case you'd like to explore this functionality also. Halfway down your project setup screen in the red cap project, you'll see an option for additional customizations in the green circle here. Click it and you'll get a in screen pop up that allows you to enable Optional Customization for your project. A few items down the page is this option to enable the field comment log or data resolution work flow. There's a great video there describing the process if your interested in additional details. The second important step in accessing your data quality is source document verification. Also known as clinical research data audit. Not every study needs this degree of auditing. It can be very time consuming, expensive and stressful for the sites being audited. On the other hand, it's very important to establish a culture of quality from the start of your study. And, if you are receiving data from multiple performance sites. Then you need to be able to trust that those data are comparable in quality. Some studies suggest that this process can capture ten times as many errors [INAUDIBLE] as you'd find through data integrity checks alone. That's because data can look perfect in your database. But still be inaccurate. If you conduct this process within our own group, it's called an internal audit. If you bring in a different organization to inspect your data and processes it's an external audit. In general the process is straightforward. You take the data set you've received for your research study and you get access to the source documents for that data. Then you go record by record and check variable by variable to see if the values match across the two sources. Source documents are the gold standard for recorded information in a clinical study. In the GCP guideline, they are defined as original documents, data, and records from which the study data derive. That might include hospital records, clinical and administrative office charts. Laboratory reports, pharmacy dispensing records, recorded data from instruments and records from outside clinics. As well as certified copies or transcriptions of such documents. There are electronic equivalents of this systems, like the electronic health record, laboratory information system, and pharmacy database. Missing source documents are a major violation of clinical trials data quality codes, so don't throw anything out. Here's how you can conduct a very basic audit of your data. From your data set, select a random set of records to audit. If you want to measure error rates and use those measurements in your analyzes then talk to your statistician about the appropriate sample size. Also, take into account practical considerations. You might want to audit 10% of the total charts but if that means 300 charts it could take weeks, depending on the size of the chart and the type of data you're checking. And realistically, if you're doing off site monitoring, you probably don't want to stay in a hotel for once. You're primary investigator isn't keen on paying it either. There are also much more advanced record sampling techniques. If you suspect that your pediatric records might have a lower quality data than your adult records for example. Then, you might weight your audit record selection towards the pediatric patients. Or you might focus your audit on key variables that you suspect to be error prone, and ones that are critical for your analysis. Print your de-identified data onto paper forms. Unless you have an electronic audit system you feel comfortable with. If you've collected your data in red cap for example, you can do your data quality assessments in REDCap two, using its data query work flow. Compare your research data to the source documents, and make note of any discrepancies you find. Summarize your preliminary findings in an exit interview with site personnel to tell them what you found or discuss it with your team if this was internal. And document your audit findings and recommendations in a report. You can follow this procedure to check data in your own team or to check data in other sites that are contributing to your study. I mentioned data discrepancies. What is that specifically? Well, it relates to, completeness and accuracy, as we discussed before. Here is a specific error coding scheme from the European Organization for the Research and Treatment of Cancer, EORTC. I like it because it's very straightforward to apply. Here are the error codes. 1 is correct. 2 is minor deviation. 3 means incorrect. 4 means a data value is missing in the database. And 5 means there is data in the database, but no corresponding value in the patient record. This means no source document. Now let's take a look at data discrepancies in context. Here, the error codes are applied to data from an HIV study. This table represents data from a single patient in both sources, the research database and the clinic record, which is acting as a source document. Code 1 means the values in the database and the clinical record match. In this case, the two birthdays match. Which makes sense because they're for the same patient. Let's skip code 2 for now. We'll come back to it in a minute. Code 3 is for incorrect data. These three letter codes refer to drug names. The database records that the patient was taking two drugs, but in the clinical record we can see that the patient was actually taking three drugs. These two drug regimens don't match, the value in the research database is incorrect. Code 4 means that a value in the clinical record, like this weight value, was not entered into the database. When you do your analysis, this is the missing data we discussed in the last segment. Code 5 is for values in the database that aren't recorded in the clinical record, they come out of no were there's no proof, no source document. You don't know if they made it up. And now going back to code 2. This is for minor errors. Which is up for interpretation according to the needs of your study. In our case, we interpreted this as true errors, but ones that did not change the data set's fitness-for-use, according to our criteria. Some examples are dates within seven days of each other, or weight values rounded up or down to the nearest integer. Remember that data integrity checks alone are not really enough to assure you have good quality data. Acceptable error rates in research data vary by study. It depends on how you plan to use the data. Some trials have a strict rule requiring the percent of variables failing data integrity checks to be less than 0.01. Trial sponsors are often more forgiving and rates for source documents but 5% is usually the upper bound. It's also important to close the audit feedback loop. Provide your sites with a timely report and make sure they create and act on a plan to correct data discrepancies. Re-audit as necessary to ensure that your data quality remains high. And finally, new data capture and data entry personnel often need a few weeks to get up to speed. Expect error rates during this time to rise, and put some extra time into internal quality assessments. That's it for our overview of data quality monitoring. We discussed basic data quality guidelines like GCP and the importance of standardizing data collection processes. We also discussed how to measure data quality through data integrity checks and source document verification audits.