[BLANK_AUDIO] Hi, in this module we're going to talk about data Standards, when to use them. And why to use them and what, what using standards can do for us. I will motivate this, this whole module by giving you an example from Data Sharing Workflow. A typical data sharing workflow in in clinical research involves collaborators who usually approach a local site where data's being collected. Or some regional or central data collection entity. And that starts when, when the collaborators approach the, the stewards of the data that usually starts what I call a data transfer dance. And part of that interaction involves the collaborators requesting certain data elements from the local sites. they usually provide a standard operating procedure. with the variables and the variable definitions as they would like them to be provided. the, the site usually responds with agreement for complete or partial participation. So they are either giving them a subset or the entire data set that they have. then they typically discuss how the data needs to be transformed. Who has to do the data cleaning and the transformation, so that it's compliant with the analysis requirements. and then usually there is a bulk transfer of the data from the participating site to the analysis centre or to the collaborators. And it's inevitable that during analysis certain problems would arise in the data. Or certain records would be flagged for more careful examination. And then there's usually back and force individual queries that need to happen. this evolved work flow is if the local site and the collaborators have if the collaborators request the data in a specific format. And the local sites store the data in a different format, then this kind of dance has to be unique for every collaborator and every site. And, so, you can imagine how when collaborators approach multiple or try to collect data from multiple data providers. or multiple providers are trying to get together to write a uniform manuscript, for example in the case of data sharing this can become cumbersome. So the complexity there becomes, this kind of interaction has to happen in, in our case, seven different times. And so you can also imagine how this kind of complexity also is multiplied if that same group of data providers is approached by multiple collaborators requesting data in their own specific format. So you're going to have a sort of explosion of different interfaces between people requesting data. And people who are going to provide that data and this can get pretty cumbersome and very time consuming. remember that you know, the the, the data, for these analysis data is usually specified using a standard operating procedure. And that the local data the local sites also have their own standard operating procedures. So can we look at the different standard operating procedures and kind of logically ask is there a way where we can use one Standard Operating Procedure. So this would be a Schema like the one we have here, where you have the Local Databases. But if we can somehow change the structure of the dat, of the data and you know, data could be stored in tables with you know, variable names. And, and a certain tabular structure with certain column names corresponding to the different variables that you collect. If we could somehow transform those so they all look the same so everybody has the same column names. And they store this same types of assertions down the roads and in these tables. If we can have everyone save their data in tables that look the same then it would be easy. It would just be a simple process in harmonizing data from all these sources would just be collecting all that data and merging the rows in those tables. And typically also when people talk about standards, they also often referred to certain data quality checks that are, that are specific to that standard. So if I you know, go and say I want to define define a standard where the date of birth has a certain, the valuable for date of birth has a certain name. And other valuable, other, other dates that are pertinent have other names then. It's sort of, it's implicit that no other observation about me so I can have I can have a medication. Or a lab results that happens either before the date of birth or after the date of death if the patient's, if, if the patient record has a death date. So these are the kinds of standards, standards both in terms of data quality as well as structure, that people can agree upon. And then, that would make in an ideal world, everyone would have a standard compliant copy of their Local data. Or they would store the data directly in a Standard Compliant Manner, that then can be shared with other people. So, one of the advantages of having a copy of your data in a Standard Compliant format. And this is often what, when people collaborate with each other often, they, you know, usually you've, you, you see such such a standard emerging. And it's essentially consensus on how people want to store the datas to avoid this duplication of work. And one advantage of having that, of having a copy of your data, or you data set. In a, in that Standard Compliant manner is that it's instantly machine readable. So you can overlay on top of that any kind of for example statistical software analysis. statistical analysis that expects this standard format for the data. And the kinds of things that you can do in addition to the quality checks that I mentioned. You know, you can write, now you can spend time and invest in you know, fancy scripts that produce very complex visualizations. you can do, you know for example, if you are collecting Epidemiology data from, from different, registries around the country then you can. And you know that everyone is using the data in a specific format, then it would be worth someone's while to write Jail mapping, or, or some Complex mapping. Software that we know that all you have to do is just go, go around and harvest all that data. And then you can produce those beautiful visualizations. you can write you can write templates for reports, so if all the collaborators have their data in a certain, in a certain format then you can write a template and ask everyone to run the same template. And then you would essentially get reports that look identical or that can e, be easily merged together. you can also write analyses up front, or even Ad hoc, where you think of some kind of, pattern that you're looking for in the data. Let's say you're looking for certain patients that, a subset of patients that have a certain, Criteria or certain temporal pattern in, in say a certain lab result, then you can write any arbitrarily Complex analysis you want. And then instantly run that analysis across all your collaborators. Another thing you can do with data that that it conforms to a widely accepted standard. is then you can also invest in building a software that can take that data, and not just statistical software. But other software they can do com, they can do useful tasks like manipulation, transformation of the data to meet other standards if, if they exist, or to meet other kinds of analysis ready formats. you can write software that allows people to automatically merge the data from the different sour, resources or repositories where they live. you can make, you know, grab that data and make it accessible via the web. and that's you know, that's also all predicated on the fact that the data on the back end is in a standard compliant format. So you can easily merge it merge it in the using the software. and one of the things you can do and, and this is why registry is typically are encouraged to or when, when, when local size usually submit to registry. That's why the use standard because you can also quaintly run aggregate data operations. You can look at means, populations and trends or you can drill down to record level data operations and target certain records for more examination. So this now is a more simplified schema of, of the data sharing workflow that I've shown originally this, in this data sharing workflow. You see every sight you know, for their own purpose may have their own internal data representation that could be either their legacy database or something that they believe is more suited for their daily operation. Then all the sight has to do locally is to create a standard compliant copy and that effort can be done once. And then they have the data that just essentially what the schematic showing that it's when you collect data. It's, it's the equivalent of stacking cylinders that all look the same, so you don't have to worry about making them all fit together. So this is the workflow, another alteration to the workflow is the parallel analysis and this, this is what I mentioned earlier. When you write, you can pre, you can create analyses or report templates or these complex visualizations. And this scenario is when the sites do not want to relinquish control of their own data. So, an external source, or a central source would write the analysis that needs to be run. It would be pushed out to the sites and then the all the participating sites would run the data, would run the queries and then submit aggregate results. So this is another workflow that's made easier when when Standard Compliance, when data is stored in a Standard Compliant manner. So so, you know, to kind of to recap in this scenario that I've been tal, talking you through. The benefits of adopting data exchange standards the, the you have to do the data transformation and the quality checks only once. The standard compliant copy of your local database is ready for instant transfer whenever is needed. So whatever you choose to participate, it's just a matter of releasing your data. There's no more massaging or transformational work that you have to do. [UNKNOWN] identical quality checks that are performed across the participating sites. Everybody knows how the data should look like, so there is a minimum set of quality checks that everyone runs. And people can write communal code, or communal tools and share them with each other. to run, to ensure this kind of quality control is uniform across all the participating sights. And an important side effect is when you try to transform the data [INAUDIBLE] to a standard you do two things. You standardize the structure or the syntax of the data but some but also this can help you bring you closer to standardizing the meaning. So the semantics of, of the certain types of states that you capture or the [INAUDIBLE] of the certain types of assertions you make the data. So instead of having multiple silos or registries assigning the label lots to follow up to a certain patient based on their own local criteria. So one one clinic can say this patient has lots to follow up because we you know that's, that's what we consider someone lots to follow up if, if they haven't come for a visit in 6 months. Another clinic can say can assign lots to follow up to, to patients who haven't been seen in 3 months for example. But now when, when when you you when everybody's using the data in a similar c, in a similar standard. If that standard requires you to define that label then the definition would more like, most likely be uniform across the board. Or, the better case scenario is when you capture just the tapes of visits in the case of the followup. And then you would you can calculate it in a uniform way across the whole group. another example is, what does it mean if a patient has TB? one site, in one region or in one clinic if it's a primary care clinics, a patient has TB, if there was a diagnosis, that was made by an external clinic. For example whereas in a specialized, infectious disease clinic the meaning of a patient having a TB disease state, is, is very much an operational definition. That could come out of, for example, a TB, culture being positive. so standards bring, when, when sharing data Using standards, people are more likely, but not necessarily so but they are more likely to use, to, to agree on the meaning, of, of topics. And that, and that generally leads to better data harmonization. So beyond data exchange, and, and the example I gave today to modifate the, the, the kind of use case for standard. you can use standards beyond data exchange scenarios in data capture. case report forms typically have common data elements. And you know, it's if, if I and, and Paul has to refer to this in, in other series, in other videos. But If I define the, the, the gender or sex, variable to be either MF or one for males, zero for female. you know, we can have different case report forms, but if we all kind of agree on the same we can use the same standard for the, for case report forms. Or, or, the same definition for these Data Elements that are for data capture then data capture could be better harmonized. storage and retrieval is usually there are standards that allow you to represent and store your facts. and you can aggregate them and retrieve them in an easy manner. Especially if the standards and the the labels that you're assigning to the different diseases for example are, are hierarchical. Then there are standards that allow you to do that kind of meaningful aggregation and retrieval. the same goes for analysis if you remember in the work flow that I was showing the collaborators typically approach with approach other local sites with some, you know, with their SOP's. And usually the, the, the the the group doing the analysis they're. They're table structures,.and they're the structure of that they would like for the data is very much related to how they would like to analysis ready table. So there are standards around Analysis, and I'll discuss those later. And regulatory compliance can be, be easier to require a compliance or to verify that you are compliant if the compliance refers to a certain specific standard. And you showed that you follow that standard, that it will be easy for, for someone to verify that your data or. The your procedures using this, the data that's in a standard format complies like other sites that have the same format, complies to a regulatory mandate, for example. and then, last but lest, not least you, standardizing your data, or how you store your data. Or using a common standard may Allow you to reuse and repurpose your data down the road. So if, if you use, a standard that has a, a clear meaning to it, then down the road you can kind of pull it back up and you know exactly how you stored it. And, you know, you can easily merge it with other data that you may have from other, collect, data collection and just repurpose or reuse it for other studies.