In this video we're going to talk about how you can share the output of your work after your data collection is, is finished. There are multiple types of output that will come up from your data collection and analysis. first and foremost, there is the publication of your results. what are the conclusions, what are the findings that came out of your data collection and analysis. you may want to share or may be required to share your methods and resources. And in certain instances your data as well. so let's start with publications. Peer-reviewed publicatons are, the currency of scientific communication. They, in a peer, a peer-reviewed manuscript essentially, the authors or the investigators write up their methods, their, their results, the description of their data, and then, peers look, examine those methods and can provide feedback either on the communication with which this work was conveyed, or the actual methods themselves, or the soundness of the conclusions that were drawn from the data for example. And the final product of this process is a peer reviewed manuscript that is then published by a journal in a final format, and made available for different models. there is a, an open access model, a closed access model which is more traditional, and there are hybrid access models be-, in between. For example in closed access journals and that has been the classical case, the journals shepherd your manuscript to the peer review process. The final version is edited and made suitable for online or print publication, and then it's made available to other scientists who will have to pay either from their own funds or have their institutions, pay for their, priviledge to access that information. this, in contrast, the open access model, the cost of the publication process is usually paid up-front, typically by the researchers or someone supporting the researchers' work. an then the publishers, the final, the end product of the peer reviewed process is published in, without any technical or financial barriers. so essentially anyone can, can access and examine your scientific work. since in the United States a lot of clinical research is funded by the National Institutes of Health, and it's, it's worth noting that the NIH has a public access policy. any public, any research that's funded by the NIH, any peer reviewed manuscripts that resolve from NIH funded research. Need to be made publicly available no later than 12 months after the official date of publication. And the venue with which these manuscripts are made publicly available is via the National Library of Medicines PudMed Central database. So you may have heard of PubMed. PubMed is where you go and search the entire literature. Every article that has been published that is indexed in PubMed can be searched via the PubMed interface. you can download and retrieve information. The abstract, the title. and some information about those articles. However, PubMed itself does not guarantee you access to the, full body of text, and graphs, and tables, for example. That are, and supplemental material that are usually part of, peer reviewed manuscripts. PubMed Central, on the other hand, is a different database that contains. that give, where all the manuscript is made available so you can go to PubMed Central. The same article, that in PubMed Central you can, you can access the, the entire content of that article. and once you submit and just to kind of a practical note, once you submit your manuscript, to PubMed Central you get a PubMed Central ID that you can. And some cases be required to, reference that number when, say, providing progress report to the NIH funders about your work, or when citing your grant and the output of your grant, for example. And this allows the tracking of all pub, of all, publically funded research, and, and allowing the public access to, to the fruit of that research. you can, as a matter of practice, a practical way you can, when you, when you choose the, the journals. Many journals have different policies relating to PubMed Central. Some journals make all their articles available. They're closed access for a while, for 12 months, and then, you know, in a blanket way, all the articles are made publicly available in PubMed Central. So in that case, the investigators do not have to worry about being complying with that public access. sometimes you can request at a specific, you know, citing the grant, that funds that work you can request that a certain article be submitted by the publisher to PubMed Central. in other cases when the publisher does not provide that, ability, the the investigators can submit the, the peer reviewed manuscript itself, which might be different than the actual version that's in the journal but. The manuscript peer reviewed manuscript where all the peer review comments are incorporated, then the investigators can submit it themselves to PubMed central. these are all different considerations, obviously different countries and different funding agencies have different requirements. But it's always important to, to be aware of the of the. Of the requirement to make the, the results of your work public or not. and PubMed Central is a very good venue to do that. Now moving on to sharing methods and resources. it's, obviously for many of these, it's important for your data management and ana, analysis procedures to be automated, when possible. So that you can I mean, for obvious reasons, automation can allow you to re run the analysis, say, when you want to sup, supplement your data, when you want to include your data, or if, if you found some errors, and you, you, you know, you clean your data. And then you know, you do not want to rerun manually the analysis. So, making it scriptable, automated is, is obviously preferable. And another side effect that comes of that is that, if you have, if your analysis is automated and can be sort of modularized and shared with others. Then other scientists can take your methods, can improve upon them. Can provide feedback. can examine it maybe for errors. And point that out to you. And everyone would benefit from. Having that as a sharable resource. another important, distinction for, why you would want to make this available to other people is, you want your research to be reproducible. When your peers examine your work, it's expected that, you know, that they can replicate, your analysis and draw the same sound conclusions from your data. and so that's another reason why you, you will want to share your, your methods and resources for data management. practically speaking there are multiple venues with which you can do it, virtual machines, are one way to share your sort of, your data pipeline or your analysis environment. Virtual machines are essentially snapshots of your operating system using software that allows this kind of visualization. But essentially you can download all the tools that you use including, maybe, the data sets, and then package those and make those available for other people to essentially replay exactly your development environment. Another way with which you can share your data management and analysis. Tools and automated tools. And I prefer that method, is to actually make the actual source code available, because then people can, other scientists can take that source code itself and run, and run it, or embed it in their own environment, and then they can modify it without having to worry about. Your entire integrated you know, environment needing to work together. They can just take the aspects of your source code that, that they can use and change and then maybe even give that back to you. an important way with which you can use is by typically if you have your data available in analysis ready data sets. you can, it would be very useful to have templates for. Reporting and analysis. So you just press a button, and you generate templates. And there are many technologies that allow you to do that. the R open source language, can interface with Latek, which, which is a way to produce, PDFs, PDF reports, or HD, or you can generate HTML reports. Also, based on our analysis. This is just two examples. There are many ways with which you can run analyses and automatically generate well produced reports out of them. Before I move on, I did mention GitHub. GitHub is, Git is a distributed source management, technology that allows you to back up your codes. maintain different versions of your code in a distributed manner. People can collaborate. They can make incremental changes, and then merge, the work together. GetHub is a website where you can establish a public. kind of place for your, source code, and in this case I have an example of some quality assurance checks that, that are run on a large HIV database. And these are very complicated, checks, and it's it was useful to kind of. Embed them, but, all in one scripting mechanism. And, so, for example, in this case, they are available on GitHub. People can download the code as a zip file or using just the Git program which is a, sort of a. a specialized way with which this data can be distributed from different computers, but anyway, people you can share that, the source code itself. You can version it, people can examine it, so that they can see the differences every time you make changes to it, and they can just choose to decide whatever Version that they want. Or corresponds to whatever, level of your manuscript was, for example. And, you'll see, at the bottom of the screen. GetHub allows you to generate, you know? Using markdown, which is, some markup language that allows you to generate structured text like this. But you can use HTML, or. Or other formats. But anyway, this is one example. I urge you to look at it. We will provide in the course other ways other source, other ways with which you can disseminate and, and share your source code. Finally, talk about the data itself. why is it important to share your data? it's important to reinforce open scientific [UNKNOWN]. Other people can look at your data. Different, analysis approaches different opinions, can examine the same data, and, everyone will be better by having these different eyes looking at the same, essentially, data. you, it promotes, new research and testing of a newer alternative hypotheses. Things that you did not think of earlier, other people can look at your data and, and try to, to. Look for different associations that were not originally intended. you can facilitate education of new researchers. Essentially researchers who are eager to see how you would work with a large data set like the one you've collected for example. And they can download that and then live that analysis phase of the experience. And just see how you can, you know, what, what are the caveats of, of the way you've collected your data, for example. and you know, more importantly, people can, if, if the data elements are harmonized, are well harmonized, you can combine these data sets into larger data sets. And then you can look for patterns, and maybe. you know it, enforce the, the power with which you can look for statistical patterns in your data. Again, going back to the NIH. NIH has a data sharing policy for NIH funded research. It's again, it's always a good idea to look at the source of funding for your data and, and the data sharing requirements. The NIH requires that you have a data sharing plan in place especially for large grants. however, it also, it's also recognized within this policy, that, you know? Patient, confidentiality needs to be protected. You know? IRB considerations need to be addressed. copyright and other, you know? issues that usually, would impede, you know, just sharing the data, as is, with everyone. And for obvious reasons. So, always look at the data sharing plan for your funding agency, the NIH has a lot of resources, like the, I have the URL up on the slide. And we will share some, other resources from different agencies on, on the course Wiki. So this concludes this video, We talked about how, what to consider when you want to publish your results, the results of your work, how you, different ways and things you need to take in, in, into consideration when sharing the methods and resources with which you've analyzed your data. And finally the actual data itself, how and why you should try to share it with other scientists