‘‘If I have seen farther, it is by standing on the shoulders of giants’’. It is nearly impossible to describe the essential role of sharing results and ideas for research progress better than Isaac Newton. Three centuries later, the importance of sharing knowledge is made even more compelling by the increased rapidity with which data are produced and the enormous possibilities for their dissemination created by Internet. Certainly, there is now wide consensus in the scientific community about the importance of achieving an effective, responsible and robust form of data sharing. However, one critical step is still lacking: “You cannot manage what you do not measure” says a famous adage! We need to know to what extent and in what ways data are actually shared, an effort which may help identify critical aspects and develop strategies which are better suited to scientific practices.
The gap is beginning to be filled thanks to a study carried out by Giovanni Destro Bisol of the Sapienza University of Roma and his collaborators. They have chosen to carry out a detailed analysis of data sharing in studies on human genetic variation. The relative simplicity of information coded by DNA along with the availability of online resources for data archiving /downloading and the possible outcomes in matters regarding human health make this research field a perfect forerunner in the attempt to arrive at complete data sharing.
The study has analyzed a total of 543 genetic datasets reported in papers indexed by the popular Pubmed database, covering the 2008-2011 period. Contrary to the positive expectations, a substantial portion of datasets (21.9%) was found to have been withheld. Even worse, limiting the research to the the Journals or which are the most cited or adopt an explicit editorial policy in favor of data sharing, the rate failed to increase beyond 80.5%.
So, what can be done to improve data sharing? The study shows … in three steps… how getting under the skin of scientific practice may help find remedies.
The authors experienced a very low rate of positive responses to e-mail requests sent to corresponding authors of withheld datasets (28.6%). This suggests that once the “magic moment” of paper acceptance has passed, it becomes difficult to convince authors to make their data fully available. It follows that sharing should be regarded as a prerequisite for final paper acceptance, rather than a recommendation. Making authors deposit their results in open online databases which provide data quality control seems to provide the best-practice standard.
Furthermore, researchers observed a substantially lower sharing in medical compared to evolutionary and forensic genetics. Potential conflicts with privacy issues and commercial interests may account for this evidence. The former problem could be counteracted by developing informed consent forms that do not preclude further development of the studies, which could be better pursued by getting participants more involved in the research design and realization. For the latter, it may be useful to limit the use of patenting and royalties for research tools, which seems appropriate especially when the work is supported by public funds.
Finally, the study provides the first estimate of research funding used to produce withheld data, an astonishing 30% of total resources. By making the scientific community and taxpayers aware of this important aspect, we may help popularize a more effective culture of data sharing in human genetic studies and other research fields.
Milia M, Congiu A, *Anagnostou P, Montinaro F, Capocasa M, Sanna E & Destro-Bisol G. Mine, yours, ours? sharing data on human genetic variation. Plos One, June 5th 2012. http://dx.plos.org/10.1371/journal.pone.0037552