Abstract submitted to 2nd International Digital Curation Conference Digital Data Curation in Practice 21-22 November 2006 Hilton Glasgow Hotel, Glasgow http://www.dcc.ac.uk/events/dcc-2006/
Data Publication at the British Atmospheric Data Centre
S. J. Pepler(1), S. E. Latham(1), S. A. Sufi(2), P. Simpson(3), K. A. Bouton(1), C. M. Jones(2), B. N. Lawrence(1), B. M. Matthews(2) and A. J. Miles(2)
(1) NERC Centres for Atmospheric Science (NCAS), (2) Council for the Central Laboratory of the Research Councils (CCLRC), (3) National Oceanography Centre, Southampton (NOCS)
The British Atmospheric Data Centre (BADC) is a NERC funded data centre with a mandate to facilitate the use of atmospheric science data by researchers and to archive NERC data for the long term.
The BADC has a good reputation for making data available effectively, however the data selection process and coherency of datasets need to be improved in order to progress toward the standards expected in the publishing of academic journal articles. One of the aims of the JISC funded CLADDIER project (Citation, Location, and Deposition in Discipline & Institutional Repositories) is to develop data publication methods at the BADC that are equivalent to those used in institutional repositories. This paper looks at two of the major issues for data publication – peer review and data citations.
The selection of data for archive at the BADC is done by negotiation with data suppliers and NERC funded researchers. Data scientists at the data centre establish if the data is usable and useful, but what is critically lacking is independent review. One possible solution is to use the existing peer review processes of an existing journal. We contrast the criteria for paper and data peer review.
A key concern in the data publication process is establishing what a data citation is referring to. In order to determine this we have interviewed scientists and asked what they would like to cite. Most scientists would like to cite on the large scale, not citing the bits and bytes, but the natural aggregations, such as “all the data collected by the Met Office weather station network” or “all the data collected in the XXX project”. This is problematic as these views on the data are not necessarily clearly defined or even hierarchical. We explore the most appropriate solution for the BADC.
