Balancing the needs of consumers and producers for scientific data collections
Abstract
Recent emphasis and requirements for open data publication have led to significant increases in data availability in the Earth sciences, which is critical to long-tail data integration. Currently, data are often published in a repository with an identifier and citation, similar to those for papers. Subsequent publications that use the data are expected to provide a citation in the reference section of the paper. However, the format of the data citation is still evolving, particularly with regards to citing dynamic data, subsets, and collections of data. Considering the motivations of both data producers and consumers, the most pressing need is to create user-friendly solutions that provide credit for data producers and enable accurate citation of data, particularly integrated data. Providing easy-to-use data citations is a critical foundation that is required to address the socio-technical challenges around data integration. Studies that integrate data from dozens or hundreds of datasets must often include data citations in supplementary material due to page limits. However, citations in the supplementary material are not indexed, making it difficult to track citations and thus giving credit to the data producer. In this paper, we discuss our experiences and the challenges we have encountered with current citation guidance. We also review the relative merits of the currently available mechanisms designed to enable compact citation of collections of data, such as data collections, data papers, and dynamic data citations. We consider these options for three data producer scenarios: a domain-specific data collection, a data repository, and a large-scale, multidisciplinary project. We posit that a new mechanism is also needed to enable citation of multiple datasets and credit to data producers.
Local Knowledge Graph (31 entities)
Related Works
Items connected by shared entities, co-authorship, citations, or semantic similarity.
Challenges in Building an End-to-End System for Acquisition, Management, and Integration of Diverse Data From Sensor Networks in Watersheds: Lessons From a Mountainous Community Observatory in East River, Colorado
BASIN-3D: A brokering framework to integrate diverse environmental data
The East River Community Observatory Data Collection: Diverse, multiscale data from a mountainous watershed in the East River, Colorado
Data Citation Guidelines for Earth Science Data , Version 2
Global Bee Interaction Data
Global Bee Interaction Data
Energy, Public Choices and Environment Data Needs, 1977
Conserving Biodiversity on Native Rangelands: Symposium Proceedings
Shrubland Ecosystem Genetics And Biodiversity: Proceedings
Cited 9 times
References (41)
2 in Knowledge Hub, 39 external
