During my stay at Leipzig, I attended a workshop named Digital Management of Science Data given by two specialists from the Data Center for the Humanities at University of Cologne. As a humanist, the concept of data management is fairly new to me. However, after this workshop, not only do I realize the necessity of managing research data, but I also develop a deeper understanding of the role that data plays in our research.
The principles of research data management (RDM) is FAIR (Findable, Accessible, Interoperable, and Reusable). The key concept of FAIR was repeated by the presenters multiple times and I would like to elaborate on it a little bit more.
Information that has been collected, observed, generated to validate original research findings can be categorized as research data.
Findable: Metadata and data should be easy to find and read for both humans and computers.
|Descriptive||Individual aspects or additional info about data for discovery and identification|
|Structural||Information about structures, methods, types, etc.|
|Bibliographic||Information for the representation of data in online-catalogues|
Accessible: Once we find the data, we need to know how can we access the data. Metadata and data are retrievable by their identifiers.
Interoperable: The data need to interoperate with applications for storage, analysis, and processing. Here we mainly talk about the preservation of software (save the version of a tool; provide an environment) and the preservation of presentation systems (website went down).
Reusable: Metadata and data should be well-described so they can be reused. 1. Online-catalogs → it must be findable. 2. Easy and clear access to data rules of access (in contrast to a dark archive) 3. It must be quotable, uniquely identifiable and addressable (Persistent Identifier looks like this: 10.1543/data.234565)
I personally find the data life cycle image is very helpful.
Data Creation → Think ahead of the circle!
Data Processing → Which format, how will the data be organized?
Data Analysis → Which data formats will be produced? Which tools?
Data Preservation → Which data should be archived (long-term)? Where? Data-format, metadata, subject-specific or generic
Data Access → Which data should be made accessible? Which forms of presentation are needed?
Data Reuse → Who should be able to find it? How can they be reused?
The remaining practical issue is: how to publish your data?
- Find a repository to store your data: such as figshare/Zenodo
- The paper gets a DOI (digital object identifier) from Figshare, can be found via the search function and it’s linked to the dataset we produced
- Behind the scenes: The metadata is also sent to a number of other aggregating services, e.g. DataCite → They hand out DOIs
- The paper and the datasets are now permanently available via the DOI.