Basic terms related to research data are given as follows.
This is data collected, collated, observed during the research process with the aim of obtaining original results. The information necessary to evaluate the results of the research (for instance: numerical data, text documents, survey results, audio and video recordings, photographs, database content or software), whereby:
- not all research data are included in the final publication,
- not all research data are analysed.
This is shared data – anyone can use it freely (e.g. modify, redistribute and disseminate) with respect for the creators' property rights.
Research funding agencies define their requirements. Requirements e.g. European Commission – Horizon 2020 Programme, National Science Centre are:
- data opening,
- making the data available to the extent necessary to evaluate the findings of publications – validating research results,
- the need for a Data Management Plan,
- depositing data in repositories.
This is a document that describes the steps to be taken at each stage of working with research data: presentation of the resource (format and type of files, number of data), rules for working with the data (ordering and description of materials, methodologies), ways of sharing, protection, long-term storage, and the data themselves. The data made available should be:
- data that were presented in the publication;
- raw data that were collected during the work, but not analysed;
- software needed to analyse the data, if required to read it;
- metadata needed to identify and describe the research data.
The data management plan must also address ethical and legal concerns, identify the owner of the data and the possibilities for dissemination.
Preparing data for sharing includes:
- data selection – not all data needs to be shared. The user should:
- take into account the scientific value of the collected documents,
- check that the data contain all the parameters necessary to reproduce the experiments,
- make sure that identical data sets do not already exist in open access,
- consider whether the cost of storing the data is appropriate to its merit;
- the removal of sensitive data that facilitate the identification of subjects by the use of:
- anonymisation – transforming personal data so that the information cannot be attributed to an identifiable person,
- pseudonymisation – transforming data so that they cannot be attributed to the data subject without the use of additional information;
- choosing a suitable file format that does not require commercial software and uses standard encoding (ASCII, UTF8).
FAIR DATA
These are international standards for describing, storing and publishing research data. Data should be:
- Findable – by humans and computer programs (through metadata, a unique identifier e.g. DOI, indexing of metadata in publicly available databases);
- Accessible – readily available, without the need for special software;
- Interoperable – prepared in a readable format, cross-referenced with other datasets;
- Reusable – accurately described, accompanied by a licence, author information or place of creation.
Data deposition
Takes place in the form of datasets – collections of files containing data linked to a single publication, scientific project or experiment and their description in the form of metadata.
Data description – metadata
These describe the collection content, provenance, research methods used and are divided into:
- descriptive metadata – necessary to identify the collection (e.g. title, abstract, author and keywords);
- structural metadata – describe relationships between collections and their elements to, for example, facilitate navigation;
- administrative metadata – information to help manage the resource (e.g. how and when the collection was created, access information).
Metadata enables research data to be accessed, understood and processed further. If a dataset is described correctly, it becomes more visible, also to computer programmes analysing the data.
Metadata standards (according to Digital Curation Centre)
Data opening and access
Data can be made available under the following licences:
- open licences (CC0, CC-BY);
- Open Data Commons project, e.g. Public Domain Dedication and License (PDDL) – public domain for databases with unrestricted ability to download, share and modify databases;
- Open Data Commons Attribution License (ODC-By) – the only condition for copying and modifying data is acknowledgement of authorship;
- Open Data Commons Open Database License (ODC-ODbL) is an open licence for copying, processing and dissemination of a database subject to attribution of authorship and dissemination under the same conditions;
- fair use.
It is important to remember that in order to share data, you need to have the rights to it.
Adequate infrastructure is needed to ensure the sharing, long-term storage and archiving of research data. Data security, protection against unauthorised access, use, alteration, disclosure and destruction is important.
Data can be deposited in the following types of repositories:
- domain-specific repositories – collect publications from specific scientific disciplines;
- institutional repositories – dedicated to the staff of a particular research unit;
- orphan repositories – provide access to papers from different disciplines and institutions and are intended for researchers who do not have the possibility to deposit papers in an institutional repository.