|
The data repositories (DRs) are platforms for storing, managing and preserving data sets over the long term. A repository contains data sets and their description (metadata). They can then be retrieved and reused by humans or machines. The size of data sets is measured in kilo-, mega-, giga- or tetrabytes.
There are several types of repositories :
To help you choose the most suitable data repository for your needs that also complies with the FAIR principles, here is a list of 12 criteria to consider.
The unique and persistent identifier, or PID (Persistant Identifier), enables us to reference, cite and provide a stable link to an online file4. A unique, perennial hypertext link is created, enabling the resource to be retrieved at any time, even if the URL address of the page changes.
Metadata are elements used to describe, in a standardized and structured way, the purpose, origin, temporal characteristics, geographical location, authorship, conditions of access and conditions of use of a resource, such as a dataset2. They make it easier to find and understand the dataset. The more information is communicated about the dataset, the easier it will be to understand and find.
There are various description schemas, including Dublin Core, Data Documentation Initiative (DDI), Metadata Encoding and Transmission Standard (METS), DataCite metadata schema...
Making metadata accessible even if the data cannot be or is no longer available helps to meet the FAIR Accessible and Findable criteria. (e.g. metadata on authors, institutions and associated publications can be useful even if the data is missing). In addition, this solution reduces storage costs
To be machine-readable, metadata (descriptive, administrative and structural) must conform to standard schemas such as DataCite.
When depositing your data in a data repository, you can choose between different access conditions8 :
Closed data | A general description of the data is published, but access to the data itself is not possible. E.g., the dataset contains non-anonymized or pseudonymized sensitive data |
---|---|
Data on request | A record of the data is published, and information is provided on how to demand access to the data. The demand is generally sent out to the researchers and approved by them. E.g., the dataset contains data with a high re-identification risk, it can only be shared under certain conditions |
Data under embargo | A record of the data is published, but the data can't be accessed until the end of a specified time period. After that period, the data becomes openly accessible E.g., the researcher wants to file a patent before publishing the data. However, the associated metadata must be accessible to indicate the existence of the data while preserving its protection. Some data repositories do not include the "embargo" option. |
Open data | The data is freely available and can be accessed by anyone |
These options only concern data, and not metadata: in all cases, metadata remains publicly accesible.
Check where the DR server is located. It is generally advised to avoid depositing on an American server or a server that is located in the USA11