Aller au contenu principal

Research data management

Recommandations

  • Share your data in a FAIR, non-commercial data repository that ensures long-term preservation (Example SWISSUbase, OLOS, Zenodo, ...)
  • Choose a repository that automatically assigns DOIs (Digital Object Identifiers) or allows them to be added
  • Choose a repository that offers Creative Commons licenses (CC0 or CC BY)

Data repositories

The data repositories (DRs) are platforms for storing, managing and preserving data sets over the long term. A repository contains data sets and their description (metadata). They can then be retrieved and reused by humans or machines. The size of data sets is measured in kilo-, mega-, giga- or tetrabytes.

There are several types of repositories : 

  • disciplinary
  • institutional
  • generalist (multidisciplinary)
  • publisher-specific
  • project-specific

To help you choose the most suitable data repository for your needs that also complies with the FAIR principles, here is a list of 12 criteria to consider.

Does the repository offer to assign a unique, permanent identifier to the dataset?

The unique and persistent identifier, or PID (Persistant Identifier), enables us to reference, cite and provide a stable link to an online file4. A unique, perennial hypertext link is created, enabling the resource to be retrieved at any time, even if the URL address of the page changes.

  • There are several types of DOI. The best  known for identifying datasets or journal articles is the DOI (Digital Object Identifier).
    • Example of a DOI attached to a resource available on the Zenodo data repository:

 Exemple de DOI

  • The perennial identifier is integrated into the bibliographic reference. This means that if the dataset is reused, it can be easily retrieved from the quote.
    • Example: Valérie Gadrat, Yvette Lafosse, Claire Sowinski, & Coralie Wysoczynski (2019, April). GopenDoRe game: the cooperative game for acquiring good practices in managing and sharing research data (Version 2). Zenodo. http://doi.org/10.5281/zenodo.2657316
  • DOIs are generally assigned when your data is deposited in a data repository or via the HES-SO Center for Scientific Information (CISO). You can obtain a DOI by sending an e-mail to open(at)hes- so.ch

Is the license under which the data will be made available clearly stated, or can the user download/select a license?

  • Some repositories impose specific a license, while others allow you to choose your preferred license
  • When data is linked to a scientific article, it is necessary to check whether the journal applies specific conditions to the data distribution license. To do this, consult the journal's instructions intended for authors.
  • Creatives Commons licenses were created in 2002 for the distribution of digital content such as text, images and films. But they can also be used for distribution on paper. They make it easy for authors to indicate the rights they want to retain and the rights they want to wave, so that others can reuse their work5.
    • Four clauses can be combined to suit your needs:

Can I upload and/or add metadata ?

Metadata are elements used to describe, in a standardized and structured way, the purpose, origin, temporal characteristics, geographical location, authorship, conditions of access and conditions of use of a resource, such as a dataset2. They make it easier to find and understand the dataset. The more information is communicated about the dataset, the easier it will be to understand and find.

There  are  various  description  schemas,  including   Dublin  Core,  Data  Documentation  Initiative (DDI), Metadata Encoding and Transmission Standard (METS), DataCite metadata schema...

Are citations and metadata always publicly accessible ?

Making metadata accessible even if the data cannot be or is no longer available helps to meet the FAIR Accessible and Findable criteria. (e.g. metadata on authors, institutions and associated publications can be useful even if the data is missing). In addition, this solution reduces storage costs

Does the data repository provide a submission form requesting that intrinsic metadata comply with a specific format ?

To be machine-readable, metadata (descriptive, administrative and structural) must conform to standard schemas such as DataCite.

Does the data repository have a long-term preservation plan for archived data?

  • The preservation plan will provide information on the actions put in place by the repository to ensure its long-term management and funding, as well as the preservation of data. "The more the repository moves towards preservation, the more solid its financing, resource management and governance must be"7
  • The COAR framework of best practices for repositories proposes 4 characteristics (2 essential, 2 recommended) for the preservation objective1:
    • "The repository (or the organization that manages the repository) has a long-term plan for repository management and financing
    • The repository has a policy defining the length of time resources will be managed, and provide a documentation on preservation practices
    • The  repository  has  a  documented  approach  to  preservation, adopting recognized preservation practices
    • The agreement between the depositor and the repository provides for  all  actions  necessary  to  ensure  preservation  responsibilities  -  for example, rights to copy, transform and store resources"

Is the data repository adapted to the type of data you are going to deposit?

  • Does the repository accept all file formats?
  • Is there a size limit?
  • Are all data formats accepted?

Does it meet the recommendations of your project sponsor, your institution, or the journal in which you publish?

  • SNSF: research data should be freely accessible
    • Requirement  to  include  a  data  management  plan  (DMP)  when submitting an application for most funding instruments
    • Open-access  archiving  of  data  produced  during  research  on  an open data repository is mandatory, as long as no legal, ethical, copyright or other clauses stand in the way.
    •  The SNSF provides funding of up to CHF 10,000 for the preparation of and validation. Downloading is not taken into account.
  • Other European countries

Is it recognized in your discipline and by the scientific community ?

  • Some repositories, still few in number, have the certificate CoreTrustSeal which ensures sustainable and trustworthy data infrastructures. 
  • SNSF offers a list of data repositories that meet their ORD requirements and are frequently used by the community
    • Generalists : Zenodo, SWISSUbase, OLOS, GitHub...
    • Disciplines: DaSCH, ENA, PlaTec...
  • The list of deposits recommended by Nature magazine

Is it free ? If not, is the data deposit fee reasonable and acceptable?

Does it offer data access options adapted to your needs?

When depositing your data in a data repository, you can choose between different access conditions8

Closed data A general description of the data is published, but access to the data itself is not possible. E.g., the dataset contains non-anonymized or pseudonymized sensitive data
Data on request A record of the data is published, and information is provided on how to demand access to the data. The demand is generally sent out to the researchers and approved by them. E.g., the dataset contains data with a high re-identification risk, it can only be shared under certain conditions
Data under embargo  A record of the data is published, but the data can't be accessed until the end of a specified time period. After that period, the data becomes openly accessible E.g., the researcher wants to file a patent before publishing the data. However, the associated metadata must be accessible to indicate the existence of the data while preserving its protection. Some data repositories do not include the "embargo" option.
Open data The data is freely available and can be accessed by anyone

These options only concern data, and not metadata: in all cases, metadata remains publicly accesible.

Where are the repository servers located?

Check where the DR server is located. It is generally advised to avoid depositing on an American server or a server that is located in the USA11

Tools & tutorials

Repositories to choose the most suitable data repository

References

  1. Confederation of Open Access Repositories. (2020, novembre 6). Cadre commun COAR de bonnes pratiques en matière d’entrepôts. Zend. https://doi.org/10.5281/zenodo.4118380
  2. Corti, L., Van den Eynden, V., Bishop, L., & Woollard, M. (2020). Managing and sharing research data : A guide to good practice (Second edition). SAGE
  3. Dedieu, L. ; Barale, M. 2020. Déposer des données dans un entrepôt, en 6 points. CIRAD. https://doi.org/10.18167/coopist/0070
  4. Deboin, M. C. (2017). Identifier et rechercher une publication ou un jeu de données par son DOI. CIRAD. https://doi.org/10.18167/COOPIST/0005
  5. Fily, M.-F. (2015). Connaitre et utiliser les licences Creative Commons. CIRAD. https://doi.org/10.18167/XTNV-D457
  6. Fonds national suisse. (2022). Open Research Data. FNS. https://www.snf.ch/fr/dMILj9t4LNk8NwyR/dossier/open-research-data
  7. Guirlet, M. (2020). Guide décisionnel et vade-mecum pour la mise à disposition d’un dépôt de données de recherche ouvertes en Suisse [Haute école de gestion. Travail de Master]. https://doc.rero.ch/record/329696/files/Guirlet-M_moire-Vdef.pdf
  8. Jouhar, M., Melly, P., Trombert, A., Elmers, J., & Beaulieu, M.-C. (2023, June 4). Mission RDM: the ultimate quest before the holidays: complementary guide. In. Mission RDM: a card-based educational escape game on research data management. Zenodo. https://doi.org/10.5281/zenodo.8002770
  9. Scientific Data. (2022a). Data Policies. Nature. https://www.nature.com/sdata/policies/data-policies
  10. Scientific Data. (2022b). Data Repository Guidance. Nature. https://www.nature.com/sdata/policies/repositories
  11. Préposé fédéral à la protection des données et à la transparence (2022). European-U.S. Data Privacy Framework (EU-U.S. DPF). https://www.edoeb.admin.ch/edoeb/fr/home/kurzmeldungen/2022/20221007_eu_us_dpf.html