LibGuides: Research data management: Research data

Research data in US repositories or servers: potential data loss

The HES-SO Research and Innovation department (20.05.2025) warns of the potential risks of deletion of research data archived on data repositories funded by US institutions (e.g. OSF) or hosted on servers located in the USA. If your data are at risk, we recommend that you make a backup to guarantee access in the event of deletion, and seek an alternative data repository.

Please contact the research support network for more information or for help with the identification of an alternative data repository.

Definition

Research Data is the "raw material" from which scientific research produces and justifies its results. To be considered scientifically sound, all research results must be based on the analysis of primary or secondary data, whatever the scientific discipline.

The Organisation for Economic Co-operation and Development (OECD) defines research data as "factual records (numbers, text, images and sound), which are used as primary sources for scientific research and are generally recognized by the scientific community as necessary to validate research results" ⁵.

Research data is what makes scientific knowledge possible. It is the basis for the administration of evidence.

Research data		Not considered research data
Database Code, algorithm Sketches (drawings, sketch notes) Data mining (results) Text document Audio recording Video recording Geospatial Measurements	Physical object (e.g. artwork, prototype, textile) Photography Protocol Questionnaire Report Survey Statistics Transcription	Preliminary analysis Publication appendix (e.g. graphic or image) Personal communications with colleagues Peer reviews Project administration file Future work programs Scientific document projects Text from publication

Type of data

CNRS's INIST defines 5 types of data⁴:

Type of data	Definition	Examples
Observational data	Data gathered in real time, usually unique and therefore impossible to reproduce	Questionnaires, interviews, neuroimaging, photography
Experimental data	Data obtained from laboratory equipment, often reproducible but sometimes costly	Chromatograms, DNA chips, trials
Computational or simulation data	Data generated by computer or simulation models, often reproducible if the model is properly documented	Meteorological data, earthquake simulation data
Derived or compiled data	Data derived from the processing or combination of "raw" data, often reproducible but costly	Text mining, MRI imaging, compiled databases
Reference data	Collection or accumulation of small datasets that have been peer-reviewed, annotated and made available	Gene databases, old image databases, archive collections

It's important to bear in mind that data is never "given" but rather "obtained" ³ through processes involving humans and/or machines. Indeed, if temperature is data for meteorologists, the data they process is factual records obtained from a transmission chain involving sensors, transmitters and receivers. In the same way, if an ancient text is data for the historian, the latter has had to discover its existence through research, obtain authorizations to access it, scan it, translate it, reconstitute it... By the same token, physicists and psychologists obtain data from experiments, sociologists from questionnaires and interviews, geographers from photographs and maps, archaeologists from excavations and the dating and classification of samples, epidemiologists from laboratory analysis (e.g. test results), and so on.

Data is therefore information that has been produced by a methodological process involving human and non-human agents. Every scientific discipline benefits from reflecting on its own modes of data production, so as not to confuse data with the reality it seeks to capture.

Data characteristics

Primary or secondary	When a research protocol produces its own data, it is referred to as primary data. But not all scientific research systematically produces its own set of data before carrying out its analysis. Research data may in fact be produced and supplied to research teams by other teams who have shared their data on data repositories, or by third-party organizations responsible for building databases (e.g. national observatories). Secondary (or second-hand) data is when research teams exploit and analyze data they have not produced themselves.
Formatted and grouped	Research data needs to be processed so that it can be read, understood, contextualized and linked together. Once formatted and grouped together in the same space, it forms a corpus or data set. Only once it has been assembled can it be analyzed, since the demonstration of scientific correlation and/or evidence is based on the search for and analysis of repetitive patterns.
Sensitive	Some personal data may be qualified as sensitive and require special precautions to ensure that its use does not harm individuals (see "Personal data" in the glossary). This includes health data.
Integrated	Obtained as part of the implementation of a research protocol, data is in itself a scientific product which both has a value (scientific, historical, but also commercial) and a certain level of confidentiality (sensitive information, intellectual property). It is therefore imperative that data is stored securely, and its sharing must be regulated to ensure that its use is not misappropriated for political or commercial purposes.
FAIR	In the context of Open Science policies, data must be FAIR, i.e. easy to find (Findable), accessible (Accessible), interoperable (Interoperable) and reusable (Reusable). To designate all the operations involved in formatting, recording and sharing data in compliance with Open Data policies, we now speak of "FAIR data" or "data FAIRization". More information on the FAIR principles
Quantitative or qualitative	Depending on the discipline, data can be quantitative (coded data in large quantities) or qualitative (observational data, speeches and texts requiring interpretation). Although these two types of data can be used in a complementary or interconnected way (grounded theory), they fall under two distinct methodologies: Quantitative data is based on a "hypothetico-deductive" method: the research hypothesis precedes the production of data, and the aim of data analysis is to confirm or refute the working hypothesis. This type of data is primarily used by experimental sciences. Qualitative data is based on an "empirico-inductive" method: the hypothesis is developed and refined during the data production and interpretation phase. This type of data is primarily used by human and social sciences.

References

Delamadeleine, C. (2023). Guide rapide de la gestion des données de recherche (p. e0230416). HES-SO. https://www.hes-so.ch/fileadmin/documents/HES-SO/Documents_HES-SO/pdf/open-science/liens-utiles/Brochure_Guide_Rapide_V20240214.pdf
GO-FAIR (2022). FAIR principles. https://www.go-fair.org/fair-principles/
Fournier, T. (2014). Les données de la recherche : définition et enjeux. Arabesques, 73, 4-6. https://dx.doi.org/10.35562/arabesques.985
Institut de l'information scientifique et technique. (2014). Une introduction à la gestion et au partage des données de recherche. https://www.inist.fr/wp-content/uploads/donnees/co/Donnees_recherche_web.html
Latour, B. (1996). Petites leçons de sociologie des sciences. La Découverte.
Organisation de Coopération et de Développement Économiques (2021). Recommandation du Conseil concernant l'accès aux données de la recherche financée sur fonds publics. https://legalinstruments.oecd.org/fr/instruments/OECD-LEGAL-0347
URD Data (2022). Les données de la recherche. https://data.ird.fr/gerer/quelles-donnees/#Les_types_de_donnees

Type de données : guideline
Groupe de travail Guidelines de la Communauté Open Science HES-SO

Research data management

Books

Help

Contributors

Research data in US repositories or servers: potential data loss

Definition

Type of data

Data characteristics

References