Data managers of the HES-SO Valais-Wallis
Guidelines working group of the Open Science HES-SO Community
Constance Delamadeleine, Open Research Data Project Manager HES-SO
Research Data is the "raw material" from which scientific research produces and justifies its results. To be considered scientifically sound, all research results must be based on the analysis of primary or secondary data, whatever the scientific discipline.
Research data is what makes scientific knowledge possible. It is the basis for the administration of evidence.
Research data | Not considered research data | |
---|---|---|
|
|
|
CNRS's INIST defines 5 types of data4:
Type of data | Definition | Examples |
---|---|---|
Observational data | Data gathered in real time, usually unique and therefore impossible to reproduce | Questionnaires, interviews, neuroimaging, photography |
Experimental data | Data obtained from laboratory equipment, often reproducible but sometimes costly | Chromatograms, DNA chips, trials |
Computational or simulation data | Data generated by computer or simulation models, often reproducible if the model is properly documented | Meteorological data, earthquake simulation data |
Derived or compiled data | Data derived from the processing or combination of "raw" data, often reproducible but costly | Text mining, MRI imaging, compiled databases |
Reference data | Collection or accumulation of small datasets that have been peer-reviewed, annotated and made available | Gene databases, old image databases, archive collections |
It's important to bear in mind that data is never "given" but rather "obtained" 3 through processes involving humans and/or machines. Indeed, if temperature is data for meteorologists, the data they process is factual records obtained from a transmission chain involving sensors, transmitters and receivers. In the same way, if an ancient text is data for the historian, the latter has had to discover its existence through research, obtain authorizations to access it, scan it, translate it, reconstitute it... By the same token, physicists and psychologists obtain data from experiments, sociologists from questionnaires and interviews, geographers from photographs and maps, archaeologists from excavations and the dating and classification of samples, epidemiologists from laboratory analysis (e.g. test results), and so on.
Data is therefore information that has been produced by a methodological process involving human and non-human agents. Every scientific discipline benefits from reflecting on its own modes of data production, so as not to confuse data with the reality it seeks to capture.
Primary or secondary |
When a research protocol produces its own data, it is referred to as primary data. But not all scientific research systematically produces its own set of data before carrying out its analysis. Research data may in fact be produced and supplied to research teams by other teams who have shared their data on data repositories, or by third-party organizations responsible for building databases (e.g. national observatories). Secondary (or second-hand) data is when research teams exploit and analyze data they have not produced themselves. |
---|---|
Formatted and grouped |
Research data needs to be processed so that it can be read, understood, contextualized and linked together. Once formatted and grouped together in the same space, it forms a corpus or data set. Only once it has been assembled can it be analyzed, since the demonstration of scientific correlation and/or evidence is based on the search for and analysis of repetitive patterns. |
Sensitive |
Some personal data may be qualified as sensitive and require special precautions to ensure that its use does not harm individuals (see "Personal data" in the glossary). This includes health data. |
Integrated |
Obtained as part of the implementation of a research protocol, data is in itself a scientific product which both has a value (scientific, historical, but also commercial) and a certain level of confidentiality (sensitive information, intellectual property). It is therefore imperative that data is stored securely, and its sharing must be regulated to ensure that its use is not misappropriated for political or commercial purposes. |
FAIR |
In the context of Open Science policies, data must be FAIR, i.e. easy to find (Findable), accessible (Accessible), interoperable (Interoperable) and reusable (Reusable). To designate all the operations involved in formatting, recording and sharing data in compliance with Open Data policies, we now speak of "FAIR data" or "data FAIRization". More information on the FAIR principles |
Quantitative or qualitative |
Depending on the discipline, data can be quantitative (coded data in large quantities) or qualitative (observational data, speeches and texts requiring interpretation). Although these two types of data can be used in a complementary or interconnected way (grounded theory), they fall under two distinct methodologies:
|