The HES-SO recommends that all personal and sensitive data be anonymized as early as possible in the research process, throughout the duration of the project, or at the very least and in any case before the research data is valorized and deposited in a database.
If anonymization is not possible, then pseudonymization is recommended. For projects submitted to cantonal human research ethics commissions, stricter or more specific conditions may be imposed and must be respected.
Anonymization and pseudonymization of research data are operations aimed at modifying data sets by deleting or transforming personal data in order to prevent the possibility of identifying the people who are the subject of the research. These two operations aim to protect sensitive data, concerning people's identity, health, cultural practices, political/religious opinions or social affiliations. They fall within the scope of personal data protection laws (FADP/ GDPR).
The same processes are sometimes referred to under different names or terms:
This is an irreversible operation, the consequences of which (necessary loss of information) must be measured. No longer identifying, anonymized data is therefore not subject to data protection laws (FADP/ GDPR). It can then be shared and reused without restriction, and stored for an unlimited period of time, provided that those responsible for data processing preserve the anonymous nature of the data produced over time.
To anonymize a data set, you can, for example, proceed as follows:
The aim in both cases is to confuse the socio-geographical data, making it impossible to re-identify individuals by correlation.
The operation consists of replacing directly identifying data (surname, first names, etc.) in a dataset with indirectly identifying data (an alphanumeric code, for example). This is a reversible operation, since the information removed from the dataset is grouped together in a separate document (correspondence table) that can be consulted to re-identify the data. Since it is always possible to re-identify survey participants, pseudonymized (or coded) data sets are considered personal data and are therefore subject to the Data Protection Act (DPA / GDPR), particularly with regard to the retention period and the possibility for data subjects to exercise their rights. The sharing and re-use of pseudonymized data sets are subject to authorization (in particular access to the file containing re-identifying data). Data controllers must maintain regulated access to the re-identification file (correspondence table) over time.
In order to pseudonymize a data set, a number of coding operations are performed:
In all three cases, the aim is to confuse the socio-geographical data, so as to make it impossible to re-identify individuals without access to correspondence tables. It is therefore essential that files containing re-identification information should only be accessible to authorized persons and under strict and pre-specified conditions.
For a list of direct identifiers (from which a person can be immediately identified) and indirect identifiers (which can compromise data confidentiality if linked to other data sources), see the guideline published by Réseau Portage7.
Original text
"Last year I followed a couple from Afghanistan. The husband had Hodgkins cancer, which was tough. They already had two children and she was pregnant with twins. She bled during the pregnancy and had to be hospitalized. He was in the middle of treatment. Weakened but still at the refugee center. He couldn't really look after the children in his wife's absence. He slept a lot. The children were 3 years and 18 months old. She was under a lot of stress, because he was texting her and saying that things weren't going well. He was also feeling sick from the chemo. One day, we found the children in the Ikea shopping center opposite the refugee center ... they had crossed the main road on their own. The social service ..." (dummy example - situations of similar complexity in the data)
Pseudonymized text
"Recently I followed a couple from the Middle East. The husband had a chronic illness, it was hard. They already had two children and she was pregnant with twins. She had complications during the pregnancy and had to be hospitalized. He was ... weakened but still at the refugee center. He couldn't really look after the children in his wife's absence. He slept a lot. The children were under 4. She was under a lot of stress, because he was texting her and saying that things weren't going well .... .... One day, the children ran away and were found after a few hours ...they had crossed a main road on their own. The social service ...".
Source: Perrenoud (2021)