Drupal

GDPR and Drupal, my loves – Questioning Data Anonymization

Published on 28 December 2021
Photo illustrating anonymous people walking under a bridge
The GDPR has significantly changed our relationship with personal data, and that's a good thing. In our daily work, how do we tackle the tricky issue of anonymizing user data in our development and testing environments? The answer in a few lines, with Drupal or Drush...

The anonymization of data, a strong constraint required by the GDPR

We regularly set up, during development, redesign, or maintenance projects of Drupal sites, multiple hosting environments (locally for development with Lando, on our dedicated infrastructure for staging and production environments).

To meet the requirements of the GDPR, we must anonymize the personal data present in Drupal databases (emails, addresses, names, user information).

Available functions, modules, and use cases

The Drush Sql Sanitize function

There is a SQL-SANITIZE function that can be used with Drush. It runs cleaning operations on a site's database. It is used as follows:

drush sql-sanitize or drush sqlsan

Documentation can be found here -
https://drushcommands.com/drush-9x/sql/sql:sanitize/

Drupal Modules

The User Sanitize module, based on this function, can be useful with its user interface (UI) and its configuration page. It is not compatible with Drupal 9, but it is very easy to make it compatible — it is a very short module.
The maintainer of the module is looking for someone to take over! Update operations started but stopped 3 years ago — https://www.drupal.org/project/user_sanitize/issues/3002688.

These two methods do not solve the fact that there are unencrypted copies of the database on your local environments or servers. Indeed, the databases are only encrypted after being imported into the development environment. Between the export from production and the import, the data will not be anonymized. In other words, these solutions may be useful for sites already existing on target environments.

GDPR Module

The Mask User Data module is also available and offered nearly the same features as sql-sanitize except it was based on the php Faker library (https://github.com/fzaninotto/Faker) and added a Drush command: drush mud. It is now obsolete and has been replaced by the popular "General Data Protection Regulation" module. This module, still in Alpha version, provides a Drush command whose main goal is to prevent developers from accessing sensitive personal user data by obfuscating the configured fields of SQL dumps.

drush gdpr-sql-dump

The GDPR Dump Project

A GDPR Dump project has been launched on Github to replace the mysqldump function with an alternative that exports the already encrypted database -
https://github.com/machbarmacher/gdpr-dump.

An extension is available for a Drupal database. Example of use:

mysqldump drupal --host=xxxx --user=drupaluser --password=xxxxxxxx users_field_data --gdpr-expressions='{"users_field_data":{"name":"uid","mail":"uid","pass":"\"\""}}' --debug-sql

Likewise, a ticket has been opened on the Drupal Backup & Migrate project to anonymize data during backups: https://www.drupal.org/project/backup_migrate/issues/2975065.

This is why there is a project in the sandbox - Backup And Migrate Sanitizer (GDPR).

Conclusion

  • We have standardized the use of the Drush sql-sanitize command on all our development and pre-production environments. By default, it targets user tables (users_field_data), fields (user__*), sessions, and comments (comment_field_data). The data is replaced by random values according to data type.
  • If needed, we are considering porting the User Sanitize module to Drupal 9.
  • We have standardized the use of the sqldump command with data encryption options on new imported projects.

Read more articles on Drupal