Data De-Identification, Masking, and Redaction

There’s a fine balance between processing users’ PII and storing it safely. The primary ways to encrypt information are Data De-Identification, Data Masking, and Data Redaction.

But how exactly do they work? Let’s explore.

Data De-Identification

To put it plainly, data de-identification is the process of removing personally identifiable information (PII) from data. The purpose of data or PII de-identification is to modify the sensitive data in such a way that it’s of no or little value to unauthorized intruders while still being usable by software and authorized personnel.

There are plenty of ways to achieve this goal. Within the wider sphere of Data Anonymization, there are seemingly endless methods to remove PII from data sets to keep everyone involved anonymous. Maybe you’ve heard of some of these methods before; they include pseudonymization, encryption, tokenization, and more.

Irreversible PII De-Identification

For the sake of this article, let’s highlight two of the most popular methods of data or PII de-identification: masking and redaction. These two methods are irreversible, in that information from the original document is permanently obscured. On the other hand, encryption and tokenization are examples of reversible methods, where a party with the correct key can reverse the transformation and view the original document, including the PII.

Which is better: Reversible or irreversible data de-identification? That depends on your end goal. If you aim to pass documents to third parties, the “destructive” methods are safer and simpler, because they don’t rely on maintaining keys or token databases.

Contrastingly, if you need to come back to your data to perform analytics and machine learning, or for legal and archival purposes, then reversible methods such as tokenization or encryption are your only choice.

Data Masking

Data masking is a non-reversible transformation defined as the following: Masking sensitive data by partially or fully replacing characters with a symbol, such as an asterisk (*) or hash (#). See the example below:

Data Redaction

When it comes to data redaction, many of us immediately think of the spy world, with the likes of Jason Bourne or Ethan Hunt from Mission Impossible rummaging through files of documents where all of the useful information has been blacked out. And in this case, Hollywood is actually spot on.

PII redaction is the irreversible process of blacking out or removing information that is personally identifiable, sensitive, confidential, or otherwise classified – typically information coming from scanned documents, images, and PDFs. See the example image below to see what a redacted document might look like.

Source: Logikcull

Data Masking vs Data Redaction

Where the typical goal of data masking is to remove any sensitive information but maintain the same data structure so it can be used in applications, redaction is meant to completely remove certain pieces of information so the remaining text can be released (perhaps to the public, journalists, unauthorized employees, etc.) without exposing any personally identifiable information or classified data.

Examples of Proper Data Masking and Redaction

To further illustrate the differences between data de-identification, redaction, and masking, we’ve provided two comparative tables.

But first, remember that masking and redaction both fall under the data de-identification “umbrella”, as two different methods of removing personally identifying information from data.

De-identification method → Masking:

Now, let’s see how that same information would look as it’s redacted in this example customer complaint email.

De-identification method → Redaction:

Note */**: These two pieces of sensitive medical information make for great examples of PII that is much more difficult to detect in any given text. Understandably, addresses, emails, credit card numbers, etc., are easier for discovery tools to detect, but it takes highly trained software to also recognize someone describing their medical symptoms, etc.

PII Discovery & Protection

Now you’re armed with a better understanding of the data de-identification methods of Masking and Redaction, it’s important to remember there is so much more that goes into data protection than blacking out words and adding placeholders where names should be. If only it was that easy, right?

I mentioned some of these methods at the start of the article, but you can also check out our other sources How to Remediate PII? and Sensitive Data Discovery Tools to learn about the industry standards of PII discovery and protection.

And there’s no time to start like the present! Otherwise, what are you waiting for? All your personally identifiable information isn’t going to de-identify itself!

You Can’t De-Identify What You Can’t Find. Use PII Tools’ Automated Data Discovery Software to Locate and Remediate All At-Risk Data

Download our new whitepaper

PII De-Identification vs. Masking vs. Redaction