Following are some of the most popular methods for rendering stored information — especially primary account numbers (PANs) — unreadable.
Masking relates to maintaining the confidentiality of data when it’s presented to a person. The process is familiar to anyone who has used a payment card in a restaurant or shop and then checked the printed receipt; certain digits of the PAN are shown as Xs rather than the actual digits (see figure below). Per PCI DSS Requirement 3.3, PAN display should be limited to the minimum number of digits necessary to perform job functions and should not exceed the first six and last four digits.
Masking a PAN for display purposes.
Source: Thales
Truncation renders stored data unreadable by ensuring that only a subset of the complete PAN is stored. As in masking, no more than the first six and last four digits can be stored.
Truncating a PAN
Source: Thales
A hash function is a well‑defined, provably secure cryptographic process that converts an arbitrary block of data (in this case, a PAN) to a different, unique string of data. In other words, every PAN yields a different result. The one‑way hash process is irreversible (which is why it’s called one-way); it’s commonly used to ensure that data hasn’t been modified, because any changes in the original block of data would result in a different hash value.
The figure below illustrates the use of the hash function in the context of the PCI DSS. The technique provides confidentiality (it’s impossible to re‑create a PAN from a hashed version of that PAN), but like truncation, it makes using the stored data for subsequent transactions impossible.
One-way hash of a PAN
Source: Thales
You can’t retain truncated and hashed versions of the same payment card within your cardholder data environment unless you implement additional controls to ensure that the two versions can’t be correlated to reconstruct the PAN.
Tokenization is a process that replaces the original PAN with surrogate data — a token that may look like a legitimate PAN but has no value to an attacker. In most implementations, the process is reversible; tokens can be converted back to the original PANs on request. Tokenization is used when stored PANs need to be accessible for subsequent transactions.
You can create tokens in a variety of ways. Following are two common approaches:
The degree to which the tokenization process is deterministic can be important in certain scenarios. Everything depends on how the tokens are being used. In some cases, it’s desirable to preserve not just the format of the PAN during the tokenization process, but also certain digits of the PAN (see the figure below).
Tokenization of a PAN (last four digits preserved)
Source: Thales
In some ways, the goals of encryption are similar to those of tokenization, in that PAN data is replaced by data that has no intrinsic value to an attacker. Encryption uses standardized cryptographic algorithms and keys to derive the encrypted PAN from the original data. The algorithms are widely known, so the security of the process hinges on the strength and handling of the cryptographic keys, which is why hardware security modules are widely involved.
The encryption process generally changes the format of the data. Typically, data size increases, when that data is encrypted. For the same reason that tokenization attempts to preserve the format of the original PAN data — to minimize changes in existing systems that come into contact with the data — organizations often employ format‑preserving encryption (FPE; see the figure below).
Encryption of a PAN (with FPE and preservation of last four digits).
Source: Thales