“Compression and Information Leakage of Plaintext”, 2002 ():
Cryptosystems like AES and triple-DES are designed to encrypt a sequence of input bytes (the plaintext) into a sequence of output bytes (the ciphertext) in such a way that the output carries no information about that plaintext except its length. In recent years, concerns have been raised about “side-channel” attacks on various cryptosystems—attacks that make use of some kind of leaked information about the cryptographic operations (eg. power consumption or timing) to defeat them.
In this paper, we describe a somewhat different kind of side-channel provided by data compression algorithms, yielding information about their inputs by the size of their outputs.
The existence of some information about a compressor’s input in the size of its output is obvious; here, we discuss ways to use this apparently very small leak of information in surprisingly powerful ways.
The compression side-channel differs from side-channels described in 1996, Kelsey et al 199628ya, & et al 1999 in two important ways:
It reveals information about plaintext, rather than key material.
It is a property of the algorithm, not the implementation. That is, any implementation of the compression algorithm will be equally vulnerable.
Our results are as follows:
Commonly-used lossless compression algorithms leak information about the data being compressed, in the size of the compressor output. While this would seem like a very small information leak, it can be exploited in surprisingly powerful ways, by exploiting the ability of many compression algorithms to adapt to the statistics of their previously-processed input data.
We consider the “stateless compression side-channel”, based on the compression ratio of an unknown string without reference to the rest of the message’s contents. We also consider the much more powerful “stateful compression side-channel”, based on the compression ratio of an unknown string, given information about the rest of the message.
We describe a number of simple attacks based mainly on the stateless side-channel
We describe attacks to determine whether some string S appears often in a set of messages, using the stateful side-channel.
We describe attacks to extract a secret string S that is repeated in many compressed messages, under partial chosen plaintext assumptions, using the stateful side-channel.
We consider countermeasures that can make both the stateless and the stateful side-channels substantially harder to exploit, and which may thus block some of these attacks.
We discuss the implications of these results, in light of the widespread use of compression with encryption, and the ‘folk wisdom’ suggesting that adding compression to an encryption application will increase security.