“Estimating and Comparing Entropy across Written Natural Languages Using PPM Compression”, 2002 (; backlinks):
Previous work on estimating the entropy of written natural language has focused primarily on English. We expand this work by considering other natural languages, including Arabic, Chinese, French, Greek, Japanese, Korean, Russian, and Spanish.
We present the results of PPM compression on machine-generated and human-generated translations of texts into various languages. Under the assumption that languages are equally expressive, and that PPM compression does well across languages, one would expect that translated documents would compress to the same size. We verify this empirically on a novel corpus of translated documents.
We suggest as an application of this finding using the size of compressed natural language texts as a mean of automatically testing translation quality.