Previous Next Contents

Chapter 7:   Message digest

7.1   What is a message digest?

A short answer: a message digest is a numerical summary or ``fingerprint'' of a given set of data (such as a data file or e-mail message).

A longer answer. A computer file is really just a sequence of numbers. For example, the file containing the single word ``Hello'' is really just the sequence 72 101 108 108 111. The file containing the file ``Goodbye'' is the sequence 71 111 111 100 98 121 101. Of course, most computer files are much bigger with correspondingly longer sequences.

Imagine that a file with one million characters needs to be sent from one computer to another computer thousands of miles away. How can the recipient be assured that the data received is in fact the data that was sent? One way would be to send the data twice and compare the two transmissions. This might work, but would be rather slow: it would take twice as long to send the message twice.

So, instead of sending the message again a second time we send a shorter message that gives a summary of the original message. This summary is calculated at both ends, and the receiver compares her summary with the one sent by the transmitter. If they match, this is evidence that the file was received intact.

Of course, the form of this summary will affect how reliable the summary is at detecting if the data was sent correctly.

One sort of summary is called a parity check: add up all the numbers in the file and see if this sum is odd or even. The message digest is then either 0 (if the sum is even) or 1 (if the sum is odd). As you may imagine, this form of fingerprint is not very useful when the files get long.

Thus one desirable aspect of a message digest is that different data sets which differ in a small number of places generate different message digests. This way, changes to the data are readily detected.

One system that does this well is the MD5 message digest. It is ``cryptographically secure'', that is, if one file generates a given MD5 message digest, it is hard to construct a different file with the same MD5 message digest. Thus, it is suitable for ``digital signatures'' that are hard to forge.

7.2   Why use a message digest with TeX?

The last (and often trickiest) step in setting a book is ``page layout''. In the TeX world this corresponds to optimizing figure placement and fixing bad page breaks. This is often delicate: changing one line of text can change not just the page on which the line appears, but the pages following (``text reflow'').

Thus, it would be very nice to determine, after making a small change, which pages have changed. One way to determine this is to look at a printout of each page and see which pages have changed. This, however, is not very practical except for the smallest documents.

My approach is to take each page's ``finger print'' (via a message digest) before making a change, make the change, and then take each page's fingerprint again after making the change. Then, just check which pages' fingerprints have changed.

Here is an example. Let filex.tex contain the following source:
a\eject b\eject c\eject d\eject e\eject f\end
The summary of this file is
dvii -u filex

File size: 628 bytes (1 K)
Comment string:  TeX output 1999.12.17:0937
Page count: 6
Number of fonts: 1
We can save and display the page information along with a message digest of each page if we use the -M1 option:
dvii -p -M1 filex >
[message digest: simple sum (ignore font)]
We now edit filex.tex and change the 'b' that appears on page 2 to a 'B'; we now have
a\eject B\eject c\eject d\eject e\eject f\end
We next TeX the file, save the new fingerprints, and see which pages have changed using the diff utility:
tex filex
dvii -p -M1 filex >
< p:[2/2]::9C8E26468F1B029011D2F28DA18B18CC
> p:[2/2]::9C8E26468F1AE29011D2F28DA18B18CC
Note that page 2 is the only page with a different message digest.

Previous Next Contents