The MD5 cryptographic hash function

The MD5 function is a cryptographic algorithm that takes an input of arbitrary length and produces a message digest that is 128 bits long. The digest is sometimes also called the "hash" or "fingerprint" of the input. MD5 is used in many situations where a potentially long message needs to be processed and/or compared quickly. The most common application is the creation and verification of digital signatures.

MD5 was designed by well-known cryptographer Ronald Rivest in 1991. In 2004, some serious flaws were found in MD5. The complete implications of these flaws has yet to be determined.

How MD5 works

Preparing the input

The MD5 algorithm first divides the input in blocks of 512 bits each. 64 Bits are inserted at the end of the last block. These 64 bits are used to record the length of the original input. If the last block is less than 512 bits, some extra bits are 'padded' to the end.

Next, each block is divided into 16 words of 32 bits each. These are denoted as M0 ... M15.

MD5 helper functions

The buffer

MD5 uses a buffer that is made up of four words that are each 32 bits long. These words are called A, B, C and D. They are initialized as

word A: 01 23 45 67
word B: 89 ab cd ef
word C: fe dc ba 98
word D: 76 54 32 10

The table

MD5 further uses a table K that has 64 elements. Element number i is indicated as Ki. The table is computed beforehand to speed up the computations. The elements are computed using the mathematical sin function:

Ki = abs(sin(i + 1)) * 232

Four auxiliary functions

In addition MD5 uses four auxiliary functions that each take as input three 32-bit words and produce as output one 32-bit word. They apply the logical operators and, or, not and xor to the input bits.

          F(X,Y,Z) = (X and Y) or (not(X) and Z)
          G(X,Y,Z) = (X and Z) or (Y and not(Z))
          H(X,Y,Z) = X xor Y xor Z
          I(X,Y,Z) = Y xor (X or not(Z))

Processing the blocks

The contents of the four buffers (A, B, C and D) are now mixed with the words of the input, using the four auxiliary functions (F, G, H and I). There are four rounds, each involves 16 basic operations. One operation is illustrated in the figure below.

One operation performed in a round of the MD5 function

The figure shows how the auxiliary function F is applied to the four buffers (A, B, C and D), using message word Mi and constant Ki. The item "<<<s" denotes a binary left shift by s bits.

The output

After all rounds have been performed, the buffers A, B, C and D contain the MD5 digest of the original input.

Reference implementation

A more complete discussion of MD5, and a reference implementation, can be found in RFC 1321.

See also