Message digest / cryptographic hash functions

Cryptographic hash functions take an input of arbitrary length and produces a message digest that is of a fixed, short length (e.g. 128 or 160 bits). The digest is sometimes also called the "hash" or "fingerprint" of the input. Hash functions are used in many situations where a potentially long message needs to be processed and/or compared quickly. The most common application is the creation and verification of digital signatures.

Requirements

A message digest or hash function is used to turn input of arbitrary length into an output of fixed length, which is called the digest or hash of the input. This output can then be used in place of the original input. This has many advantages. The output always has the same length, so this can be taken into account when processing or storing the message digest. Also, the output is much shorter than the input, so that processing and storing can be done much quicker.

The most common cryptographic hash function is MD5. MD5 was designed by well-known cryptographer Ronald Rivest in 1991. In 2004, some serious flaws were found in MD5. The complete implications of these flaws has yet to be determined. Another popular hash function is SHA-1.

To make hash functions work, they should have two properties:

  1. Given a particular message digest, it should be very difficult to find an input that has the same message digest.
  2. It should be very difficult to find two inputs that have the same message digest.

Reverting a digest function to recover the input

The first property prevents people from taking a particular digest and using it in connection with another message. You could publish a digest in a newspaper, which proves that you had access to the input message on that date. No one else can know the message, because of this first property.

Finding two inputs with the same digest

The second property means that if two inputs produce the same message digest, they must be the same input as well. Often the second requirement is taken even further: if a message is changed slightly, the message digest of the changed message should have changed a great deal.

Digital signatures

A popular application of message digest or hash functions is digital signatures. Computing a digital signature for a long message is very time-consuming. However, computing a digital signature for a message that is only 128 or 160 bits long can be done quickly. So, instead of digitally signing the message, the message's hash is signed.

To verify the signature, the recipient of the message computes the hash of the message he received. He compares this against the fingerprint that was signed. If they are the same, then (because of the second property above), the message he received is authentic.

Should an attacker have manipulated the message, then the fingerprint of the manipulated message will be very different from the fingerprint that has been signed by the sender. The attacker is not able to create a new fingerprint that is signed by the original sender.

Integrity verification

File transmissions over networks such as the Internet may sometimes introduce small errors. To verify whether the received file is identical to the original, the recipient computes the hash of the received file. This hash is then compared to the hash of the original file. That original hash was published on the website or FTP site where the original can be downloaded. It could also be transmitted along with the file.

This use of hash functions is comparable to the use of checksums or Cyclic Redundancy Check (CRC) functions. However CRC functions typically produce outputs of only 32 bits long. It is easy to find a different input that produces the same 32-bit output.

In this scenario, tampering with the original file cannot be detected. If that is also desirable, the creator of the original file should not just publish the hash but should digitally sign that hash first.

Message authentication codes

A Message Authentication Code (MAC) is somewhat similar to a digital signature. Sometimes a MAC is called a keyed hash function. The sender creates the MAC using the message to be authenticated and a secret key. The recipient verifies that the MAC is authentic using this same secret key. This is different from digital signatures, where a public/private key pair is used.