Crash course on cryptography: Digital signatures

Public key cryptography is not only used to protect messages. An important application is the creation and checking of so-called digital signatures. Digital signatures are coupled to the electronic document to which they apply. This coupling is established using public-key cryptography and so-called cryptographic hash functions.

The basic principle

In public key cryptography, anything Alice encrypts with Bob's public key can be decrypted by Bob with the corresponding private key. Alice can also encrypt a message with her private key, which means that Bob can decrypt it with Alice's public key. Since the public key is, as the name suggests, publicly available, this is not very good idea if Alice wants to keep that message a secret. Eve can also simply obtain a copy of Alice's public key and thus also decrypt the message.

But because Alice keeps her private key to herself, Bob knows that only Alice could have encrypted this message. Bob can now be sure that this message was written by Alice. A signature on a paper message serves as proof that this message was written by the person who signed it. Encrypting with a private key thus can be regarded as an equivalent to placing one's signature on the message. This is why this is called creating a digital signature for the message.

If Alice wants to keep the message a secret that only Bob is allowed to learn, she of course then simply encrypts the digitally signed message with Bob's public key. Bob first decrypts the message with his own private key and then decrypts the result with Alice's public key. He now knows that no one else could have read the message (because it was encrypted using his public key) and that no one but Alice could have written this message (because it was encrypted using her private key).

How digital signatures work

Digitally signing large messages takes a long time, just like encrypting large messages with someone's public key. Just like with public key encryption, placing digital signature therefore involves an extra step. First a summary of the message is computed, and then this summary is signed.

Cryptographic hash functions

The summary is generated using so-called cryptographic hash functions. A cryptographic hash function can transform input of an arbitrary length to an output of a certain number of bits, typically 128 or 160 bits. The output is called the hash value. Well-known hash functions are MD5 and SHA-1, although many more exist.

A very simple example of a hash function is to simply add up the position in the alphabet of all the characters in the message. For example, the message "ape" would give as output 22 (1 plus 16 plus 5).

Since the hash value is usually shorter than the message itself, this makes it easier and faster to compare two messages or to find a particular message in a table. For example, it is common in database management systems to compute the hash value of all the names in a database with information on people. To determine whether a particular person occurs in the database, the hash value of his name is computed and compared against the hash values of all the names. This is much faster than comparing the name itself against all the names in the database, because the hash value is a number of a fixed length. Names can be many characters long and each character has many more possibilities than just 0-9.

Hash functions should have two properties:

1. Given a particular output, it should be difficult to find a message that has that particular output (for cryptographers this means the hash function is "one-way").
2. Given two messages, the chance that they have the same hash value should be small (cryptographers refer to this as "collision-free").

If a particular hash function has these properties, it is called a cryptographic hash function. It is now possible to use the hash value of a message instead of the message itself.

The simple example given above does not have these properties. There are many messages that have the hash value 22. And furthermore, it is quite easy to find another message that also has this hash value.

Cyclic Redundancy Check functions

So-called CRC (Cyclic Redundancy Check) functions are often used to check the integrity of a message. The output of a CRC function is normally called the checksum of the message. CRC functions are designed to help detect errors in the message that occurred during transmission. If a disruption on the communication channel changed one of the bits of the message, the checksum of the resulting message will be different. A checksum thus can be seen as the hash of a message.

However, CRC functions are not as strong as cryptographic hash functions. They produce short checksums, typically 32 bits (4 characters), which means that the chance that two messages have the same checksum is quite real. Furthermore, with longer messages it is easily possible to modify the message without affecting the checksum. Sometimes it is sufficient to simply append a certain number of spaces to the end of the modified message to make sure that the result has the same checksum as the original message.

Cryptographic hash functions and digital signatures

Hash functions can be used to determine whether a message has been modified. Alice computes the hash value of the message she wants to send to Bob and sends the hash value of the message together with the message to Bob. Bob computes the hash value of the message he receives, and compares it against the hash value he received from Alice. If these two hash values are the same, Bob knows that the message was not modified. After all, the second property of the hash function says that the chance that the modified message has the same hash value as the original message is very small.

Eve can now no longer just modify the message without Bob noticing this. However, Eve can modify the message and compute the hash value of the modified message. She can then replace the hash value that Alice sent with the hash value she computed. Bob will then think that the message was not modified, because the message he received has the same hash value as the one he got from Alice. But Bob has no way to know that he did not get that hash value from Alice.

Of course this is where digital signatures come in. After computing the hash value of the message she wants to send, Alice digitally signs this hash value and sends the result (the digital signature of the message) to Bob. Bob then decrypts the digital signature using Alice's public key. He compares the result with the hash value he computed for the message he received and so determines whether the message was modified. If everything checks out, Bob knows that this message really came from Alice and it was not modified.

Because Eve does not have Alice's private key, she is no longer able to replace the hash value that Alice signed with the hash value of the modified message. And it is next to impossible for Eve to modify the message in such a way that the hash value remains the same. Because of the first property of the hash function, it is difficult for Eve to find another message that has the same hash value. And even if she manages to find one, the chance that this other message is even remotely the same as the original message from Alice is extremely small.

An important reason for using a cryptographic hash function is that the message remains in unencrypted form. Furthermore, the (digitally signed) hash value can now be transmitted and stored invisible to the user, for example as part of the headers of an e-mail message or encapsulated using the well-known MIME standard. The digital signature can also be transmitted over an entirely separate channel. Alice could publish the digital signature of a message in a newspaper. This way, she could later prove that she had a copy of this message on the date of publication of this newspaper without having to reveal the message. This can be useful for example if Alice had to prove that she wrote a particular message and did not infringe on somebody else's copyright.

Applications of digital signatures

Digital signatures offer many applications other than signing messages such as e-mail. A digital signature can be created for any kind of file. The digital signature then can be used as proof that the file was not modified after the digital signature was created. It can also be used to make the file unique, for example by appending a serial number to the file and signing the result.

Authenticating Web servers

Using public key cryptography a Web browser and server can communicate with each other securely. The browser can encrypt a session key using the public key of the server and send it to the server.

In this application the Web browser typically obtains a copy of the public key of the server by requesting a certificate containing this public key from the server. This certificate has been signed by some trusted third party. The public key of this trusted third party has been programmed into the Web browser beforehand. Using this public key the browser can determine that the certificate is authentic. The browser then knows it has the right public key.

Electronic money (digital cash)

Making files unique with digital signatures is the basis of digital cash (electronic money). Alice the banker creates electronic banknotes of various denominations and puts a unique number on every banknote. She signs the result. Bob the client now makes a withdrawal from his account with Alice and receives some of the signed banknotes. The banknotes can be anonymous or include Bob's name. Bob then goes to Charlie's electronic hardware store and purchases a digital camera using these banknotes as payment. Charlie verifies that the banknotes bear Alice's signature and so knows that they are not counterfeit.

Bob could of course make as many copies of the signed banknotes as he wants, since the banknotes are in electronic form. Charlie therefore now has to go to Alice and report to her the unique number on the banknote he received. Alice will then record that number as "spent" and indicate to Charlie that the transaction is okay. If the number was already recorded as "spent", Alice will reject the transaction. If the transaction is okay, the amount indicated on the banknotes is credited to Charlie's account.

This system has many advantages over traditional payment techniques. Alice can create banknotes of any denomination, including for example millicents (0.001 cents). This way for example an electronic archive could charge one millicent for every document Bob requests, and Bob could pay that without having to take a subscription or make a deposit in advance.

One disadvantage of this system is that it requires Charlie to immediately check with Alice whether the banknote he still valid. If Charlie waits even a few minutes, Bob can spend the banknote again at Dave's. Then either Charlie or Dave is not going to get his money.

This principle is currently used for electronic coupons. As a coupon is less valuable than a banknote, the risk of double spending a coupon appears to be acceptable. Furthermore, coupons are usually only valid at one particular store.

Timestamping services

The insight that the digital signature of a message can be handled totally separate from the message itself has given rise to several interesting new applications. It is now possible for example to offer a digital timestamping service. This basically works just like the above example of publishing a digital signature in a newspaper. Alice now sends the document she wants timestamped to a timestamping server. The server computes the hash value of Alice's document, adds the current date and time and digitally signs the result (the "timestamp"). The timestamp is then published for example on the Web site of the timestamping service.

From that moment on anyone can verify that that particular document existed at the time indicated by the timestamping server. Because the hash value contained in the timestamp is for all practical purposes unique for that document, it is not possible for Alice to re-date a later document by appending the timestamp of the original.

It is of course possible for the timestamping server to generate timestamps with any date and time. Alice could bribe the operator of the server to generate a timestamp that is one year in the past, for example. A simple trick to make this impossible is to assign a sequence number to every timestamp. The first timestamp gets the number 1, the second gets the number 2, and so on. This way it is not possible to insert a later timestamp between two earlier ones, because there is no sequence number available.

The operator of the timestamping server could now simply remove one of the other timestamps and replace it by the one Alice gave him. As long as nobody notices that a timestamp went missing or changed, he can get away with this. By inserting a sequence number and the hash value of the previous timestamp this also becomes impossible. If the operator replaced the fourth timestamp with something else, the hash value for the fourth timestamp contained in the fifth timestamp is no longer correct. The operator could then of course also replace the fifth timestamp, but then the sixth timestamp and all the other timestamps also have to be replaced.

While it is of course possible for the operator to replace all timestamps in order to insert one forged one, the chance that somebody will notice is of course much bigger. To increase the security of the system the server could regularly publish some of the timestamps in a location where it can no longer be modified. For example, every Sunday the most recent timestamp could be published in the New York Times. Or the server could make a document listing the last 100 timestamps and timestamp that one. This "super- timestamp" could then be published in a newspaper, or maybe sent to another timestamping server operated by a different entity. This way Alice would have to bribe different persons.

Signed computer programs

Digital signatures can also be used to authenticate software applications. The manufacturer of a computer program can generate a digital signature for the executable. When a user downloads the program, he can verify that the digital signature is correct. He then knows that this program was really made by that particular manufacturer. If he trusts that manufacturer, he can safely install the application. The manufacturer of course promises that the application will not do anything malicious.

The source code of many open source software programs is distributed together with the digital signature of the author(s). This way the recipients can check that they have not been modified by anyone else. For instance everyone can verify the authenticity of the Linux kernel by checking whether it was properly signed by Linus Torvalds.

ActiveX controls (more or less comparable to Java applets, but based on a Microsoft standard) are digitally signed. Microsoft's Internet Explorer checks the digital signature using a Microsoft public key installed in the browser. The control is only executed if the digital signature is authentic. If the signature does not check out, or the browser security level is set to high, the user will be asked to confirm execution.

Unfortunately it appears to be difficult to properly sign ActiveX controls so that all users can verify that the signature is authentic. This has led to the practice of telling users in the installation manual or on the web page containing the control to simply press "Yes" whenever Internet Explorer says anything about the digital signature. This makes it of course very easy for a hacker to replace the ActiveX control with anything he desires. Although the digital signature will not check out, the user will simply follow the manual and click "Yes" anyway.

More recently suggestions have been made to extend this application to hardware as well. The CPU in a PC would check the digital signature on the operating system or on applications to be executed. If the digital signature does not check out, or it was not created by an authorized manufacturer, the CPU refuses to execute the operating system or program. It is unclear at the time of writing whether the owner of the PC in question will be able to indicate who are authorized manufacturers.