Crash course on cryptography: Digital signatures

In this document

The basic principle
How digital signatures work
Applications of digital signatures

The basic principle

In public key cryptography, anything Alice encrypts with Bob's public key can be decrypted by Bob with the corresponding private key. Alice can also encrypt a message with her private key, which means that Bob can decrypt it with Alice's public key. Since the public key is, as the name suggests, publicly available, this is not very good idea if Alice wants to keep that message a secret. Eve can also simply obtain a copy of Alice's public key and thus also decrypt the message.

But because Alice keeps her private key to herself, Bob knows that only Alice could have encrypted this message. Bob can now be sure that this message was written by Alice. A signature on a paper message serves as proof that this message was written by the person who signed it. Encrypting with a private key thus can be regarded as an equivalent to placing one's signature on the message. This is why this is called creating a digital signature for the message.

If Alice wants to keep the message a secret that only Bob is allowed to learn, she of course then simply encrypts the digitally signed message with Bob's public key. Bob first decrypts the message with his own private key and then decrypts the result with Alice's public key. He now knows that no one else could have read the message (because it was encrypted using his public key) and that no one but Alice could have written this message (because it was encrypted using her private key).

How digital signatures work

Digitally signing large messages takes a long time, just like encrypting large messages with someone's public key. Just like with public key encryption, placing digital signature therefore involves an extra step. First a summary of the message is computed, and then this summary is signed.

Cryptographic hash functions

The summary is generated using so-called cryptographic hash functions. A cryptographic hash function can transform input of an arbitrary length to an output of a certain number of bits, typically 128 or 160 bits. The output is called the hash value. Well-known hash functions are MD5 and SHA-1, although many more exist.

A very simple example of a hash function is to simply add up the position in the alphabet of all the characters in the message. For example, the message "ape" would give as output 22 (1 plus 16 plus 5).

Since the hash value is usually shorter than the message itself, this makes it easier and faster to compare two messages or to find a particular message in a table. For example, it is common in database management systems to compute the hash value of all the names in a database with information on people. To determine whether a particular person occurs in the database, the hash value of his name is computed and compared against the hash values of all the names. This is much faster than comparing the name itself against all the names in the database, because the hash value is a number of a fixed length. Names can be many characters long and each character has many more possibilities than just 0-9.

Hash functions should have two properties:

Given a particular output, it should be difficult to find a message that has that particular output (for cryptographers this means the hash function is "one-way").
Given two messages, the chance that they have the same hash value should be small (cryptographers refer to this as "collision-free").

If a particular hash function has these properties, it is called a cryptographic hash function. It is now possible to use the hash value of a message instead of the message itself.

The simple example given above does not have these properties. There are many messages that have the hash value 22. And furthermore, it is quite easy to find another message that also has this hash value.

Cyclic Redundancy Check functions

So-called CRC (Cyclic Redundancy Check) functions are often used to check the integrity of a message. The output of a CRC function is normally called the checksum of the message. CRC functions are designed to help detect errors in the message that occurred during transmission. If a disruption on the communication channel changed one of the bits of the message, the checksum of the resulting message will be different. A checksum thus can be seen as the hash of a message.

However, CRC functions are not as strong as cryptographic hash functions. They produce short checksums, typically 32 bits (4 characters), which means that the chance that two messages have the same checksum is quite real. Furthermore, with longer messages it is easily possible to modify the message without affecting the checksum. Sometimes it is sufficient to simply append a certain number of spaces to the end of the modified message to make sure that the result has the same checksum as the original message.

Cryptographic hash functions and digital signatures

Hash functions can be used to determine whether a message has been modified. Alice computes the hash value of the message she wants to send to Bob and sends the hash value of the message together with the message to Bob. Bob computes the hash value of the message he receives, and compares it against the hash value he received from Alice. If these two hash values are the same, Bob knows that the message was not modified. After all, the second property of the hash function says that the chance that the modified message has the same hash value as the original message is very small.

Eve can now no longer just modify the message without Bob noticing this. However, Eve can modify the message and compute the hash value of the modified message. She can then replace the hash value that Alice sent with the hash value she computed. Bob will then think that the message was not modified, because the message he received has the same hash value as the one he got from Alice. But Bob has no way to know that he did not get that hash value from Alice.

Of course this is where digital signatures come in. After computing the hash value of the message she wants to send, Alice digitally signs this hash value and sends the result (the digital signature of the message) to Bob. Bob then decrypts the digital signature using Alice's public key. He compares the result with the hash value he computed for the message he received and so determines whether the message was modified. If everything checks out, Bob knows that this message really came from Alice and it was not modified.

Because Eve does not have Alice's private key, she is no longer able to replace the hash value that Alice signed with the hash value of the modified message. And it is next to impossible for Eve to modify the message in such a way that the hash value remains the same. Because of the first property of the hash function, it is difficult for Eve to find another message that has the same hash value. And even if she manages to find one, the chance that this other message is even remotely the same as the original message from Alice is extremely small.

An important reason for using a cryptographic hash function is that the message remains in unencrypted form. Furthermore, the (digitally signed) hash value can now be transmitted and stored invisible to the user, for example as part of the headers of an e-mail message or encapsulated using the well-known MIME standard. The digital signature can also be transmitted over an entirely separate channel. Alice could publish the digital signature of a message in a newspaper. This way, she could later prove that she had a copy of this message on the date of publication of this newspaper without having to reveal the message. This can be useful for example if Alice had to prove that she wrote a particular message and did not infringe on somebody else's copyright.

Applications of digital signatures

Making files unique with digital signatures is the basis of digital cash (electronic money). Alice the banker creates electronic banknotes of various denominations and puts a unique number on every banknote. She signs the result. Bob the client now makes a withdrawal from his account with Alice and receives some of the signed banknotes. The banknotes can be anonymous or include Bob's name. Bob then goes to Charlie's electronic hardware store and purchases a digital camera using these banknotes as payment. Charlie verifies that the banknotes bear Alice's signature and so knows that they are not counterfeit.

Bob could of course make as many copies of the signed banknotes as he wants, since the banknotes are in electronic form. Charlie therefore now has to go to Alice and report to her the unique number on the banknote he received. Alice will then record that number as "spent" and indicate to Charlie that the transaction is okay. If the number was already recorded as "spent", Alice will reject the transaction. If the transaction is okay, the amount indicated on the banknotes is credited to Charlie's account.

This system has many advantages over traditional payment techniques. Alice can create banknotes of any denomination, including for example millicents (0.001 cents). This way for example an electronic archive could charge one millicent for every document Bob requests, and Bob could pay that without having to take a subscription or make a deposit in advance.

One disadvantage of this system is that it requires Charlie to immediately check with Alice whether the banknote he still valid. If Charlie waits even a few minutes, Bob can spend the banknote again at Dave's. Then either Charlie or Dave is not going to get his money.

This principle is currently used for electronic coupons. As a coupon is less valuable than a banknote, the risk of double spending a coupon appears to be acceptable. Furthermore, coupons are usually only valid at one particular store.

Timestamping services

The insight that the digital signature of a message can be handled totally separate from the message itself has given rise to several interesting new applications. It is now possible for example to offer a digital timestamping service. This basically works just like the above example of publishing a digital signature in a newspaper. Alice now sends the document she wants timestamped to a timestamping server. The server computes the hash value of Alice's document, adds the current date and time and digitally signs the result (the "timestamp"). The timestamp is then published for example on the Web site of the timestamping service.

From that moment on anyone can verify that that particular document existed at the time indicated by the timestamping server. Because the hash value contained in the timestamp is for all practical purposes unique for that document, it is not possible for Alice to re-date a later document by appending the timestamp of the original.

It is of course possible for the timestamping server to generate timestamps with any date and time. Alice could bribe the operator of the server to generate a timestamp that is one year in the past, for example. A simple trick to make this impossible is to assign a sequence number to every timestamp. The first timestamp gets the number 1, the second gets the number 2, and so on. This way it is not possible to insert a later timestamp between two earlier ones, because there is no sequence number available.

The operator of the timestamping server could now simply remove one of the other timestamps and replace it by the one Alice gave him. As long as nobody notices that a timestamp went missing or changed, he can get away with this. By inserting a sequence number and the hash value of the previous timestamp this also becomes impossible. If the operator replaced the fourth timestamp with something else, the hash value for the fourth timestamp contained in the fifth timestamp is no longer correct. The operator could then of course also replace the fifth timestamp, but then the sixth timestamp and all the other timestamps also have to be replaced.

While it is of course possible for the operator to replace all timestamps in order to insert one forged one, the chance that somebody will notice is of course much bigger. To increase the security of the system the server could regularly publish some of the timestamps in a location where it can no longer be modified. For example, every Sunday the most recent timestamp could be published in the New York Times. Or the server could make a document listing the last 100 timestamps and timestamp that one. This "super- timestamp" could then be published in a newspaper, or maybe sent to another timestamping server operated by a different entity. This way Alice would have to bribe different persons.

Crash course on cryptography: Digital signatures

In this document

See also

The basic principle

How digital signatures work

Cryptographic hash functions

Cyclic Redundancy Check functions

Cryptographic hash functions and digital signatures

Applications of digital signatures

See also

Authenticating Web servers

Electronic money (digital cash)

See also

Timestamping services

Signed computer programs

See also

All parts