So much of what I will write on this blog will assume a knowledge of crypto. I thought I’d create a post I could reference back to for many future posts to keep things simple and easy to understand.
Algorithms and Keys
We all know what an algorithm is. In cryptography it’s not the algorithm you keep secret. The algorithm should be designed in such a way that if it is discovered, unless the snooper has the key, it is useless to him/her. Many algorithms are publicly published. Key secrecy is what’s important. It even goes as far as to say that if you know the algorithm, in other words, you know what mathematics was used and you also know the ciphertext itself, a good encryption algorithm will still keep the plaintext data secret! It sounds impossible because if somebody told me “the answer is 27 and I know to get that number the algorithm divided by 10, added 4 then multiplied by 3”, I’d be able to pretty quickly calculate the input was 50. But these algorithms use special one-way functions (which we’ll look at in a moment) which make that impossible (or at least so difficult you’d never bother to attempt it) to do.
There are 2 broad classes of algorithm: symmetric and asymmetric.
a) Symmetric algorithms use the same key for both encryption and decryption.
b) Asymmetric algorithms, such as public/private key cryptography use one key for encryption and a different, though mathematically related key for decryption. That last sentence sounds counter-intuitive. As does the idea that publishing your algorithm still maintains secrecy. But that’s the truth. Of course an algorithm is only secure until/if somebody finds a weakness.
There is a scenario known as the doomsday scenario. This is the day somebody publishes an algorithm for predicting prime numbers. Almost the entirety of public/private key cryptography (used by protocols such as SSL/TLS) is based on the notion that there is no pattern to a series of prime numbers, other than that they are prime.
There is a possibility that somebody has already come up with a prime-prediction algorithm. It would certainly be in their interest to keep it secret!
The encryption algorithm has 2 inputs – plaintext and the key. It has one output, ciphertext.
If decrypting data, 2 inputs: ciphertext and the key. It has one output, plaintext.

One Way Functions
Mathematical functions where it is difficult (or impossible) to get back to the source values, knowing only the output values, are known as one-way functions. There are many, but modular arithmetic gives us a method that is used extensively in cryptography.
A simple example is where, on a 12 hour clock face, you add 5 hours to 9am. The answer is 2 pm. Or written down we could say:
9+5=2
Because this is an example of modular arithmetic where the modulus is 12, we’d actually write:
9+5=2(mod12)
Let’s take a simple function:
3x where x=2
This is a function for turning 2 in to 9, because it’s the same as 3 * 3, which equals 9. There is a direct relationship between the magnitude of x and the magnitude of the function result. Using modular arithmetic can give the function a great property – unpredictability.
x 1 2 3 4 5 6
3x 3 9 27 81 243 729
3x(mod7) 3 2 6 4 5 1
Your password in Windows is stored as a one way function – albeit one that is considerably more complex than what you’ve just seen.
Hashes
There are some very useful algorithms that produce fixed length strings. For example it doesn’t matter how much data you feed in to an MD4 hashing algorithm, you’ll always get a 128 bit string out of it. That means there must be rather a lot of input strings that will produce exactly the same output strings – and indeed there are; they’re called collisions; they are so rare, you might as well consider them to never happen, in just the same way you are very unlikely to ever have 2 GUIDs of the same value if the programming framework you use has true randomness and a uses big enough numbers for GUIDs. MD4 is a complicated one-way function, so predicting collisions is, to all intents and purposes, impossible. Unless you keep trying and comparing the output strings. This is, in theory the only way to find out. It’s called a brute force attack. You just use computing power to very quickly run through all possible combinations. This could take a long time. A very long time. Hashes are very useful because they allow you to perform comparisons very quickly. If you have a large message, you can create an MD4 hash of the message and send the hash to somebody over a slow network. They can then run a hash on data they hold, which they believe to be identical and compare the hash you sent them, with the one they just generated. If they are the same, it means the 2 datasets are the same.
So say if I take the string “1” and put it through hashing functions – I’ll end up with:
MD4: 8BE1EC697B14AD3A53B371436120641D
MD5: C4CA4238A0B923820DCC509A6F75849B
However, if I take the string “2”, which is numerically almost identical to the first string – I’ll get massively different results.
MD4: 2687049D90DA05D5C9D9AEBED9CDE2A8
MD5: C81E728D9D4C2F636F067F89CC14862C
All this stuff comes in very useful with digital signatures which I’ll describe a little later.
Key Distribution
The problem with encrypting/decrypting data in the way I’ve told you so far, is that you have to somehow get the decryption key safely to your partner. It could easily be intercepted in this day of us all being permanently online. It’s the age-old problem that generals have had of communicating with their officers in the field for centuries. If the messenger who has the key is captured, all your communication can be decrypted, whether or not subsequent messengers know the key. This is called the key distribution problem.
3 mathematicians came up with an answer to this problem: Diffie, Hellman and Merkle. They do an exchange of data which can be intercepted by anybody but which allows both sender and receiver to generate the same key but doesn’t allow the interceptor to generate the key. Sounds incredible? It’s very simple to explain, now that you understand modular arithmetic.

Asymmetric Key Encryption
We use one key to encrypt, and a related key to decrypt data. You can actually swop the keys round. But the point is you don’t have one key. This gets round the key distribution problem. There’s a great way of describing the difference between symmetric and asymmetric key encryption. It involves the use of a box to put messages in and we have to assume the box, its clasps and the padlock used to lock it are impossible to penetrate.
Symmetric Key: You send a messenger out with a copy of the key. He gets it to your recipient who lives 10 miles away. On the way he stops at a pub and has his pocket picked. The key is whisked off to a locksmith who copies it and it is then secreted back in to the messenger’s pocket.
Some time later you send a messenger with the box containing your message. You are confident that your recipient is the only one who can read the message because the original messenger returned and reported nothing unusual about the key. The second messenger stops at the same pub. He is pick-pocketed. The copy key is used to unlock the box and read the message. The box with its message intact is secreted back in to the messenger’s pocket. You and your recipient have no idea that your communication has been compromised. There is no secrecy…
Asymmetric Key: Your recipient has a padlock and key. He keeps the key in a private place about his person. Let’s therefore call it a private key. He puts the padlock in to the box, but leaves it unlocked. He doesn’t mind if anybody sees the padlock. It’s publicly viewable. Even though it’s not really a key, let’s call it a public key. He sends a messenger to you with the box. The messenger stops at the pub and is pick pocketed. All the snoopers see is an open padlock. They secretly return the box. The messenger arrives at your door. You take the padlock out of the box and put your message in to it. You use the open padlock to lock the box, snapping it shut and you send the messenger on his way. He again stops at the pub and is pick-pocketed. They find only a padlocked box. No key. They have no way of getting in to the box. They secretly return the box to the messenger’s pocket. The messenger gets to your recipient, who use the key he secreted in a private place about his person (the private key) and uses it to unlock the padlock and read the message. Secrecy is maintained.
You can see the process is a bit more complicated for asymmetric key than for symmetric key, so it’s not something you’d want to do often. So what is often done is that instead of putting a message in the box and padlocking it, a symmetric key is put in the box and padlocked. That way, you solve the key distribution problem. That’s what happens with computer cryptography mostly. Public/private key cryptography is used to transport a symmetric key that is used for message exchanges. One reason for doing this is that asymmetric key crypto, or public/private key crypto, as it is known, is expensive, in terms of computing power, whereas symmetric key crypto is much more lightweight. When you see that a web site uses 256 bit encryption, they are talking about the symmetric key that is used after the public/private key crypto was used to transport the symmetric key from sender to receiver. Often the key lengths for public/private key cryptography is 2048 bits. You may have found yourself confused when setting up IIS with 256 bit SSL encryption and seeing keys of 1024 or 2048 bits. This is why – it’s the difference between what’s called the session key and the public/private keys.


The public and private keys are held on the ecommerce web server. The private key is heavily protected in the keystore. Many organisations will go as far as to have a special tamper-proof hardware device to protect their private keys. The public key doesn’t need to be protected because it’s, well, public. You could have daily printouts of it in the newspapers and have it broadcast every hour, on the hour, on the radio. The idea is that it doesn’t matter who sees it.
The website generates the public and private keys. They have to be generated as a key-pair because they are mathematically related to each other. You retrieve the public key from the website and use it as your encryption key. You’re not just going to send your credit card information across the Internet yet. You’re actually going to generate a symmetric key and that is going to become the plain-text input data to the asymmetric encryption algorithm. The cipher-text will traverse the Internet and the ecommerce site will now use its private key to decrypt the data. The resulting output plain-text will be the symmetric key you sent. Now that both you and the ecommerce site have a symmetric key that was transported secretly, you can encrypt all the data you exchange. This is what happens with a URL that starts https://.
There are still a couple of problems to solve here, but let’s put them on to the back-burner for a little while. We need to understand digital signatures and certificates for those problems. In the meantime let’s have a peek at the mathematics inside the public/private key algorithm. There is an interesting little story-ette around this algorithm. A researcher at the UK’s GCHQ called Clifford Cocks invented the algorithm in 1973. However, working for GCHQ, his work was secret, so he couldn’t tell anybody. About 3 years later, 3 mathematicians, Ron Rivest, Adi Shamir and Leonard Adelman also invented it. They went on to create the security company RSA (which stands for Rivest, Shamir and Adelman). It is said the RSA algorithm is the most widely used piece of software in the world.


The extended Euclidean algorithm is particularly useful when a and b are coprime, since x is the modular multiplicative inverse of a modulo b.
Now we want to use that to encrypt a message.

The Web server has access to the private key, so it can decrypt the ciphertext.

Plaintext = 11^23(mod 187)
OK – there’s actually a problem here. In this message, every “X” would come out in ciphertext as the value 11. We could perform a frequency analysis attack on the message. In the English language, certain letters tend to appear more frequently than others. The letters “e” and “i” for example are very common, but “x” and “z” are uncommon.

Digital Signatures
Asymmetric keys, as mentioned earlier can be swopped around. If you use one key for encryption, you must use the other key for decryption. This feature comes in very handy for the creation of digital signatures. You’ve heard of digitally signed documents, authenticode, digitally signed applications, digital certificates and so on.


• The plaintext in the message has been altered and that accounts for the difference.
• The ciphertext in the message has been altered and that accounts for the difference.
• They have both been altered and that accounts for the difference.
In order to have a consistent message, the attacker would need to have access to the key that was used to generate the ciphertext.
Do you remember earlier, I talked about hashes? Well, because a message might be quite large, it’s often best to generate a hash of the message and encrypt that. If it’s an MD5 hash, it means you’ll only have to encrypt 128 bytes. When you come to perform the validation of the signature, you have to take the plain text portion and generate a hash before you do the comparison. It just uses the CPU more efficiently.

Depending on the data you are looking at, you’ll often even find the keys you can use to decrypt the message in plaintext within the message body. It seems like complete madness because anybody who intercepts the message could simply modify the plaintext portion of the message and then use the included key to generate a new ciphertext equivalent. That would make the message consistent.
However, if the plaintext key included in the message is the message-issuer’s public key, then the attacker would need access to the corresponding private key, which they won’t get because it’s, well, private.
But even with this there is still a problem. How do you know the message came from the sender it purports to come from? As an attacker, I could easily generate my own key-pair. I could then create a message that says I am the issuer and use my private key to create the encrypted part of the message.
When you come to check the message you’ll know that it definitely wasn’t tampered with in transit, but how do you know you can trust the public key embedded in to the message? How do you know that it’s me that created the message. That’s where digital certificates come in to play.
Certificates
Certificates are data structures that conform to a specification: X.509. But really they are just documents that do what we just talked about. The plain text data is the public key, plus other distinguishing information like the issuer, the subject name, common name and so on. It is then hashed and the hash is encrypted using the private key of a special service called a certification authority (CA) – a service that issues certificates.
When we protect a web server with an SSL certificate, we go through a 2 stage process. generating a certificate request, and then finishing it off by receiving and installing the certificate. The request part, generates a public and private key. The public key plus the distinguishing information is sent to the CA, which then creates a digitally signed document, signed using the CA’s private key. The document conforms to X.509 certificate standards. The certificate is returned by the CA, and we install it on our web server.
Anytime anybody connects to the web server over SSL, they retrieve the certificate and perform a signature validation on it. Remember it was signed by the CA’s private key. So they have to have the CA’s public key to perform the validation. If you go in to Internet Explorer’s Internet Options and then to the Content tab, you’ll se a Certificates button. That shows you all the CAs you have certificates (and therefore public keys) for. It means if you see a certificate that was signed by a CA on a web site, in theory, the CA did a check to make sure the requester was indeed the requester before issuing the certificate. It means that you have to trust that the CA did a good job of checking the requester’s validity before issuing the certificate.
Even this creates a minor problemette – how do you know the CA’s certificate wasn’t created by an imposter of some description? Well, it can have its certificate signed by a CA higher up the food-chain than itself. Eventually you get to a CA at the top of the food chain and this is called a Root CA. Internet Explorer and all the other browsers have all the main Root CAs for the Internet built-in. These are trusted organisations. They have to be! They are so trusted, they are able to sign their own certificates.
Technorati Tags: crypto primer,crypto,cryptography primer,cryptography,encryption,decryption,asymmetric key,symmetric key,public/private key,public key infrastructure,pki,certificates,digital signatures,signatures,signing,digital certificates,x.509 certificates,planky,plankytronixx
You may from time to time play with a Visual Studio command line tool called makecert.exe. It’s a tool that creates self-signed certificates. If you are just using them for development purposes on a local machine they are probably fine. You trust yourself, presumably. Sometimes you can use self-signed certificates on Internet-facing services. For example if you upload your own self-signed certificate to a service and you are sure nobody intercepted it while you were uploading it (because you were using SSL maybe), it means you can have private conversations with the service and you can be sure the service is the service you issued the certificate to. If you just sent the naked certificate, they’d be able to encrypt messages that only you could decrypt, because you’d have the private key. It’s possible to also include the private key when you create a certificate. If you send one of these certificates to an Internet service, they can digitally sign messages they send to you with your private key. Because you are assured that you gave the private key only to the service, you can be sure the messages are genuinely coming from that service and not an imposter. You have to trust that the service do a good job of keeping your private key safe.
Of course it wouldn’t be practical if every time you wanted to buy something on the Internet, in order to create an SSL connection you had to first upload a self-signed certificate. That’s why there is a large infrastructure of CAs and Root CAs built on the Internet. This infrastrucutre is called a Public Key Infrastructure or PKI. Many organisations have their own internal PKIs.

You can also see the chain of CAs up to that chain’s corresponding Root CA when you look at a certificate.

I’ll write another post soon that goes through a complete SSL handshake. That's a great way to explain what’s happening in crypto.
Planky Sphere: Related Content