About the public key algorithms

On the Internet one can find a lot of information about the public key algorithms, their complete description, mathematical proofs etc. Even WikiPedia will give you all the details about the RSA algorithms for instance. Still, reading all this, even implementing an algorithm yourself you will remain uncertain about how to employ it in the real world. The deeper you understand the mathematics the more are the "side" questions you would have. For example the data encrypted with RSA must be shorter than the modulus of the key set - this is obvious from the algorithm explanation. However, in the real world you generate the keys randomly and you cannot be sure that they will give the length you need - how to deal with that, how much of the length to sacrifice in order to guarantee that any data with suitable length will pass with any key pair. Further you can ask yourself - "well I can encrypt something, but should I encrypt it directly or pre-preprocess it somehow?".

The answers to such questions are given by certain standards. One of the most complete set of standards is PKCS, however there are many and many applications employing RSA and other algorithms in their own way. So, having the core algorithm (in RSA object) and some other tools (such as SFBinaryData, the BigNumber object etc.) you can comply both standards or application specific implementations at the price of some code you will need to write. So, you need to know what to do. This page aims at giving you a good idea about what RSA can do, when it is sensible to be done and what kind of constraints are there. With that in mind you will be ready to understand standards and specific implementations and make your applications work with them. We will not comment the mathematics involved - you can read about that in many places on the Internet. What this page concentrates on is the properties of the algorithm and their implications.

The RSA algorithm - what it looks like and what can it do

All the parameters involved in RSA. Note that some of them are not (optionally) used during the encryption/decryption. We list all of them here and we will comment why after that
P, Q - two large prime numbers
Modulus - is actually P*Q
PublicExp - public exponent
SecretExp - secret exponent
RSA will work fine with data that is encoded as positive integer number less than the Modulus. For such a data the encryption and decryption can be considered simply as two functions each of of which is the reverse of the other. Thus you can "decrypt" the data, then to recover its original form you need to "encrypt" it and the reverse is true as well. Thus the terms decryption and encryption are subjective and are actually selected to reflect the typical sensible usage of the algorithm.
The "encrypt" operation takes 3 arguments - the data, the Modulus and the PublicExp
The "decrypt" operation takes 3 arguments - the data, the Modulus and the SecretExp. It can be performed in 4 times faster manner if the CRT (Chinese Remainder Theorem) is applied. In such case the "decrypt" operation will take also the P and Q as parameters (see DecryptPQ method).
The fact is that knowing the Modulus and the PublicExp it is fairly difficult to calculate the SecretExp while the reverse is not true. This is the factor that determines how and what is done with the algorithm. You should not be confused by the reverse usage of Encrypt and Decrypt when the algorithm is used for purposes other then typical encryption. It is hard to come with neutral yet informative names for them so, this naming fashion took roots. They are reverse to each other and the usage depends only on how you want to use them and on the common sense.

The common and not so-common RSA usage patterns

Encryption/Decryption

This is the usage that gives the names of the operations. In this case the data is encrypted using the Modulus and the PublicExp. On the other end it is decrypted by using the Modulus and the SecretKey.

Remembering the fact in 5 above means that the encrypting party needs to know only the Modulus and the PublicKey. With this knowledge this party can only use the Encrypt method so this side has access only to one of the functions and not the reverse one (Decrypt method). It is difficult to compute the SecretExp knowing from what the encrypting party knows - it is computationally infeasible. Therefore this is why algorithms like RSA are called public key encryption algorithms - knowing the public key you have access only to one of the directions.

Signing/Signature verification

This is a reverse usage of the algorithm. Its purpose is not to "hide" the data from other parties but to enable anyone who has the PublicExp and the Modulus to verify that the data has indeed originated from the party having the SecretExp.

In this case the party that have all the keys "decrypts" the data and sends it to the other party which in turn "encrypts" it to recover it in its original form. So, in contrast to the encryption/decryption here only one party can prepare the data and anyone having the PublicExp can recover it (i.e. the relation is one to many). This kind of usage makes sense for signing purposes hence the name of this operation. A simple example will be generating a hash of a document, "decrypting" that hash, then sending the document and the "decrypted" hash to the other party. The other party generates the same hash over the document, then "encrypts" the received hash to recover it and compares them both. If they match then the document has been indeed issued by the party having the SecretExp.

Usage authorization/product protection

This is actually a variant of the signature verification. In contrast to the above two for which most often certain standards are used, this one is not so typical and is virtually always implemented in application specific manner.

In this case the party that has the SecretExp encrypts some data (using "decrypt" again) and publishes or distributes it to its consumers. The latter must have the corresponding PublicExp in order to recover the data. Thus the PublicExp (together with the Modulus - don't forget it must be known for the both sides) is used as an authorization key - who has it can read the data, so it is given to those who have paid or otherwise are entitled for access to the distributed data.

Because the different parts of the data can use different keys one can authorize the consumers for different parts of the data by providing the corresponding keys. Of course this is not a complete software or data protection - the key can be stolen for instance.

Other usages

Keeping in mind the algorithm properties you can use it for other purposes as well. You only need to make sure that the direction in which it is employed is the correct direction for the case. So, ask yourself who must be able to access the both processing directions and who must have access to only one of them. The latter should have the public key only and the other has them both.

How it is actually used

First lets summarize: The public key is the pair (Modulus, PublicExp), The private key is the pair (Modulus, SecretExp). Virtually always the party that has the SecretExp also has the PublicExp recorded (and even if t does not - it can be calculated) so you can assume that this party has the both keys. Sometimes the P and Q are also kept in order to enable faster "decrypt". How the keys are packed and exchanged is another matter. For instance there are the PKCS standards that define a standard manner and some more details, but there are other less popular standards and also many applications doing this in their own specific way.

Certainly the way the SecretExp and even most importantly the P and Q are stored is important. They must be protected from adversaries in appropriate way. This is, of course, another matter, but if you need to devise some way to keep them from curious eyes you should consider some kind of password protection. How? For example you can use the password provided by the party that generates the keys and may be some other data (name, username - whatever is available in the specific scenario) to generate a hash or otherwise generate something that can be used as a key for a symmetric encryption algorithm. Then apply this algorithm with this key and save the RSA keys thus encrypted on the user's machine. This is what the certificate storages do, there are standards about that and today the operating systems or at least the WEB browsers implement the most popular of them.

Back to the usage. Lets remember some of the properties of the RSA algorithm - it is relatively slow, the data that can be encrypted in the big number used by the algorithm is limited to the size of the Modulus, even worse - the resulting number must be less than the Modulus). So, even if the algorithm was faster it is quite inconvenient to use it for stream encryption. Too much things must be agreed between the two sides and the way algorithm works is obviously not stream-friendly.

In the real world the RSA is used together with a symmetric algorithm. RSA encrypts the key for the symmetric algorithm and possibly other useful information (for instance the ID or the name of the symmetric algorithm used), but the bulk of the work is done with the symmetric algorithm. Virtually all the existing standards in this area define how such data is packed, ID-s for the algorithms and so on. Thus if you are going to construct your own encryption scheme you would need to do something similar and define a rule about how to put this crucial data in a memory block then encrypt that block with RSA and send it to the other party.

(the article will be continued in the next issue of NDL)