On the Internet one can find
a lot of information about the public key algorithms, their complete
description, mathematical proofs etc. Even WikiPedia will give you all
the details about the RSA algorithms for instance. Still, reading all
this, even implementing an algorithm yourself you will remain
uncertain about how to employ it in the real world. The deeper you
understand the mathematics the more are the "side" questions
you would have. For example the data encrypted with RSA must be
shorter than the modulus of the key set - this is obvious from the
algorithm explanation. However, in the real world you generate the
keys randomly and you cannot be sure that they will give the length
you need - how to deal with that, how much of the length to sacrifice
in order to guarantee that any data with suitable length will pass
with any key pair. Further you can ask yourself - "well I can
encrypt something, but should I encrypt it directly or pre-preprocess
it somehow?".
The answers to such questions are given by certain
standards. One of the most complete set of standards is PKCS, however
there are many and many applications employing RSA and other
algorithms in their own way. So, having the core algorithm (in RSA
object) and some other tools (such as SFBinaryData,
the BigNumber object etc.) you can comply
both standards or application specific implementations at the price of
some code you will need to write. So, you need to know what to do.
This page aims at giving you a good idea about what RSA can do, when
it is sensible to be done and what kind of constraints are there. With
that in mind you will be ready to understand standards and specific
implementations and make your applications work with them. We will not
comment the mathematics involved - you can read about that in many
places on the Internet. What this page concentrates on is the
properties of the algorithm and their implications.
The RSA algorithm - what it looks like and what can it do
- All the parameters involved in RSA. Note that some of them are
not (optionally) used during the encryption/decryption. We list all
of them here and we will comment why after that
P, Q - two large prime numbers
Modulus - is actually P*Q
PublicExp - public
exponent
SecretExp - secret
exponent
- RSA will work fine with data that is encoded as positive integer
number less than the Modulus. For such a data the
encryption and decryption can be considered simply as two
functions each of of which is the reverse of the other. Thus you
can "decrypt"
the data, then to recover its original form you need to "encrypt"
it and the reverse is true as well. Thus the terms decryption and
encryption are subjective and are actually selected to reflect the
typical sensible usage of the algorithm.
- The "encrypt" operation takes 3 arguments - the data,
the Modulus and the PublicExp
- The "decrypt" operation takes 3 arguments - the data,
the Modulus and the SecretExp. It can be performed
in 4 times faster manner if the CRT (Chinese Remainder Theorem) is
applied. In such case the "decrypt" operation will take
also the P and Q as parameters (see DecryptPQ
method).
- The fact is that knowing the Modulus and the PublicExp
it is fairly difficult to calculate the SecretExp while the
reverse is not true. This is the factor that determines how and
what is done with the algorithm. You should not be confused by the
reverse usage of Encrypt and Decrypt when the algorithm is used
for purposes other then typical encryption. It is hard to come
with neutral yet informative names for them so, this naming
fashion took roots. They are reverse to each other and the usage
depends only on how you want to use them and on the common sense.
The common and not so-common RSA usage patterns
Encryption/Decryption
This is the usage that gives the names of the operations. In this
case the data is encrypted using the Modulus and the PublicExp.
On the other end it is decrypted by using the Modulus and the
SecretKey.
Remembering the fact in 5 above means that the encrypting party
needs to know only the Modulus and the PublicKey. With
this knowledge this party can only use the Encrypt
method so this side has access only to one of the functions and not
the reverse one (Decrypt method).
It is difficult to compute the SecretExp knowing from what the
encrypting party knows - it is computationally infeasible. Therefore
this is why algorithms like RSA are called public key encryption
algorithms - knowing the public key you have access only to one of
the directions.
Signing/Signature verification
This is a reverse usage of the algorithm. Its purpose is not to
"hide" the data from other parties but to enable anyone
who has the PublicExp and the Modulus to verify that the data has
indeed originated from the party having the SecretExp.
In this case the party that have all the keys
"decrypts" the data and sends it to the other party which
in turn "encrypts" it to recover it in its original form.
So, in contrast to the encryption/decryption here only one party can
prepare the data and anyone having the PublicExp can recover it
(i.e. the relation is one to many). This kind of usage makes sense
for signing purposes hence the name of this operation. A simple
example will be generating a hash of a document,
"decrypting" that hash, then sending the document and the
"decrypted" hash to the other party. The other party
generates the same hash over the document, then "encrypts"
the received hash to recover it and compares them both. If they
match then the document has been indeed issued by the party having
the SecretExp.
Usage authorization/product protection
This is actually a variant of the signature verification. In
contrast to the above two for which most often certain standards are
used, this one is not so typical and is virtually always implemented
in application specific manner.
In this case the party that has the SecretExp encrypts some data
(using "decrypt" again) and publishes or distributes it to
its consumers. The latter must have the corresponding PublicExp in
order to recover the data. Thus the PublicExp (together with the
Modulus - don't forget it must be known for the both sides) is used
as an authorization key - who has it can read the data, so it is
given to those who have paid or otherwise are entitled for access to
the distributed data.
Because the different parts of the data can use different keys
one can authorize the consumers for different parts of the data by
providing the corresponding keys. Of course this is not a complete
software or data protection - the key can be stolen for
instance.
Other usages
Keeping in mind the algorithm properties you can use it for other
purposes as well. You only need to make sure that the direction in
which it is employed is the correct direction for the case. So, ask
yourself who must be able to access the both processing directions
and who must have access to only one of them. The latter should have
the public key only and the other has them both.
How it is actually used
First lets summarize: The public key is the pair (Modulus,
PublicExp), The private key is the pair (Modulus,
SecretExp). Virtually always the party that has the SecretExp
also has the PublicExp recorded (and even if t does not - it can be
calculated) so you can assume that this party
has the both keys. Sometimes the P and Q are also kept in order to
enable faster "decrypt". How the keys are packed and
exchanged is another matter. For instance there are the PKCS
standards that define a standard manner and some more details, but
there are other less popular standards and also many applications
doing this in their own specific way.
Certainly the way the SecretExp and even most importantly the P
and Q are stored is important. They must be protected from
adversaries in appropriate way. This is, of course, another matter,
but if you need to devise some way to keep them from curious eyes
you should consider some kind of password protection. How? For
example you can use the password provided by the party that
generates the keys and may be some other data (name, username -
whatever is available in the specific scenario) to generate a hash
or otherwise generate something that can be used as a key for a
symmetric encryption algorithm. Then apply this algorithm with this
key and save the RSA keys thus encrypted on the user's machine. This
is what the certificate storages do, there are standards about that
and today the operating systems or at least the WEB browsers
implement the most popular of them.
Back to the usage. Lets remember some of the properties of
the RSA algorithm - it is relatively slow, the data that can be
encrypted in the big number used by the algorithm is limited to the
size of the Modulus, even worse - the resulting number must be less
than the Modulus). So, even if the algorithm was faster it is quite
inconvenient to use it for stream encryption. Too much things must
be agreed between the two sides and the way algorithm works is
obviously not stream-friendly.
In the real world the RSA is used together with a symmetric
algorithm. RSA encrypts the key for the symmetric algorithm and
possibly other useful information (for instance the ID or the name
of the symmetric algorithm used), but the bulk of the work is done
with the symmetric algorithm. Virtually all the existing standards
in this area define how such data is packed, ID-s for the algorithms
and so on. Thus if you are going to construct your own encryption
scheme you would need to do something similar and define a rule
about how to put this crucial data in a memory block then encrypt
that block with RSA and send it to the other party.
(the article will be continued in the next issue of NDL)
|