Protect from Prying Eyes: Encryption in Oracle 10g - Part 1
Part 1 | Part 2
What is Encryption?
Simply put, encryption is the art of disguising data. Suppose the combination to my wall safe is 37529. Because I am not very good at remembering numbers, I keep on forgetting it. So I come up an idea — I write it down on a piece of paper and tape it on the safe! There it is — it’s there when I need it, and I don’t have to worry about forgetting it. What a great idea!
But, there is a slight problem. If a burglar happens to see it, there is no need for him to try to crack the code — it’s all there in plain sight! So, I change the strategy a little bit — I decided to disguise the combination or alter it in such a way that only I know how it was altered. For instance, I may decide to reverse the digits; the number becomes 92573, which is on the piece of paper. The burglar will see the number 92573, which is not the safe combination. He will not be able to guess the correct one since he does not know how it was altered in the first place.
Yet over time, this method is rendered useless. After a few burglars visit my house, they all learn that I simply reverse the digits. Alas! I have to figure out a new encryption scheme. The modified scheme I come up with is to add a number to the actual value — say, 9. So my safe combination 37529 becomes 37529 + 9 = 37538, which goes to the piece of paper. To decipher the true meaning, the burglar has to know to two things:
- The logic used to perform the encryption (i.e., adding a number to the actual value)
- The number that is added (i.e., 9, in this case)
If I suspect that the burglars have somehow learned the first, I can simply change the number 9 to something else. It’s important to understand the differences between the means to achieve the encryption and the value used to scramble original data. The first concept, the logic used to scramble is known as an algorithm, and the value used to scramble the actual data is known as the encryption key. We can change one, keep the other constant, and get a different encrypted value. For instance, I can keep the algorithm the same, but change the key to arrive at a different encrypted value.
In the case of my safe combination, the algorithm is very simple — I merely add the key to the original data to come up with the encrypted value. However, this simplistic algorithm has no place in reality; encryption algorithms are actually far more complex. It’s not the purpose of this article to go into the details of encryption algorithms; that would fall into the realm of academia. As users, we do not need to delve into the details, but to learn the basics to use in practice. Because the roots of this algorithm are common knowledge, it’s not practical to hide the algorithm itself. But by using a secret key, the encrypted value can remain safe — decipherable only by using the right key.
Since the “key” is literally the key to decipher the data, my encrypted value is safe as the key itself. For instance, in this case, the burglar knows the algorithm (i.e., that I used a single-digit number to arrive at the encrypted value). But what is the number? Well, he doesn’t have to work hard — there are only ten single digit numbers — from zero to nine. He has to make up to ten guesses to get the right number, something he can definitely accomplish in the short time span. But what if I used a two-digit number? In that case, the burglar would have to make 100 guesses — a more difficult feat. I could frustrate the burglar more with a three-digit key, which forces him to make up to 1,000 guesses. The point is, while keeping the algorithm the same, encryption can be made more secure by increasing the number of digits in the key. For real-life encryption, the same concept holds true — increasing the size of the key will force the intruder to make more guesses, and make the code more difficult to crack. In digital encryption, since computers are used to encrypt and decrypt, making 1000 guesses, even 1 million guesses is trivial and can be done in matter of minutes; hence, keys must be very long to offer a real barrier to guessing.
In summary, you need the following for encryption
- An Algorithm
- A Key
Types of Encryption
In the encryption case we’ve been talking about, note a very important behavior: the key used to decrypt is the same as the one used to encrypt. This type of encryption is known as symmetric key encryption, since the same key is used in both cases. Symmetric key encryption makes the algorithms easier to implement; however, there is a fundamental security risk inherent in this model. Since the same key must be used to decrypt the data that is encrypted, the key must be protected along with the data. What if the intruder gets the key? He will be able to decrypt every encrypted value, seriously compromising the security of the system. In addition to this, another inherent problem is key storage itself. And if the key is lost, the encrypted data can never be decrypted.
To address these issues, another kind of encryption exists, in which two different keys are used to encrypt and decrypt data. Since the keys are different, this scheme is known as asymmetric encryption. In this method, a set of two keys are generated simultaneously. One key, known as the “public key” is given out to the others. This key is used to encrypt the data. While decrypting, the user uses the other key, known as “private key” to decrypt the value. The public key is made well known to the world and is not made a secret. The private key can be stored securely since it is used only during the decryption process. Since the private key is not used during encryption, this not required to be given out to the sender of the data. This obviously increases the security of the key.
However, note that the public and private keys, although not the same are mathematically connected – they have to be. This makes the guessing of the private key a little bit easier for someone who knows the public key. To reduce the risk, the key must be longer, usually 1024 bits, unlike the typical 128-bit key sizes in symmetric encryption schemes. A larger key also means higher computational processing requirements on the server, which translates to longer elapsed time. This makes asymmetric encryption schemes a little undesirable.
When to Use What Type
For data-in-motion-type encryptions, in which the data is encrypted to be sent across wire to reduce chances of eavesdropping, the asymmetric encryption is usually used, due to two primary reasons. First, the single key use in symmetric key encryption is not practical; and second, the length of time for which the encryption key must be in use is very short, so much smaller encryption keys can be used. Even if the intruder cracks the key, the data is already transmitted, reducing chances of compromise. In case of encryption of data-at-rest, the key is held for a much longer duration, so a longer key is necessary. To balance the need to reduce key length and reduce risk of exposure, asymmetric encryption schemes are used in data-at-rest.
Block and Streaming Encryption
Now that we have seen how encryption schemes are commonly used, an interesting question is raised: How does an algorithm work in data input? For the simple example of the safe combination, I have used an algorithm that adds a number to the entire input data. Note that I stress the word, “entire,” which means the entire data must be available to the algorithm before it can be encrypted. In case of streaming data, such as one flowing between two servers over the network or from the Web server to the browser, the data is not available in its entirety. To encrypt the data that is completely available, the algorithm breaks it into chunks (of typically eight bytes) and then applies encryption; but streaming data cannot be broken into chunks. Hence, a different strategy is employed for streaming data. The other type, in which the entire data is available, is known as block encryption.
Encryption Algorithms
Now that you understand how encryption works, let’s focus on one important part of encryption: the algorithm.
One of the first algorithms to be used was the Digital Encryption Standard (DES), which breaks up data into blocks of 64 bits and encrypts them using a 64-bit key. Of the 64 bits, only 56 are used. A smaller key length makes this a fast encryption approach; however, with modern computers, an intruder can easily break the 56-bit key.
To address the security risks, a modified form of DES was introduced — Triple DES or DES3. This makes three passes of the input data through the DES algorithm, so it is reasonably secure for most applications. Another form of DES3 uses three keys to further secure the encryption. Three-key DES3 and two-key DES3 use keys of 156 and 112 bits respectively.
DES3 was considered adequate for most applications for a significant amount of time, but advancement in computing resources diminished the safety of the algorithm. Additionally, the compute-intensive algorithm proved to be too strenuous on the server housing the database. This led to the development of Advanced Encryption Standard (AES).
AES uses one of the three key lengths — 128, 192, and 256, depending on application. The more the key length, the longer the computation cycle, but also, the lesser the risk of an intruder breaking it.
Another type of encryption is RC4, which is a streaming encryption algorithm.
Padding
Let’s revisit a point that was made in the previous discussion about block encryption: an algorithm breaks the input data into blocks of eight bytes (typically) and encrypts one block at a time. What if the length of the data is not divisible by eight? Then the last block of bytes will be less than eight bytes, but the algorithm will not be able to operate on this block because it is smaller in size.
To address this, a value must be added to the data to make the length exactly divisible by eight; this process is known as padding. The value is removed when the data is decrypted. In most cases, a simple padding with a known value such as zero is acceptable; this process is known as zero-padding. However, padding by zero also makes the data somewhat prone to discovery by an intruder, since he can now guess what zero looks like in encrypted format. Therefore, padding by zero is not considered cryptographically secure. Instead, a value that is less vulnerable to break-ins is added by a standard known as Public Key Cryptography Standard #5, or PKCS#5.
Of course, if you know that the data is in lengths of multiples of eight, then there is no need to pad.
Chaining
Continuing along the previous line of discussion, input data is broken into chunks and then encrypted. Since data is separated, wouldn’t it be nice to somehow de-link the encrypted blocks so that even if a single block is cracked, the rest will be intact?
This is when chaining comes in: it specifies how these chunks of data are linked (or de-linked) during encryption. In the most common format, a block is XOR-ed with the encrypted value of the previous block. The resultant value is then passed to the encryption algorithm. The same is repeated for all the blocks. This scheme is known as Cyclic Block Chaining. Since the first block has no previous block, a value known as an initialization vector (IV) is used to XOR.
In another form of chaining, each block is independently encrypted, known as Electronic Code Book chaining method. Other methods are Cipher Feedback and Output Feedback.
Hashing
So far, we have talked about a concept in which the input data is masked to protect it from prying eyes. However, this does not protect the data from being modified by an intruder. For instance, assume that we are transmitting the salary of the CIO and a disgruntled employee decided the change the salary from 10 million to 10 thousand by chopping off a few zeroes at the end. How does the receiving end know that this data has been altered? Or, how can we know that the data has not been altered?
To address the issue, a concept called checksum is used. It has been used extensively in the past to ensure filesystem integrity, and it has been in use for some time. The idea is to calculate a checksum value by applying some algorithm to an input value. Then, upon receiving the input value, the receiver also applies the same algorithm and gets a checksum. He compares the checksum calculated with what came with the data; if they match, the data has not been tampered with. If they differ, then the data has been compromised. This checksum is often the hash value of the input value.
Consider the implications of the checksum concept — the input value itself is not masked, because the intention is not to prevent people from seeing it. The intent is to prevent people from modifying it. Of course, data can be encrypted as well; but the purpose of hashing is not to prevent exposure.
The second important concept to remember is that this hash value is calculated from the input value, but the input value cannot be calculated from the hash value. This is important to understand as it has use in other applications. For instance, password management functions can store hash values in tables. When a user wants to validate passwords, the function can hash the user’s entry and compare the hash to the hash, instead of comparing plain text. This way, the actual password is never displayed and is never compromised. Oracle password management functions the same way — it uses a one-way hash value in the password column of the USER$ table.
MAC
Hashing provides an important function — validating the integrity of a piece of data. However, consider this scenario: An intruder might come to know the hashing algorithm used and then use it to calculate the hash value of the input data after changing it. The receiver gets the tampered input data as well as the changed hash value. Since the values match, the receiver will not be able to determine that the data has been compromised.
To prevent this situation, the concept known as Message Authentication Code (MAC) was developed, which uses a “key” to calculate the hash value (known as MAC value in this instance). The same key is used by the receiver in calculating MAC value. Since key is not likely to be known by the intruder, a changed MAC value will be different and data compromise will be unearthed.
Implementing Encryption
With the conceptual foundation in place, let’s delve into the actual task of implementing an encryption system.
In order to implement an encryption system, you do not have to reinvent the wheel and create the algorithms; they are already available in supplied packages. Oracle provides a package called dbms_obfuscation_toolkit (DOTK) that contains several procedures and functions to allow you to achieve encryption. In Oracle 10g, a new package named dbms_crypto offers similar but improved functionality. Even though the older package is still available in Oracle 10g, it’s recommended that the dbms_crypto be used for all encryption related matters for the following reasons:
- Crypto provided AES encryption methods; DOTK does not. AES is now the de facto standard in many circumstances and is preferred, because of its increased security and speed.
- Crypto provides automatic padding. In DOTK, you have to provide padding of data smaller than the block length.
- Crypto provides PKCS#5 padding, which is cryptographically more secure than user-supplied padding (such as padding with zeroes).
And of course, DOTK may be deprecated soon.
Please bear in mind that Crypto (and DOTK, as well) offer only symmetric encryption only (i.e., the same key is used to encrypt and decrypt). Asymmetric encryption, in which a public-private key pair is used, is not available in either package. To implement such functionality, you will need to use Oracle Advanced Security, an extra-cost option for the database. For most common applications, symmetric key is adequate. We will focus on Crypto for the remainder of this article; OAS will be covered in a future article.
Encryption
From our earlier discussion on defining encryption, you know that to encrypt data, you need the following:
- An encryption algorithm
- An encryption key
- A padding method
- A chaining method
What you choose to use for the key depends on your choice of algorithm. Once you have identified the algorithm to use (say, AES128) you have to use a key that is 128 bits long. Crypto provides constants in the package to indicate different types of algorithms. Table 1 shows these constants with the algorithms and the key lengths they need.
Constant Name | Description | Effective Key Length |
ENCRYPT_DES |
Data Encryption Standard. Faster but not secure. | 56 |
ENCRYPT_3DES_2KEY |
Modified Triple Data Encryption Standard. Operates on a block 3 times with 2 keys. | 112 |
ENCRYPT_3DES |
Triple Data Encryption Standard. Operates on a block 3 times but with one key. | 156 |
ENCRYPT_AES128 |
Advanced Encryption Standard | 128 |
ENCRYPT_AES192 |
Advanced Encryption Standard | 192 |
ENCRYPT_AES256 |
Advanced Encryption Standard. | 256 |
ENCRYPT_RC4 |
This is the only stream cipher. |
Table 1: Encryption algorithms.
Once you choose your algorithm, you have to decide on what key to use. For example, if you have chosen AES128 as the algorithm, you have to choose a 128-bit key, and this key has to be of a RAW data type.
Suppose we decide to use the AES scheme with 128-bit keys along with CBC chaining and PKCS#5 padding. The following piece of code performs the encryption of a data value “ConfidentialData” with the key as “1234567890123456.”
1 declare
2 l_key varchar2(2000) := '1234567890123456';
3 l_in_val varchar2(2000) := 'ConfidentialData';
4 l_mod number := dbms_crypto.ENCRYPT_AES128
5 + dbms_crypto.CHAIN_CBC
6 + dbms_crypto.PAD_PKCS5;
7 l_enc raw (2000);
8 begin
9 l_enc := dbms_crypto.encrypt
10 (
11 UTL_I18N.STRING_TO_RAW (l_in_val, 'AL32UTF8'),
12 l_mod,
13 UTL_I18N.STRING_TO_RAW (l_key, 'AL32UTF8')
14 );
15 dbms_output.put_line ('Encrypted='||l_enc);
16* end;
SQL> /
Encrypted=C0777257DFBF8BA9A4C1F724F921C43C70D0C0A94E2950BBB6BA2FE78695A6FC
PL/SQL procedure successfully completed.
SQL>
Let’s analyze the above piece of code line by line.
Line | Description |
2 | The key is defined here. As you can see the key is exactly 16 characters, which it must be for AES to work. |
3 | The input value, which needs to encrypted. |
4 — 6 | These lines need some more explanation; hence it has been provided at the end of this table. |
7 | We define a variable to hold the encrypted value. |
9 | The function encrypt is called. |
11 | The function expects the input value to be in RAW, not varchar2 as is the reality. In this line, we have converted it to raw using the function strings_to_raw in the supplied package utl_i18n. |
13 | As with the input value, the function also expects the key to be RAW as well. Hence we convert it here. |
15 | Finally, we display the encrypted value. However, in reality you will not display the value as it is meaningless; you will probably use it for something else, such as store it. |
The encrypted value, also in RAW, is displayed as a hexadecimal string. This is the basic workings of the encrypt function. Now, let’s take a look at the lines we omitted from the previous discussion.
Remember, to encrypt you need the following components:
- An input value
- A key
- An algorithm
- A padding scheme
- A chaining scheme
In the previously cited piece of code, the input value and the key were provided in lines three and two respectively — but where are the rest of the necessary components?
The rest are all crowded into lines four through six. These options are specified in the package dbms_crypto as named constants. For instance, the AES 128-bit algorithm is specified by the constant ENCRYPT_AES128. Table 1 lists these encryption constants. Similarly, the CBC chaining method is specified by CHAIN_CBC and the PKCS#5 padding is specified by PAD_PKCS5. All these three constants have been specified as having been added together in the input parameter MOD in the function encrypt(). This is exactly how dbms_crypto expects to get the specifics of algorithm — padding and chaining. The combination of these three is known as a modifier to the function.
The following is a list of padding modifiers available:
Constant | Description |
PAD_NONE |
No padding is done. If length of the data is exactly same as the block size or a multiple thereof, this option is used. |
PAD_ZERO |
The data is padded with zeroes to make it a multiple of block size. |
PAD_PKCS5 |
PKCS#5 based padding is done. This is the safest and the recommended approach. |
The following options are available for chaining:
Constant | Description |
CHAIN_ECB | Electronic Code Book standard |
CHAIN_CBC | Cyclic Buffer Chaining method |
CHAIN_CFB | Cypher Feedback method |
CHAIN_OFB | Output Feedback |
If you wanted to use no padding and Electronic Code Book chaining, you could have called the ENCRYPT function as:
4 l_mod number := dbms_crypto.ENCRYPT_AES128
5 + dbms_crypto.CHAIN_ECB
6 + dbms_crypto.PAD_NONE;
Similarly you can specify any combination of algorithm, padding and chaining to suit your needs.
Coming next in part 2: Decryption, key management, key storage, and more.
--
Arup Nanda is the Lead DBA at Starwood Hotels & Resorts. He has been an Oracle DBA for more than 11 years, touching all aspects of database management — from modeling to performance tuning to disaster recovery. He is the co-author of the book Oracle Privacy Security Auditing (2003, Rampant Tech Press).
Contributors : Arup Nanda
Last modified 2006-01-06 02:03 PM