May 5, 2014
Protecting the OpenSSL Private Key in a Separate Process
Ever since Heartbleed, I've been thinking of ways to better isolate OpenSSL so that a vulnerability in OpenSSL won't result in the compromise of sensitive information. This blog post will describe how you can protect the private key by isolating OpenSSL private key operations in a dedicated process, a technique I'm using in titus, my open source high-isolation TLS proxy server.
If you're worried about OpenSSL vulnerabilities, then simply terminating TLS in a dedicated process, such as stunnel, is a start, since it isolates sensitive web server memory from OpenSSL, but there's still the tricky issue of your private key. OpenSSL needs access to the private key to perform decryption and signing operations. And it's not sufficient to isolate just the key: you must also isolate all intermediate calculations, as Akamai learned when their patch to store the private key on a "secure heap" was ripped to shreds by security researcher Willem Pinckaers.
Fortunately, OpenSSL's modular nature can be leveraged to out-source RSA private key operations (sign and decrypt) to user-defined functions, without having to modify OpenSSL itself. From these user-defined functions, it's possible to use inter-process communication to transfer the arguments to a different process, where the operation is performed, and then transfer the result back. This provides total isolation: the process talking to the network needs access to neither the private key nor any intermediate value resulting from the RSA calculations.
I'm going to show you how to do this. Note that, for clarity, the code presented here lacks proper error handling and resource management. For production quality code, you should look at the source for titus.
Traditionally, you initialize OpenSSL using code like the following:
SSL_CTX* ctx; FILE* cert_filehandle; FILE* key_filehandle; // ... omitted: initialize CTX, open cert and key files ... X509* cert = PEM_read_X509_AUX(cert_filehandle, NULL, NULL, NULL); EVP_PKEY* key = PEM_read_PrivateKey(key_filehandle, NULL, NULL, NULL); SSL_CTX_use_certificate(ctx, cert); SSL_CTX_use_PrivateKey(ctx, key);
The first thing we do is replace the call to PEM_read_PrivateKey
, which reads
the private key into memory, with our own function that creates a shell of a private
key with references to our own implementations of the sign and decrypt operations.
Let's call that function make_private_key_shell
:
EVP_PKEY* make_private_key_shell (X509* cert) { EVP_PKEY* key = EVP_PKEY_new(); RSA* rsa = RSA_new(); // It's necessary for our shell to contain the public RSA values (n and e). // Grab them out of the certificate: RSA* public_rsa = EVP_PKEY_get1_RSA(X509_get_pubkey(crt)); rsa->n = BN_dup(public_rsa->n); rsa->e = BN_dup(public_rsa->e); static RSA_METHOD ops = *RSA_get_default_method(); ops.rsa_priv_dec = rsa_private_decrypt; ops.rsa_priv_enc = rsa_private_encrypt; RSA_set_method(rsa, &ops); EVP_PKEY_set1_RSA(key, rsa); return key; }
The magic happens with the call to RSA_set_method
. We pass it a struct
of function pointers from which we reference our own implementations of
the private decrypt and private encrypt (sign) operations. These implementations
look something like this:
int rsa_private_decrypt (int flen, const unsigned char* from, unsigned char* to, RSA* rsa, int padding) { do_rsa_operation(1, flen, from, to, rsa, padding); } int rsa_private_encrypt (int flen, const unsigned char* from, unsigned char* to, RSA* rsa, int padding) { do_rsa_operation(2, flen, from, to, rsa, padding); } int do_rsa_operation (char command, int flen, const unsigned char* from, unsigned char* to, RSA* rsa, int padding) { write(sockpair[0], &command, sizeof(command)); write(sockpair[0], &padding, sizeof(padding)); write(sockpair[0], &flen, sizeof(flen)); write(sockpair[0], from, flen); int to_len; read(sockpair[0], &to_len, sizeof(to_len)); if (to_len > 0) { read(sockpair[0], to, to_len); } return to_len; }
The arguments and results are sent to and from the other process over a socket pair that has been previously opened. Our message format is simply:
uint8_t command; // 1 for decrypt, 2 for sign
int padding; // the padding argument
int flen; // the flen argument
unsigned char from[flen]; // the from argument
The response format is:
int to_len; // length of result buffer (to)
unsigned char to[to_len]; // the result buffer
Here's the code to open the socket pair and run the RSA private key process:
void run_rsa_process (const char* key_path) { socketpair(AF_UNIX, SOCK_STREAM, 0, sockpair); if (fork() == 0) { close(sockpair[0]); FILE* key_filehandle = fopen(key_path, "r"); RSA* rsa = PEM_read_RSAPrivateKey(key_filehandle, NULL, NULL, NULL); fclose(key_filehandle); int command; while (read(sockpair[1], &command, sizeof(command)) == 1) { int padding; int flen; read(sockpair[1], &padding, sizeof(padding)); read(sockpair[1], &flen, sizeof(flen)); unsigned char* from = (unsigned char*)malloc(flen); read(sockpair[1], from, flen); unsigned char* to = (unsigned char*)malloc(RSA_size(rsa)); int to_len = -1; if (command == 1) { to_len = RSA_private_decrypt(flen, from, to, rsa, padding); } else if (command == 2) { to_len = RSA_private_encrypt(flen, from, to, rsa, padding); } write(sockpair[1], &to_len, sizeof(to_len)); if (to_len > 0) { write(sockpair[1], to, sizeof(to_len)); } free(to); free(from); } _exit(0); } close(sockpair[1]); }
In the function above, we first create a socket pair for communicating between the parent (untrusted) process and child (trusted) process. We fork, and in the child process, we load the RSA private key, and then repeatedly service RSA private key operations received over the socket pair from the parent process. Only the child process, which never talks to the network, has the private key in memory. If the memory of the parent process, which does talk to the network, is ever compromised, the private key is safe.
That's the basic idea, and it works. There are other ways to do the interprocess communication that are more complicated but may be more efficient, such as using shared memory to transfer the arguments and results back and forth. But the socket pair implementation is conceptually simple and a good starting point for further improvements.
This is one of the techniques I'm using in titus to achieve total isolation of the part of OpenSSL that talks to the network. However, this is only part of the story. While this technique protects your private key against a memory disclosure bug like Heartbleed, it doesn't prevent other sensitive data from leaking. It also doesn't protect against more severe vulnerabilities, such as remote code execution. Remote code execution could be used to attack the trusted child process (such as by ptracing it and dumping its memory) or your system as a whole. titus protects against this using additional techniques like chrooting and privilege separation.
My next blog post will go into detail on titus' other isolation techniques. Follow me on Twitter, or subscribe to my blog's Atom feed, so you know when it's posted.
Update: Read part two of this blog post.
Post a Comment
Your comment will be public. To contact me privately, email me. Please keep your comment polite, on-topic, and comprehensible. Your comment may be held for moderation before being published.
Comments
No comments yet.