Blog (Page 10) - Andrew Ayer

Blog

July 13, 2014

LibreSSL's PRNG is Unsafe on Linux [Update: LibreSSL fork fix]

The first version of LibreSSL portable, 2.0.0, was released a few days ago (followed soon after by 2.0.1). Despite the 2.0.x version numbers, these are only preview releases and shouldn't be used in production yet, but have been released to solicit testing and feedback. After testing and examining the codebase, my feedback is that the LibreSSL PRNG is not robust on Linux and is less safe than the OpenSSL PRNG that it replaced.

Consider a test program, fork_rand. When linked with OpenSSL, two different calls to RAND_bytes return different data, as expected:

$ cc -o fork_rand fork_rand.c -lcrypto $ ./fork_rand Grandparent (PID = 2735) random bytes = f05a5e107f5ec880adaeead26cfff164e778bab8e5a44bdf521e1445a5758595 Grandchild (PID = 2735) random bytes = 03688e9834f1c020765c8c5ed2e7a50cdd324648ca36652523d1d71ec06199de

When the same program is linked with LibreSSL, two different calls to RAND_bytes return the same data, which is a catastrophic failure of the PRNG:

$ cc -o fork_rand fork_rand.c libressl-2.0.1/crypto/.libs/libcrypto.a -lrt $ ./fork_rand Grandparent (PID = 2728) random bytes = f5093dc49bc9527d6d8c3864be364368780ae1ed190ca0798bf2d39ced29b88c Grandchild (PID = 2728) random bytes = f5093dc49bc9527d6d8c3864be364368780ae1ed190ca0798bf2d39ced29b88c

The problem is that LibreSSL provides no way to safely use the PRNG after a fork. Forking and PRNGs are a thorny issue - since fork() creates a nearly-identical clone of the parent process, a PRNG will generate identical output in the parent and child processes unless it is reseeded. LibreSSL attempts to detect when a fork occurs by checking the PID (see line 122). If it differs from the last PID seen by the PRNG, it knows that a fork has occurred and automatically reseeds.

This works most of the time. Unfortunately, PIDs are typically only 16 bits long and thus wrap around fairly often. And while a process can never have the same PID as its parent, a process can have the same PID as its grandparent. So a program that forks from a fork risks generating the same random data as the grandparent process. This is what happens in the fork_rand program, which repeatedly forks from a fork until it gets the same PID as the grandparent.

OpenSSL faces the same issue. It too attempts to be fork-safe, by mixing the PID into the PRNG's output, which works as long as PIDs don't wrap around. The difference is that OpenSSL provides a way to explicitly reseed the PRNG by calling RAND_poll. LibreSSL, unfortunately, has turned RAND_poll into a no-op (lines 77-81). fork_rand calls RAND_poll after forking, as do all my OpenSSL-using programs in production, which is why fork_rand is safe under OpenSSL but not LibreSSL.

You may think that fork_rand is a contrived example or that it's unlikely in practice for a process to end up with the same PID as its grandparent. You may be right, but for security-critical code this is not a strong enough guarantee. Attackers often find extremely creative ways to manufacture scenarios favorable for attacks, even when those scenarios are unlikely to occur under normal circumstances.

Bad chroot interaction

A separate but related problem is that LibreSSL provides no good way to use the PRNG from a process running inside a chroot jail. Under Linux, the PRNG is seeded by reading from /dev/urandom upon the first use of RAND_bytes. Unfortunately, /dev/urandom usually doesn't exist inside chroot jails. If LibreSSL fails to read entropy from /dev/urandom, it first tries to get random data using the deprecated sysctl syscall, and if that fails (which will start happening once sysctl is finally removed), it falls back to a truly scary-looking function (lines 306-517) that attempts to get entropy from sketchy sources such as the PID, time of day, memory addresses, and other properties of the running process.

OpenSSL is safer for two reasons:

If OpenSSL can't open /dev/urandom, RAND_bytes returns an error code. Of course the programmer has to check the return value, which many probably don't, but at least OpenSSL allows a competent programmer to use it securely, unlike LibreSSL which will silently return sketchy entropy to even the most meticulous programmer.
OpenSSL allows you to explicitly seed the PRNG by calling RAND_poll, which you can do before entering the chroot jail, avoiding the need to open /dev/urandom once in the jail. Indeed, this is how titus ensures it can use the PRNG from inside its highly-isolated chroot jail. Unfortunately, as discussed above, LibreSSL has turned RAND_poll into a no-op.

What should LibreSSL do?

First, LibreSSL should raise an error if it can't get a good source of entropy. It can do better than OpenSSL by killing the process instead of returning an easily-ignored error code. In fact, there is already a disabled code path in LibreSSL (lines 154-156) that does this. It should be enabled.

Second, LibreSSL should make RAND_poll reseed the PRNG as it does under OpenSSL. This will allow the programmer to guarantee safe and reliable operation after a fork and inside a chroot jail. This is especially important as LibreSSL aims to be a drop-in replacement for OpenSSL. Many properly-written programs have come to rely on OpenSSL's RAND_poll behavior for safe operation, and these programs will become less safe when linked with LibreSSL.

Unfortunately, when I suggested the second change on Hacker News, a LibreSSL developer replied:

The presence or need for a [RAND_poll] function should be considered a serious design flaw.

I agree that in a perfect world, RAND_poll would not be necessary, and that its need is evidence of a design flaw. However, it is evidence of a design flaw not in the cryptographic library, but in the operating system. Unfortunately, Linux provides no reliable way to detect that a process has forked, and exposes entropy via a device file instead of a system call. LibreSSL has to work with what it's given, and on Linux that means RAND_poll is an unfortunate necessity.

Workaround

If the LibreSSL developers don't fix RAND_poll, and you want your code to work safely with both LibreSSL and OpenSSL, then I recommend putting the following code after you fork or before you chroot (i.e. anywhere you would currently need RAND_poll):

unsigned char c;
if (RAND_poll() != 1) {
	/* handle error */
}
if (RAND_bytes(&c, 1) != 1) {
	/* handle error */
}

In essence, always follow a call to RAND_poll with a request for one random byte. The RAND_bytes call will force LibreSSL to seed the PRNG if it's not already seeded, ~~making it unnecessary to later open /dev/urandom from inside the chroot jail~~. It will also force LibreSSL to update the last seen PID, fixing the grandchild PID issue. (Edit: the LibreSSL PRNG periodically re-opens and re-reads /dev/urandom to mix in additional entropy, so unfortunately this won't avoid the need to open /dev/urandom from inside the chroot jail. However, as long as you have a good initial source of entropy, mixing in the sketchy entropy later isn't terrible.)

I really hope it doesn't come to this. Programming with OpenSSL already requires dodging numerous traps and pitfalls, often by deploying obscure workarounds. The LibreSSL developers, through their well-intended effort to eliminate the pitfall of forgetting to call RAND_poll, have actually created a whole new pitfall with its own obscure workaround.

Update (2014-07-16 03:33 UTC): LibreSSL releases fix for fork issue

LibreSSL has released a fix for the fork issue! (Still no word on the chroot/sketchy entropy issue.) Their fix is to use pthread_atfork to register a callback that reseeds the PRNG when fork() is called. Thankfully, they've made this work without requiring the program to link with -lpthread.

I have mixed feelings about this solution, which was discussed in a sub-thread on Hacker News. The fix is a huge step in the right direction but is not perfect - a program that invokes the clone syscall directly will bypass the atfork handlers (Hacker News commenter colmmacc suggests some legitimate reasons a program might do this). I still wish that LibreSSL would, in addition to implementing this solution, just expose an explicit way for the programmer to reseed the PRNG when unusual circumstances require it. This is particularly important since OpenSSL provides this facility and LibreSSL is meant to be a drop-in OpenSSL replacement.

Finally, though I was critical in this blog post, I really appreciate the work the LibreSSL devs are doing, especially their willingness to solicit feedback from the community and act on it. (I also appreciate their willingness to make LibreSSL work on Linux, which, despite being a Linux user, I will readily admit is lacking in several ways that make a CSPRNG implementation difficult.) Ultimately their work will lead to better security for everyone.

Comments

June 27, 2014

xbox.com IPv6 Broken, Buggy DNS to Blame

I've previously discussed the problems caused by buggy DNS servers that don't implement IPv6-related queries properly. The worst problem I've faced is that, by default, F5 Network's "BIG-IP GTM" appliance doesn't respond to AAAA queries for certain records, causing lengthy delays as IPv6-capable resolvers continue to send AAAA queries before finally timing out.

Now www.xbox.com is exhibiting broken AAAA behavior, and it looks like an F5 "GTM" appliance may be to blame. www.xbox.com is a CNAME for www.gtm.xbox.com, which is itself a CNAME for wildcard.xbox.com-c.edgekey.net (which is in turn a CNAME for another CNAME, but that's not important here). The nameservers for gtm.xbox.com ({ns1-bn, ns1-qy, ns2-bn, ns2-qy}.gtm.xbox.com), contrary to the DNS spec, do not return the CNAME record when queried for the AAAA record for www.gtm.xbox.com:

$ dig www.gtm.xbox.com. AAAA @ns1-bn.gtm.xbox.com. ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> www.gtm.xbox.com. AAAA @ns1-bn.gtm.xbox.com. ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44411 ;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;www.gtm.xbox.com. IN AAAA ;; Query time: 137 msec ;; SERVER: 134.170.28.97#53(134.170.28.97) ;; WHEN: Tue Jun 24 14:59:58 2014 ;; MSG SIZE rcvd: 34

Contrast to an A record query, which properly returns the CNAME:

$ dig www.gtm.xbox.com. A @ns1-bn.gtm.xbox.com. ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> www.gtm.xbox.com. A @ns1-bn.gtm.xbox.com. ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 890 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available ;; QUESTION SECTION: ;www.gtm.xbox.com. IN A ;; ANSWER SECTION: www.gtm.xbox.com. 30 IN CNAME wildcard.xbox.com-c.edgekey.net. ;; Query time: 115 msec ;; SERVER: 134.170.28.97#53(134.170.28.97) ;; WHEN: Tue Jun 24 15:04:34 2014 ;; MSG SIZE rcvd: 79

Consequentially, any IPv6-only host attempting to resolve www.xbox.com will fail, even though www.xbox.com has IPv6 connectivity and ultimately (once you follow 4 CNAMEs) has an AAAA record. You can witness this by running ping6 www.xbox.com from a Linux box (but flush your DNS caches first). Fortunately, IPv6-only hosts are rare, and dual-stack or IPv4-only hosts won't have a problem resolving www.xbox.com because they'll try resolving the A record. Nevertheless, this buggy behavior is extremely troubling, especially when the bug is with such simple logic (it doesn't matter what kind of record is being requested: if there's a CNAME, you return it), and is bound to cause headaches during the IPv6 transition.

I can't tell for sure what DNS implementation is powering the gtm.xbox.com nameservers, but considering that "GTM" stands for "Global Traffic Manager," F5's DNS appliance product, which has a history of AAAA record bugginess, I have a hunch...

Comments

June 27, 2014

Titus Isolation Techniques, Continued

In my previous blog post, I discussed the unique way in which titus, my high-security TLS proxy server, isolates the TLS private key in a separate process to protect it against Heartbleed-like vulnerabilities in OpenSSL. In this blog post, I will discuss the other isolation techniques used by titus to guard against OpenSSL vulnerabilities.

A separate process for every connection

The most basic isolation performed by titus is using a new and dedicated process for every TLS connection. This ensures that a vulnerability in OpenSSL can't be used to compromise the memory of another connection. Although most of the attention around Heartbleed focused on extracting the private key, Heartbleed also exposed other sensitive information, such as user passwords contained in buffers from other connections. By giving each TLS connection a new and dedicated process, titus confines an attacker to accessing memory related to only his own connection. Such memory would contain at most the session key material for the connection and buffers from previous packets, which is information the attacker already knows.

Currently titus forks but does not call execve(). This simplifies the code greatly, but it means that the child process has access to all the memory of the parent process at the time of the fork. Therefore, titus is careful to avoid loading anything sensitive into the parent's memory. In particular, the private key is never loaded by the parent process.

However, some low-grade sensitive information may be loaded into the parent's memory before forking. For instance, the parent process calls getpwnam() during initialization, so the contents of /etc/passwd may persist in memory and be accessible by child processes. A future version of titus should probably call execve() to launch the child process so it starts off with a clean slate.

Another important detail is that titus must reinitialize OpenSSL's random number generator by calling RAND_poll() in the child process after forking. Failure to do so could result in multiple children generating the same random numbers, which would have catastrophic consequences.

Privilege separation

To protect against arbitrary code execution vulnerabilities, titus runs as dedicated non-root users. Currently two users are used: one for running the processes that talk to the network, and another for the processes that hold the private key.

Using the same user for all connections has security implications. The most serious problem is that, by default, users can use the ptrace() system call to access the memory of other processes running as the same user. To prevent this, titus disables ptracing using the PR_SET_DUMPABLE option to the prctl() syscall. This is an imperfect security measure: it's Linux-specific, and doesn't prevent attackers from disrupting other processes by sending signals.

Ultimately, titus should use a separate user for every concurrent connection. Modern Unix systems use 32 bit UIDs, making it completely feasible to allocate a range of UIDs to be used by titus, provided that titus reuses UIDs for future connections. To reuse a UID securely, titus would need to first kill off any latent process owned by that user. Otherwise, an attacker could fork a process that lies in wait until the UID is reused. Unfortunately, Unix provides no airtight way to kill all processes owned by a user. It may be necessary to leverage cgroups or PID namespaces, which are unfortunately Linux-specific.

Filesystem isolation

Finally, in order to reduce the attack surface even further, titus chroots into an empty, unwritable directory. Thus, an attacker who can execute arbitrary code is unable to read sensitive files, attack special files or setuid binaries, or download rootkits to the filesystem. Note that titus chroots after reseeding OpenSSL's RNG, so it's not necessary include /dev/urandom in the chroot directory.

Future enhancements

In addition to the enhancements mentioned above, I'd like to investigate using seccomp filtering to limit the system calls which titus is allowed to execute. Limiting titus to a minimal set of syscalls would reduce the attack surface on the kernel, preventing an attacker from breaking out of the sandbox if there's a kernel vulnerability in a particular syscall.

I'd also like to investigate network and process namespaces. Network namespaces would isolate titus from the network, preventing attackers from launching attacks on systems on your internal network or on the Internet. Process namespaces would provide an added layer of isolation and make it easy to kill off latent processes when a connection ends.

Why titus?

The TLS protocol is incredibly complicated, which makes TLS implementations necessarily complex, which makes them inevitably prone to security vulnerabilities. If you're building a simple server application that needs to talk TLS, the complexity of the TLS implementation is going to dwarf the complexity of your own application. Even if your own code is securely written and short and simple enough to be easily audited, your application may nevertheless be vulnerable if you link with a TLS implementation. Titus provides a way to isolate the TLS implementation, so its complexity doesn't affect the security of your own application.

By the way, titus was recently discussed on the Red Hat Security Blog along with some other interesting approaches to OpenSSL privilege separation, such as sslps, a seccomp-based approach. The blog post is definitely worth a read.

Comments

May 5, 2014

Protecting the OpenSSL Private Key in a Separate Process

Ever since Heartbleed, I've been thinking of ways to better isolate OpenSSL so that a vulnerability in OpenSSL won't result in the compromise of sensitive information. This blog post will describe how you can protect the private key by isolating OpenSSL private key operations in a dedicated process, a technique I'm using in titus, my open source high-isolation TLS proxy server.

If you're worried about OpenSSL vulnerabilities, then simply terminating TLS in a dedicated process, such as stunnel, is a start, since it isolates sensitive web server memory from OpenSSL, but there's still the tricky issue of your private key. OpenSSL needs access to the private key to perform decryption and signing operations. And it's not sufficient to isolate just the key: you must also isolate all intermediate calculations, as Akamai learned when their patch to store the private key on a "secure heap" was ripped to shreds by security researcher Willem Pinckaers.

Fortunately, OpenSSL's modular nature can be leveraged to out-source RSA private key operations (sign and decrypt) to user-defined functions, without having to modify OpenSSL itself. From these user-defined functions, it's possible to use inter-process communication to transfer the arguments to a different process, where the operation is performed, and then transfer the result back. This provides total isolation: the process talking to the network needs access to neither the private key nor any intermediate value resulting from the RSA calculations.

I'm going to show you how to do this. Note that, for clarity, the code presented here lacks proper error handling and resource management. For production quality code, you should look at the source for titus.

Traditionally, you initialize OpenSSL using code like the following:

SSL_CTX*	ctx;
FILE*		cert_filehandle;
FILE*		key_filehandle;
// ... omitted: initialize CTX, open cert and key files ...
X509*		cert = PEM_read_X509_AUX(cert_filehandle, NULL, NULL, NULL);
EVP_PKEY*	key = PEM_read_PrivateKey(key_filehandle, NULL, NULL, NULL);
SSL_CTX_use_certificate(ctx, cert);
SSL_CTX_use_PrivateKey(ctx, key);

The first thing we do is replace the call to PEM_read_PrivateKey, which reads the private key into memory, with our own function that creates a shell of a private key with references to our own implementations of the sign and decrypt operations. Let's call that function make_private_key_shell:

EVP_PKEY* make_private_key_shell (X509* cert)
{
	EVP_PKEY* key = EVP_PKEY_new();
	RSA*      rsa = RSA_new();

	// It's necessary for our shell to contain the public RSA values (n and e).
	// Grab them out of the certificate:
	RSA*      public_rsa = EVP_PKEY_get1_RSA(X509_get_pubkey(crt));
	rsa->n = BN_dup(public_rsa->n);
	rsa->e = BN_dup(public_rsa->e);

	static RSA_METHOD  ops = *RSA_get_default_method();
	ops.rsa_priv_dec = rsa_private_decrypt;
	ops.rsa_priv_enc = rsa_private_encrypt;
	RSA_set_method(rsa, &ops);

	EVP_PKEY_set1_RSA(key, rsa);
	return key;
}

The magic happens with the call to RSA_set_method. We pass it a struct of function pointers from which we reference our own implementations of the private decrypt and private encrypt (sign) operations. These implementations look something like this:

int rsa_private_decrypt (int flen, const unsigned char* from, unsigned char* to, RSA* rsa, int padding)
{
	do_rsa_operation(1, flen, from, to, rsa, padding);
}

int rsa_private_encrypt (int flen, const unsigned char* from, unsigned char* to, RSA* rsa, int padding)
{
	do_rsa_operation(2, flen, from, to, rsa, padding);
}

int do_rsa_operation (char command, int flen, const unsigned char* from, unsigned char* to, RSA* rsa, int padding)
{
	write(sockpair[0], &command, sizeof(command));
	write(sockpair[0], &padding, sizeof(padding));
	write(sockpair[0], &flen, sizeof(flen));
	write(sockpair[0], from, flen);

	int to_len;
	read(sockpair[0], &to_len, sizeof(to_len));
	if (to_len > 0) {
		read(sockpair[0], to, to_len);
	}

	return to_len;
}

The arguments and results are sent to and from the other process over a socket pair that has been previously opened. Our message format is simply:

uint8_t command; // 1 for decrypt, 2 for sign
int padding; // the padding argument
int flen; // the flen argument
unsigned char from[flen]; // the from argument

The response format is:

int to_len; // length of result buffer (to)
unsigned char to[to_len]; // the result buffer

Here's the code to open the socket pair and run the RSA private key process:

void run_rsa_process (const char* key_path)
{
	socketpair(AF_UNIX, SOCK_STREAM, 0, sockpair);
	if (fork() == 0) {
		close(sockpair[0]);
		FILE* key_filehandle = fopen(key_path, "r");
		RSA*  rsa = PEM_read_RSAPrivateKey(key_filehandle, NULL, NULL, NULL);
		fclose(key_filehandle);

		int command;
		while (read(sockpair[1], &command, sizeof(command)) == 1) {
			int padding;
			int flen;
			read(sockpair[1], &padding, sizeof(padding));
			read(sockpair[1], &flen, sizeof(flen));
			unsigned char* from = (unsigned char*)malloc(flen);
			read(sockpair[1], from, flen);
			unsigned char* to = (unsigned char*)malloc(RSA_size(rsa));
			int to_len = -1;
			if (command == 1) {
				to_len = RSA_private_decrypt(flen, from, to, rsa, padding);
			} else if (command == 2) {
				to_len = RSA_private_encrypt(flen, from, to, rsa, padding);
			}
			write(sockpair[1], &to_len, sizeof(to_len));
			if (to_len > 0) {
				write(sockpair[1], to, sizeof(to_len));
			}
			free(to);
			free(from);
		}
		_exit(0);
	}
	close(sockpair[1]);
}

In the function above, we first create a socket pair for communicating between the parent (untrusted) process and child (trusted) process. We fork, and in the child process, we load the RSA private key, and then repeatedly service RSA private key operations received over the socket pair from the parent process. Only the child process, which never talks to the network, has the private key in memory. If the memory of the parent process, which does talk to the network, is ever compromised, the private key is safe.

That's the basic idea, and it works. There are other ways to do the interprocess communication that are more complicated but may be more efficient, such as using shared memory to transfer the arguments and results back and forth. But the socket pair implementation is conceptually simple and a good starting point for further improvements.

This is one of the techniques I'm using in titus to achieve total isolation of the part of OpenSSL that talks to the network. However, this is only part of the story. While this technique protects your private key against a memory disclosure bug like Heartbleed, it doesn't prevent other sensitive data from leaking. It also doesn't protect against more severe vulnerabilities, such as remote code execution. Remote code execution could be used to attack the trusted child process (such as by ptracing it and dumping its memory) or your system as a whole. titus protects against this using additional techniques like chrooting and privilege separation.

My next blog post will go into detail on titus' other isolation techniques. Follow me on Twitter, or subscribe to my blog's Atom feed, so you know when it's posted.

Update: Read part two of this blog post.

Comments

April 8, 2014

Responding to Heartbleed: A script to rekey SSL certs en masse

Because of the Heartbleed vulnerability in OpenSSL, I'm treating all of my private SSL keys as compromised and regenerating them. Fortunately, certificate authorities will reissue a certificate for free that signs a new key and is valid for the remaining time on the original certificate.

Unfortunately, using the openssl commands by hand to rekey dozens of SSL certificates is really annoying and is not my idea of a good time. So, I wrote a shell script called openssl-rekey to automate the process. openssl-rekey takes any number of certificate files as arguments, and for each one, generates a new private key of the same length as the original key, and a new CSR with the same common name as the original cert.

If you have a directory full of certificates, it's easy to run openssl-rekey on all of them with find and xargs:

$ find -name '*.crt' -print0 | xargs -0 /path/to/openssl-rekey

Once you've done this, you just need to submit the .csr files to your certificate authority, and then install the new .key and .crt files on your servers.

By the way, if you're like me and hate dealing with openssl commands and cumbersome certificate authority websites, you should check out my side project, SSLMate, which makes buying certificates as easy as running sslmate buy www.example.com 2 and reissuing certificates as easy as running sslmate reissue www.example.com. I was able to reissue each of my SSLMate certs in under a minute. As my old certs expire I'm replacing them with SSLMate certs, and that cannot happen soon enough.

Comments

Older Posts Newer Posts

Andrew Ayer

Sections

Blog

LibreSSL's PRNG is Unsafe on Linux [Update: LibreSSL fork fix]

Bad chroot interaction

What should LibreSSL do?

Workaround

Update (2014-07-16 03:33 UTC): LibreSSL releases fix for fork issue

xbox.com IPv6 Broken, Buggy DNS to Blame

Titus Isolation Techniques, Continued

A separate process for every connection

Privilege separation

Filesystem isolation

Future enhancements

Why titus?

Protecting the OpenSSL Private Key in a Separate Process

Responding to Heartbleed: A script to rekey SSL certs en masse