May 30, 2020
Fixing the Breakage from the AddTrust External CA Root Expiration
A lot of stuff on the Internet is currently broken on account of a Sectigo root certificate expiring at 10:48:38 UTC today. Generally speaking, this is affecting older, non-browser clients (notably OpenSSL 1.0.x) which talk to TLS servers which serve a Sectigo certificate chain ending in the expired certificate. See also this Twitter thread by Ryan Sleevi.
This post is going to explain what you should do to avoid problems,
from the perspectives of both server operators (tldr: test your server with What's My Chain Cert? and do what it says) and client operators (tldr: upgrade your TLS libraries if possible, otherwise remove AddTrust External CA Root
from your trust store).
Quick primer on certificate chains
When you connect to a TLS server, the server sends the client a certificate that proves its identity. The client needs to build a chain of certificates from the server certificate to a root certificate that the client trusts. To help the client build this chain, the server sends back one or more intermediate certificates after its own certificate.
For example, my website sends the following two certificates:
Subject | Issuer | Expiration |
---|---|---|
www.agwa.name | Sectigo RSA Domain Validation Secure Server CA | 2021-04-03 |
Sectigo RSA Domain Validation Secure Server CA | USERTrust RSA Certification Authority | 2030-12-31 |
The first certificate is mine and is issued by Sectigo RSA Domain Validation Secure Server CA
.
The second certificate is Sectigo RSA Domain Validation Secure Server CA
and is issued by USERTrust RSA Certification Authority
,
which is a root certificate. These two certificates form a complete chain to a trusted root.
However, USERTrust RSA Certification Authority
is a relatively new root.
It was created in 2010, and it took many years for it to become trusted
by all clients. As recently as last year I heard reports of clients
not trusting this root.
For this reason, some servers send back a chain with an additional intermediate certificate:
Subject | Issuer | Expiration |
---|---|---|
www.agwa.name | Sectigo RSA Domain Validation Secure Server CA | 2021-04-03 |
Sectigo RSA Domain Validation Secure Server CA | USERTrust RSA Certification Authority | 2030-12-31 |
USERTrust RSA Certification Authority | AddTrust External CA Root | 2020-05-30 |
This sequence of certificates form a chain to another root called
AddTrust External CA Root
which was created in 2000 and is trusted by
many client platforms. Or rather, it was trusted before it expired today.
Fortunately, modern clients with well-written certificate validators
(this includes all mainstream web browsers) won't have a problem with the expiration.
Since they trust the USERTrust RSA Certification Authority
root, they will build
a chain to that root and ignore the fact that the server sent an expired
intermediate certificate.
Other clients, notably anything using OpenSSL 1.0.x or GnuTLS, will have
a problem. Even if these clients trust the USERTrust RSA Certification
Authority
root, and could build a chain to it if they wanted, they'll
end up building a chain to AddTrust External CA Root
instead, causing
the certificate validation to fail with an expired certificate error.
Fixing this problem as a server operator
Basically, you need to remove the intermediate certificate issued by AddTrust External CA Root
from your certificate chain.
If you get your certificates from SSLMate,
you don't need to worry. I saw this coming over a year ago, and configured SSLMate to start providing
a chain without AddTrust External CA Root
. As certificates renewed,
SSLMate customers received the new chain, and since SSLMate has long
capped certificate lifetimes at one year, the older chain was cycled
out before the intermediate expired.
But if your server is using Sectigo certificates from another source, you might need to worry. You can quickly test if your server is affected using What's My Chain Cert?. If your server is OK, it will say "correct chain". If it's sending the expired intermediate, it will say "trusted chain containing an expired certificate" and provide you with a link to download a correct, non-expired chain.
Fixing this problem as a client operator
In a perfect world, all of your libraries would be up-to-date and you wouldn't be using clownish TLS implementations like GnuTLS. But the world isn't perfect. OpenSSL 1.0.x is still common, and curl used it as recently as Debian Stretch. And APT, the package manager used by Debian and Ubuntu, links with GnuTLS.
Fortunately, OpenSSL 1.0.x and GnuTLS (at least on Debian) only choke on the expired intermediate
if the AddTrust External CA Root
root is in the local trust store. If it
isn't, they will build a chain to USERTrust RSA Certification Authority
instead.
On Debian (and probably Ubuntu but I haven't tested), you can easily remove this
root from the trust store as follows:
- Edit
/etc/ca-certificates.conf
and put a bang/exclamation mark (!) beforemozilla/AddTrust_External_Root.crt
- Run
update-ca-certificates
For Fedora and RHEL, see this Tweet by Christian Heimes.
February 8, 2020
Short Take: Why Trust-On-First-Use Doesn't Work (Even for SSH)
Considering all the progress that has been made over the last decade making SSL certificates on the Web easy, free, automated, and transparent, it's a bit jarring to see someone arguing in 2020 that trust-on-first-use (TOFU) would be better for the Web:
Unpopular opinion. Most people would be better off with a Trust On First Use system for accessing sites. Like SSH, perhaps with some unique (per user) OOB addition to it. Would we really design it this way of starting again?
— Nick Hutton @nickdothutton, Feb 6, 2020
First, be wary of any comparison with SSH, because in the grand scheme of things, very few people use SSH. *nix sysadmins do, obviously. Many, but not all, software developers do. Some people in engineering/science fields might. But that's a drop in the bucket compared to the Web, which basically everyone uses. So just because something appears to work for SSH doesn't mean it will work for the Web.
And I would argue that TOFU actually doesn't work very well for SSH, and the only reason we put up with it is because of SSH's low deployment. SSH server host keys rarely change (which is bad for post-compromise security, so this is nothing to celebrate), but when they do, SSH handles it very poorly. The user gets a big scary message about a possible man-in-the-middle attack. And then what do you think they do? They do this:
Hi all,
It appears that as of midnight last night, SSH and login are working. However, there were a couple students last night who were getting errors such as “REMOTE HOST IDENTIFICATION HAS CHANGED!” or “POSSIBLE DNS SPOOFING DETECTED!” when trying to SSH in.
To fix this, you can run `ssh-keygen -R [REDACTED]` then try to SSH in again. I believe someone else mentioned last night that you could also just delete the entire ~/.ssh/known_hosts file as well to fix the issue, but this seems to be a less destructive solution.
That's from a real email that I once received. I would not be at all surprised if TOFU actually devolves to opportunistic encryption in practice, because users just bypass any man-in-the-middle error they receive.
You could make it really hard to bypass man-in-the-middle errors, but then people would brick their servers, as happened with HTTP public key pinning, which is one of the reasons why that technology is now extinct.
Proponents of TOFU might say that even if TOFU devolves to opportunistic encryption, the man-in-the-middle errors at least make attacks noisy. True, but the errors are seen by people who generally don't know what they mean and even if they did, can't evaluate whether an error is a legitimate key change or an actual attack. In contrast, a PKI with Certificate Transparency (i.e. the system currently deployed on the Web) also makes attacks noisy, but alerts about new certificates go to server operators, who actually know whether a new certificate is legitimate or not. They just need to be monitoring Certificate Transparency logs.
So yes, I do believe we would design the Web this way if starting again.
February 3, 2020
When Will Your DNS Record Be Published?
When publishing a DNS record through an API, it's often useful to know when the DNS record has been fully published and is visible to DNS resolvers. A perfect example which comes up at SSLMate is automatically validating a certificate request by publishing a DNS record. SSLMate must be sure that the DNS record is visible before it tells the certificate authority to validate it, or the certificate request may fail.
Unfortunately, I know of only one DNS provider that has an API to tell you when a change is published: Route 53. After submitting a DNS change request to Route 53, the API returns a ChangeInfo object which contains a status of either "PENDING" or "INSYNC". You can poll the change until its status becomes "INSYNC", which means the change has taken effect on all Route 53 servers. SSLMate has published a lot of DNS records through Route 53 and this API has never let me down, which makes me happy.
Other DNS providers offer absolutely nothing to help you determine when a DNS change is visible. In these cases, SSLMate can do nothing but sleep for 10-120 seconds (depending on the provider) and hope for the best. Unfortunately, it doesn't help for SSLMate to try to resolve the DNS record to see if the record has been published - modern authoritative DNS services use many different servers, often with anycast or load balancing, so just because SSLMate sees the record doesn't mean that others will.
And then there's Google Cloud DNS, which deserves a special mention because they offer an API that looks very similar to Route 53's: after submitting a change request, the API returns a change object with a status of "pending" that you can poll until the status becomes "done". Sounds perfect! Except if you read the fine print, it says:
A status of "done" means that the request to update the authoritative servers has been sent, but the servers might not be updated yet
Sure enough, I found that it often takes two minutes after a change becomes "done" for it to be fully visible. The change object also contains a very bizarre boolean called "isServing", which is documented as:
If the DNS queries for the zone will be served.
I'm not sure what this means, or why information about the zone's status would be present in a record change object. In my testing I never once saw a value besides false, even long after queries for both the individual record and the zone as a whole were being served.
So the change object API is completely useless, and I don't know why it exists - who cares if the "request to update the authoritative servers has been sent"? That's an internal implementation detail. It only matters to users of the API if the change has been fully applied everywhere. So, SSLMate doesn't use change objects. It sleeps for 2 minutes after adding the record and hopes for the best.
All of this is exasperated when requesting a certificate using the ACME protocol. With ACME, if you tell the server the DNS record is published, but the server doesn't see the record, your certificate order is invalidated. You have to create a new order, and you're given a different DNS record that you have to publish. That means your ACME client could potentially get in a situation where it never makes forward progress, because on each attempt it fails to wait long enough before telling the ACME server to check the record.
SSLMate has a workaround for this when talking to an ACME-using certificate authority such as Let's Encrypt. Instead of publishing the record returned by the ACME server, SSLMate publishes an NS record that delegates the record to a custom-built authoritative DNS server operated by SSLMate. SSLMate's authoritative server returns the record provided by the ACME server. The NS record never changes, so if checking the record fails and SSLMate has to create a new ACME order, it doesn't need to republish a DNS record in the customer's zone; instead it just has to update the record that SSLMate's authoritative server returns, which can be done instantaneously. Therefore, every retry is more likely to succeed than the previous one since more time has elapsed since publishing the NS record. All of this happens completely automatically and transparently to the user of SSLMate, and is one of the ways that SSLMate provides great dependability. (Another benefit is that if the customer's DNS provider doesn't provide an API, they can publish the NS record manually and never have to touch it again, even for renewals.)
Nevertheless, it would be really nice if more DNS providers offered an API like Route 53 to report when a DNS record has been published.
January 4, 2020
This Is Why You Always Review Your Dependencies, AGPL Edition
Before adding a dependency to one of my software projects, I do some basic vetting of the dependency. Among the things I check are:
- How is the code licensed?
- Who are the authors?
- Are there any serious unresolved issues in the issue tracker?
- Is there a history of serious bugs in the issue tracker?
- What kind of code review process is used for pull requests?
Finally, I do a cursory review of the code. I look for anything blatantly insecure or malicious, and try to get a feel for the quality of the code base. I look for "Brown M&Ms" - minor inattention to detail that might indicate a larger problem.
I repeat the above recursively on transitive dependencies as many times as necessary. I also repeat the cursory code review any time I upgrade a dependency.
This is quite a bit of work, but is necessary to avoid falling victim to attacks like event-stream. I was recently reminded of yet another reason to review dependencies, as I reviewed Duo's highly-publicized Go library for WebAuthn, github.com/duo-labs/webauthn.
It started off poorly when I noticed some Brown M&M's: despite being a library, it was logging messages to stdout, and there were several code smells which indicated inexperience with Go. Sure enough, these minor issues foreshadowed a far larger problem: when I started reviewing the transitive dependency github.com/katzenpost/core/crypto/eddsa, I was greeted with an AGPLv3 license header.
This was bad news for most people wanting to use Duo's WebAuthn library. Although Duo had licensed their library under a BSD license, when you linked your application with Duo's library, you'd also be linking with the AGPL-licensed library, creating a "modified" work in the eyes of the (A)GPL, thus subjecting your application to section 13 of the AGPL:
Notwithstanding any other provision of this License, if you modify the Program, your modified version must prominently offer all users interacting with it remotely through a computer network (if your version supports such interaction) an opportunity to receive the Corresponding Source of your version by providing access to the Corresponding Source from a network server at no charge, through some standard or customary means of facilitating copying of software.
In other words, if you used github.com/duo-labs/webauthn in a public-facing web app, your web app had to be open source.
The most galling thing about this dependency is that it's
redundant with golang.org/x/crypto/ed25519,
which is one of Go's quasi-standard "x" libraries. In fact,
github.com/duo-labs/webauthn originally used golang.org/x/crypto/ed25519.
That changed during a pull request from an external
collaborator titled "Consolidate COSE things to their own area".
In the process of moving some code from one file to another,
this pull request subtly changed the implementation of
OKPPublicKeyData.Verify
.
Here's the old OKPPublicKeyData.Verify
, which uses golang.org/x/crypto/ed25519:
// Verify Octet Key Pair (OKP) Public Key Signature func (k *OKPPublicKeyData) Verify(data []byte, sig []byte) (bool, error) { f := HasherFromCOSEAlg(COSEAlgorithmIdentifier(k.PublicKeyData.Algorithm)) h := f() h.Write(data) return ed25519.Verify(k.XCoord, h.Sum(nil), sig), nil }
Here's the new OKPPublicKeyData.Verify
, which uses the AGPL-licensed github.com/katzenpost/core/crypto/eddsa:
// Verify Octet Key Pair (OKP) Public Key Signature func (k *OKPPublicKeyData) Verify(data []byte, sig []byte) (bool, error) { f := HasherFromCOSEAlg(COSEAlgorithmIdentifier(k.PublicKeyData.Algorithm)) h := f() h.Write(data) var oKey eddsa.PublicKey err := oKey.FromBytes(k.XCoord) if err != nil { return false, err } return oKey.Verify(h.Sum(nil), sig), nil }
There was zero explanation provided for this change. The pull request was reviewed by two Duo employees, who approved and merged it.
Aside: this is why I don't like to accept pull requests that move code around. Even if the new code organization is better, it's usually not worth the time it takes to ensure the pull request isn't doing anything extra.
I filed an issue about the AGPL-licensed dependency, and the developers switched back to using golang.org/x/crypto/ed25519. Nevertheless, I've decided not to use github.com/duo-labs/webauthn. The bulk of the library and its dependencies are to support a WebAuthn misfeature called attestation, which I have less-than-zero desire to use. I just finished writing a vastly simpler, attestation-free library which is less than one tenth the size (I will open source it soon - watch this space). (There's another lesson here, which is that complicated "features" like attestation that serve a minority's use case shouldn't be added to Web standards.) Developing this library is less costly than the liability of using an existing WebAuthn Go library.
This incident reminded me of why I like programming in Go. Go's extensive standard library, along with its quasi-standard "x" libraries, mean that the dependency graph of my projects is typically quite small. The bulk of my trust is consolidated in the Go project, and thanks to their stellar reputation and solid operating procedures, I don't feel a need to review the source code of the Go compiler and standard libraries. Even though I love Rust, I am terrified every time I look at the dependency graph of a typical Rust library: I usually see dozens of transitive dependencies written by Internet randos whom I have zero reason to trust. Vetting all those dependencies takes far too much time, which is why I'm much less productive in Rust than Go.
One final note: as a fan of verifiable data structures like Certificate Transparency, I have to love the new Go checksum database. However, the checksum database does you no good if you don't take the time to review your dependencies. Unfortunately, I've already seen one over-enthusiastic Go user claim that the Go checksum database solves all problems with dependency management. It doesn't. There's no easy way around this basic fact: you have to review your dependencies.
December 20, 2019
Preventing Server Side Request Forgery in Golang
If your application makes requests to URLs provided by untrusted sources (such as users), you must take care to avoid server side request forgery (SSRF) attacks. Otherwise, an attacker might be able to induce your application to make a request to a service on your server's localhost or internal network. Since the service thinks the request is coming from a trusted source, it might perform a privileged action or return sensitive data that gets relayed by your application back to the attacker. This is particularly a problem when running in EC2, which exposes sensitive credentials over its metadata service, which is accessible over HTTP at a private IP address. SSRF attacks can be serious; one was exploited earlier this year to steal more than 100 million credit applications from Capital One.
One way to prevent SSRF attacks is to validate all addresses before connecting to them. However, you must do the validation at a very low layer to be effective. It's not sufficient to simply block URLs that contain "localhost" or an internal IP address, since an attacker could publish a DNS record under a public domain that resolves to an internal IP address. It's also insufficient to do the DNS lookup yourself and block a URL if the hostname resolves to an unsafe address; an attacker could set up a special DNS server that returns a safe address the first time it's queried, and the target address the second time when your application actually connects to the URL.
Instead, you need to hook deep into your HTTP client's networking stack and check for a safe address right before the HTTP client tries to access it.
Fortunately, Go makes it easy to hook in at just the right place, thanks to
the Control
field of net.Dialer
, introduced in Go 1.11:
// If Control is not nil, it is called after creating the network // connection but before actually dialing. // // Network and address parameters passed to Control method are not // necessarily the ones passed to Dial. For example, passing "tcp" to Dial // will cause the Control function to be called with "tcp4" or "tcp6". Control func(network, address string, c syscall.RawConn) error // Go 1.11
This function is called by Go's standard library after the address has
been resolved, but before connecting. The network
argument is tcp4
, udp4
,
tcp6
, or udp6
, and the address
argument
is an IP address and port number separated by a colon (e.g. 192.0.2.0:80
or [2001:db8:f942::3ab2]:443
;
split it with net.SplitHostPort
, not strings.Split
, to avoid IPv6 breakage).
If the control function returns an error, the dial is aborted.
Here's an example control function that returns an error if the address
is not safe. It's quite conservative, permitting only TCP connections
to port 80 and 443 on public IP addresses (see here for the
implementation of isPublicIPAddress
). You may want to customize the control function
to suit your application's needs.
func safeSocketControl(network string, address string, conn syscall.RawConn) error { if !(network == "tcp4" || network == "tcp6") { return fmt.Errorf("%s is not a safe network type", network) } host, port, err := net.SplitHostPort(address) if err != nil { return fmt.Errorf("%s is not a valid host/port pair: %s", address, err) } ipaddress := net.ParseIP(host) if ipaddress == nil { return fmt.Errorf("%s is not a valid IP address", host) } if !isPublicIPAddress(ipaddress) { return fmt.Errorf("%s is not a public IP address", ipaddress) } if !(port == "80" || port == "443") { return fmt.Errorf("%s is not a safe port number", port) } return nil }
Once you have a control function, you can use it to make HTTP requests as follows
(the various numbers below match those used by http.DefaultClient
):
safeDialer := &net.Dialer{ Timeout: 30 * time.Second, KeepAlive: 30 * time.Second, DualStack: true, Control: safeSocketControl, } safeTransport := &http.Transport{ Proxy: http.ProxyFromEnvironment, DialContext: safeDialer.DialContext, ForceAttemptHTTP2: true, MaxIdleConns: 100, IdleConnTimeout: 90 * time.Second, TLSHandshakeTimeout: 10 * time.Second, ExpectContinueTimeout: 1 * time.Second, } safeClient := &http.Client{ Transport: safeTransport, } resp, err := safeClient.Get(untrustedURL)
The above code examples are in the public domain.