November 30, 2020
The Lengths People Go To Just To Avoid DNSSEC
Connecting to a website, say example.com, over TLS is a relatively straightforward affair. The client looks up the DNS A/AAAA record for example.com, connects to the IP address over TLS, and confirms that the presented certificate is valid for example.com.
In contrast, connecting to other services, like XMPP or SMTP, over TLS is less straightforward. That's because clients don't directly look up the A/AAAA record for example.com. Instead they look up a SRV record (for XMPP) or an MX record (for SMTP) which contains the hostname of the XMPP or SMTP server. Then they look up the A/AAAA record of that hostname and connect to it. This layer of indirection makes it easy to delegate the operation of certain services to other hosts. For instance, if example.com wants to use Gmail for their email, their MX record would contain aspmx.l.google.com.
This raises the question of which hostname the certificate should certify: the original hostname (example.com), or the hostname listed in the SRV/MX record (aspmx.l.google.com). Both options have problems.
Approach 1, using the original hostname (example.com), is undesirable because there is no automated way for the operator of a service like SMTP or XMPP to obtain certificates for the hostnames which are delegated to them. Google can automatically get a certificate for aspmx.l.google.com because they own google.com, but they can't for example.com. The admins of example.com would have to request the certificate themselves and give it to their SMTP and XMPP providers. Such a manual approach is bound to cause outages as people forget to renew expiring certificates. But there's another problem: the certificate would permit the SMTP and XMPP providers to impersonate all services for example.com. You probably don't want your instant messaging provider to be able to impersonate your website.
Approach 2, using the SRV/MX hostname (aspmx.l.google.com), doesn't have these problems. It's quite easy for the service operator to automatically obtain a certificate for their own domain. Unfortunately, this approach is not secure. Since the DNS lookup for the SRV/MX record is most likely unauthenticated, a man-in-the-middle attacker could intercept the query and return a rogue record that says the SMTP or XMPP service for example.com is handled by a server operated by the attacker. The attacker would have no problem obtaining a valid certificate for their own domain.
Perhaps the most obvious solution to this problem is to just make the DNS lookup authenticated. The IETF has been trying to do this since 1997 with DNSSEC. More than 20 years later, the results are not promising: fewer than 2% of .com domains support DNSSEC, only 25% of Internet users validate DNSSEC even when it is supported, the use of insecure crypto like 1024 bit RSA and SHA-1 is still rampant, and DNSSEC is so hard to deploy correctly that outages are common.
Unsurprisingly, many people would like to avoid the DNSSEC quagmire, which has led to some very interesting workarounds...
POSH
One workaround is POSH, or "PKIX over Secure HTTP", standardized
in RFC 7711. With POSH, the owner of example.com publishes a JSON document
at https://example.com/.well-known/posh/SERVICE.json containing
a list of certificate fingerprints which are allowed to be used for the
given SERVICE (e.g. xmpp-server) on example.com. To connect to example.com's
XMPP server, a POSH-aware client would first retrieve https://example.com/.well-known/posh/xmpp-server.json,
ensuring that the HTTPS server presents a valid, publicly-trusted certificate for example.com.
It would then connect to the XMPP server indicated in example.com's XMPP SRV record,
and ensure that it presents a certificate whose fingerprint is listed in the JSON document.
POSH solves the problems presented above. It's secure, because the JSON document containing the fingerprints is authenticated by a certificate for example.com, which remains under the control of the owner of example.com. The operator of the XMPP service doesn't need to obtain a publicly-trusted certificate for example.com. To facilitate certificate rotation, POSH supports delegation: the JSON file at example.com can reference the URL of a different JSON file which is hosted by the XMPP operator. This gives the XMPP operator flexibility to rotate certificates at any time without needing to inform their customers to update their JSON documents.
Although POSH was designed to be protocol agnostic, it was only ever used with XMPP, and even then I could only find a few XMPP clients which support it. It's fair to say POSH never caught on.
MTA-STS
A more recent workaround, for server-to-server SMTP only, is MTA-STS, standardized in RFC 8461. The underlying concept of MTA-STS is the same as POSH: the owner of example.com publishes a document over HTTPS with information about how to validate secure SMTP connections to example.com's mail servers. Several details are different. One cosmetic difference is that the document is published at mta-sts.example.com rather than example.com, which simplifies deployment for domain owners who can't easily make changes to their main website. A more fundamental difference is that instead of listing certificate fingerprints, the document lists the hostnames which are allowed in the MX record set. When connecting to an SMTP server for example.com, the client verifies both that it's connecting to a server listed in the MTA-STS document at mta-sts.example.com, and that the server presents a publicly-trusted certificate valid for the SMTP server's hostname.
The amusing thing about MTA-STS is that it basically boils down to duplicating the contents of the MX record in a document that is published over HTTPS, leveraging the WebPKI to authenticate the MX record rather than DNSSEC. It's kind of incredible that this is considered easier than using DNSSEC, despite having more moving parts and requiring duplication. That says way more about DNSSEC being a failure than about MTA-STS being good, and I've written before about how I think MTA-STS will prove hard to deploy in practice. I suggested how DNS providers could alleviate the problems by automating MTA-STS for their customers, but I'm not aware of any DNS providers to do so.
I am happy to report
that SSLMate now offers MTA-STS
automation as part of Cert Spotter.
It's not quite as seamless as what a DNS provider could offer, but it's
pretty good: Cert Spotter continuously monitors your domains' MX records
and automatically publishes an appropriate MTA-STS policy for you, obtaining and renewing the
necessary SSL certificates. All you need to do is publish two CNAME records
delegating the MTA-STS-related subdomains (mta-sts and _mta-sts)
to SSLMate-operated servers. Cert Spotter updates the policy automatically any time it detects a change to your
MX records, ensuring the policy never falls out of sync with your DNS.
For transparency, Cert Spotter emails you when this happens, so you
can detect unauthorized MX record changes. Cert Spotter
will also alert you if any of your MX servers have a TLS or certificate
problem that would prevent MTA-STS from working.
Looking forward: SRVName certificates
There is another solution which, if it's ever deployed, will be much nicer than POSH or MTA-STS: SRVName certificates. SRVName certificates authenticate not just a domain name, like normal certificate, but a particular service running on that domain. For example, you could get a certificate that's valid for only SMTP on example.com, or only XMPP on example.com. This solves the security problem of the first approach above: the owner of example.com can give their mail server operator a SRVName certificate that's valid only for SMTP, allowing the mail server to operate an SMTP service for example.com, but not impersonate any other example.com services. Assuming the validation rules are flexible enough, the SMTP service operator could even obtain the SRVName certificate themselves provided the certificate authority validates that they are listed in the MX record for example.com.
Technically speaking, SRVName is a type of subject alternative name (SAN) which can be placed in a certificate, akin to the DNS SAN which certificates use today for authenticating domain names. It's possible for a certificate to contain both SRVName and DNS SANs, and here I use "SRVName certificate" to mean a certificate containing a SRVName SAN.
Unfortunately, there's a major obstacle blocking SRVName certificates: technically-constrained subordinate certificate authorities. A technically-constrained sub-CA is a certificate authority which is restricted to issuing certificates only for namespaces that are enumerated in the sub-CA certificate's name constraints field. For example, an enterprise that needs to issue a large number of certificates might operate a publicly-trusted, technically-constrained sub-CA that is constrained to the enterprise's domains and IP address ranges. Since their sub-CA can only issue certificates for namespaces that they control, they're allowed to operate it under looser security standards than unconstrained publicly-trusted CAs.
The problem is that if a particular type of SAN (in this case, the SRVName SAN) isn't listed in the name constraints as either allowed or denied, the standard says that all instances of that SAN type are allowed by default. Allow-by-default is usually a bad idea when security is concerned, and this case is no different. Clients can't accept SRVName certificates because it would be unsafe: every existing technically-constrained sub-CA that doesn't have SRVName in its name constraints field has unconstrained ability to issue SRVName certificates. Unfortunately, the Baseline Requirements (the rules governing public certificate issuance) only require technically-constrained sub-CAs to have DNS, IP, and Directory Name constraints. Consequentially, there are many existing technically-constrained sub-CAs out there that would need to be revoked and reissued with SRVName constraints before it's safe to deploy SRVName certificates.
Will that ever happen? Who knows. In the meantime, we're stuck with hacks like MTA-STS.
June 25, 2020
Writing an SNI Proxy in 115 Lines of Go
The very first message sent in a TLS connection is the Client Hello record, in which the client greets the server and tells it, among other things, the server name it wants to connect to. This is called Server Name Indication, or SNI for short, and it's quite handy as it allows many different servers to be co-located on a single IP address.
The server name is sent in plaintext, which is unfortunately really bad for privacy and censorship resistance, but does enable something very useful: a proxy server can read the server name and use it to decide where to route the connection, without having to decrypt the connection. You can leverage this to make many different physical servers accessible from the Internet even if you have only one public IPv4 address: the proxy listens on your public IP address and forwards connections to the appropriate private IP address based on the SNI.
I just finished writing such a proxy server, which I plan to run on my home network's router so that I can easily access my internal servers from anywhere on the Internet, without a VPN or SSH port forwarding. I was pleased by how easy it was to write this proxy server using only Go's standard library. It's a great example of how well-suited Go is for programs involving networking and cryptography.
Let's start with a standard listen/accept loop (right out of the examples for Go's net
package):
func main() { l, err := net.Listen("tcp", ":443") if err != nil { log.Fatal(err) } for { conn, err := l.Accept() if err != nil { log.Print(err) continue } go handleConnection(conn) } }
Here's a sketch of the handleConnection function, which reads the
Client Hello record from the client, dials the backend server indicated by the
Client Hello, and then proxies the client to and from the backend. (Note that we
dial the backend using the SNI value, which works well with split-horizon DNS where
the proxy sees the backend's private IP address and external clients see the proxy's public
IP address. If that doesn't work for you, can use more complicated routing logic.)
func handleConnection(clientConn net.Conn) { defer clientConn.Close() // ... read Client Hello from clientConn ... backendConn, err := net.Dial("tcp", net.JoinHostPort(clientHello.ServerName, "443")) if err != nil { log.Print(err) return } defer backendConn.Close() // ... proxy clientConn <==> backendConn ... }
Let's assume for now we have a convenient function to read a Client
Hello record from an io.Reader
and return a tls.ClientHelloInfo:
func readClientHello(reader io.Reader) (*tls.ClientHelloInfo, error)
We can't simply call this function from handleConnection,
because once the Client Hello is read, the bytes are gone. We need to
preserve the bytes and forward them along to the backend, which is expecting
a proper TLS connection that starts with a Client Hello record.
What we need to do instead is "peek" at the Client Hello record, and
thanks to some simple but powerful abstractions from Go's io package, this can be
done with just six lines of code:
func peekClientHello(reader io.Reader) (*tls.ClientHelloInfo, io.Reader, error) { peekedBytes := new(bytes.Buffer) hello, err := readClientHello(io.TeeReader(reader, peekedBytes)) if err != nil { return nil, nil, err } return hello, io.MultiReader(peekedBytes, reader), nil }
What this code does is create a TeeReader, which
is a reader that wraps another reader and writes everything that is read
to a writer, which in our case is a byte buffer.
We pass the TeeReader to readClientHello, so every byte
read by readClientHello gets saved to our buffer. Finally,
we create a MultiReader which essentially
concatenates our buffer with the original reader. Reads from the
MultiReader initially come out of the buffer, and when that's exhausted,
continue from the original reader. We return the MultiReader to the caller
along with the ClientHelloInfo. When the caller reads from the MultiReader
it will see a full TLS connection stream, starting with the Client Hello.
Now we just need to implement readClientHello. We could open up the TLS
RFCs and learn how to parse a Client Hello record, but it turns out we can
let crypto/tls do the work for us, thanks to a callback function in tls.Config called GetConfigForClient:
// GetConfigForClient, if not nil, is called after a ClientHello is // received from a client. GetConfigForClient func(*ClientHelloInfo) (*Config, error) // Go 1.8
Roughly, what we need to do is create a TLS server-side
connection with a GetConfigForClient callback
that saves the ClientHelloInfo
passed to it. However, creating a TLS connection requires a full-blown
net.Conn,
and readClientHello is passed merely an io.Reader. So let's
create a type, readOnlyConn, which wraps an io.Reader
and satisfies the net.Conn interface:
type readOnlyConn struct { reader io.Reader } func (conn readOnlyConn) Read(p []byte) (int, error) { return conn.reader.Read(p) } func (conn readOnlyConn) Write(p []byte) (int, error) { return 0, io.ErrClosedPipe } func (conn readOnlyConn) Close() error { return nil } func (conn readOnlyConn) LocalAddr() net.Addr { return nil } func (conn readOnlyConn) RemoteAddr() net.Addr { return nil } func (conn readOnlyConn) SetDeadline(t time.Time) error { return nil } func (conn readOnlyConn) SetReadDeadline(t time.Time) error { return nil } func (conn readOnlyConn) SetWriteDeadline(t time.Time) error { return nil }
readOnlyConn forwards reads to the reader and simulates a broken pipe when written to
(as if the client closed the connection before the server could reply).
All other operations are a no-op.
Now we're ready to write readClientHello:
func readClientHello(reader io.Reader) (*tls.ClientHelloInfo, error) { var hello *tls.ClientHelloInfo err := tls.Server(readOnlyConn{reader: reader}, &tls.Config{ GetConfigForClient: func(argHello *tls.ClientHelloInfo) (*tls.Config, error) { hello = new(tls.ClientHelloInfo) *hello = *argHello return nil, nil }, }).Handshake() if hello == nil { return nil, err } return hello, nil }
Note that Handshake always fails because the readOnlyConn
is not a real connection. As long as the Client Hello is successfully read, the failure
should only happen after GetConfigForClient is called, so we only care
about the error if hello was never set.
Let's put everything together to write the full handleConnection function.
I've added deadlines (thanks, Filippo!)
and a check that the SNI value ends with .internal.example.com
to prevent this from being used as an open proxy. When I deploy this, I will
use the DNS suffix of my home network.
func handleConnection(clientConn net.Conn) { defer clientConn.Close() if err := clientConn.SetReadDeadline(time.Now().Add(5 * time.Second)); err != nil { log.Print(err) return } clientHello, clientReader, err := peekClientHello(clientConn) if err != nil { log.Print(err) return } if err := clientConn.SetReadDeadline(time.Time{}); err != nil { log.Print(err) return } if !strings.HasSuffix(clientHello.ServerName, ".internal.example.com") { log.Print("Blocking connection to unauthorized backend") return } backendConn, err := net.DialTimeout("tcp", net.JoinHostPort(clientHello.ServerName, "443"), 5*time.Second) if err != nil { log.Print(err) return } defer backendConn.Close() var wg sync.WaitGroup wg.Add(2) go func() { io.Copy(clientConn, backendConn) clientConn.(*net.TCPConn).CloseWrite() wg.Done() }() go func() { io.Copy(backendConn, clientReader) backendConn.(*net.TCPConn).CloseWrite() wg.Done() }() wg.Wait() }
Here's the complete Go source code - just 115 lines! (Not counting copyright legalese)
June 18, 2020
Security Review of CFSSL Signer Code
Certificate signing is the most security-sensitive task performed by a certificate authority. The CA has to sign values, like DNS names, that are provided by untrusted sources. The CA must rigorously validate these values before signing them. If an attacker can bypass validation and get untrusted data included in a certificate, the results can be dire. For example, if an attacker can trick a CA into including an arbitrary SAN extension, they can get a certificate for domains they don't control.
Unfortunately, there is a history of CAs including unvalidated information in certificates. A common cause is CAs copying information directly from CSRs instead of from more constrained information sources. Since CSRs can contain both subject identity information and arbitrary certificate extensions, directly ingesting CSRs is extremely error-prone for CAs. For this reason, CAs would be well-advised to extract the public key from the CSR very early in the certificate enrollment process and discard everything else. In a perfect world, CAs would accept standalone public keys from subscribers instead of CSRs. (Before you say "proof-of-possession", see the appendix.)
I decided to review the signing code in CFSSL, the open source PKI toolkit used by Let's Encrypt's Boulder, to see how it stacks up against this advice. Unfortunately, I found that CFSSL copies subject identity information from CSRs by default, has features that are hard to use safely, and uses complicated logic that obfuscates what is included in certificate fields. I recommend that publicly-trusted CAs not use CFSSL.
Update: since publication of this post, Let's Encrypt has begun moving away from CFSSL!
Scope of review
I reviewed the CFSSL Git repository as of commit 6b49beae21ff90a09aea3901741ef02b1057ee65 (the HEAD of master at the time of my review). I reviewed the code in the signer and signer/local packages.
The signing operation
In CFSSL, you sign a certificate by invoking the Sign function on the signer.Signer interface, which has
this signature:
Sign(req SignRequest) (cert []byte, err error)
There is only one actual implementation of signer.Signer:
local.Signer. (The other implementations, remote.Signer and
universal.Signer, are ultimately wrappers around local.Signer.)
Inputs to the signing operation
At a high level, inputs to the signing operation come from three to four places:
-
The
Signerobject, which contains:- The private key and certificate of the CA
- A list of named profiles plus a default profile
- A signature algorithm
I will refer to this object as
Signer. -
The
SignRequestargument, whose relevant fields are:Hosts []string Request string // The CSR Subject *Subject Profile string CRLOverride string Serial *big.Int Extensions []Extension NotBefore time.Time NotAfter time.TimeI will refer to this object as
SignRequest. -
The
Signer's default certificate profile, represented by an instance of the SigningProfile struct. I will refer to the default profile asdefaultProfile. -
The effective certificate profile, represented by an instance of the SigningProfile struct. I will refer to the effective profile as
profile. If the profile named bySignRequest.Profileexists inSigner, thenprofileis that profile. If it doesn't exist, thenprofileequalsdefaultProfile.
The Sign function takes values from these places and combines them to produce the input to x509.CreateCertificate in Go's standard library.
There is overlap - for instance SANs can be specified in the CSR, SignRequest.Hosts, or
SignRequest.Extensions. How does Sign decide which source to use when constructing the certificate?
Certificate construction logic
To understand how Sign works, I looked at each certificate field and worked backwards to figure out
how Sign decides to populate the field. Below are my findings.
Serial Number
- If
profile.ClientProvidesSerialNumbersis true: useSignRequest.Serial(error if not set). - Else: generate a random 20 byte serial number.
Not Before
- If
SignRequest.NotBeforeis non-zero: use it. - Else if
profile.NotBeforeis non-zero: use it. - Else if
profile.Backdateis non-zero: usecurrent time - profile.Backdate. - Else: use
current time - 5 minutes.
Not After
- If
SignRequest.NotAfteris non-zero: use it. - Else if
profile.NotAfteris non-zero: use it. - Else if
profile.Expiryis non-zero: usenot before + profile.Expiry. - Else: use
not before + defaultProfile.Expiry.
Signature Algorithm
- If
profile.CSRWhitelistis nil orprofile.CSRWhitelist.SignatureAlgorithmis true: UseSigner's signature algorithm. - Else: CFSSL leaves the signature algorithm unspecified and the Go standard library picks a sensible algorithm.
Comments: it's weird how something named CSRWhitelist is used to decide whether to use a value that comes not from the CSR, but from Signer. This is probably because CFSSL's ParseCertificateRequest function gets this field from Signer rather than from the CSR that it is parsing. This sort of indirection and misleading naming makes the code hard to understand.
Public Key
- If
profile.CSRWhitelistis nil orprofile.CSRWhitelist.PublicKeyis is true: Use the CSR's public key. - Else: the certificate won't have a public key (this probably causes
x509.CreateCertificateto return an error).
Comments: it's unclear why you'd ever want profile.CSRWhitelist.PublicKey to be false. The public key is literally the only piece of information that should be taken from the CSR.
SANs
This one's a doozy...
- If
profile.CopyExtensionsis true andprofile.CSRWhitelistis nil and the CSR contains a SAN extension andSignRequest.Extensionscontains a SAN extension, and the SAN OID is present inprofile.ExtensionWhitelist: add two SAN extensions to the certificate, one from the CSR and one fromSignRequest.Extensions. Note thatSignRequest.Hostsis ignored andprofile.NameWhitelistis bypassed. - Else if
profile.CopyExtensionsis true andprofile.CSRWhitelistis nil and the CSR contains a SAN extension: use the SAN extension verbatim from the CSR. Note thatSignRequest.Hostsis ignored andprofile.NameWhitelistis bypassed. - Else if
SignRequest.Extensionscontains a SAN extension, and the SAN OID is present inprofile.ExtensionWhitelist: use the SAN extension verbatim fromSignRequest.Extensions. Note thatSignRequest.Hostsis ignored andprofile.NameWhitelistis bypassed. - Else if
profile.CAConstraint.IsCAis true: the certificate will not contain a SAN extension. - Else if
SignRequest.Hostsis non-nil:- Use each string in
SignRequest.Hostsas follows:- If string parses as an IP address: make it an IP Address SAN.
- Else if string parses as an email address: make it an email address SAN.
- Else if string parses as a URI, make it a URI SAN.
- Else: make it a DNS SAN.
- If
profile.NameWhitelistis non-nil: return an error unless the string representation of every DNS, email, and URI SAN matches theprofile.NameWhitelistregex (IP address SANs are not checked).
- Use each string in
- Else if
profile.CSRWhitelistis nil and the CSR contains a SAN extension:- Copy the DNS names, IP addresses, email addresses, and URIs from the CSR's SAN extension.
- If
profile.NameWhitelistis non-nil: enforce whitelist as described above.
- Else if
profile.CSRWhitelistis non-nil and the CSR contains a SAN extension:- If
profile.CSRWhitelist.DNSNamesis true: use DNS names from the CSR's SAN extension. - If
profile.CSRWhitelist.IPAddressesis true: use IP addresses from the CSR's SAN extension. - If
profile.CSRWhitelist.EmailAddressesis true: use email addresses from the CSR's SAN extension. - If
profile.CSRWhitelist.URIsis true: use URIs from the CSR's SAN extension. - If
profile.NameWhitelistis non-nil: enforce whitelist as described above.
- If
Subject
For each supported subject attribute (common name, country, province, locality, organization, organizational unit, serial number):
- If the attribute was specified in
SignRequest.Subject: use it. - Else if
profile.CSRWhitelistis nil orprofile.CSRWhitelist.Subjectis true: use the attribute from the CSR's subject, if present.
Common name only: if profile.NameWhitelist is non-nil: return an error unless the common name matches the profile.NameWhitelist regex.
Note: SignRequest.Hosts does not override the common name.
Basic Constraints
- If
SignRequest.Extensionscontains a basic constraints extension, and the basic constraints OID is present inprofile.ExtensionWhitelist: copy the basic constraints extension verbatim fromSignRequest.Extensions. - Else: use the values from
profile.CAConstraint.
Comments: given how security-sensitive this extension is, it's a relief that there's no way for the value to come from the CSR. Despite this, there is code earlier in the signing process that looks at the CSR's Basic Constraints extension. First it's extracted from the CSR in ParseCertificateRequest and then it's validated in Sign. This code ultimately has no effect, but it makes the logic harder to follow (and gave me a mild heart attack when I saw it).
Extensions besides SAN and Basic Constraints
For a given extension FOO:
- If
profile.CopyExtensionsis true andprofile.CSRWhitelistis nil and the CSR contains aFOOextension andSignRequest.Extensionscontains aFOOextension, andFOOis present inprofile.ExtensionWhitelist: add twoFOOextensions to the certificate, one from the CSR and one fromSignRequest.Extensions. Note that fields inSignRequest(likeCRLOverride) orprofile(likeOCSP,CRL, etc.) that would normally control theFOOextension are ignored. - Else if
profile.CopyExtensionsis true andprofile.CSRWhitelistis nil and the CSR contains aFOOextension: copy it verbatim from the CSR. Note that fields inSignRequest(likeCRLOverride) orprofile(likeOCSP,CRL, etc.) that would normally control theFOOextension are ignored. - Else if
SignRequest.Extensionscontains aFOOextension, andFOOis present inprofile.ExtensionWhitelist: copy it verbatim to the certificate. Note that fields inSignRequest(likeCRLOverride) orprofile(likeOCSP,CRL, etc.) that would normally control theFOOextension are ignored. - Else: use fields from
SignRequest(likeCRLOverride) andprofile(likeOCSP,CRL, etc.) to decide what value the extension should have, if any.
Other comments
By default, CSRWhitelist is nil. This is a bad default, as it means SANs will be copied from the CSR unless SignRequest.Hosts is set. Likewise, any subject attribute not specified in SignRequest.Subject will be copied from the CSR. This is practically impossible to use safely: to avoid including unvalidated subject information you have to specify a value for every attribute in SignRequest.Subject - and if you don't want the attribute included in the final certificate you're out of luck. If CFSSL ever adds support for a new attribute type, you had better update your code to specify a value for the attribute or unvalidated information might slip through. This is exactly the sort of logic that makes it so easy to accidentally issue certificates with "Some-State" in the subject.
If the profile specified by SignRequest.Profile doesn't exist, the default profile is used. This could lead to an unexpected certificate profile being used if a CA deletes a profile from their configuration but there are still references to it elsewhere. Considering the trouble that CAs have with profile management (see the infamous TURKTRUST incident or the CA that discovered they had a whopping 85 buggy profiles), I think it would be much safer if a non-existent profile resulted in an error.
SignRequest.Hosts is untyped - everything is a string and there is no distinction between IP addresses, email addresses, URIs, and DNS names. (Also, Hosts is a misleading name because URIs and email addresses aren't hosts.) CFSSL decides what type of SAN to include based on what the string in Hosts successfully parses as, and assumes it's a DNS name if it doesn't parse as anything else. This could lead to unexpected SAN types in the certificate. Determining if a string was intended to be a URI by trying to parse it is an especially bad idea considering how hellish URIs are to parse, and how much variation there is between different URI parsing implementations. If the user of CFSSL adds a string which they believe to be a valid URI to SignRequest.Hosts, but Go's URI parser rejects it, the URI will end up in a DNS SAN instead.
Variable names are inconsistent and often unhelpful. In Sign, req is used for values from SignRequest and safeTemplate is used for values from the CSR. But in PopulateSubjectFromCSR (which is called by Sign), req is used for values from the CSR, and s is used for values from the SignRequest. This increases the likelihood of accidentally using data from the wrong source.
ParseCertificateRequest blindly and unconditionally copies the extensions from the CSR to the Extensions field of the x509.Certificate template - even if profile.CopyExtensions is false. Fortunately, this field is ignored by x509.CreateCertificate so it's probably harmless. It just means that attacker-controlled input is propagated further through the program, increasing the opportunity for it to be misused.
CopyExtensions is a foot cannon
I am extremely concerned by the presence of the CopyExtensions option. Enabling it practically guarantees misissuance because all extensions (except Basic Constraints) are copied verbatim from the CSR, overriding any value specified in the profile or the SignRequest. In particular, SignRequest.Hosts and profile.NameWhitelist are ignored if the CSR contains a SAN extension. Also, profile.ExtensionWhitelist only applies to extensions specified in SignRequest - not those specified in the CSR. I think it's quite likely that users of CopyExtensions will be surprised when neither of these whitelists are effective.
Lack of documentation
As I showed above, the logic for constructing a certificate is very complicated, and you have to use CFSSL in exactly the right way to avoid copying unvalidated information from CSRs. Unfortunately, documentation is practically non-existent and I could only figure out CFSSL's logic by reading the source code. Obviously, the lack of documentation makes it hard to use CFSSL safely. But the more fundamental problem is that documentation writing wasn't a core part of CFSSL's engineering process. Had documentation been written in tandem with the design and implementation of CFSSL, it would have been evident that incomprehensibility was spiraling out of control. This information could have been fed back into the engineering process and used to redesign or even reject features that made the system too hard to understand. I have personally saved myself many times from releasing overly-complicated software just by writing the documentation for it.
Final thoughts
CFSSL has some nice features, like its friendly command line interface
and its certificate bundler for building optimal certificate chains.
However, I struggle to see the value provided by its signer package.
Its truly useful functionality, like Certificate Transparency submission and pre-issuance
linting, could be extracted into standalone libraries. The rest of the signer
is just a complicated wrapper around Go's x509.CreateCertificate
that obscures what gets included in certificates and will include the wrong thing if you hold it wrong.
A long history of misissuance shows us why we need better.
If you're a CA, just call x509.CreateCertificate directly - it will be much easier to ensure
you are only including validated information in your certificates.
Appendix: Proof-of-Possession and TLS
A common but unfounded objection to discarding everything in a CSR except the public key is that checking the CSR's signature is necessary because it ensures proof-of-possession of the private key. If a CA doesn't verify proof-of-possession, then someone could obtain a certificate for a key which belongs to someone else. (In fact, someone recently got a certificate containing Let's Encrypt's public key.) For TLS, this doesn't matter. (Other protocols, like S/MIME, may be different.) The TLS protocol ensures proof-of-possession every time the certificate is used.
For TLS 1.3, this is easy to see: the server or client has to send a Certificate Verify message which contains a signature from their private key over a transcript of the handshake. The handshake includes their certificate, which is a superset of the information in a CSR. Therefore, the Certificate Verify message proves at least as much as the CSR signature does. In fact it's better, since the proof is fresh and not reliant on a trusted third party doing its job correctly.
In earlier versions of TLS, client certificates are verified in the same way (signing a handshake transcript which includes the certificate). Server certificates are used differently, but ultimately the handshake transcript (which includes the server certificate) is authenticated by a shared secret that is known only to the client and the holder of the certificate private key (provided neither party deliberately sabotages their security). So as with TLS 1.3, private key possession is proven, rendering the CSR signature unnecessary.
May 30, 2020
Fixing the Breakage from the AddTrust External CA Root Expiration
A lot of stuff on the Internet is currently broken on account of a Sectigo root certificate expiring at 10:48:38 UTC today. Generally speaking, this is affecting older, non-browser clients (notably OpenSSL 1.0.x) which talk to TLS servers which serve a Sectigo certificate chain ending in the expired certificate. See also this Twitter thread by Ryan Sleevi.
This post is going to explain what you should do to avoid problems,
from the perspectives of both server operators (tldr: test your server with What's My Chain Cert? and do what it says) and client operators (tldr: upgrade your TLS libraries if possible, otherwise remove AddTrust External CA Root from your trust store).
Quick primer on certificate chains
When you connect to a TLS server, the server sends the client a certificate that proves its identity. The client needs to build a chain of certificates from the server certificate to a root certificate that the client trusts. To help the client build this chain, the server sends back one or more intermediate certificates after its own certificate.
For example, my website sends the following two certificates:
| Subject | Issuer | Expiration |
|---|---|---|
| www.agwa.name | Sectigo RSA Domain Validation Secure Server CA | 2021-04-03 |
| Sectigo RSA Domain Validation Secure Server CA | USERTrust RSA Certification Authority | 2030-12-31 |
The first certificate is mine and is issued by Sectigo RSA Domain Validation Secure Server CA.
The second certificate is Sectigo RSA Domain Validation Secure Server CA and is issued by USERTrust RSA Certification Authority,
which is a root certificate. These two certificates form a complete chain to a trusted root.
However, USERTrust RSA Certification Authority is a relatively new root.
It was created in 2010, and it took many years for it to become trusted
by all clients. As recently as last year I heard reports of clients
not trusting this root.
For this reason, some servers send back a chain with an additional intermediate certificate:
| Subject | Issuer | Expiration |
|---|---|---|
| www.agwa.name | Sectigo RSA Domain Validation Secure Server CA | 2021-04-03 |
| Sectigo RSA Domain Validation Secure Server CA | USERTrust RSA Certification Authority | 2030-12-31 |
| USERTrust RSA Certification Authority | AddTrust External CA Root | 2020-05-30 |
This sequence of certificates form a chain to another root called
AddTrust External CA Root which was created in 2000 and is trusted by
many client platforms. Or rather, it was trusted before it expired today.
Fortunately, modern clients with well-written certificate validators
(this includes all mainstream web browsers) won't have a problem with the expiration.
Since they trust the USERTrust RSA Certification Authority root, they will build
a chain to that root and ignore the fact that the server sent an expired
intermediate certificate.
Other clients, notably anything using OpenSSL 1.0.x or GnuTLS, will have
a problem. Even if these clients trust the USERTrust RSA Certification
Authority root, and could build a chain to it if they wanted, they'll
end up building a chain to AddTrust External CA Root instead, causing
the certificate validation to fail with an expired certificate error.
Fixing this problem as a server operator
Basically, you need to remove the intermediate certificate issued by AddTrust External CA Root
from your certificate chain.
If you get your certificates from SSLMate,
you don't need to worry. I saw this coming over a year ago, and configured SSLMate to start providing
a chain without AddTrust External CA Root. As certificates renewed,
SSLMate customers received the new chain, and since SSLMate has long
capped certificate lifetimes at one year, the older chain was cycled
out before the intermediate expired.
But if your server is using Sectigo certificates from another source, you might need to worry. You can quickly test if your server is affected using What's My Chain Cert?. If your server is OK, it will say "correct chain". If it's sending the expired intermediate, it will say "trusted chain containing an expired certificate" and provide you with a link to download a correct, non-expired chain.
Fixing this problem as a client operator
In a perfect world, all of your libraries would be up-to-date and you wouldn't be using clownish TLS implementations like GnuTLS. But the world isn't perfect. OpenSSL 1.0.x is still common, and curl used it as recently as Debian Stretch. And APT, the package manager used by Debian and Ubuntu, links with GnuTLS.
Fortunately, OpenSSL 1.0.x and GnuTLS (at least on Debian) only choke on the expired intermediate
if the AddTrust External CA Root root is in the local trust store. If it
isn't, they will build a chain to USERTrust RSA Certification Authority instead.
On Debian (and probably Ubuntu but I haven't tested), you can easily remove this
root from the trust store as follows:
- Edit
/etc/ca-certificates.confand put a bang/exclamation mark (!) beforemozilla/AddTrust_External_Root.crt - Run
update-ca-certificates
For Fedora and RHEL, see this Tweet by Christian Heimes.
February 8, 2020
Short Take: Why Trust-On-First-Use Doesn't Work (Even for SSH)
Considering all the progress that has been made over the last decade making SSL certificates on the Web easy, free, automated, and transparent, it's a bit jarring to see someone arguing in 2020 that trust-on-first-use (TOFU) would be better for the Web:
Unpopular opinion. Most people would be better off with a Trust On First Use system for accessing sites. Like SSH, perhaps with some unique (per user) OOB addition to it. Would we really design it this way of starting again?
— Nick Hutton @nickdothutton, Feb 6, 2020
First, be wary of any comparison with SSH, because in the grand scheme of things, very few people use SSH. *nix sysadmins do, obviously. Many, but not all, software developers do. Some people in engineering/science fields might. But that's a drop in the bucket compared to the Web, which basically everyone uses. So just because something appears to work for SSH doesn't mean it will work for the Web.
And I would argue that TOFU actually doesn't work very well for SSH, and the only reason we put up with it is because of SSH's low deployment. SSH server host keys rarely change (which is bad for post-compromise security, so this is nothing to celebrate), but when they do, SSH handles it very poorly. The user gets a big scary message about a possible man-in-the-middle attack. And then what do you think they do? They do this:
Hi all,
It appears that as of midnight last night, SSH and login are working. However, there were a couple students last night who were getting errors such as “REMOTE HOST IDENTIFICATION HAS CHANGED!” or “POSSIBLE DNS SPOOFING DETECTED!” when trying to SSH in.
To fix this, you can run `ssh-keygen -R [REDACTED]` then try to SSH in again. I believe someone else mentioned last night that you could also just delete the entire ~/.ssh/known_hosts file as well to fix the issue, but this seems to be a less destructive solution.
That's from a real email that I once received. I would not be at all surprised if TOFU actually devolves to opportunistic encryption in practice, because users just bypass any man-in-the-middle error they receive.
You could make it really hard to bypass man-in-the-middle errors, but then people would brick their servers, as happened with HTTP public key pinning, which is one of the reasons why that technology is now extinct.
Proponents of TOFU might say that even if TOFU devolves to opportunistic encryption, the man-in-the-middle errors at least make attacks noisy. True, but the errors are seen by people who generally don't know what they mean and even if they did, can't evaluate whether an error is a legitimate key change or an actual attack. In contrast, a PKI with Certificate Transparency (i.e. the system currently deployed on the Web) also makes attacks noisy, but alerts about new certificates go to server operators, who actually know whether a new certificate is legitimate or not. They just need to be monitoring Certificate Transparency logs.
So yes, I do believe we would design the Web this way if starting again.