> > The intermediate certs would’ve had Expirations before this month > No. Nobo...

tialaramex · on Sept 21, 2021

You say "parent" but as I keep explaining it's a graph. It's not a tree. It's not even a DAG, it's a full blown "Oh dear that looks like a hard problem" graph.

There actually is a good reason to issue certificates from an expired root in particular, and in fact Let's Encrypt has one, a certificate for ISRG Root X1 signed by the expiring DST Root CA X3 for some few extra years. Why do this? Android clients don't care about expiry in root certificates. In the absence of actual root store updates they will continue to trust this root, which makes some sense.

But that's not what this is about (although it's interesting). The key thing to understand is that it's a graph.

hamburglar · on Sept 21, 2021

Each cert was begotten from a specific signer. That’s all that is meant by “parent”. And the distinction between a tree and a graph is irrelevant for purposes of TLS cert validation. Each “Certificates” message should contain a linear chain, and each thing in the chain has another thing in the chain that signed it (which I am referring to as its parent). Walk that chain and verify that all of the certs in it are valid and eventually end in a valid cert that you trust a priori (and in many stacks, this MUST be a self-signed cert). If a particular platform chooses to continue trusting expired roots, that’s peculiar to that platform. I get it in the case of embedded stuff that may or may not ever get updated, but it’s strange nonetheless.

tialaramex · on Sept 21, 2021

> Each cert was begotten from a specific signer. That’s all that is meant by “parent”.

Then this parent is a key pair, and can't expire. Certificates expire and certificates don't sign anything.

> That’s all that is meant by “parent”. And the distinction between a tree and a graph is irrelevant for purposes of TLS cert validation.

And this, ladies and gentlemen, is how you get bugs of the sort which will trip people up about a week or so. "I can't see how to solve this hard problem, but I wrote a good solution to a different easier problem, I hope that's enough".

> Each “Certificates” message should contain a linear chain, and each thing in the chain has another thing in the chain that signed it (which I am referring to as its parent).

This is truly how TLS 1.2 and earlier described the message. It hasn't been a good idea for years, and TLS 1.3 finally explains that no, you probably don't want to do this with the Certificates message (a "chain" in the usual parlance).

> Walk that chain and verify that all of the certs in it are valid and eventually end in a valid cert that you trust a priori (and in many stacks, this MUST be a self-signed cert).

Yup. This bad algorithm is exactly what's going to blow up for some people in just over a week. Don't do this.

tialaramex · on Sept 21, 2021

I had to dash off to play D&D, so I stopped once I had written a bare explanation, but let's take an extra moment for why exactly this is going to blow up for some people.

> Walk that chain and verify that all of the certs in it are valid and eventually end in a valid cert that you trust a priori (and in many stacks, this MUST be a self-signed cert).

Imagine we're following this naive algorithm, we trust both ISRG Root X1 (which is many years from expiring) and DST Root CA X3 (which expires next week), and imagine that it's October 1st 2021 and we are seeing a pretty typical HTTPS server with a Let's Encrypt certificate.

Walking the chain we see several certificates, our algorithm tells us to check all of them:

1. An end entity certificate, for example.com signed by R3, this certificate seems fine, hasn't expired, valid, tick, moving on

2. An intermediate certificate, for R3 signed by ISRG Root X1, this certificate seems fine, it has CA:TRUE as necessary, hasn't expired, valid, tick, moving on

3. Another intermediate certificate, for ISRG Root X1, signed by DST Root CA X3, this certificate seems fine, it has CA:TRUE as necessary, hasn't expired, valid, tick, moving on

4. DST Root CA X3 self-signed certificate, which has expired. Invalid, fail, this chain is untrustworthy.

But wait, after step 2 we reached ISRG Root X1, which we know is trustworthy. We were actually done! Why are we looking at these other certificates at all, much less failing the whole chain?

And this is the bug in older OpenSSL versions (and older Libressl, and older GnuTLS) which is why if your clients can upgrade to OpenSSL 1.1.0 or newer that's key.

hamburglar · on Sept 22, 2021

In this scenario, most (not all) stacks will stop when they reach ISRG Root X1, which is trusted because it's in your trusted roots store. You're correct, there is no reason to continue validating because we have run into a cert that is explicitly trusted. That should be fine. The chain only needs to continue until it hits an explicitly-trusted cert. My recollection of the TLS RFC is that it will stop there. In fact, the Certificates message doesn't even need to have an ordered chain, and the entire chain doesn't even have to be relevant. You can have your end entity cert, some random irrelevant intermediate, the root cert, and then the intermediate and (again, this is from memory, it's been a while since I actually read it carefully) RFC says that you just must be able to build and verify the cert path up to a trusted root by finding all the intermediates in that list of certs.

The fact that some software stacks also happen to insist that that final step lands on a self-signed cert is the problem that will cause this cert validation to continue up the chain past ISRG Root X1 and blow up at the expired DST Root CA X3. It's not wrong because it's validating the expiration date on the root cert, it's wrong that it's ever reaching that root cert at all.

If you have references that say I'm wrong about this, I'm honestly interested. I just don't think the solution to this type of problem is ignoring the expiration dates, I think it's pushing out a trusted cert that's further down the chain, a la ISRG Root X1.

tialaramex · on Sept 23, 2021

> In this scenario, most (not all) stacks will stop when they reach ISRG Root X1, which is trusted because it's in your trusted roots store.

That isn't what you wrote. Machine don't do "What I really meant", only what you actually wrote. Unfortunately the people who programmed libraries like OpenSSL had the same attitude as you (until e.g. OpenSSL 1.1), and so in about a week a bunch of people are going to reget that.

> My recollection of the TLS RFC is that it will stop there.

The TLS RFCs don't offer any opinion about how you should make trust decisions. Accordingly even before RFC 8446 was finished, the most important clients (the web browsers) treat Certificates as one end-entity certificate plus some number of other documents which might be useful or might be irrelevant.

This also makes them robust against the most common misconfiguration which is failure to provide the intermediates. Since they don't care they will soldier on anyway, either by AIA chasing or, in the case of Firefox, by including the entire set of trusted unconstrained intermediates in every install.

> The fact that some software stacks also happen to insist that that final step lands on a self-signed cert is the problem that will cause this cert validation to continue up the chain past ISRG Root X1

That's not really key. They just don't stop, the self-signed certificates are a coincidence not a necessary element.

> If you have references that say I'm wrong about this, I'm honestly interested.

What does it mean to be "wrong"? I'm sure you will continue to believe you were correct, and yet the exact steps you wrote down are going to blow up for people in just a few days. As a programmer, I call that "wrong" but I'm quite sure you feel you weren't wrong at all, just misunderstood.

hamburglar · on Sept 24, 2021

> That isn't what you wrote. Machine don't do "What I really meant", only what you actually wrote.

This isn't code for a machine, it's a conversation. If you interpreted it in a way that sounds broken to you, please tell me exactly how and I'll either clarify or learn something new. All I'm doing is casually describing the RFC, and yes, I may have stated something ambiguously or incorrectly. Help me out here. Are you saying the RFC describes a bad algorithm or that I've described it badly?

> The TLS RFCs don't offer any opinion about how you should make trust decisions

The TLS RFCs refer you to RFC 5280 for validation, which does explicitly specify the algorithm.

> I'm sure you will continue to believe you were correct, and yet the exact steps you wrote down are going to blow up for people in just a few days

I'm not seeing how my steps will blow up, although the phrase "eventually end in a valid cert that you trust a priori" could be interpreted differently than I meant it. "Eventually end in" here is referring to the validation algorithm ending, not the cert chain. Can you elaborate on how you think this algorithm fails?

> As a programmer, I call that "wrong" but I'm quite sure you feel you weren't wrong at all, just misunderstood.

Could I ask you to be a little less aggressive in this conversation? We aren't having a fight here.

Edit: I think we're actually in violent agreement here. Your statement "But wait, after step 2 we reached ISRG Root X1, which we know is trustworthy. We were actually done!" is exactly my point. You don't need to ignore the expiration date on the old root if you have issued a new trusted authority below that root and the validation can stop there. You aren't ignoring the expiration on the old root, you're ignoring the entire old root cert because you found a new trusted cert that supercedes it. if you did still have an end entity cert that was issued via a path to the old root that did not include the newer, explicitly trusted cert lower down the chain, THAT end entity validation would fail.

withinboredom · on Sept 21, 2021

It still blows my mind that we’ve focused so much on writing robust software that can handle so many edge/failure cases… yet, an expired certificate blows everything up. As though the simple passage of time should dictate what I can and cannot trust or choose to do. Just ask me, the developer, if I should trust something or not, so I can pass that question on to the user, if it makes sense.

thraxil · on Sept 21, 2021

I think that's largely because "blow everything up" is arguably the correct thing to do on an expired certificate, at least compared to "silently do an insecure thing". If you visit a website with an expired certificate, most browsers do what you suggest and ask the user if they want to proceed at their own risk. All the software that doesn't have a direct user to ask, like basically anything running on a server, has to either fail with an error when something is wrong with the certificate, or open up a security hole that will probably eventually be exposed and exploited.

withinboredom · on Sept 21, 2021

All software has a user, otherwise, what is the point of having the software in the first place? A server’s user is an operator, devops, or whatever the org calls it.

thraxil · on Sept 21, 2021

Not all software has a user present when it runs that can make decisions like that. Probably the vast majority doesn't. If an automated update process that runs on thousands of servers encounters an expired certificate, it's not like it can pause and pop up a prompt to ask me if it should continue or not. I do a lot of immutable infrastructure and don't even have shell access on those machines, so even if I wanted to, it wouldn't be possible.

withinboredom · on Sept 21, 2021

This seems pretty easy to mitigate if someone created a IGNORE_EXPIRATION environment var where any matching cert fingerprint would ignore the expiration date. I have a feeling the more paranoid people would hate it, but it’s better than what we have now where it’s all-or-none ignoring cert validity.