Running an update script every day is good. Certbot defaults to running twice a day. Just use something with similar logic, waiting to renew short-lived certificates until halfway through their validity period. That way the actual load is nice and spread out. And you should get that logic by default if you do a normal setup.
If I would use short-lived certs I would make sure to choose an ACME client that has support for ARI (ACME Renewal Information). Then the CA will tell the client when it’s time to renew.
² By the seventh day God had finished the work He had been doing; so on the seventh day He rested from all His work. ³ Then the on-call tech, Lucifer, the Son of Dawn, was awoken at midnight because God did not renew the heavens' and the earths' HTTPS certificate. ⁴ Thusly Lucifer drafted his resignation in a great fury.
I just got home from a stressful day in retail (oh who am I kidding; every day is stress in retail) and this gave me a chuckle I really needed. Thank you.
Standard memory disclosure: the apple when eaten would be freed, but it would still be read, leaking its contents. Luckily its volume was low, so they couldn't exfiltrate all of it. But still, the heavens are closed for maintenance, pending a rewrite in Rust.
The CA/B Forum defines a "short-lived" certificate as 7 days, which has some reduced requirements on revocation that we want. That time, in turn, was chosen based on previous requirements on OCSP responses.
Those are based on a rough idea that responding to any incident (outage, etc) might take a day or two, so (assuming renewal of certificate or OCSP response midway through lifetime) you need at least 2 days for incident response + another day to resign everything, so your lifetime needs to be at least 6 days, and then the requirement is rounded up to another day (to allow the wiggle, as previously mentioned).
Plus, in general, we don't want to align to things like days or weeks or months, or else you can get "resonant frequency" type problems.
We've always struggled with people doing things like renewing on a cronjob at midnight on the 1st monday of the month, which leads to huge traffic surges. I spend more time than I'd like convincing people to update their cronjobs to run at a randomized time.
I have always been a bit puzzled by this. By issuing fixed length certificates you practically guarantee oscillation. If you have a massive traffic spike from, say, a CDN mass reissuing after a data breach - you are guaranteed to have the same spike [160 - $renewal_buffer] hours later.
Fuzzing the lifetime of certificates would smooth out traffic, encourage no hardcoded values, and most importantly statistical analysis from CT logs could add confidence that these validity windows are not carefully selected to further a cryptographic or practical attack.
- 8 is a lucky number and a power of 2
- 8 lets me refresh weekly and have a fixed day of the week to check whether there was some API 429 timeout
- 6 is the value of every digit in the number of the beast
- I just don't like 6!