Weakest part of CI/CD workflows is when code requires corresponding infrastructu...

throwaway20371 · on Sept 13, 2021

Well that's because it's not a development workflow. That's a live systems change. It's like changing the tire on a moving vehicle.

If you have a very repeatable environment, you can have an entire pipeline that creates new infra from scratch (w/Terraform), build and deploy your new app, test it, and then point traffic at the new infra. It's like blue/green but bigger. You aren't changing the tire, you're moving from one moving vehicle to another one. That works well because there's no chance for unusual problems from trying to figure out how to re-jigger things on the fly.

The former is configuration-management-organized infrastructure, and the latter is immutable infrastructure.

The problem comes in with things like changing an S3 bucket or IAM role. Changing those things is like changing the highway... you can't replace the highway. You have to close down a lane of traffic, put up traffic cones, reduce the speed limit, make your changes carefully. Ideally test on a strip of test highway first.

These cloud-managed services cannot be made immutable, so you have to use configuration-management. So you have to have a change management system in place, and tightly manage the dependency between your app and the change.

rajamaka · on Sept 13, 2021

You've never seen a blue/green deployment or a DNS change without downtime?

nopurpose · on Sept 13, 2021

I've never seen _workflow_, were developer single handedly can create a well tested PR performing all necessary release steps upon merge.

rajamaka · on Sept 13, 2021

At least for cloud environments, I haven't seen a company that doesn't have such a workflow in place for a long time.

nopurpose · on Sept 14, 2021

What exactly goes into PRs you've seen? How do they codify these steps, such that:

- Release steps can be tested while PR is being developed. That is executed somewhere, which doesn't interfere with production or ideally other developers ongoing PR should it go bad

- Upon merge production system safely transitioned from one state to another without human intervention.

Steps, for simplicity are:

- Deploy new version of the code behind new load balancer

- Make some requests through it to verify its workig

- Switch DNS to the new load balancer IP

- Wait for old load balancer new connections to die down and remove both LB and old version of the code behind it.

gonzo41 · on Sept 13, 2021

I agree, I've started to demerge those sorts of changes. Infra code moves slower (over the long term) than app code. So having seperate pipelines is IMO better.

Though on your final comment, I'd say doing something like a no down time api migration is actually pretty complicated if you're not doing it a lot.