I would like to live in a world where crowdstrike is directly liable for every dollar lost and has to pay for all of it. Only then would these companies start to take quality seriously.
Of course in our real world, they are unlikely to pay anything at all and just continue operating as-is.
>I would like to live in a world where crowdstrike is directly liable for every dollar lost and has to pay for all of it.
That wouldn't make any sense.
The blame realistically lies within each company who allowed a critical point of failure just so they get checkbox software and not actually have to expend the effort of making sure the company is able to function with their chosen infrastructure.
Exactly. The customer is to blame here, not the seller. Buyer beware.
If airlines skimp on maintenance and planes crash, well, all those dead passengers should have picked an airline that takes maintenance more seriously. Next time they'll know better.
No next time, they're dead, so not a good selection criteria.
Agreed that both sides share some liability, but a vendor that says they create security software with root level code injection access must be held to the highest possible standard, and liability.
If someone puts up a shack and sells hamburgers and hundreds of people die from the rotten meat, they most certainly are liable.
But apparently crowsdstrike can brick computers remotely, causing enormous damage including some deaths, and just say oopsie and walk away without any liability.
> I would like to live in a world where crowdstrike is directly liable for every dollar lost and has to pay for all of it.
Or a world where the regulator that required the checkbox — even if the firm can demonstrate a superior way of achieving the same objective — should pay for it.
It's absolutely ludicrous the US airline industry still has no contingency plan for "a bad software update is pushed by a vendor to our Windows systems."
Acknowledge that it’s a distributed system and build accordingly. The distributed systems literature understands failure modes, including Byzantine failure. Old airline systems were built to work with simple protocols and limited communication. A ticket was an independently verifiable object. Reservations were on a printed list. Check-in and gate agents did not need to make a phone call for each passenger. This was expensive, but it worked.
One can do the same thing with computers. Build small systems that operate independently, with strictly defined inputs and outputs. Make them un-brickable. Distribute software on read-only media, or outright replace the CPU for each update. You roll back by having the staff unplug the new thing and plug in the old thing.
A high-spec CM4 is $90. A small USB stick costs basically nothing. A bespoke CM4-like machine with only ROM could be made in volume for relatively little money. Most of the problem with an approach like this is that the industry doesn’t think this way.
Maybe a big buyer should step up, e.g. a major military. Non-CrowdStrike-able tech for, say, the US military seems like it would be extremely valuable.
That was the defacto state of things up until windows 10 moved to a forced update model. In that world we regularly had massive malware waves and resultant massive botnets constantly disrupting critical infrastructure.
We moved away from that for a reason. And the real fix is to make updates not take down the whole system. When was the last time every Linux system in the world was unable to boot all at the same time?
Even the crowdstrike issue only took down a very limited subset of installs and the blast radius is far more minimal due to the linux kernel being a lot more sane and cautious.
Crowdstrike's contracts and terms of service will have clauses about how much liability they have when things go wrong. I have no idea what Crowdstrike's policy is, but pretty often, the liability is limited to the amount of money you've paid them during the outage.
I've been involved in procurement at a big corporation and one thing we always modified in contracts was making the vendor 100% liable for any damages caused by their outages, but many vendors wouldn't make that modification.
You can. Just be a massive company with a massive contract and demand it as part of your contract negotiations. It’ll probably cost you a pretty penny, but you could make it happen.
Realistically, this doesn’t happen because the returns don’t make sense. If vendors are liable for your business, they need to account for your business in their costs. That drives their costs (and prices) up.
That is not a bad thing. Today all these shaky companies exist because they can take all the profit and externalize all the downsides, laughing all the way to the bank.
If they were made liable, one of two things might happen, both of which are positive outcomes.
One, they might upgrade their engineering and quality processes to the point where they can guarantee they won't be the root cause for taking down large parts of the industry. Sure it'll cost more but if they can do it and still make a profit (just less), all is well.
Two, maybe it simply can't be profitable to build these root backdoor systems with enough layers of safety, in which case the companies will disappear. This is also good; if the product can't be safe, it should not exist.
If a bridge has maintenance issues causing it to be unusable while it is being fixed, the bridge makers are not on the hook for the total downstream economic loss.
Crowdstrike's impact was more than Delta's cancelled flights? How did you arrive at the conclusion it was limited to Delta?
Even still, Delta had a really bad time recovering, and is still cancelling flights days later. It's not just "a day or two". At a billion a week, with 30-40% cancellation rates, that's 300-400M just for this one customer. And that's just lost revenue. Imagine the extra costs: customer service complaints, hardware / IT restoration, extra wages for flight attendants working double duty to keep the remaining flights going. Madness.
Even just in travel segment, how many hotels, car rentals, uber/lyft rides, etc were cancelled b/c of missed flights? How much do you think they paid on top of the lost revenue to handle customer complaints, IT restoration, etc?
The repair costs alone at a given airport must be staggering, as every terminal screen is BSOD and needs a tech to manually restore from bitlocker.
Well the comment I replied to didn't event try to quantify the total impact, and only mentioned delta flights being canceled. There might be a bazillion other ways it can add up to $5.4B, but "flights Delta cancelled and will be on the hook to refund" does not come anywhere close to justifying that number.
Delta only started to recover today, so it’s more like 4-5 days of canceled flights. Not saying it’s a huge difference but 1-2 greatly underplays how bad it was for people flying Delta vs any other airline.
Most of that was due to knock on impacts. The Delta scheduling software couldn’t handle the number of changes that were being to staff schedules, and routes, and flights due to the impacts of the CrowdStrike outage.
Way more than delta flights were grounded. If you include all the hours lost by all those people and assign some monetary value it's probably more than that. Not only that, but all the hospitals. God forbid someone died because of it. Surprised crowd strike isn't bankrupt
How does a company this big not have automated tests for their config files, and not have gradual/staggered rollouts for their deployments?
Is there some good reason for this approach (need to get config updates into the wild as quickly as possible to combat zero-days or zero-hours?) or was this just a massive oversight?
Side rant... their postmortem took forever to get to the point, first explaining all their jargon and product names. Makes me really appreciate the Cloudflare ones.
features get pushed to next quarter all the time. tag some JIRA L2s for [Future Sprint Owner] to worry about.
But you don't release until QA is done, esp. if you're touching safety-critical systems. Turns out CS didn't realize they ran safety critical systems in airlines and hospitals.
If you suggested we let through potentially bricking level changes without 100% testing coverage by basic testing including reboot testing on many different hardware/VM combinations then I would fire you in that same meeting if you weren't joking with me (and I was in a position to do it)
And because you knew it was coming you have the fix ready that you didn’t get paid any extra for and then the project gets declared a success by C-suite and the idiots who devised and it and almost destroyed the business get rewarded…
I am wondering if the update passed the test farm just fine, but when the file was moved to the update distribution system that's when the issue happened. The file copied as all nulls and there was no validation check that the file posted ok. Compounded by no validation check on the file after downloading by the end system. Compounded by not having a staged roll-out process for updates.
>Is there some good reason for this approach (need to get config updates into the wild as quickly as possible to combat zero-days or zero-hours?) or was this just a massive oversight?
I'm curious of this too. Has there ever been a scenario where zero-days could have caused so much damage that it'd warrant this speediness in patching? In a cost-benefit analysis would it justify x% of patches like this happening in preventing whatever security issues could occur without this type of infrastructure?
My bet is they have some normal process for updates that has testing but that process is only enforced by policy, not code, and somebody simply decided it was a waste of their time.
Yeah, management should sternly tell All code / config must be tested before deployment to prod. Millions of companies have issued this order and after that they are running free of problem for decades.
Without regulation, best practices are simply opinions and suggestions. “Please do” is insufficient for critical infrastructure. See: financial infra regulatory apparatus.
They very likely have automated tests. However, what if bug only triggers 90% of the time and you hit the lucky 10% during automated tests? Of course you can run tests 100 times but... is this a common practice? Moreover, we have both code and anecdotal evidences that the bug may indeed happen randomly. Tavis Ormandy posted a rough analysis of the crash context: https://x.com/taviso/status/1814762302337654829. It looks like the crash is caused by first checking if an uninitialized pointer is NULL, and if not, dereferencing it. If the uninitialized leftover data just happened to be zero, no crash happens.
And anecdotally, we saw people reporting that repeatedly rebooting their machines for 15+ times fixed the problem for them - because eventually you got lucky and in a boot it didn't happen and CrowdStrike managed to update itself to not crash.
> not have gradual/staggered rollouts for their deployments
No idea. Maybe their poor reliability guy got overrided by another team, like "how dare you delaying our important definition update? we're racing with threat actors!". I hope they learned their lesson.
Often this is not because the team doesn’t know about these things but because they have low staffing or other priorities or deadlines. This event looks to me like company rot that can be laid at the CEO’s feet
Can't wait for the class action lawsuit. The total impact is likely greater than $5.4B. A significant number of people must have died due to the impact this had on hospitals and emergency services.
THERE IS NO WARRANTY THAT THE SOFTWARE OR ANY OTHER CROWDSTRIKE OFFERINGS WILL BE ERROR FREE, OR THAT THEY WILL OPERATE WITHOUT INTERRUPTION OR WILL FULFILL ANY OF SOFTWARE USER’S PARTICULAR PURPOSES OR NEEDS. THE SOFTWARE AND ALL OTHER CROWDSTRIKE OFFERINGS ARE NOT FAULT-TOLERANT AND ARE NOT DESIGNED OR INTENDED FOR USE IN ANY HAZARDOUS ENVIRONMENT REQUIRING FAIL-SAFE PERFORMANCE OR OPERATION. NEITHER THE SOFTWARE OR ANY OTHER CROWDSTRIKE OFFERINGS ARE FOR USE IN THE OPERATION OF AIRCRAFT NAVIGATION, NUCLEAR FACILITIES, COMMUNICATION SYSTEMS, WEAPONS SYSTEMS, DIRECT OR INDIRECT LIFE-SUPPORT SYSTEMS, AIR TRAFFIC CONTROL, OR ANY APPLICATION OR INSTALLATION WHERE FAILURE COULD RESULT IN DEATH, SEVERE PHYSICAL INJURY, OR PROPERTY DAMAGE. SOFTWARE USER AGREES THAT IT IS SOFTWARE USER’S RESPONSIBILITY TO ENSURE SAFE USE OF SOFTWARE AND ANY OTHER CROWDSTRIKE OFFERING IN SUCH APPLICATIONS AND INSTALLATIONS.
Parent is taking about deaths. You can’t use terms of service limited iability regarding to death and TOD is not law.. I’m pretty sure it can be’litigated. You can’t just say I am not responsible for death or injury and skip all regulatory requirements for safety critical systems
It seems potentially tricky because they didn't just say they're not responsible for death or injury.
They essentially got the customer to accept a contract that says the software isn't designed for use in systems where failure could cause death, and that the customer accepts responsibility for using it appropriately.
I agree this whole incident was a massive blunder by CrowdStrike, but I'm not sure it makes sense to hold them liable for damage caused by customers using the product in a places they explicitly agreed not to use it in. In those cases, I think the organization that installed CrowdStrike's software in inappropriate places bears a lot of responsibility for the outcome, and their failure to understand the TOS they agreed to doesn't mean it's not a legally binding contract.
Perhaps, but it specifically says you should not use it for things like airtraffic control or where life and limb are on the line. If you use a rope to climb, when specifically warned that the rope should not be used for climbing, then can we hold the rope manufacturer resoonsible if someone climbs with the rope and dies when they fall?
If all the ropes the company sold got an update that caused all the ropes to break around the same time, and people were injured and property was damaged, I would think the rope company would be liable.
Is the position that CrowdStrike should not be used on anything important because their software can not be trusted? I mean, that's where I'm at now, and I bet many others feel the same.
hmm i was talking with my little sister about picking up some of their stock because these things always just blow over and the stock reverts back to where it basically was in time.
I don't think many enterprises will switch because of the effort required and, instead, they'll just yell at the account reps for a while and then go back to paying the invoice. However, a big lawsuit is something i didn't think of.
Can someone with a background in contract negotiation, vendor onboarding, and business continuity risk management share their expertise? We'd love to hear about typical vendor contract provisions that protect customers in situations like this.
If damages can be demonstrated, what are the chances of airlines successfully claiming compensation? Or, in practice, do such cases usually result in significant discounts during the next contract renewal rather than actual damages paid out?
Most contracts have indemnity clauses that protect from, or cap damages, due to vendor issues. You can get a court to overrule that if you can prove something like gross negligence, or that such provisions don't apply to something like safety-critical airline systems.
CS could push back saying they just offer endpoint protection, and it's on your org for where you put it. Kind like Ikea saying "hey man we just make end tables" when someone decides to put Lack tables on every airplane, and they turn out to be super flammable.
If the liability is capped at the cost of the duration of the incident (70 minutes from Crowdstrike’s PR-messaging perspective) or one month’s service charge - both pretty normal in standard contracts, then it’s only outside of the contracts that some equity could be achieved. Not holding my breath though.
Hopefully this will make BigCos think twice about forcing their employees to fill their computers with "security" malware that slows productivity to a crawl.
The value proposition of Crowdstrike is exactly that: something that you can deploy to tick the regulatory checkbox of "we have endpoint protection from a reputable company everywhere" without consuming outrageous system resources.
That's why they have so many enterprise customers. They're the only game in town that won't slow down your servers arbitrarily while still convincing an auditor that you do have an antivirus.
Too bad they also crash your whole system every now and then.
I think this is going to be a huge boon for dell. We had so many older computers that got completely hosed. Lots of Latitude 5400s died completely. All will need replacements.
Hardware failure hosed or just needing a re-image hosed?
Unless the hardware was already near failure I don’t see how this could cause hardware failure. The worst case scenario was the machine just constantly rebooting but after 3 (I think, somewhere around that number) it should have launched into WindowsRE.
Does anyone else feel a little sympathy for CrowdStrike? They pushed out something they should not have. OK. That is bad. But a couple days on and the bulk of the difficulties seem to be from how windows handled the situation: The BSODs, the boot loops, the inability to recover from a basic fault. I feel that if this did happen in a linux environment (it could) that it would be easier to isolate and boot systems into some sort of temporary mode. Linux would communicate and offer options. The windows-specific trend of just abandoning all hope, giving up and throwing the BSOD at the user ... CrowdStrike didn't create that.
What do you mean? They wrote the kernel driver. With great power comes great responsability.
If you're writing a kernel driver that is deployed throughout a great portion of Fortune 500, with the money that that entails, then you should definitely be able to afford to pay people to write defensive code and have proper pipelines in place.
Why there should be any sympathy for them? Their business is shitty boss-ware. I care about boss-ware makers as much as I care about tobacco companies.
According to the discussion in the thread, you're correct. Also, it was a $10 giftcard for .. uber eats. Where you can't get anything for less than ten bucks.
These are deeply insulting. My company sometimes sends out $10 cards for DoorDash. To actually get something I would have to add at least another $10 myself.
I wonder if Uber Eats and DoorDash give these out for free to companies as promotion. I bet most people who use the cards spend another $20 or more.
I guess it's easier to buy X number of gift cards and allow that company to deal with the individuals rather than paying each customer $10 individually.
Also, does the IT manager get that giftcard? Do they share it with the rest of the team? Does the CTO get the card and shares it with the rest of the C-suite. What's the proper way of handling that other than reject with a harsh laugh in their face at the offer?
Unlikely. Big enterprises are slow to change and switching provider would be a massive change.
They’re more likely to investigate running two platforms simultaneously or, more likely, talk to competitors purely as a bargaining chip to negotiate a bigger discount for the next renewal.
yeah that's my gut feeling. Also, no one at these enterprises are to blame they can all point fingers at CloudStrike and abuse their account reps. That way they don't have to put in the work to actually switch vendors while appearing like they're "on it". I bet they all get pretty nice discounts on renewal like you said but that will be it.
They were valued near $100B prior to this incident and even now their valuation is north of $60B. Even if they were held financially responsible for all $6B in damages - which is definitely not going to happen - that doesn't seem like a company-ending scenario.
Their biggest problem is obviously going to be with customer retention, but there are huge technical and regulatory hurdles their customers would have to go through to switch to a competitor. I'm sure many of their customers will accept this as an isolated incident and be quick to accept CrowdStrike's assurances that this won't happen again.
I think Delta Airlines is probably in more trouble right now than CrowdStrike.
They profited ~$125M last quarter, depending on their profit growth it could be 6-12 years of profits if they have to eat $5B of lawsuit judgements. Even 10% of that is the profits from a whole year.
The real problem is their stock crashed, so their best engineers will probably leave, and then it will become increasingly worse in a feedback cycle until possible eventual bankruptcy.
Ideally the stock price should be pushed up to attract better engineers to go in and fix shit. If stockholders agree to bid up the stock 100% YoY, hell, even I'd look for a job there, and help fix shit in return for some juicy RSUs.
If you take away their funding, you can only expect worse in the future.
They just lost a lot though, losing incentive to stay. But they provide a public good, so it would help for the public to bid the stock up to get the employees to stay, fix things, and vest it rather than ditching the company.
I think they'll be a fraction of their current size in just a couple of years. This was just too big and there are too many competitors out there for the space. The market will fix this problem, slowly as usual
I think what's more likely is their stock will drop for a while, this will blow over, and everyone will continue to pay their licence fees as if nothing ever happened. Don't underestimate the inertia, ineptitude, and resistance to change that permeates the upper echelons of large corps.
1) These types of products cause incidents all the time, this was just a very high impact one that happened to affect everybody all at once.
2) Their product is very good compared to the competitors.
All products in this space are black boxes, but CS is one of the least black-boxy, the alerts it produces are decent, the tooling is comes with is especially good (from an operator perspective), and the reporting it produces is exactly the sort of thing decision makers in big enterprise love to see.
I doubt there’s going to be much churn from this, definitely not an existential amount. As much as I personally can’t stand the organisation, I think they absorbed most of the bad press on behalf of all the service providers they took offline.
This was what I kept thinking about when it happened. Every job I've ever had has had days of some system or other being down for some amount of time. Hell, even WFH has put me offline when I've had extended power outage after a major storm. The only thing different about this was that it was all the companies at the same time because of one glitch. For all of the companies that did not use this system, it was just another day of the week.
Yeah, I know using free software isn’t a panacea. Still it would be a step in the right direction, plus I could not refrain from the cheap shot at M$ Windows.
This isn't your personal computer/homelab where you can get away with using common sense antivirus or even windows defender. Software like crowdstrike are often used in industries where they're mandated to install such software for compliance reasons (eg. PCI-DSS). Even if you were using linux you'd still need to install it, and crowdstrike previously had issues with their linux agent. It was just uncommon enough that it didn't hit the news.
Except that seems like a maintenance nightmare day to day. There's bugs in the linux version but not the windows version, not to mention having to write two sets of software. Imagine having to get your app's prod to work on both windows AND linux.
Agreed. It should be deployed entirely on Linux. Rip and rebuild is much easier on Linux. Using Windows as a server should be seen as a dark pattern in 2024.
For EMS, hospitals, Windows makes sense on the server because they don't know any better. For anyone remotely technologically competent, Windows shouldn't even be considered an option other than as workstations. Linux on the server is the only way and no one can convince me otherwise.
>Using Windows as a server should be seen as a dark pattern in 2024.
>Linux on the server is the only way and no one can convince me otherwise.
Now meet the sysadmin that thinks the same, but for windows for clients. At the risk of overgeneralizing, people are only for "diversity" when it means supporting their preferred underdog platform (eg. linux desktop). When they're the dominant incumbent it's suddenly "dark pattern", "they don't know any better" and "no one can convince me otherwise".
Moving Defender to user space is a requirement to lock down windows from a fair competition perspective.
Microsoft is currently blaming the EU commission for not allowing them locking down Windows, compare https://www.telegraph.co.uk/business/2024/07/22/microsoft-bl...
All previous comments on HN about the incident... I've seen absolutely no one praising the thing as a security solution but a lot of people posting that it's bought to pass audits.
As someone who used CrowdStike daily and worked as an MDR Analyst and Engineer at a top ranked MDR provider, CrowdStrike is a very capable piece of tech.
While the driver for purchase is almost always to pass audits, it's still a good product.
Please tell me you aren’t carelessly repeating a braindead figure you read because an LLM TOLD SOME GUY SO. Because you wouldn’t do that, right? Please.
>one of the greatest problem for them is that they bypassed customers deployment policies
Caveat emptor. Falcon and other similar security products often push updates at-will, and they're fully transparent about this if you actually read the contract terms and understand the vendor's approach to operations. I have worked with many clients that elect not to use such tools in certain sensitive environments, specifically to mitigate the risk of being impacted by something like CrowdStrike's 7/19 event.
respond to threats faster. and without direct involvement of the owning company, since their GPO or other updaters / control systems may also be compromised.
You're free to choose an EDR vendor that allows you to defer definition updates. Remember, this is enterprise sales for multi-billion dollar companies, so the usual excuse of "take it or leave it" doesn't really apply.
"A few people on twitter are saying this thing happened. We didn't actually talk to them, we didn't look at the emails and verify their authenticity ourselves, we just trusted some twitter screenshots and wrote a blogspam article stating it as truth.
We put absolutely no critical thought into whether this was a likely thing, and we completely ignored the many government and media reports that are credibly sourced which state that there are known phishing scams and other threat actors trying to capitalize on this incident.”
I highly doubt this is something that Crowdstrike actually did.
Edit: Amazingly they did, the article has been updated with a statement. Amazingly stupid all around.
https://www.washingtonpost.com/transportation/2024/07/23/del...