Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
CrowdStrike global outage to cost US Fortune 500 companies $5.4B (theguardian.com)
176 points by Terretta on July 24, 2024 | hide | past | favorite | 166 comments


$5.4B seems way too low given the number of flights Delta cancelled and will be on the hook to refund.

https://www.washingtonpost.com/transportation/2024/07/23/del...


I would like to live in a world where crowdstrike is directly liable for every dollar lost and has to pay for all of it. Only then would these companies start to take quality seriously.

Of course in our real world, they are unlikely to pay anything at all and just continue operating as-is.


>I would like to live in a world where crowdstrike is directly liable for every dollar lost and has to pay for all of it.

That wouldn't make any sense.

The blame realistically lies within each company who allowed a critical point of failure just so they get checkbox software and not actually have to expend the effort of making sure the company is able to function with their chosen infrastructure.


Exactly. The customer is to blame here, not the seller. Buyer beware.

If airlines skimp on maintenance and planes crash, well, all those dead passengers should have picked an airline that takes maintenance more seriously. Next time they'll know better.


It shouldn't be one or the other, but both


> Next time they'll know better.

No next time, they're dead, so not a good selection criteria.

Agreed that both sides share some liability, but a vendor that says they create security software with root level code injection access must be held to the highest possible standard, and liability.

If someone puts up a shack and sells hamburgers and hundreds of people die from the rotten meat, they most certainly are liable.

But apparently crowsdstrike can brick computers remotely, causing enormous damage including some deaths, and just say oopsie and walk away without any liability.


> I would like to live in a world where crowdstrike is directly liable for every dollar lost and has to pay for all of it.

Or a world where the regulator that required the checkbox — even if the firm can demonstrate a superior way of achieving the same objective — should pay for it.


It's absolutely ludicrous the US airline industry still has no contingency plan for "a bad software update is pushed by a vendor to our Windows systems."


If the computers in a computer reliant massively distributed organization go down, what's the alternative?


Acknowledge that it’s a distributed system and build accordingly. The distributed systems literature understands failure modes, including Byzantine failure. Old airline systems were built to work with simple protocols and limited communication. A ticket was an independently verifiable object. Reservations were on a printed list. Check-in and gate agents did not need to make a phone call for each passenger. This was expensive, but it worked.

One can do the same thing with computers. Build small systems that operate independently, with strictly defined inputs and outputs. Make them un-brickable. Distribute software on read-only media, or outright replace the CPU for each update. You roll back by having the staff unplug the new thing and plug in the old thing.

A high-spec CM4 is $90. A small USB stick costs basically nothing. A bespoke CM4-like machine with only ROM could be made in volume for relatively little money. Most of the problem with an approach like this is that the industry doesn’t think this way.

Maybe a big buyer should step up, e.g. a major military. Non-CrowdStrike-able tech for, say, the US military seems like it would be extremely valuable.


Computers shouldn't auto update. Updates need to be scheduled and monitored and tested.


That was the defacto state of things up until windows 10 moved to a forced update model. In that world we regularly had massive malware waves and resultant massive botnets constantly disrupting critical infrastructure.

We moved away from that for a reason. And the real fix is to make updates not take down the whole system. When was the last time every Linux system in the world was unable to boot all at the same time?

Even the crowdstrike issue only took down a very limited subset of installs and the blast radius is far more minimal due to the linux kernel being a lot more sane and cautious.


Every CTO’s easy answer - cool, we’ll just only do them once every 5 years then.

With modern tech stacks, we’re talking hundreds of updates a quarter. If not more.


Sounds like they shouldn't be using those tech sticks then.


Desktop operating systems, mobile devices, browsers.....there's a lot that doesn't pass your approval.


My desktop doesn't auto update. It only updates when I ask it to.

My phone does auto update, and nearly every auto update makes it worse and slower. That's been true for nearly every Android I've had.


The sheer awesomeness of Android 3 is staggering by now.


> If the computers in a computer reliant massively distributed organization go down, what's the alternative?

I feel like the answer should be obvious: if you need reliable systems then you don't create a single point of dependency on such fragile systems.

The latest blog from Bruce Schneier is a good writeup on this brittleness:

https://www.schneier.com/blog/archives/2024/07/the-crowdstri...


Failover?


Failover to what?


Crowdstrike's contracts and terms of service will have clauses about how much liability they have when things go wrong. I have no idea what Crowdstrike's policy is, but pretty often, the liability is limited to the amount of money you've paid them during the outage.

I've been involved in procurement at a big corporation and one thing we always modified in contracts was making the vendor 100% liable for any damages caused by their outages, but many vendors wouldn't make that modification.


I’ve heard of that for civil engineering firms but the amount of damages is capped at the yearly contract amount.

Which in this case is probably a lot less than what these companies are paying in clean up costs.


You can. Just be a massive company with a massive contract and demand it as part of your contract negotiations. It’ll probably cost you a pretty penny, but you could make it happen.

Realistically, this doesn’t happen because the returns don’t make sense. If vendors are liable for your business, they need to account for your business in their costs. That drives their costs (and prices) up.


> That drives their costs (and prices) up.

That is not a bad thing. Today all these shaky companies exist because they can take all the profit and externalize all the downsides, laughing all the way to the bank.

If they were made liable, one of two things might happen, both of which are positive outcomes.

One, they might upgrade their engineering and quality processes to the point where they can guarantee they won't be the root cause for taking down large parts of the industry. Sure it'll cost more but if they can do it and still make a profit (just less), all is well.

Two, maybe it simply can't be profitable to build these root backdoor systems with enough layers of safety, in which case the companies will disappear. This is also good; if the product can't be safe, it should not exist.


That's an excellent way to encourage no one to do anything ever.


We can look outside the software industry to see this is not true.

Building bridges is a very overused analogy, but it's still a good one. Structural engineers still design bridges and they get built.

What they don't do it throw some drawing together and YOLO it to the builder and let's see what happens, the way software gets built.


Absolutely not.

If a bridge has maintenance issues causing it to be unusable while it is being fixed, the bridge makers are not on the hook for the total downstream economic loss.


Delta's revenue is only $58.05 billion in 20213. I'm not sure how a day or two of canceled flights is going to be anywhere near $5.4B.


Crowdstrike's impact was more than Delta's cancelled flights? How did you arrive at the conclusion it was limited to Delta?

Even still, Delta had a really bad time recovering, and is still cancelling flights days later. It's not just "a day or two". At a billion a week, with 30-40% cancellation rates, that's 300-400M just for this one customer. And that's just lost revenue. Imagine the extra costs: customer service complaints, hardware / IT restoration, extra wages for flight attendants working double duty to keep the remaining flights going. Madness.

Even just in travel segment, how many hotels, car rentals, uber/lyft rides, etc were cancelled b/c of missed flights? How much do you think they paid on top of the lost revenue to handle customer complaints, IT restoration, etc?

The repair costs alone at a given airport must be staggering, as every terminal screen is BSOD and needs a tech to manually restore from bitlocker.

https://www.cbsnews.com/news/delta-flight-cancellations-toda...


Well the comment I replied to didn't event try to quantify the total impact, and only mentioned delta flights being canceled. There might be a bazillion other ways it can add up to $5.4B, but "flights Delta cancelled and will be on the hook to refund" does not come anywhere close to justifying that number.


Delta only started to recover today, so it’s more like 4-5 days of canceled flights. Not saying it’s a huge difference but 1-2 greatly underplays how bad it was for people flying Delta vs any other airline.


Most of that was due to knock on impacts. The Delta scheduling software couldn’t handle the number of changes that were being to staff schedules, and routes, and flights due to the impacts of the CrowdStrike outage.


Way more than delta flights were grounded. If you include all the hours lost by all those people and assign some monetary value it's probably more than that. Not only that, but all the hospitals. God forbid someone died because of it. Surprised crowd strike isn't bankrupt


the hospitals alone are reasons to excoriate CS, airlines notwithstanding


It's more like a week of downtime which totals to about a billion based on the data you brought in.

So, just one of the affected companies brings the total to $1B, wouldn't you say $5B is actually a low estimate?


How does a company this big not have automated tests for their config files, and not have gradual/staggered rollouts for their deployments?

Is there some good reason for this approach (need to get config updates into the wild as quickly as possible to combat zero-days or zero-hours?) or was this just a massive oversight?

Side rant... their postmortem took forever to get to the point, first explaining all their jargon and product names. Makes me really appreciate the Cloudflare ones.


How many conversations have you had with people at work about how something was a bad idea, only to unsuccessfully avoid it?


And conversely, how many good ideas have you proposed to have them put on "next quarter"'s schedule?


features get pushed to next quarter all the time. tag some JIRA L2s for [Future Sprint Owner] to worry about.

But you don't release until QA is done, esp. if you're touching safety-critical systems. Turns out CS didn't realize they ran safety critical systems in airlines and hospitals.


That's... not quite what I meant. What I meant was,

How many times do you think someone proposed:

    We should harden our code / delivery in case the system is important
Only to hear

    We can add robustness later, we will patch it in Terms of Service "for now"


If you suggested we let through potentially bricking level changes without 100% testing coverage by basic testing including reboot testing on many different hardware/VM combinations then I would fire you in that same meeting if you weren't joking with me (and I was in a position to do it)


And because you knew it was coming you have the fix ready that you didn’t get paid any extra for and then the project gets declared a success by C-suite and the idiots who devised and it and almost destroyed the business get rewarded…


I am wondering if the update passed the test farm just fine, but when the file was moved to the update distribution system that's when the issue happened. The file copied as all nulls and there was no validation check that the file posted ok. Compounded by no validation check on the file after downloading by the end system. Compounded by not having a staged roll-out process for updates.


>Is there some good reason for this approach (need to get config updates into the wild as quickly as possible to combat zero-days or zero-hours?) or was this just a massive oversight?

I'm curious of this too. Has there ever been a scenario where zero-days could have caused so much damage that it'd warrant this speediness in patching? In a cost-benefit analysis would it justify x% of patches like this happening in preventing whatever security issues could occur without this type of infrastructure?


My bet is they have some normal process for updates that has testing but that process is only enforced by policy, not code, and somebody simply decided it was a waste of their time.


Yeah, management should sternly tell All code / config must be tested before deployment to prod. Millions of companies have issued this order and after that they are running free of problem for decades.

It is so straightforward and it always works.


Without regulation, best practices are simply opinions and suggestions. “Please do” is insufficient for critical infrastructure. See: financial infra regulatory apparatus.

Incentives, outcomes, the usual.


Regulations bad. The invisible foot of Freedom Markets™ will fix exactly these kinds of market failures. Rationally speaking.


Hand, but yes.


> not have automated tests for their config files

They very likely have automated tests. However, what if bug only triggers 90% of the time and you hit the lucky 10% during automated tests? Of course you can run tests 100 times but... is this a common practice? Moreover, we have both code and anecdotal evidences that the bug may indeed happen randomly. Tavis Ormandy posted a rough analysis of the crash context: https://x.com/taviso/status/1814762302337654829. It looks like the crash is caused by first checking if an uninitialized pointer is NULL, and if not, dereferencing it. If the uninitialized leftover data just happened to be zero, no crash happens.

And anecdotally, we saw people reporting that repeatedly rebooting their machines for 15+ times fixed the problem for them - because eventually you got lucky and in a boot it didn't happen and CrowdStrike managed to update itself to not crash.

> not have gradual/staggered rollouts for their deployments

No idea. Maybe their poor reliability guy got overrided by another team, like "how dare you delaying our important definition update? we're racing with threat actors!". I hope they learned their lesson.


The crash was triggered by a config file that just contained null bytes as payload


This is a false rumor. Please stop spreading misinformation.


Running tests 100 times only helps, if you have some sufficiently randomized input data, so that the issue can happen.


Often this is not because the team doesn’t know about these things but because they have low staffing or other priorities or deadlines. This event looks to me like company rot that can be laid at the CEO’s feet


hey they're moving fast and breaking things..


Can't wait for the class action lawsuit. The total impact is likely greater than $5.4B. A significant number of people must have died due to the impact this had on hospitals and emergency services.


Everyone signed their terms of use: https://www.crowdstrike.com/software-terms-of-use/

Section 6.1:

THERE IS NO WARRANTY THAT THE SOFTWARE OR ANY OTHER CROWDSTRIKE OFFERINGS WILL BE ERROR FREE, OR THAT THEY WILL OPERATE WITHOUT INTERRUPTION OR WILL FULFILL ANY OF SOFTWARE USER’S PARTICULAR PURPOSES OR NEEDS. THE SOFTWARE AND ALL OTHER CROWDSTRIKE OFFERINGS ARE NOT FAULT-TOLERANT AND ARE NOT DESIGNED OR INTENDED FOR USE IN ANY HAZARDOUS ENVIRONMENT REQUIRING FAIL-SAFE PERFORMANCE OR OPERATION. NEITHER THE SOFTWARE OR ANY OTHER CROWDSTRIKE OFFERINGS ARE FOR USE IN THE OPERATION OF AIRCRAFT NAVIGATION, NUCLEAR FACILITIES, COMMUNICATION SYSTEMS, WEAPONS SYSTEMS, DIRECT OR INDIRECT LIFE-SUPPORT SYSTEMS, AIR TRAFFIC CONTROL, OR ANY APPLICATION OR INSTALLATION WHERE FAILURE COULD RESULT IN DEATH, SEVERE PHYSICAL INJURY, OR PROPERTY DAMAGE. SOFTWARE USER AGREES THAT IT IS SOFTWARE USER’S RESPONSIBILITY TO ENSURE SAFE USE OF SOFTWARE AND ANY OTHER CROWDSTRIKE OFFERING IN SUCH APPLICATIONS AND INSTALLATIONS.


Parent is taking about deaths. You can’t use terms of service limited iability regarding to death and TOD is not law.. I’m pretty sure it can be’litigated. You can’t just say I am not responsible for death or injury and skip all regulatory requirements for safety critical systems


It seems potentially tricky because they didn't just say they're not responsible for death or injury.

They essentially got the customer to accept a contract that says the software isn't designed for use in systems where failure could cause death, and that the customer accepts responsibility for using it appropriately.

I agree this whole incident was a massive blunder by CrowdStrike, but I'm not sure it makes sense to hold them liable for damage caused by customers using the product in a places they explicitly agreed not to use it in. In those cases, I think the organization that installed CrowdStrike's software in inappropriate places bears a lot of responsibility for the outcome, and their failure to understand the TOS they agreed to doesn't mean it's not a legally binding contract.

It'll be interesting to see how it all plays out.


Perhaps, but it specifically says you should not use it for things like airtraffic control or where life and limb are on the line. If you use a rope to climb, when specifically warned that the rope should not be used for climbing, then can we hold the rope manufacturer resoonsible if someone climbs with the rope and dies when they fall?


If all the ropes the company sold got an update that caused all the ropes to break around the same time, and people were injured and property was damaged, I would think the rope company would be liable.

Is the position that CrowdStrike should not be used on anything important because their software can not be trusted? I mean, that's where I'm at now, and I bet many others feel the same.


Your rope climber sounds troubled. I'm not sure the analogy holds.

If the rope climber is the same person who purchased the rope, then they get a Darwin award!

Otherwise need more detail: is the rope on loan? What's the licensing structure of the rope? Is the license still attached to the rope somehow?


The lawyers will get a few hundred million and businesses affected will get $50 and a coupon for a Wendy's frosty.


hmm i was talking with my little sister about picking up some of their stock because these things always just blow over and the stock reverts back to where it basically was in time.

I don't think many enterprises will switch because of the effort required and, instead, they'll just yell at the account reps for a while and then go back to paying the invoice. However, a big lawsuit is something i didn't think of.


Realistically speaking, the liability lies with every individual organization that installed the corporate spyware on their systems.


Can someone with a background in contract negotiation, vendor onboarding, and business continuity risk management share their expertise? We'd love to hear about typical vendor contract provisions that protect customers in situations like this.

If damages can be demonstrated, what are the chances of airlines successfully claiming compensation? Or, in practice, do such cases usually result in significant discounts during the next contract renewal rather than actual damages paid out?


> We'd love

who is 'we'?

Most contracts have indemnity clauses that protect from, or cap damages, due to vendor issues. You can get a court to overrule that if you can prove something like gross negligence, or that such provisions don't apply to something like safety-critical airline systems.

CS could push back saying they just offer endpoint protection, and it's on your org for where you put it. Kind like Ikea saying "hey man we just make end tables" when someone decides to put Lack tables on every airplane, and they turn out to be super flammable.


If the liability is capped at the cost of the duration of the incident (70 minutes from Crowdstrike’s PR-messaging perspective) or one month’s service charge - both pretty normal in standard contracts, then it’s only outside of the contracts that some equity could be achieved. Not holding my breath though.


Don't worry...they're handing out $10 vouchers to make up for it

https://techcrunch.com/2024/07/24/crowdstrike-offers-a-10-ap...


Well, 'security' check boxes have consequences.


Hopefully this will make BigCos think twice about forcing their employees to fill their computers with "security" malware that slows productivity to a crawl.


The value proposition of Crowdstrike is exactly that: something that you can deploy to tick the regulatory checkbox of "we have endpoint protection from a reputable company everywhere" without consuming outrageous system resources.

That's why they have so many enterprise customers. They're the only game in town that won't slow down your servers arbitrarily while still convincing an auditor that you do have an antivirus.

Too bad they also crash your whole system every now and then.


I think this is going to be a huge boon for dell. We had so many older computers that got completely hosed. Lots of Latitude 5400s died completely. All will need replacements.


How? The fix is to just remove the one bad "Channel File." Are there machines where that does not resolve the problem?


Hardware failure hosed or just needing a re-image hosed?

Unless the hardware was already near failure I don’t see how this could cause hardware failure. The worst case scenario was the machine just constantly rebooting but after 3 (I think, somewhere around that number) it should have launched into WindowsRE.


How did it hose them completely? I thought the problem is easily fixed by removing the offending config?


Never heard of servers with bad hardware that never rebooted for years?


Many people here have only ever used cloud servers.


And yet here we are discussing laptops.


The Latitude 5400 is a laptop.


I thought this just required a manual intervention OR re-imaging?


Does anyone else feel a little sympathy for CrowdStrike? They pushed out something they should not have. OK. That is bad. But a couple days on and the bulk of the difficulties seem to be from how windows handled the situation: The BSODs, the boot loops, the inability to recover from a basic fault. I feel that if this did happen in a linux environment (it could) that it would be easier to isolate and boot systems into some sort of temporary mode. Linux would communicate and offer options. The windows-specific trend of just abandoning all hope, giving up and throwing the BSOD at the user ... CrowdStrike didn't create that.


What do you mean? They wrote the kernel driver. With great power comes great responsability.

If you're writing a kernel driver that is deployed throughout a great portion of Fortune 500, with the money that that entails, then you should definitely be able to afford to pay people to write defensive code and have proper pipelines in place.


And there is only the one kernel. No easy rollback option at boot, a previous version to at least get the system online.



Why there should be any sympathy for them? Their business is shitty boss-ware. I care about boss-ware makers as much as I care about tobacco companies.


But they apologized with a $10 gift card.

Soo. $5.4B - $10


Imagine Apple / Google pushing an update that bricked 2b+ mobile devices.


It's gonna happen someday lol, that's why I keep a backup device (just my previous gen phone) and don't update it except a few times a year.


The preliminary post incident review is here:

https://www.crowdstrike.com/falcon-content-update-remediatio...

It boils down to the "Content Validator" had a bug and gave a false positive.

It's kind of crazy that the 'rapid response content' update was then free to go out direct to production machines with zero actual live testing.

That's either due to c-suite excel cost-cutting/maximize profit or silicon valley yolo.


Silicon valley working at midnight? It was a contractor from India.


Is Chaos Engineering an appropriate preventative measure for this sort of thing?


I've never seen chaos testing for broken drivers or boot problems. It's usually taking things down ramdomly.


"take everything down, randomly"

but to the parent poster's point... maybe. sometimes you gotta throw a wrench in it and see what happens.


That's a lot of $10 (non functional) Uber eats cards


So, here's 10 bucks, we good?


Or a year membership of geek squad.


Worse than than, 10$ gift card


I heard it had been withdrawn?


CrowdStrike offers a $10 apology gift card to say sorry for outage - https://news.ycombinator.com/item?id=41058261 (129 comments)

According to the discussion in the thread, you're correct. Also, it was a $10 giftcard for .. uber eats. Where you can't get anything for less than ten bucks.


These are deeply insulting. My company sometimes sends out $10 cards for DoorDash. To actually get something I would have to add at least another $10 myself.

I wonder if Uber Eats and DoorDash give these out for free to companies as promotion. I bet most people who use the cards spend another $20 or more.


I guess it's easier to buy X number of gift cards and allow that company to deal with the individuals rather than paying each customer $10 individually.

Also, does the IT manager get that giftcard? Do they share it with the rest of the team? Does the CTO get the card and shares it with the rest of the C-suite. What's the proper way of handling that other than reject with a harsh laugh in their face at the offer?


Even worse, the vouchers got canceled and can't be redeemed.


That's truly some dark comedy.


Bankruptcy coming ....


Unlikely. Big enterprises are slow to change and switching provider would be a massive change.

They’re more likely to investigate running two platforms simultaneously or, more likely, talk to competitors purely as a bargaining chip to negotiate a bigger discount for the next renewal.


You can sue your provider and still continue to use that provider. Look at the Apple vs Samsung lawsuits as an example


The Apple vs Samsung situation is very different to this but I do take your point.

The end result is still the same though, CrowdStrike will lose a lot of income and confidence but they’re not going out of business.

Frankly, even if they were to, I’m certain they’d end up getting bailed out anyway. But I can’t see it getting to the point to begin with.


yeah that's my gut feeling. Also, no one at these enterprises are to blame they can all point fingers at CloudStrike and abuse their account reps. That way they don't have to put in the work to actually switch vendors while appearing like they're "on it". I bet they all get pretty nice discounts on renewal like you said but that will be it.


They were valued near $100B prior to this incident and even now their valuation is north of $60B. Even if they were held financially responsible for all $6B in damages - which is definitely not going to happen - that doesn't seem like a company-ending scenario.

Their biggest problem is obviously going to be with customer retention, but there are huge technical and regulatory hurdles their customers would have to go through to switch to a competitor. I'm sure many of their customers will accept this as an isolated incident and be quick to accept CrowdStrike's assurances that this won't happen again.

I think Delta Airlines is probably in more trouble right now than CrowdStrike.


They profited ~$125M last quarter, depending on their profit growth it could be 6-12 years of profits if they have to eat $5B of lawsuit judgements. Even 10% of that is the profits from a whole year.


As far as I can tell, CS is still up 75% over 12 months.


The real problem is their stock crashed, so their best engineers will probably leave, and then it will become increasingly worse in a feedback cycle until possible eventual bankruptcy.

Ideally the stock price should be pushed up to attract better engineers to go in and fix shit. If stockholders agree to bid up the stock 100% YoY, hell, even I'd look for a job there, and help fix shit in return for some juicy RSUs.

If you take away their funding, you can only expect worse in the future.


Crashed? Since the start of the year CRWD is up 5%


Since the beginning of time everything is up 10000000%

It crashed in the last week. That's what matters.


But since most engineers probably started before last week, wouldn't they be still up quite a bit? Even with the crash it's up 75% over the past year.


They just lost a lot though, losing incentive to stay. But they provide a public good, so it would help for the public to bid the stock up to get the employees to stay, fix things, and vest it rather than ditching the company.


Is that how stonks work? It sounds quite close to throwing good money after bad.


Lawsuits may crater CrowdStrike. I'll be surprised if they carry adequate insurance against this large a screwup.


I think they'll be a fraction of their current size in just a couple of years. This was just too big and there are too many competitors out there for the space. The market will fix this problem, slowly as usual


I think what's more likely is their stock will drop for a while, this will blow over, and everyone will continue to pay their licence fees as if nothing ever happened. Don't underestimate the inertia, ineptitude, and resistance to change that permeates the upper echelons of large corps.


tho they've taken a hit and lost some business, for sure, there are still a lot of people using SolarWinds...


I'm curious if that will actually happen as these companies need to use something to check their compliance box.


There are competitors in this space. Palo Alto Networks comes to mind. Whether competitors have same issues as ClownStrike remains to be seen.


The two problems I see with this story are

1) These types of products cause incidents all the time, this was just a very high impact one that happened to affect everybody all at once.

2) Their product is very good compared to the competitors.

All products in this space are black boxes, but CS is one of the least black-boxy, the alerts it produces are decent, the tooling is comes with is especially good (from an operator perspective), and the reporting it produces is exactly the sort of thing decision makers in big enterprise love to see.

I doubt there’s going to be much churn from this, definitely not an existential amount. As much as I personally can’t stand the organisation, I think they absorbed most of the bad press on behalf of all the service providers they took offline.


Correct. To move away there have to be alternatives. Who here is building the alternative?


Yeah I think it will be quickly forgotten. Most people already have where I work. It was just another IT fail in a long line of them.


This was what I kept thinking about when it happened. Every job I've ever had has had days of some system or other being down for some amount of time. Hell, even WFH has put me offline when I've had extended power outage after a major storm. The only thing different about this was that it was all the companies at the same time because of one glitch. For all of the companies that did not use this system, it was just another day of the week.


"We can't hold back releases from going into prod.. we have to deliver"

"We don't have enough time to write tests"

"Developers should be able to test their own code"


Free just got cheaper.

Yeah, I know using free software isn’t a panacea. Still it would be a step in the right direction, plus I could not refrain from the cheap shot at M$ Windows.


This isn't your personal computer/homelab where you can get away with using common sense antivirus or even windows defender. Software like crowdstrike are often used in industries where they're mandated to install such software for compliance reasons (eg. PCI-DSS). Even if you were using linux you'd still need to install it, and crowdstrike previously had issues with their linux agent. It was just uncommon enough that it didn't hit the news.


This is why the important thing is diversity. The more diverse your ecosystem they less likely you are to suffer a catastrophic failure

If half your tills are windows/defender and half linux/crowdstrike then half your tills are going to be working.


Except that seems like a maintenance nightmare day to day. There's bugs in the linux version but not the windows version, not to mention having to write two sets of software. Imagine having to get your app's prod to work on both windows AND linux.


Agreed. It should be deployed entirely on Linux. Rip and rebuild is much easier on Linux. Using Windows as a server should be seen as a dark pattern in 2024.

For EMS, hospitals, Windows makes sense on the server because they don't know any better. For anyone remotely technologically competent, Windows shouldn't even be considered an option other than as workstations. Linux on the server is the only way and no one can convince me otherwise.


>Using Windows as a server should be seen as a dark pattern in 2024.

>Linux on the server is the only way and no one can convince me otherwise.

Now meet the sysadmin that thinks the same, but for windows for clients. At the risk of overgeneralizing, people are only for "diversity" when it means supporting their preferred underdog platform (eg. linux desktop). When they're the dominant incumbent it's suddenly "dark pattern", "they don't know any better" and "no one can convince me otherwise".


Two teams. Two systems. Identical design specifications and goals.

If the results match: Everything is largely proven to be working as-designed, and the output is assumed to be valid. This is an advantage.

If one breaks: Nothing is proven to be working, but that's no worse than we have today with just one system. This is not a disadvantage.


Cloudstrike customers voluntarily agreed to allow Cloudstrike to push kernel drivers. What should Microsoft have done to prevent this?


Move Windows Defender into user space and enforcing the same for all security software.


This has nothing to do with how Defender works.

Crowdstrike shipped a driver that they marked as a mandatory boot driver. The Windows OS could have had more recovery options otherwise.


Moving Defender to user space is a requirement to lock down windows from a fair competition perspective. Microsoft is currently blaming the EU commission for not allowing them locking down Windows, compare https://www.telegraph.co.uk/business/2024/07/22/microsoft-bl...


It's my understanding that CrowdStrike customers buy that thing to check a box in some security audit, not because it provides any other benefit.

Let's blame bullshit compliance?


>not because it provides any other benefit.

It probably does provide benefits against some clueless intern in accounting downloading a macro-enabled excel file that has a malware enabled.


What’s that understanding based on?


All previous comments on HN about the incident... I've seen absolutely no one praising the thing as a security solution but a lot of people posting that it's bought to pass audits.


As someone who used CrowdStike daily and worked as an MDR Analyst and Engineer at a top ranked MDR provider, CrowdStrike is a very capable piece of tech.

While the driver for purchase is almost always to pass audits, it's still a good product.


Actually there were a lot of pen testers that were speaking positively about CrowdStrike


Today in "random number pulled out of someone's ass"


Looks good on a resume:

- Wrote code responsible for $5.4 billion


[flagged]


"...and made an impact in thousands of lives" in resume-speak


That is probably a lowball. Several cities lost 911 service. Hospitals lost critical systems during emergencies.

(Why those systems run windows + crowdstrike and are even connected to the outside world, I don't know)


Source?


How do these sort of estimates get made? Genuinely curious.


In this case a guy asked an LLM. So dumb.

https://news.ycombinator.com/item?id=41054756


https://news.ycombinator.com/item?id=41054756

Please tell me you aren’t carelessly repeating a braindead figure you read because an LLM TOLD SOME GUY SO. Because you wouldn’t do that, right? Please.


Do you have a source for this? I can't find estimates anywhere.


Dude is probably guessing


I’m pretty certain CS has contracts that limit their liabilities in events like this.

Probably a refund is all they’ll be on the hook for.

Sadly, damage done like this is just chalked up to an accident, and swept under the rug.


It depends on the actual root cause, gross negligence won't save them, regardless of what they put in the contracts.

From my point of view, one of the greatest problem for them is that they bypassed customers deployment policies.


>one of the greatest problem for them is that they bypassed customers deployment policies

Caveat emptor. Falcon and other similar security products often push updates at-will, and they're fully transparent about this if you actually read the contract terms and understand the vendor's approach to operations. I have worked with many clients that elect not to use such tools in certain sensitive environments, specifically to mitigate the risk of being impacted by something like CrowdStrike's 7/19 event.


Do we have more insight into the nature or reasons behind the bypassing?


respond to threats faster. and without direct involvement of the owning company, since their GPO or other updaters / control systems may also be compromised.


>From my point of view, one of the greatest problem for them is that they bypassed customers deployment policies.

Do you really want to wait until for the weekly/monthly/quarterly deployment window to deploy a detection update for a 0day, or a new type of malware?


Well, at least it's up to me to decide, not CS.


You're free to choose an EDR vendor that allows you to defer definition updates. Remember, this is enterprise sales for multi-billion dollar companies, so the usual excuse of "take it or leave it" doesn't really apply.


> contracts that limit their liabilities… refund is all they’ll be on the hook for

By cashing in this $10 Uber Eats coupon you agree to hold harmless...

- https://news.ycombinator.com/item?id=41058261

- https://techcrunch.com/2024/07/24/crowdstrike-offers-a-10-ap...


"A few people on twitter are saying this thing happened. We didn't actually talk to them, we didn't look at the emails and verify their authenticity ourselves, we just trusted some twitter screenshots and wrote a blogspam article stating it as truth.

We put absolutely no critical thought into whether this was a likely thing, and we completely ignored the many government and media reports that are credibly sourced which state that there are known phishing scams and other threat actors trying to capitalize on this incident.”

I highly doubt this is something that Crowdstrike actually did.

Edit: Amazingly they did, the article has been updated with a statement. Amazingly stupid all around.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: