Cloudflare had a partial outage

jgrahamc · on June 21, 2022

Yes, not worldwide but a lot of places. Problem with our backbone. We know what. Rollbacks etc. happening. Bring it back up in chunks.

Should be back up everywhere.

golanggeek · on June 21, 2022

We are not using cloud flare. But our domain is also not accessible. We are using digital ocean's DNS service for propagating our IP. Does the DigitalOcean's DNS service depend on Cloudflare service?

tn890 · on June 21, 2022

Yes https://www.cloudflare.com/case-studies/digitalocean/

ThePowerOfFuet · on June 21, 2022

That link isn't accessible from where I am right now.

Alanis Morissette agrees that this is ironic.

mproud · on June 21, 2022

Yes, this is actual irony, unlike “rain on your wedding day.”

asddubs · on June 21, 2022

but in regards to singing a whole song about things that are not irony being irony: isn't it ironic?

don't you think?

chrisweekly · on June 21, 2022

Yes! IRL Alanis is smart and eloquent, and those that think that song's lyrics are evidence to the contrary are missing the joke.

baobob · on June 21, 2022

Moved all my domains to DO specifically to stop donating traffic data to Cloudflare. Absolutely stunned I didn't notice this earlier.

It's bullshit all the way down.

Are there any companies left offering free DNS usable from Terraform that aren't part of the "Internet Five Eyes"?

edit: looks like Linode may be the next best 'not terrible' option

mnordhoff · on June 21, 2022

I have bad news for you, Linode's authoritative DNS service also uses Cloudflare DNS Firewall.

  $ dig +short ns1.digitalocean.com aaaa
  2400:cb00:2049:1::adf5:3a33
  $ dig +short ns1.linode.com aaaa
  2400:cb00:2049:1::a29f:1a63

noobermin · on June 21, 2022

So much for whoever downthread said "not all websites use cloud flare"

mercer · on June 21, 2022

Same here, except DNSimple.

Deukhoofd · on June 21, 2022

mikesabbagh · on June 21, 2022

CF SRE team need to rethink their published SLA of 100%. This is not reasonable. https://www.cloudflare.com/business-sla/

shric · on June 21, 2022

An SLA of 100% simply means you agree to compensate your customers (as specified, usually with credit) if your service is down at all, nothing more.

Also, SRE here but not for Cloudflare -- I've never seen SREs directly involved in externally published SLAs, they usually come from legal. We deal with SLOs on more fine grained SLIs than overall uptime

codethief · on June 21, 2022

> SLA […] SRE […] SLOs […] SLIs

I made it to SLA (which I believe stands for service level agreement). What do the other abbreviations stand for?

shric · on June 21, 2022

SRE - Site Reliability Engineer (a term Google came up with that's been adopted elsewhere) Google defined it approximately as what happens when you apply software engineering practices to what was traditionally an operations function.

SLO - Service Level Objective - the service level you strive for. If it's higher you have room for experimentation, etc.

SLI - Service Level Indicator - the actual metric(s) you use to measure a service level (latency, error rate, throughput, etc.)

nsheridan · on June 21, 2022

SLA - correct. That’s the contract between the operator and the users which describes the penalties for not meeting agreed-upon SLO

SLO - service level objective, the stated availability (or latency or durability etc) of the service. Usually expressed as a value over a period of time (e.g 99.9% availability as measured over a moving 30 average). The SLO is measured by the SLI.

SLI - service level indicator. Simply, the direct measurement of the service (i.e metrics)

SRE - Site Reliability Engineer, usually a member of a team who is responsible for the continued availability of the service and the poor sap who gets paged when it breaches SLO or has an outage or other impactful event.

KarlKode · on June 21, 2022

SRE: Site reliability engineer

SLI: Service level indicator (Metric to measure the health of a service. For example successful requests per interval / total requests per interval.)

SLO: Service level objective (what performance you expect eg. the previously mentioned SLI is >= 99.5%)

SLA: Servicelevel agreement (legal agreement that defines what happens if a SLO is not met)

stn_za · on June 21, 2022

Yep, I'd promise 99.95 at a stretch, never 100%.

They are not being honest with themselves here

latch · on June 21, 2022

I'm not sure you and your parent understand what an SLA means. It's an agreement that, when broken, incurs a penalty.

They aren't saying they guarantee 100% uptime. They're saying they'll pay you for any downtime. It's literally the 3rd paragraph:

> 1.2 Penalties. If the Service fails to meet the above service level, the Customer will receive a credit equal to the result of the Service Credit calculation in Section 6 of this SLA.

(Most people I know consider them meaningless marketing BS that's really just meant to trick people or satisfy some make-work checkbox)

greyface- · on June 21, 2022

> They aren't saying they guarantee 100% uptime

> Cloudflare ("Company") commits to provide a level of service for Business Customers demonstrating: [...] 100% Uptime. The Service will serve Customer Content 100% of the time without qualification.

This is a legal commitment to provide 100% uptime. They are guaranteeing 100% uptime and defining penalties for failing to meet that guarantee. The fact that a penalty is defined does not stop it from being a guarantee.

e1g · on June 21, 2022

No, this SLA is a legal commitment to give you credits when Service uptime falls below a certain threshold. The threshold could be anything - 99%, 50%, 100%, etc. Importantly, Cloudflare is not under a legal obligation to provide the Service at or above the agreed threshold, it's under a legal obligation to give you Credits when the Service uptime is below that threshold.

"Service Credits are Customer’s sole and exclusive remedy for any violation of this SLA."

chrisseaton · on June 21, 2022

> This is a legal commitment to provide 100% uptime. They are guaranteeing 100% uptime

I don't think you know what a guarantee is.

For example when you buy a new car you get a guarantee that it won't break down. Are they claiming it won't break down? No, of course not. What a guarantee means is that they'll fix it or compensate you if it does.

brokenkebab2 · on June 21, 2022

Looks like it supports parent opinion: commit - bind to a certain course if policy. It's legal obligation, not a statement about guarantees in physical world (like "this alloy won't melt below t°C")

subvarad · on June 23, 2022

I completely can understand your emotion. But even the top CDNs can have outages of some form or the other. If site uptime is important, check out https://www.cdnreserve.com/ - it's built on the design principle that the likelihood of two separate platforms having an outage at the same time is close to zero.

arccy · on June 21, 2022

that just means they're willing to pay for the marketing number, not that they will actually achieve it

samcheng · on June 21, 2022

Thanks for being here with timely updates! I knew to come to Hacker News once the alert triggered and a few users started complaining.

fragmede · on June 21, 2022

Good thing HN doesn’t use them then!

lesmond · on June 21, 2022

Enterprise support has been useless as is the status page. Got more info here

dx034 · on June 21, 2022

The status page shows about as much information as the post here.

lesmond · on June 21, 2022

It didn’t at the time.

Their phone line kept cutting us off and then the people there were not too helpful.

robbiep · on June 21, 2022

Status page was down for me in Sydney

Akuehne · on June 21, 2022

Just wanted to also reiterate how thankful I am that you took your time to let us know. It speaks volumes.

taf2 · on June 21, 2022

Agreed- couldn’t figure out what was going on… finally checked here and - ah now I can sleep

Abishek_Muthian · on June 21, 2022

Cloudflare going down is one of the things which keeps me awake, My main complaint about Cloudflare is that they are very good at everything they offer that we've become reliant on them for everything.

brightball · on June 21, 2022

Happens to everybody sometime. AWS seemed to have a major outage a couple of times a year for a while there.

subvarad · on June 23, 2022

Exactly but the likelihood of two networks going down at the same time is close to Zero. Check out: https://www.cdnreserve.com/ We rolled it out to complement top CDNs.

Abishek_Muthian · on June 21, 2022

True, They're usually due to issues with BGP routes.

It's common to see CF being the DNS/CDN for applications across AWS, GCP, Azure etc. So perhaps CF being down affects more applications than individual cloud platforms?

jimmygrapes · on June 21, 2022

"do no evil" springs to mind -- once burned, etc.

Abishek_Muthian · on June 21, 2022

Yeah, What's up with the competition to Cloudflare? What's the real barrier for entry?

It's not infrastructure anymore, As there is a new PaaS startup every week offering distributed hosting and So why bundling in DNS, DDOS detection+mitigation, cloud workers... with it is so hard?

richdougherty · on June 21, 2022

This is just my take, but Cloudflare looks to be building a "moat" to make entry hard. This is built around two things: 1. economies of scale, 2. a network effect.

-

https://en.wikipedia.org/wiki/Economies_of_scale

As Cloudflare gets bigger, they can provide services more cheaply. This is because (a) they can more fully utilise their data centres and other physical capital investments, (b) they can divide their fixed software costs over more users and (c) they get process efficiencies and discounts with scale.

A new entrant will struggle to match cost unless they're able to obtain similar scale. The bigger Cloudflare gets, the bigger the scale that a new entrant needs to hit before they can match them on cost.

-

https://en.wikipedia.org/wiki/Network_effect

Second they're aiming to build a network effect through having huge number of locations. The more locations, the more appealing to new customers as they can be close to more users. A competitor will have to build a similar number of locations to match Cloudflare's proposition.

A new entrant cannot provide as much value, and therefore cannot charge as high a price, without building a similar sized network. This again requires the entrant to invest heavily before they can charge a similar price.

-

The combination of these two things mean that when Cloudflare is operating at a large scale with a large network it can offer a more valuable service (and charge a higher price) than a new entrant, and earn more profit because it can operate at a lower cost.

Also, Cloudflare has the option of lowering its price and still being profitable due to lower costs at its scale, so it can deter entrants from trying to compete by the threat of being able to lower prices below what is profitable for new entrants.

The only players who can compete may be those who already have comparable size - Amazon, Google, Microsoft, Facebook, CDNs, etc, since they will already have addressed the issues of scale and network effects. However, they may not want to cannibalise their existing markets. It will be hard for other new entrants to compete.

subvarad · on June 23, 2022

There are many noteworthy players - Akamai, Fastly etc., and Edge plaoviders like ourselves (Zycada) who complement top CDNs like Akamai, Cloudflare, Fastly.

Abishek_Muthian · on June 24, 2022

The main difference between Cloudflare and the others mentioned is the price; One can start with CF for a side project for free and continue to use it free till it becomes a viable startup.

Others at best offer a limited trial plan, But most are just 'Speak to expert/ Contact us' for pricing which means haggling with a sales rep while we can just build things. Even the paid plans of CF is reasonable when compared with others with better features.

jgrahamc · on June 21, 2022

It's hard at scale.

Abishek_Muthian · on June 21, 2022

Isn't everything harder at scale? That's not a barrier for entry though.

Sohcahtoa82 · on June 21, 2022

Building a CDN absolutely is hard to do at scale.

You can't build a Cloudflare competitor in AWS/Azure/Linode/DO/etc. You need your own data centers. Multiple of them across the country, ideally around the world if you want to serve the whole world.

This is insanely hard.

andrewnyr · on June 21, 2022

for a global cdn... it quite literally is

Abishek_Muthian · on June 21, 2022

Point taken, For a Global CDN - scale is the point of entry.

llausa · on June 21, 2022

hats off to you sir - would not want to be in your shoes right now but thanks for the updates

jgrahamc · on June 21, 2022

I don't have shoes on.

stavros · on June 21, 2022

That's the point, two people can't fit in them.

tomglynch · on June 21, 2022

In tumultuous times (fix wasn't implemented at this point), the Cloudflare CTO still has time for some wit. Love it!

keyle · on June 21, 2022

It's back wooo!

And I'm saying this for the last time: no one type google into google!

bilekas · on June 21, 2022

Thanks for the update, just curious if we will get a report on what happened ? In as much detail as can be of course - morbid curiosity mainly. I love the post reports these events usually bring.

blue_cookeh · on June 21, 2022

Cloudflare are usually pretty good with posting post mortems. https://blog.cloudflare.com/tag/postmortem/

brycewray · on June 21, 2022

https://blog.cloudflare.com/cloudflare-outage-on-june-21-202...

halotrope · on June 21, 2022

Thank you! This little comment just saved me an hour of investigation. Good luck for getting the system back up asap.

dangrossman · on June 21, 2022

All my sites behind Cloudflare had come back up, and have now gone back to serving 500 errors.

The Cloudflare Dashboard is also no longer fully loading.

jgrahamc · on June 21, 2022

Where are you located?

dangrossman · on June 21, 2022

Raleigh, NC, USA

Sites are gradually reappearing as I type this. Some of my sites, and doordash.com, were returning 500 errors again just a minute ago. They just came back up, followed by the CF dashboard loading again.

hakanderyal · on June 21, 2022

I'm from Turkey, and I also have been seeing intermittent errors for the last 10 minutes. Seems ok now.

janmo · on June 21, 2022

Should probably do slower roulouts next time.

rcshubhadeep · on June 21, 2022

Thanks for letting us know

b3lvedere · on June 21, 2022

Thank you for your very fast support!

obnauticus · on June 21, 2022

Which undersea cable was cut ;)

noitpmeder · on June 21, 2022

Can't really roll back that change

ekimekim · on June 21, 2022

Sure you can, but physically rolling back more cable takes a bit longer :P

MarkyMark210 · on June 21, 2022

DR much ?

jgrahamc · on June 21, 2022

I don't know what that means.

mike_d · on June 21, 2022

DR means "disaster recovery," it is a formal plan used to respond to and mitigate potential risks to the business. Things like having a communications plan for an incident, or a backup office outside of your main office natural disaster zone.

jgrahamc · on June 21, 2022

Ah. Just one more reason I hate acronyms. They obscure what the person is trying to say.

hericium · on June 21, 2022

I really dislike that they are editing their status messages.

Entry[1] dated "Jun 21, 2022 - 06:43 UTC" has been edited to include more detail after they posted another entry at 06:57 UTC. There seems to be no indication that the message has been altered.

Currently text on the status page may suggest that they identified the problem immediately but it took about 15 minutes. Previously there was a text stating that customers should expect update within 15 minutes. Next message was posted 14 minutes after that but previous message was altered later and nothing indicates this.

Cloudflare, not cool.

[1] https://www.cloudflarestatus.com/incidents/xvs51y9qs9dj

e1g · on June 21, 2022

Strongly agree. Such whitewashing puts all previous incident reports in doubt - can I trust CF summaries of outages, or did they rewrite that history too.

RobinUS2 · on June 21, 2022

I understand your point, but CloudFlare generally is very transparent, including root cause analysis and their CTO reaching out directly. It could also be a mistake or not so well thought about instead of assuming bad intentions.

nextaccountic · on June 21, 2022

jgrahamc is in this thread. if he wants to, he may say this was a mistake and swear they won't do this anymore

he can also add an [edited] on that entry, among other things

joebob42 · on June 21, 2022

Kind of ironic, there was a big "cloudflare is bad and a central point of failure" article on the front page just a couple days ago.

Found it, https://news.ycombinator.com/item?id=31801947

edit: Not that I necessarily agree with the article even in light of there being an outage, cloudflare has been pretty good for us. Just thought it was interesting.

nabla9 · on June 21, 2022

Mid 2000s one computer science professor said that internet capacity is not going to match the amount of traffic. Everybody laughed. The wold was full of dark fiber after dot-com-bust.

But if you look at his math, it was correct. The era of client-server connected heterogeneous distributed Internet is just a side show today.

The solution has been centralization (clarification: big companies run their own caches and networks near users). and growth of caches and then Cloudflare taking care of the rest.

danachow · on June 21, 2022

> The solution has been centralization and growth of caches.

Centralization and growth of caches are on their face contradictory.

Perhaps you mean organizational centralization but that really has nothing to do with internet capacity demands. Your hot take isn’t so brilliant. What’s fundamentally wrong with edge distribution?

nabla9 · on June 21, 2022

> Perhaps you mean organizational centralization

Yes. This is exactly what I mean. Big companies run their own caches and networks near users. Cloudflare takes care of the rest.

>What’s fundamentally wrong with edge distribution?

You incorrectly assume judgement from my part. My point is that things have changed. New problems arise in solution to old problems. Fragility from small number of organizations running their caches to solve bandwidth problem.

sschueller · on June 21, 2022

Also ironic so many are blindly helping create "the great firewall of the USA" because it's easy and cheap.

hericium · on June 21, 2022

> "the great firewall of the USA"

Not only the US.

ilaksh · on June 21, 2022

Cloudflare usually works great. That doesn't mean they aren't a central point of failure.

rvz · on June 21, 2022

But web2 is going (really) great once you depend on Cloudflare. And it is certainly not re-centralizing the whole internet with a provider that is a single point of failure. /s

balex · on June 21, 2022

I can't tell if you're being cynical or actually mean it.

Care to clarify so I could take the mandatory contrarian approach?

selcuka · on June 21, 2022

/s means "end of sarcasm".

balex · on June 21, 2022

Ah. Obviously.

So yeah, how is a CDN centralizing your infra? You could just have your CNAMEs point to a different provider or directly to your gateways. Or you could even go down the multi CDN path, and have someone like ns1 automatically redirect your CNAMEs to an alternate CDN on a per-geo basis to overcome local failures.

It's just another SaaS component in your system. You could self host if you're willing to take on the ownership challenge, and at certain scale it would even be more cost effective.

selcuka · on June 22, 2022

Not the OP, but CloudFlare is not only a CDN but does everything you mentioned in your comment for you, so it's the load balancer and the DNS as well. When it goes down everything goes down.

Technically you could set up a separate DNS/failover somewhere else and use a backup reverse proxy/TLS terminator/CDN SaaS similar to CloudFlare, but then that somewhere else will be your point of failure.

indy · on June 21, 2022

Brace yourself for a lot more of those kinds of articles in the next few days.

sph · on June 21, 2022

I welcome them. Perhaps these outages stop all people rushing to host their single HTML page blog through them.

indy · on June 21, 2022

unfortunately their tentacles have penetrated deep into the internet: https://news.ycombinator.com/item?id=31820929

FlyingSnake · on June 21, 2022

With titles like "Cloudflare considered bad"

phillipseamore · on June 21, 2022

It's time to start discussing a fail-open option for us CF users. Most of my sites are using CF for global performance rather than DDoS protection and security. I'd be fine with them changing DNS to point to the origin (or any other user defined IPs) in case of issues (even if it would take hours to return to normal).

This is also important for countries with limited connectivity to the Internet, if the PoP in that country looses it's connection back to CF it shuts everything down, so even if the origin is in the next rack over from the PoP, it's un-reachable.

renonce · on June 21, 2022

You can implement your own DNS server that CNAMEs to Cloudflare and falls back to origin IP when there is a problem with Cloudflare. I think a downstream Cloudflare provider could provide such services if they desire.

subvarad · on June 21, 2022

Outages occur to the best of CDNs. But the likelihood of two CDNs suffering an outage at the same time is close to zero. Check out www.cdnreserve.com.

phillipseamore · on June 21, 2022

Last time I checked that was limited to the enterprise plan.

eins1234 · on June 21, 2022

Of course it is... Typical Cloudflare.

Cthulhu_ · on June 21, 2022

Imagine having to pay for a service

adwww · on June 21, 2022

Cloudflare's pricing has issues.

My company was paying $20 a month. We were heavily depended on CF, we'd have been happy to pay more.

But... the one feature we wanted was for our accounts team to have their own login so the ops team didn't have to download invoices every month. Nope, that one feature required an enterprise plan which they quoted $4,000 a month for.

garyclarke27 · on June 21, 2022

Oh dear your poor Ops team, they had to download a few invoices to save $4k and every month! My heart bleeds for them.

tinco · on June 21, 2022

Companies where you have to log in and download invoices are the worst. If there's a viable alternative to their products I switch immediately. You make it seem like it's not a big deal, but a reasonably sized startup has dozens of service providers. Should we pay every little service $4k/mo just to save the communications and context switching overhead?

adwww · on June 21, 2022

You jest, but imagine how time consuming it would be if every app we used was setup like CloudFlare, where only the one super admin can deal with billing.

Also in these days of remote work, it's a problem if the credit card details need updating - either you have to give the company card details over a slack call, or you need to give a card holder your root password.

hizanberg · on June 21, 2022

I've been impressed with Cloudflare's (non enterprise) value thus far, what Bandwidth & Users did the $4k quote cover?

evrflx · on June 21, 2022

Is there no api for that? It would even save the manual download effort.

aaaaaaaaaaab · on June 21, 2022

Just write a script?

mpol · on June 21, 2022

Imagine having to pay for a service... at enterprise prices.

This is not really an extra or a nice-to-have, it might be more a hostage situation.

eins1234 · on June 21, 2022

Imagine already paying for a service and then having someone snark at you for wanting things for free.

I tried to exercise some restraint this time, but screw it. Here's another rant:

Beware of Cloudflare's tactic of luring people in to their CDN product with "free" bandwidth, and then locking useful features arbitrarily behind what I can only imagine is a thousands of dollars per month enterprise plan. Just look at their cache-purging page for a super obvious example of this (there are plenty more, way too many to list), everything other than basic purge by URL is enterprise only: https://developers.cloudflare.com/cache/how-to/purge-cache/

These days Cloudflare is literally my last choice for a CDN for my new projects. My new go-to is bunny.net, who charges a reasonable usage-based fee for bandwidth and gives you unfettered access to all the features they've built (and doesn't route your users to farther/closer nodes based on how much you pay: https://cloudflare-test.judge.sh/). Though I'd even reach for Cloudfront with their expensive bandwidth costs these days, because at least their pricing is transparent and scales smoothly with usage, and they don't arbitrarily cut you off from useful features that you might not know you need yet.

Even their bandwidth might not really be "free", since I've heard if you actually use any significant amount, the sales people will come knocking on your door to coerce you to get on the same enterprise plan or have your site taken down.

easytiger · on June 21, 2022

Can I ask out of interest (most of my projects are high perf/low traffic) what kind of traffic you are dealing with at the point you decide you need a CDN?

eins1234 · on June 21, 2022

I don't really use a CDN to manage high traffic volumes. It's more to provide a better, lower-latency experience for my users regardless of where they access my apps from.

bluejekyll · on June 21, 2022

You’d need to have TLS certs on origin ready to go for this scenario to work. Additionally, you’d need to make sure to test it and ensure that there’s nothing wrong in this event.

On top of that, depending on your scale, can you take all the traffic on origin that Cloudflare currently offloads?

phillipseamore · on June 21, 2022

No issue for me. This is obviously a power-user option. It's kind-of implemented for Enterprise users were you don't have to let CF have full control over the domain.

yunohn · on June 21, 2022

> I'd be fine with them changing

This would be best implemented by you. If the point is to avoid CF as a PoF, why would you rely on their infra to fail safe when something breaks?

booi · on June 21, 2022

Wouldn't this expose the origin IPs to attack?

mccorrinall · on June 21, 2022

Yes, but he says in his second sentence that he doesn’t mind and mainly uses CF for performance.

phillipseamore · on June 21, 2022

Plenty of commenters also seem to miss the most important word in the first sentence: "option".

phillipseamore · on June 21, 2022

I'm talking about this as an option for users like me, that don't have an attack surface, but need the global performance gains of CF.

1123581321 · on June 21, 2022

Probably not many users who need the performance and can handle unexpected failover. There would also be the issue of setting the policy defaults effectively. Most users wouldn’t benefit from this footgun.

If you’re serious, you could probably automate this right now with your DNS provider and uptime monitoring.

phillipseamore · on June 21, 2022

I'm so serious that I already have failover after 1 hour at the registrar level, but those changes are not immediate and can take up to 24h to roll-in and roll-back due to DNS propagation and caching.

phillipseamore · on June 21, 2022

We seem to have hit the nesting maximum 1123581321, but to your point "There is no immediate option with DNS changes."

There's a huge difference in changing nameservers for a domain and simply changing host records.

efreak · on June 21, 2022

This is a 7th level comment. Synu below[0] has a 9th level comment, and I've seen nested comments go quite a bit further.

[0] https://news.ycombinator.com/item?id=31821497

pxeger1 · on June 21, 2022

I think there is no nesting maximum (or if there is, it's much bigger than this). There's a limit which stops you replying to a comment immediately, to prevent super long quick-fire arguments.

1123581321 · on June 21, 2022

Ah, sorry, misunderstood you. You can’t rely on them to change their host records when they’re down.

If you want CDN-independent automatic failover, look into anycast with two providers. If one of them is Cloudflare, use the tier that lets you manage your DNS elsewhere.

1123581321 · on June 21, 2022

There is no immediate option with DNS changes. CF can’t immediately remove their IP from the route. Sounds like you’ve solved your problem in the sense that you have an automatic failover, though, which is good.

chrisweekly · on June 21, 2022

Typically CF _is_ the DNS provider though, right?

1123581321 · on June 21, 2022

Only up until the $200/mo tier. This kind of feature would be locked at that level anyway.

As I said to the OP elsewhere, he should be doing something like anycast to multiple CDNs if this is critical.

hatthew · on June 21, 2022

Yes, but I'd wager that most sites experience cloudflare outages more often than they experience bona fide attacks.

Cthulhu_ · on June 21, 2022

You say that, but there's tons of automated attempts doing the rounds on everything directly connected to the internet; centralized providers like Cloudflare can detect and prevent these patterns, whereas you need to be on the ball yourself if you have a service directly open to the internet. Exploits are exploited quickly, and while I make no assumptions about your particular website / application, a lot cannot push an update on short notice.

scrollaway · on June 21, 2022

You would lose your wager.

DDoS attacks are extremely common at just about any scale, even if you only have a few thousand users.

Cloudflare going down like this? Actually first time I remember it happening. There’s been downtime before but nothing so major.

londons_explore · on June 21, 2022

The origin IP's are already open to attack for most users.

It's trivial to scan the whole IPv4 internet to find out which IP you are hosting your site on.

renonce · on June 21, 2022

You need to block 443 for any other IP than Cloudflare. The IP list can be found at https://www.cloudflare.com/ips/.

dingdongthe · on June 21, 2022

and/or use Authenticated Origin Pulls with a TLS client certificate.

renonce · on June 27, 2022

That would leak your IP nevertheless. People can figure out that you're serving a specific website by inspecting your certificate on handshake without actually connecting.

woojoo666 · on June 21, 2022

Wow lots of websites are affected, including Medium. The perils of centralization strike again. Though ironically, I noticed that the IPFS website uses cloudflare as well. The actual IPFS network is working just fine though, and I'm not aware of IPFS ever having any global outages. Though then again, I'm not aware of any on bittorrent either

capableweb · on June 21, 2022

The concept of "being down" doesn't really apply to protocols. IPFS/BitTorrent never being down is a bit like saying that TCP/HTTP has never been down. Individual servers/client can have connection issues, but obviously won't affect clients not connected to those, and is not because of the protocols themselves.

woojoo666 · on June 21, 2022

But the infrastructures those protocols provide (the IPFS network, torrent swarms) can be an alternative to Cloudflare. Which is why I brought it up

softawre · on June 21, 2022

Not to state the obvious, but... if a big centralized company built a Cloudflare for IPFS to make it easy for the masses to adopt, that company could go down just as easy as Cloudflare.

woojoo666 · on June 22, 2022

How so? Somebody links to a webpage, decentralized resolver converts it to an IPFS hash, which the client queries for any providers of that hash, and retrieves directly from them. No central authority necessary

plasma · on June 21, 2022

jgrahamc, just some feedback about trying to reach support:

1. I could see my site down, including cloudflare.com with nginx 500 errors, via Sydney AU

2. Logged in to dashboard (via Melbourne AU) that worked; and so was thinking it was an issue with Sydney Cloudflare My experience with Cloudflare has been in the past sometimes servers in some regions have issues and its a transient thing.

3. Status page showed no problems, so I went to "Contact support" and went around in circles (really frustrating) via the "Contact support" link moving me between Community forums, Support ticket, etc. I then see Chat is an option is available with a Business plan, so I upgrade to that, hoping for some real-time support to alert of the Sydney issue.

4. Return to the "Contact support" page after upgrading the plan, but the Chat option still not present on the support screen (and help articles say to return to support page and click "Chat" but it never shows up).

5. Come across https://community.cloudflare.com/t/cloudflare-for-teams-chat... searching for why I can't see Chat as an option on the support forum saying they're on paid plans with no chat support and its not showing up, so I just give up assuming its broken

6. Open HackerNews and see its at the top. A few moments later the status page reflects the outage.

I still can't see the Chat option so I've down-graded the plan again.

calyhre · on June 21, 2022

Their whole support experience is really not great. Used it a few times those last few years, and I rarely got out satisfied at all.

For example, they seems to have what I assume is a separated DB for CF users and CF support users, but with one shared login system. But if you end up updating your email on CF, it's not reflected on their support system and all your tickets are gonna be refused because of the email mismatch, completely disregarding the fact that you just logged in via your CF account. And no way to update it from the support part of course.

jgrahamc · on June 21, 2022

Thanks. I'll feed that back.

mwcampbell · on June 21, 2022

At times like this and the big Fastly outage roughly a year ago, choosing to host on a simple, independent bare-metal box doesn't seem like such a bad strategy (as long as one has backups for disaster recovery, of course). Sure, other things can cause downtime in that kind of infrastructure, but at least my service isn't likely to be taken offline by someone else's configuration error or deployment gone wrong.

shric · on June 21, 2022

Are there places to host an independent bare-metal box where the internet provider for that box is more reliable than cloudflare?

jwr · on June 21, 2022

I have been running my business on Hetzner bare-metal servers for the last 7 years. During that time there were several brief network outages, on the order of minutes. I think one network outage was 30 minutes. Other than that, no problems.

Given the price and performance difference between bare-metal and everything else, I am puzzled as to why small businesses that do not need scalability do not go with bare metal. And given the speeds of todays hardware, if you are not doing something stupid and you have a B2B SaaS, it's really difficult to need "scalability" beyond several bare-metal servers.

To be clear, I do not consider my bare-metal boxes "reliable", I have a multi-server setup managed by ansible, with a distributed database, and I can take a single-node failure without problems. I also have a staging setup that can be converted to production quickly, and a terraform setup that can quickly spin up a Digital Ocean cluster if needed.

HelloNurse · on June 21, 2022

Your box running your web server is far less complicated than using a CDN and worrying about countless additional points of failure. Network problems are only a minor risk.

ranguna · on June 21, 2022

And where will I host my box? I'm my apartment?

My Internet goes down at least twice a year and my electricity goes down even more, specially in the winter. So no, this is not more reliable than cloudflare.

HelloNurse · on June 21, 2022

In a discussion about using a CDN, it's implicit that it represents an addition to "professional" hosting with servers in a well managed data center that has, at least, redundant high-bandwidth network connections, not to a domestic network connection.

Note that your home network could be good enough for a personal web site that nobody pays you to respect a SLA on.

ranguna · on June 21, 2022

Soooo... Cloudflare?

mwcampbell · on June 21, 2022

No, we're talking about a colocation provider, or a leased dedicated server provider. I went with OVHcloud US for my latest deployment. HN is at m5hosting.com.

ranguna · on June 22, 2022

OVH had some server fires that caused some amount of user downtime. I'm not really sure how that's gonna help.

Unless you have fallback with multi cloud deployments.

zahllos · on June 21, 2022

> And where will I host my box? I'm my apartment?

You seem to imply that the options are only cloudflare or your apartment. This simply isn't true: there are a plethora of companies that will lease you a dedicated box of some Us in one of their racks, as the sibling commenter replies. Alternatively, you can search for co-location services. Options range from 1U/2U co-location, to half rack units, to full racks, to dedicated areas of the datacentre ranging from cages to whole rooms (I've been in at least one datacentre where an entire room was under separate access control and leased to one customer only).

Usually datacentres are located quite strategically. For example the location of many datacentres in Zürich corresponds with two separate power supply grids that meet (so they can pull from both).

Some of the companies involved are resellers and don't actually operate the datacentres they use. Others actually do. Usually the service is more or less the same, from the point of view of renting a 1U, or co-locating one.

If you want reliability features of a datacentre, e.g. for your office services, but might move, you may find your local city surprising. In Manchester, UK, there's a large amount of dark fibre under the city (fibre that is laid, but not in use), owned by some of the DC companies. Sometimes you can connect your office to said datacentre via dedicated fibre.

Semaphor · on June 21, 2022

We’ve been on Hetzner for several years now. So far the only outages we had were from us moving servers (yeah, we don’t have high availability or load balancing, just a single beefy dedicated server). So, yes?

jonatron · on June 21, 2022

Last company I worked for, we had many Hetzner servers. We had many drive failures and CPU fan failures. It's fine if you can deal with a relatively high chance of hardware failure.

js4ever · on June 21, 2022

According to my monitoring, yes and by a large difference

Santosh83 · on June 21, 2022

Perhaps not, but those who want to avoid Cloudflare for technical or idealogical reasons won't realistically expect identical performance from smaller alternatives. Same as using Linux. People use it knowing fully well it may not support the latest & greatest consumer gadgets like Windows, but unless people use alternatives despite minor downsides, we shouldn't be distressed when we eventually reach a point of global near-monopoly.

guerrilla · on June 21, 2022

> People use it knowing fully well it may not support the latest & greatest consumer gadgets like Windows,

Like what exactly?

gfosco · on June 21, 2022

Any local data center.

buro9 · on June 21, 2022

I'm on Linode.

Linode is down because Cloudflare is down.

Can't login to their control panel, etc.

You'd need to go fully independent and roll your own, with zero dependencies, to really make this work.

stephenr · on June 21, 2022

Linode control panel being down doesn't mean that the servers they host are down.

For *most* web facing apps/sites, a site hosted on e.g. Linode like this, but not using Cloudflare, would be unaffected by such an outage.

Operyl · on June 21, 2022

I guess it depends. If you scale up and down via the API and can’t access the API .. you have a pretty good chance of a down scenario if you had a traffic spike you can’t scale for.

iepathos · on June 21, 2022

Yeah, they'd also be dependent on their ISP still if they're "fully independent". Good luck dealing with massive traffic spikes on a single bare-metal box and good luck maintaining a similar uptime to cloudflare's 98.84% uptime lol

toast0 · on June 21, 2022

Most (or at least many) colo facilities have multiple transit ISPs, some are big enough to have decent peering as well.

I'm assuming 98.84% uptime is a joke? Less than 4+ days of downtime is something I could manage from a home connection most years, if I had a static IP.

8n4vidtmkvmk · on June 21, 2022

98.84? that a real number? that's pretty low

feross · on June 21, 2022

Does Linode really use Cloudflare? They were bought by Akamai earlier this year.

buro9 · on June 21, 2022

Feel free to try https://login.linode.com/login

Whilst the incident is happening you'll see the Cloudflare 522 page.

archon810 · on June 21, 2022

Interestingly enough, I'm already logged in, and the homepage as well as the rest of the Linode dashboard are operational. It seems only the login page is down.

fomine3 · on June 21, 2022

Google's Firebase uses Fastly even after acquisition. It's possible Linode to continue using Cloudflare.

rootsu · on June 21, 2022

Isn't linode owned by Akamai now?

rrix2 · on June 21, 2022

imo it's messed up that they haven't rebuilt their network infrastructure in the four months since they got acquired

dx034 · on June 21, 2022

Today's actually the first time my site is down and it's Cloudflare's fault instead of my own. Obviously this outage is huge, but so far I've been really impressed with their reliability.

rtpg · on June 21, 2022

What is your setup to where you are isolated from "other person making a mistake"? Even if you're a box in a colocated datacenter you're still able to get knocked off the net from some maintenance on the surrounding pipes. Hell, hosting your own box doesn't cause Comcast DNS issues to not knock off a bunch of people either.

I do think there is some holistic overview of hosting stuff on the internet, where you could label each extra actor that can break things, mitigation strategies, and costs of such. Someone better than me would be able to place relative risk (and I think in that model laying out various provider uptimes/issues would be great!) and offer a smart way of dealing with the buy vs. build question on this.

toast0 · on June 21, 2022

If you need fault tolerance/isolation, you want to have a second box in a different colo (preferably in a different city; a different coast/continent if it's important).

If you can live with dns round robin between the two, then you can easily host the DNS with multiple providers and avoid SPOF (could maybe host it on the two boxes you already have, too). You're still at risk of domain registry/registrar failures, and failures of their tld nameservers (very rare for well run tlds) and the root servers (not sure if they ever had a widespread failure). And of course, simultaneous failure of both locations isn't impossible, just less likely.

On Comcast DNS failures... Most of the recent ones I've heard of manifested as users on Comcast can't resolve X, but were really X had bad DNSSEC records and Comcast DNS refused to return records that weren't signed properly. It's easy to avoid that by not using DNSSEC.

In the general case of working despite bad ISP dns, you can't do much (anything?) for web browsers, but if you build apps, you can hard code fallback IPs for when DNS doesn't work... But you need to have IPs that stick around for the lifetime of your app downloads.

mwcampbell · on June 21, 2022

Fair point. Still, based on my anecdotal experience using leased dedicated servers, mistakes at that networking layer seem to happen less often than mistakes that take AWS us-east-1 or one of the big CDNs offline.

gtirloni · on June 21, 2022

> mistakes at that networking layer seem to happen less often than mistakes that take AWS us-east-1 or one of the big CDNs offline

Not in my experience. Things break all the time, the difference is nobody notices because either the colocating ISP is too small or we are.

rtpg · on June 21, 2022

It does feel like more "hosted" environments are trying to do more fancy stuff inside the network, so have more failure cases. Or perhaps services that do a lot of things, even if you end up just using simple server components.

I still have a fun memory of half of IBM Cloud's servers falling over, meaning that our production app was luckily still up but our staging server fell over. I could get to their website, but their login stuff was all messed up. I believe that one was also a "routing stuff got messed up" issue....

mercer · on June 21, 2022

My domains with DNSimple, and my servers are on Hetzner. There should be no dependency on CloudFlare, and yet they are down too.

danachow · on June 21, 2022

Puhleeze. DNSimple uses Cloudflare DNS firewall product - this is not a secret. If you don’t like it use an alternative DNS provider, there are plenty.

sph · on June 21, 2022

Good to see how decentralised the Internet truly is.

sadhorse · on June 21, 2022

Be enlightened by the truth.

mercer · on June 21, 2022

Shouldn't you hit me with a stick first, or break a pot?

gtirloni · on June 21, 2022

> at least my service isn't likely to be taken offline by someone else's configuration error or deployment gone wrong.

It's likely to be taken offline by yourself more often than not though.

gonzo41 · on June 21, 2022

Yeah but if the internet is widely down, the network effect is that people probably aren't using your site because everything else is down and they'll wait for confirmation from sites like facebook and their internet banking and netflix to make sure things are back to normal.

Tiddles-the2nd · on June 21, 2022

Not a useful comment 20 minutes into an outage.

The internet is an interconnected web of dependencies. Unless you are Cloudflare/Akamai/Amazon/Google there is no self-hosted anymore.

You can host in your basement if you like but you're still dependent on your ISP.

Santosh83 · on June 21, 2022

> The internet is an interconnected web of dependencies.

Ironically this is exactly what increasingly centralisation weakens. The huge cloud providers have eroded "an interconnected web of dependencies" into few huge server farms servicing everyone else.

andrewnyr · on June 21, 2022

Unfortunately, bad actors have made centralisation necessary for many sites to survive against even low-traffic attacks

jayhoon · on June 21, 2022

CF's website is down as well. The CF Status page [0] says everything is working, though.

[0] https://www.cloudflarestatus.com/

jimmydorry · on June 21, 2022

Ironically, the cloudflare.com site is a more reliable indicator. If it doesn't load, then cloudflare is down.

Their status page is a joke, likely crippled to reduce legal liability, but at this point it's just an outright misrepresentation.

Shank · on June 21, 2022

> Their status page is a joke, likely crippled to reduce legal liability, but at this point it's just an outright misrepresentation.

It's just Atlassian Statuspage, which is a manually-updated incident response system. Unlike AWS, Cloudflare actually makes an effort to update it fairly quickly, but it can still be slow-to-update when something is immediately wrong.

jimmydorry · on June 21, 2022

"Fairly quickly" meaning something like 30 mins to get a "we have identified there is an issue".

For their status page to be broken down into individual services and regions, I get the impression of some kind of automated monitoring.

The only service I saw get marked non-operational was their API, while their site and dashboard were not available at all yet marked as operational.

chippiewill · on June 21, 2022

> Their status page is a joke, likely crippled to reduce legal liability, but at this point it's just an outright misrepresentation.

It's fairly standard practice these days for status pages to be manually updated. The difficulty with having them be automatically updated is that for it to be useful that system needs to have a greater reliability than the thing it's monitoring. The signal to noise ratio is otherwise a bit ridiculous.

jimmydorry · on June 21, 2022

Reddit Status [1] isn't perfect, but it's miles better than a static page saying everything is operational while everything is in-fact inaccessible. That it took 30 minutes for the page saying everything is fine to be updated with a warning that there is an issue (while almost all of the services and regions remained marked as operational) only makes the ineffectiveness of that page more blatant.

It goes without saying that the monitoring system must be separate from what it's monitoring and must be more reliable. Compared to running a CDN for half of the internet, automated monitoring is table-stakes.

[1] https://www.redditstatus.com/

mr-karan · on June 21, 2022

They've just updated: https://www.cloudflarestatus.com/incidents/xvs51y9qs9dj

tebbers · on June 21, 2022

Doesn't load for me.

skunkworker · on June 21, 2022

It just updated to show Red a few seconds ago.

Edit: at least it showed there was a problem within <10 minutes, unlike other status pages that sometimes are green the entire time.

prawn · on June 21, 2022

I don't think I've ever seen a status page report an issue at the time the issue was occurring. Seems a bit pointless.

raverbashing · on June 21, 2022

Because most of them are manually updated, and not pulling directly from an uptime system (at least for incidents, etc)

(Also it doesn't help that uptime monitoring systems are usually stupid and love triggering on false positives)

hatthew · on June 21, 2022

In my experience (with the sites I visit and care about), the status pages are usually pretty accurate and helpful.

csomar · on June 21, 2022

A bit too early imo. It took it 3-5 minutes to start displaying errors.

donkarma · on June 21, 2022

Do any of these corporate status pages ever work? AWS doesn't and neither does CF. This website is a more useful status page

irjustin · on June 21, 2022

It's now just updated to show incident

dangrossman · on June 21, 2022

I wasted a bunch of time debugging the HTTP 500 errors on my site before I realized everything is 100% OK on my end, and that it's Cloudflare returning the error not my servers.

RowanH · on June 21, 2022

Ditto - I'm sitting here, wtf I'm not running Nginx on my blog, but I'm getting an Nginx response, hit IP directly....oooh.... right that doesn't make sense it's working fine. Cloudflare can't be down, that's next to, wait, status page (to their credit it's got a status note). HN here we go...