Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Cloudflare had a partial outage (cloudflare.com)
723 points by rkwasny on June 21, 2022 | hide | past | favorite | 425 comments


Yes, not worldwide but a lot of places. Problem with our backbone. We know what. Rollbacks etc. happening. Bring it back up in chunks.

Should be back up everywhere.


We are not using cloud flare. But our domain is also not accessible. We are using digital ocean's DNS service for propagating our IP. Does the DigitalOcean's DNS service depend on Cloudflare service?



That link isn't accessible from where I am right now.

Alanis Morissette agrees that this is ironic.


Yes, this is actual irony, unlike “rain on your wedding day.”


but in regards to singing a whole song about things that are not irony being irony: isn't it ironic?

don't you think?


Yes! IRL Alanis is smart and eloquent, and those that think that song's lyrics are evidence to the contrary are missing the joke.


Moved all my domains to DO specifically to stop donating traffic data to Cloudflare. Absolutely stunned I didn't notice this earlier.

It's bullshit all the way down.

Are there any companies left offering free DNS usable from Terraform that aren't part of the "Internet Five Eyes"?

edit: looks like Linode may be the next best 'not terrible' option


I have bad news for you, Linode's authoritative DNS service also uses Cloudflare DNS Firewall.

  $ dig +short ns1.digitalocean.com aaaa
  2400:cb00:2049:1::adf5:3a33
  $ dig +short ns1.linode.com aaaa
  2400:cb00:2049:1::a29f:1a63


So much for whoever downthread said "not all websites use cloud flare"


Same here, except DNSimple.


Yes.


CF SRE team need to rethink their published SLA of 100%. This is not reasonable. https://www.cloudflare.com/business-sla/


An SLA of 100% simply means you agree to compensate your customers (as specified, usually with credit) if your service is down at all, nothing more.

Also, SRE here but not for Cloudflare -- I've never seen SREs directly involved in externally published SLAs, they usually come from legal. We deal with SLOs on more fine grained SLIs than overall uptime


> SLA […] SRE […] SLOs […] SLIs

I made it to SLA (which I believe stands for service level agreement). What do the other abbreviations stand for?


SRE - Site Reliability Engineer (a term Google came up with that's been adopted elsewhere) Google defined it approximately as what happens when you apply software engineering practices to what was traditionally an operations function.

SLO - Service Level Objective - the service level you strive for. If it's higher you have room for experimentation, etc.

SLI - Service Level Indicator - the actual metric(s) you use to measure a service level (latency, error rate, throughput, etc.)


SLA - correct. That’s the contract between the operator and the users which describes the penalties for not meeting agreed-upon SLO

SLO - service level objective, the stated availability (or latency or durability etc) of the service. Usually expressed as a value over a period of time (e.g 99.9% availability as measured over a moving 30 average). The SLO is measured by the SLI.

SLI - service level indicator. Simply, the direct measurement of the service (i.e metrics)

SRE - Site Reliability Engineer, usually a member of a team who is responsible for the continued availability of the service and the poor sap who gets paged when it breaches SLO or has an outage or other impactful event.


SRE: Site reliability engineer

SLI: Service level indicator (Metric to measure the health of a service. For example successful requests per interval / total requests per interval.)

SLO: Service level objective (what performance you expect eg. the previously mentioned SLI is >= 99.5%)

SLA: Servicelevel agreement (legal agreement that defines what happens if a SLO is not met)


Yep, I'd promise 99.95 at a stretch, never 100%.

They are not being honest with themselves here


I'm not sure you and your parent understand what an SLA means. It's an agreement that, when broken, incurs a penalty.

They aren't saying they guarantee 100% uptime. They're saying they'll pay you for any downtime. It's literally the 3rd paragraph:

> 1.2 Penalties. If the Service fails to meet the above service level, the Customer will receive a credit equal to the result of the Service Credit calculation in Section 6 of this SLA.

(Most people I know consider them meaningless marketing BS that's really just meant to trick people or satisfy some make-work checkbox)


> They aren't saying they guarantee 100% uptime

> Cloudflare ("Company") commits to provide a level of service for Business Customers demonstrating: [...] 100% Uptime. The Service will serve Customer Content 100% of the time without qualification.

This is a legal commitment to provide 100% uptime. They are guaranteeing 100% uptime and defining penalties for failing to meet that guarantee. The fact that a penalty is defined does not stop it from being a guarantee.


No, this SLA is a legal commitment to give you credits when Service uptime falls below a certain threshold. The threshold could be anything - 99%, 50%, 100%, etc. Importantly, Cloudflare is not under a legal obligation to provide the Service at or above the agreed threshold, it's under a legal obligation to give you Credits when the Service uptime is below that threshold.

"Service Credits are Customer’s sole and exclusive remedy for any violation of this SLA."


> This is a legal commitment to provide 100% uptime. They are guaranteeing 100% uptime

I don't think you know what a guarantee is.

For example when you buy a new car you get a guarantee that it won't break down. Are they claiming it won't break down? No, of course not. What a guarantee means is that they'll fix it or compensate you if it does.


Looks like it supports parent opinion: commit - bind to a certain course if policy. It's legal obligation, not a statement about guarantees in physical world (like "this alloy won't melt below t°C")


I completely can understand your emotion. But even the top CDNs can have outages of some form or the other. If site uptime is important, check out https://www.cdnreserve.com/ - it's built on the design principle that the likelihood of two separate platforms having an outage at the same time is close to zero.


that just means they're willing to pay for the marketing number, not that they will actually achieve it


Thanks for being here with timely updates! I knew to come to Hacker News once the alert triggered and a few users started complaining.


Good thing HN doesn’t use them then!


Enterprise support has been useless as is the status page. Got more info here


The status page shows about as much information as the post here.


It didn’t at the time.

Their phone line kept cutting us off and then the people there were not too helpful.


Status page was down for me in Sydney


Just wanted to also reiterate how thankful I am that you took your time to let us know. It speaks volumes.


Agreed- couldn’t figure out what was going on… finally checked here and - ah now I can sleep


Cloudflare going down is one of the things which keeps me awake, My main complaint about Cloudflare is that they are very good at everything they offer that we've become reliant on them for everything.


Happens to everybody sometime. AWS seemed to have a major outage a couple of times a year for a while there.


Exactly but the likelihood of two networks going down at the same time is close to Zero. Check out: https://www.cdnreserve.com/ We rolled it out to complement top CDNs.


True, They're usually due to issues with BGP routes.

It's common to see CF being the DNS/CDN for applications across AWS, GCP, Azure etc. So perhaps CF being down affects more applications than individual cloud platforms?


"do no evil" springs to mind -- once burned, etc.


Yeah, What's up with the competition to Cloudflare? What's the real barrier for entry?

It's not infrastructure anymore, As there is a new PaaS startup every week offering distributed hosting and So why bundling in DNS, DDOS detection+mitigation, cloud workers... with it is so hard?


This is just my take, but Cloudflare looks to be building a "moat" to make entry hard. This is built around two things: 1. economies of scale, 2. a network effect.

-

https://en.wikipedia.org/wiki/Economies_of_scale

As Cloudflare gets bigger, they can provide services more cheaply. This is because (a) they can more fully utilise their data centres and other physical capital investments, (b) they can divide their fixed software costs over more users and (c) they get process efficiencies and discounts with scale.

A new entrant will struggle to match cost unless they're able to obtain similar scale. The bigger Cloudflare gets, the bigger the scale that a new entrant needs to hit before they can match them on cost.

-

https://en.wikipedia.org/wiki/Network_effect

Second they're aiming to build a network effect through having huge number of locations. The more locations, the more appealing to new customers as they can be close to more users. A competitor will have to build a similar number of locations to match Cloudflare's proposition.

A new entrant cannot provide as much value, and therefore cannot charge as high a price, without building a similar sized network. This again requires the entrant to invest heavily before they can charge a similar price.

-

The combination of these two things mean that when Cloudflare is operating at a large scale with a large network it can offer a more valuable service (and charge a higher price) than a new entrant, and earn more profit because it can operate at a lower cost.

Also, Cloudflare has the option of lowering its price and still being profitable due to lower costs at its scale, so it can deter entrants from trying to compete by the threat of being able to lower prices below what is profitable for new entrants.

The only players who can compete may be those who already have comparable size - Amazon, Google, Microsoft, Facebook, CDNs, etc, since they will already have addressed the issues of scale and network effects. However, they may not want to cannibalise their existing markets. It will be hard for other new entrants to compete.


There are many noteworthy players - Akamai, Fastly etc., and Edge plaoviders like ourselves (Zycada) who complement top CDNs like Akamai, Cloudflare, Fastly.


The main difference between Cloudflare and the others mentioned is the price; One can start with CF for a side project for free and continue to use it free till it becomes a viable startup.

Others at best offer a limited trial plan, But most are just 'Speak to expert/ Contact us' for pricing which means haggling with a sales rep while we can just build things. Even the paid plans of CF is reasonable when compared with others with better features.


It's hard at scale.


Isn't everything harder at scale? That's not a barrier for entry though.


Building a CDN absolutely is hard to do at scale.

You can't build a Cloudflare competitor in AWS/Azure/Linode/DO/etc. You need your own data centers. Multiple of them across the country, ideally around the world if you want to serve the whole world.

This is insanely hard.


for a global cdn... it quite literally is


Point taken, For a Global CDN - scale is the point of entry.


hats off to you sir - would not want to be in your shoes right now but thanks for the updates


I don't have shoes on.


That's the point, two people can't fit in them.


In tumultuous times (fix wasn't implemented at this point), the Cloudflare CTO still has time for some wit. Love it!


It's back wooo!

And I'm saying this for the last time: no one type google into google!


Thanks for the update, just curious if we will get a report on what happened ? In as much detail as can be of course - morbid curiosity mainly. I love the post reports these events usually bring.


Cloudflare are usually pretty good with posting post mortems. https://blog.cloudflare.com/tag/postmortem/



Thank you! This little comment just saved me an hour of investigation. Good luck for getting the system back up asap.


All my sites behind Cloudflare had come back up, and have now gone back to serving 500 errors.

The Cloudflare Dashboard is also no longer fully loading.


Where are you located?


Raleigh, NC, USA

Sites are gradually reappearing as I type this. Some of my sites, and doordash.com, were returning 500 errors again just a minute ago. They just came back up, followed by the CF dashboard loading again.


I'm from Turkey, and I also have been seeing intermittent errors for the last 10 minutes. Seems ok now.


Should probably do slower roulouts next time.


Thanks for letting us know


Thank you for your very fast support!


Which undersea cable was cut ;)


Can't really roll back that change


Sure you can, but physically rolling back more cable takes a bit longer :P


DR much ?


I don't know what that means.


DR means "disaster recovery," it is a formal plan used to respond to and mitigate potential risks to the business. Things like having a communications plan for an incident, or a backup office outside of your main office natural disaster zone.


Ah. Just one more reason I hate acronyms. They obscure what the person is trying to say.


I really dislike that they are editing their status messages.

Entry[1] dated "Jun 21, 2022 - 06:43 UTC" has been edited to include more detail after they posted another entry at 06:57 UTC. There seems to be no indication that the message has been altered.

Currently text on the status page may suggest that they identified the problem immediately but it took about 15 minutes. Previously there was a text stating that customers should expect update within 15 minutes. Next message was posted 14 minutes after that but previous message was altered later and nothing indicates this.

Cloudflare, not cool.

[1] https://www.cloudflarestatus.com/incidents/xvs51y9qs9dj


Strongly agree. Such whitewashing puts all previous incident reports in doubt - can I trust CF summaries of outages, or did they rewrite that history too.


I understand your point, but CloudFlare generally is very transparent, including root cause analysis and their CTO reaching out directly. It could also be a mistake or not so well thought about instead of assuming bad intentions.


jgrahamc is in this thread. if he wants to, he may say this was a mistake and swear they won't do this anymore

he can also add an [edited] on that entry, among other things


Kind of ironic, there was a big "cloudflare is bad and a central point of failure" article on the front page just a couple days ago.

Found it, https://news.ycombinator.com/item?id=31801947

edit: Not that I necessarily agree with the article even in light of there being an outage, cloudflare has been pretty good for us. Just thought it was interesting.


Mid 2000s one computer science professor said that internet capacity is not going to match the amount of traffic. Everybody laughed. The wold was full of dark fiber after dot-com-bust.

But if you look at his math, it was correct. The era of client-server connected heterogeneous distributed Internet is just a side show today.

The solution has been centralization (clarification: big companies run their own caches and networks near users). and growth of caches and then Cloudflare taking care of the rest.


> The solution has been centralization and growth of caches.

Centralization and growth of caches are on their face contradictory.

Perhaps you mean organizational centralization but that really has nothing to do with internet capacity demands. Your hot take isn’t so brilliant. What’s fundamentally wrong with edge distribution?


> Perhaps you mean organizational centralization

Yes. This is exactly what I mean. Big companies run their own caches and networks near users. Cloudflare takes care of the rest.

>What’s fundamentally wrong with edge distribution?

You incorrectly assume judgement from my part. My point is that things have changed. New problems arise in solution to old problems. Fragility from small number of organizations running their caches to solve bandwidth problem.


Also ironic so many are blindly helping create "the great firewall of the USA" because it's easy and cheap.


> "the great firewall of the USA"

Not only the US.


Cloudflare usually works great. That doesn't mean they aren't a central point of failure.


But web2 is going (really) great once you depend on Cloudflare. And it is certainly not re-centralizing the whole internet with a provider that is a single point of failure. /s


I can't tell if you're being cynical or actually mean it.

Care to clarify so I could take the mandatory contrarian approach?


/s means "end of sarcasm".


Ah. Obviously.

So yeah, how is a CDN centralizing your infra? You could just have your CNAMEs point to a different provider or directly to your gateways. Or you could even go down the multi CDN path, and have someone like ns1 automatically redirect your CNAMEs to an alternate CDN on a per-geo basis to overcome local failures.

It's just another SaaS component in your system. You could self host if you're willing to take on the ownership challenge, and at certain scale it would even be more cost effective.


Not the OP, but CloudFlare is not only a CDN but does everything you mentioned in your comment for you, so it's the load balancer and the DNS as well. When it goes down everything goes down.

Technically you could set up a separate DNS/failover somewhere else and use a backup reverse proxy/TLS terminator/CDN SaaS similar to CloudFlare, but then that somewhere else will be your point of failure.


Brace yourself for a lot more of those kinds of articles in the next few days.


I welcome them. Perhaps these outages stop all people rushing to host their single HTML page blog through them.


unfortunately their tentacles have penetrated deep into the internet: https://news.ycombinator.com/item?id=31820929


With titles like "Cloudflare considered bad"


It's time to start discussing a fail-open option for us CF users. Most of my sites are using CF for global performance rather than DDoS protection and security. I'd be fine with them changing DNS to point to the origin (or any other user defined IPs) in case of issues (even if it would take hours to return to normal).

This is also important for countries with limited connectivity to the Internet, if the PoP in that country looses it's connection back to CF it shuts everything down, so even if the origin is in the next rack over from the PoP, it's un-reachable.


You can implement your own DNS server that CNAMEs to Cloudflare and falls back to origin IP when there is a problem with Cloudflare. I think a downstream Cloudflare provider could provide such services if they desire.


Outages occur to the best of CDNs. But the likelihood of two CDNs suffering an outage at the same time is close to zero. Check out www.cdnreserve.com.


Last time I checked that was limited to the enterprise plan.


Of course it is... Typical Cloudflare.


Imagine having to pay for a service


Cloudflare's pricing has issues.

My company was paying $20 a month. We were heavily depended on CF, we'd have been happy to pay more.

But... the one feature we wanted was for our accounts team to have their own login so the ops team didn't have to download invoices every month. Nope, that one feature required an enterprise plan which they quoted $4,000 a month for.


Oh dear your poor Ops team, they had to download a few invoices to save $4k and every month! My heart bleeds for them.


Companies where you have to log in and download invoices are the worst. If there's a viable alternative to their products I switch immediately. You make it seem like it's not a big deal, but a reasonably sized startup has dozens of service providers. Should we pay every little service $4k/mo just to save the communications and context switching overhead?


You jest, but imagine how time consuming it would be if every app we used was setup like CloudFlare, where only the one super admin can deal with billing.

Also in these days of remote work, it's a problem if the credit card details need updating - either you have to give the company card details over a slack call, or you need to give a card holder your root password.


I've been impressed with Cloudflare's (non enterprise) value thus far, what Bandwidth & Users did the $4k quote cover?


Is there no api for that? It would even save the manual download effort.


Just write a script?


Imagine having to pay for a service... at enterprise prices.

This is not really an extra or a nice-to-have, it might be more a hostage situation.


Imagine already paying for a service and then having someone snark at you for wanting things for free.

I tried to exercise some restraint this time, but screw it. Here's another rant:

Beware of Cloudflare's tactic of luring people in to their CDN product with "free" bandwidth, and then locking useful features arbitrarily behind what I can only imagine is a thousands of dollars per month enterprise plan. Just look at their cache-purging page for a super obvious example of this (there are plenty more, way too many to list), everything other than basic purge by URL is enterprise only: https://developers.cloudflare.com/cache/how-to/purge-cache/

These days Cloudflare is literally my last choice for a CDN for my new projects. My new go-to is bunny.net, who charges a reasonable usage-based fee for bandwidth and gives you unfettered access to all the features they've built (and doesn't route your users to farther/closer nodes based on how much you pay: https://cloudflare-test.judge.sh/). Though I'd even reach for Cloudfront with their expensive bandwidth costs these days, because at least their pricing is transparent and scales smoothly with usage, and they don't arbitrarily cut you off from useful features that you might not know you need yet.

Even their bandwidth might not really be "free", since I've heard if you actually use any significant amount, the sales people will come knocking on your door to coerce you to get on the same enterprise plan or have your site taken down.


Can I ask out of interest (most of my projects are high perf/low traffic) what kind of traffic you are dealing with at the point you decide you need a CDN?


I don't really use a CDN to manage high traffic volumes. It's more to provide a better, lower-latency experience for my users regardless of where they access my apps from.


You’d need to have TLS certs on origin ready to go for this scenario to work. Additionally, you’d need to make sure to test it and ensure that there’s nothing wrong in this event.

On top of that, depending on your scale, can you take all the traffic on origin that Cloudflare currently offloads?


No issue for me. This is obviously a power-user option. It's kind-of implemented for Enterprise users were you don't have to let CF have full control over the domain.


> I'd be fine with them changing

This would be best implemented by you. If the point is to avoid CF as a PoF, why would you rely on their infra to fail safe when something breaks?


Wouldn't this expose the origin IPs to attack?


Yes, but he says in his second sentence that he doesn’t mind and mainly uses CF for performance.


Plenty of commenters also seem to miss the most important word in the first sentence: "option".


I'm talking about this as an option for users like me, that don't have an attack surface, but need the global performance gains of CF.


Probably not many users who need the performance and can handle unexpected failover. There would also be the issue of setting the policy defaults effectively. Most users wouldn’t benefit from this footgun.

If you’re serious, you could probably automate this right now with your DNS provider and uptime monitoring.


I'm so serious that I already have failover after 1 hour at the registrar level, but those changes are not immediate and can take up to 24h to roll-in and roll-back due to DNS propagation and caching.


We seem to have hit the nesting maximum 1123581321, but to your point "There is no immediate option with DNS changes."

There's a huge difference in changing nameservers for a domain and simply changing host records.


This is a 7th level comment. Synu below[0] has a 9th level comment, and I've seen nested comments go quite a bit further.

[0] https://news.ycombinator.com/item?id=31821497


I think there is no nesting maximum (or if there is, it's much bigger than this). There's a limit which stops you replying to a comment immediately, to prevent super long quick-fire arguments.


Ah, sorry, misunderstood you. You can’t rely on them to change their host records when they’re down.

If you want CDN-independent automatic failover, look into anycast with two providers. If one of them is Cloudflare, use the tier that lets you manage your DNS elsewhere.


There is no immediate option with DNS changes. CF can’t immediately remove their IP from the route. Sounds like you’ve solved your problem in the sense that you have an automatic failover, though, which is good.


Typically CF _is_ the DNS provider though, right?


Only up until the $200/mo tier. This kind of feature would be locked at that level anyway.

As I said to the OP elsewhere, he should be doing something like anycast to multiple CDNs if this is critical.


Yes, but I'd wager that most sites experience cloudflare outages more often than they experience bona fide attacks.


You say that, but there's tons of automated attempts doing the rounds on everything directly connected to the internet; centralized providers like Cloudflare can detect and prevent these patterns, whereas you need to be on the ball yourself if you have a service directly open to the internet. Exploits are exploited quickly, and while I make no assumptions about your particular website / application, a lot cannot push an update on short notice.


You would lose your wager.

DDoS attacks are extremely common at just about any scale, even if you only have a few thousand users.

Cloudflare going down like this? Actually first time I remember it happening. There’s been downtime before but nothing so major.


The origin IP's are already open to attack for most users.

It's trivial to scan the whole IPv4 internet to find out which IP you are hosting your site on.


You need to block 443 for any other IP than Cloudflare. The IP list can be found at https://www.cloudflare.com/ips/.


and/or use Authenticated Origin Pulls with a TLS client certificate.


That would leak your IP nevertheless. People can figure out that you're serving a specific website by inspecting your certificate on handshake without actually connecting.


Wow lots of websites are affected, including Medium. The perils of centralization strike again. Though ironically, I noticed that the IPFS website uses cloudflare as well. The actual IPFS network is working just fine though, and I'm not aware of IPFS ever having any global outages. Though then again, I'm not aware of any on bittorrent either


The concept of "being down" doesn't really apply to protocols. IPFS/BitTorrent never being down is a bit like saying that TCP/HTTP has never been down. Individual servers/client can have connection issues, but obviously won't affect clients not connected to those, and is not because of the protocols themselves.


But the infrastructures those protocols provide (the IPFS network, torrent swarms) can be an alternative to Cloudflare. Which is why I brought it up


Not to state the obvious, but... if a big centralized company built a Cloudflare for IPFS to make it easy for the masses to adopt, that company could go down just as easy as Cloudflare.


How so? Somebody links to a webpage, decentralized resolver converts it to an IPFS hash, which the client queries for any providers of that hash, and retrieves directly from them. No central authority necessary


jgrahamc, just some feedback about trying to reach support:

1. I could see my site down, including cloudflare.com with nginx 500 errors, via Sydney AU

2. Logged in to dashboard (via Melbourne AU) that worked; and so was thinking it was an issue with Sydney Cloudflare My experience with Cloudflare has been in the past sometimes servers in some regions have issues and its a transient thing.

3. Status page showed no problems, so I went to "Contact support" and went around in circles (really frustrating) via the "Contact support" link moving me between Community forums, Support ticket, etc. I then see Chat is an option is available with a Business plan, so I upgrade to that, hoping for some real-time support to alert of the Sydney issue.

4. Return to the "Contact support" page after upgrading the plan, but the Chat option still not present on the support screen (and help articles say to return to support page and click "Chat" but it never shows up).

5. Come across https://community.cloudflare.com/t/cloudflare-for-teams-chat... searching for why I can't see Chat as an option on the support forum saying they're on paid plans with no chat support and its not showing up, so I just give up assuming its broken

6. Open HackerNews and see its at the top. A few moments later the status page reflects the outage.

I still can't see the Chat option so I've down-graded the plan again.


Their whole support experience is really not great. Used it a few times those last few years, and I rarely got out satisfied at all.

For example, they seems to have what I assume is a separated DB for CF users and CF support users, but with one shared login system. But if you end up updating your email on CF, it's not reflected on their support system and all your tickets are gonna be refused because of the email mismatch, completely disregarding the fact that you just logged in via your CF account. And no way to update it from the support part of course.


Thanks. I'll feed that back.


At times like this and the big Fastly outage roughly a year ago, choosing to host on a simple, independent bare-metal box doesn't seem like such a bad strategy (as long as one has backups for disaster recovery, of course). Sure, other things can cause downtime in that kind of infrastructure, but at least my service isn't likely to be taken offline by someone else's configuration error or deployment gone wrong.


Are there places to host an independent bare-metal box where the internet provider for that box is more reliable than cloudflare?


I have been running my business on Hetzner bare-metal servers for the last 7 years. During that time there were several brief network outages, on the order of minutes. I think one network outage was 30 minutes. Other than that, no problems.

Given the price and performance difference between bare-metal and everything else, I am puzzled as to why small businesses that do not need scalability do not go with bare metal. And given the speeds of todays hardware, if you are not doing something stupid and you have a B2B SaaS, it's really difficult to need "scalability" beyond several bare-metal servers.

To be clear, I do not consider my bare-metal boxes "reliable", I have a multi-server setup managed by ansible, with a distributed database, and I can take a single-node failure without problems. I also have a staging setup that can be converted to production quickly, and a terraform setup that can quickly spin up a Digital Ocean cluster if needed.


Your box running your web server is far less complicated than using a CDN and worrying about countless additional points of failure. Network problems are only a minor risk.


And where will I host my box? I'm my apartment?

My Internet goes down at least twice a year and my electricity goes down even more, specially in the winter. So no, this is not more reliable than cloudflare.


In a discussion about using a CDN, it's implicit that it represents an addition to "professional" hosting with servers in a well managed data center that has, at least, redundant high-bandwidth network connections, not to a domestic network connection.

Note that your home network could be good enough for a personal web site that nobody pays you to respect a SLA on.


Soooo... Cloudflare?


No, we're talking about a colocation provider, or a leased dedicated server provider. I went with OVHcloud US for my latest deployment. HN is at m5hosting.com.


OVH had some server fires that caused some amount of user downtime. I'm not really sure how that's gonna help.

Unless you have fallback with multi cloud deployments.


> And where will I host my box? I'm my apartment?

You seem to imply that the options are only cloudflare or your apartment. This simply isn't true: there are a plethora of companies that will lease you a dedicated box of some Us in one of their racks, as the sibling commenter replies. Alternatively, you can search for co-location services. Options range from 1U/2U co-location, to half rack units, to full racks, to dedicated areas of the datacentre ranging from cages to whole rooms (I've been in at least one datacentre where an entire room was under separate access control and leased to one customer only).

Usually datacentres are located quite strategically. For example the location of many datacentres in Zürich corresponds with two separate power supply grids that meet (so they can pull from both).

Some of the companies involved are resellers and don't actually operate the datacentres they use. Others actually do. Usually the service is more or less the same, from the point of view of renting a 1U, or co-locating one.

If you want reliability features of a datacentre, e.g. for your office services, but might move, you may find your local city surprising. In Manchester, UK, there's a large amount of dark fibre under the city (fibre that is laid, but not in use), owned by some of the DC companies. Sometimes you can connect your office to said datacentre via dedicated fibre.


We’ve been on Hetzner for several years now. So far the only outages we had were from us moving servers (yeah, we don’t have high availability or load balancing, just a single beefy dedicated server). So, yes?


Last company I worked for, we had many Hetzner servers. We had many drive failures and CPU fan failures. It's fine if you can deal with a relatively high chance of hardware failure.


According to my monitoring, yes and by a large difference


Perhaps not, but those who want to avoid Cloudflare for technical or idealogical reasons won't realistically expect identical performance from smaller alternatives. Same as using Linux. People use it knowing fully well it may not support the latest & greatest consumer gadgets like Windows, but unless people use alternatives despite minor downsides, we shouldn't be distressed when we eventually reach a point of global near-monopoly.


> People use it knowing fully well it may not support the latest & greatest consumer gadgets like Windows,

Like what exactly?


Any local data center.


I'm on Linode.

Linode is down because Cloudflare is down.

Can't login to their control panel, etc.

You'd need to go fully independent and roll your own, with zero dependencies, to really make this work.


Linode control panel being down doesn't mean that the servers they host are down.

For *most* web facing apps/sites, a site hosted on e.g. Linode like this, but not using Cloudflare, would be unaffected by such an outage.


I guess it depends. If you scale up and down via the API and can’t access the API .. you have a pretty good chance of a down scenario if you had a traffic spike you can’t scale for.


Yeah, they'd also be dependent on their ISP still if they're "fully independent". Good luck dealing with massive traffic spikes on a single bare-metal box and good luck maintaining a similar uptime to cloudflare's 98.84% uptime lol


Most (or at least many) colo facilities have multiple transit ISPs, some are big enough to have decent peering as well.

I'm assuming 98.84% uptime is a joke? Less than 4+ days of downtime is something I could manage from a home connection most years, if I had a static IP.


98.84? that a real number? that's pretty low


Does Linode really use Cloudflare? They were bought by Akamai earlier this year.


Feel free to try https://login.linode.com/login

Whilst the incident is happening you'll see the Cloudflare 522 page.


Interestingly enough, I'm already logged in, and the homepage as well as the rest of the Linode dashboard are operational. It seems only the login page is down.


Google's Firebase uses Fastly even after acquisition. It's possible Linode to continue using Cloudflare.


Isn't linode owned by Akamai now?


imo it's messed up that they haven't rebuilt their network infrastructure in the four months since they got acquired


Today's actually the first time my site is down and it's Cloudflare's fault instead of my own. Obviously this outage is huge, but so far I've been really impressed with their reliability.


What is your setup to where you are isolated from "other person making a mistake"? Even if you're a box in a colocated datacenter you're still able to get knocked off the net from some maintenance on the surrounding pipes. Hell, hosting your own box doesn't cause Comcast DNS issues to not knock off a bunch of people either.

I do think there is some holistic overview of hosting stuff on the internet, where you could label each extra actor that can break things, mitigation strategies, and costs of such. Someone better than me would be able to place relative risk (and I think in that model laying out various provider uptimes/issues would be great!) and offer a smart way of dealing with the buy vs. build question on this.


If you need fault tolerance/isolation, you want to have a second box in a different colo (preferably in a different city; a different coast/continent if it's important).

If you can live with dns round robin between the two, then you can easily host the DNS with multiple providers and avoid SPOF (could maybe host it on the two boxes you already have, too). You're still at risk of domain registry/registrar failures, and failures of their tld nameservers (very rare for well run tlds) and the root servers (not sure if they ever had a widespread failure). And of course, simultaneous failure of both locations isn't impossible, just less likely.

On Comcast DNS failures... Most of the recent ones I've heard of manifested as users on Comcast can't resolve X, but were really X had bad DNSSEC records and Comcast DNS refused to return records that weren't signed properly. It's easy to avoid that by not using DNSSEC.

In the general case of working despite bad ISP dns, you can't do much (anything?) for web browsers, but if you build apps, you can hard code fallback IPs for when DNS doesn't work... But you need to have IPs that stick around for the lifetime of your app downloads.


Fair point. Still, based on my anecdotal experience using leased dedicated servers, mistakes at that networking layer seem to happen less often than mistakes that take AWS us-east-1 or one of the big CDNs offline.


> mistakes at that networking layer seem to happen less often than mistakes that take AWS us-east-1 or one of the big CDNs offline

Not in my experience. Things break all the time, the difference is nobody notices because either the colocating ISP is too small or we are.


It does feel like more "hosted" environments are trying to do more fancy stuff inside the network, so have more failure cases. Or perhaps services that do a lot of things, even if you end up just using simple server components.

I still have a fun memory of half of IBM Cloud's servers falling over, meaning that our production app was luckily still up but our staging server fell over. I could get to their website, but their login stuff was all messed up. I believe that one was also a "routing stuff got messed up" issue....


My domains with DNSimple, and my servers are on Hetzner. There should be no dependency on CloudFlare, and yet they are down too.


Puhleeze. DNSimple uses Cloudflare DNS firewall product - this is not a secret. If you don’t like it use an alternative DNS provider, there are plenty.


Good to see how decentralised the Internet truly is.


Be enlightened by the truth.


Shouldn't you hit me with a stick first, or break a pot?


> at least my service isn't likely to be taken offline by someone else's configuration error or deployment gone wrong.

It's likely to be taken offline by yourself more often than not though.


Yeah but if the internet is widely down, the network effect is that people probably aren't using your site because everything else is down and they'll wait for confirmation from sites like facebook and their internet banking and netflix to make sure things are back to normal.


Not a useful comment 20 minutes into an outage.

The internet is an interconnected web of dependencies. Unless you are Cloudflare/Akamai/Amazon/Google there is no self-hosted anymore.

You can host in your basement if you like but you're still dependent on your ISP.


> The internet is an interconnected web of dependencies.

Ironically this is exactly what increasingly centralisation weakens. The huge cloud providers have eroded "an interconnected web of dependencies" into few huge server farms servicing everyone else.


Unfortunately, bad actors have made centralisation necessary for many sites to survive against even low-traffic attacks


CF's website is down as well. The CF Status page [0] says everything is working, though.

[0] https://www.cloudflarestatus.com/


Ironically, the cloudflare.com site is a more reliable indicator. If it doesn't load, then cloudflare is down.

Their status page is a joke, likely crippled to reduce legal liability, but at this point it's just an outright misrepresentation.


> Their status page is a joke, likely crippled to reduce legal liability, but at this point it's just an outright misrepresentation.

It's just Atlassian Statuspage, which is a manually-updated incident response system. Unlike AWS, Cloudflare actually makes an effort to update it fairly quickly, but it can still be slow-to-update when something is immediately wrong.


"Fairly quickly" meaning something like 30 mins to get a "we have identified there is an issue".

For their status page to be broken down into individual services and regions, I get the impression of some kind of automated monitoring.

The only service I saw get marked non-operational was their API, while their site and dashboard were not available at all yet marked as operational.


> Their status page is a joke, likely crippled to reduce legal liability, but at this point it's just an outright misrepresentation.

It's fairly standard practice these days for status pages to be manually updated. The difficulty with having them be automatically updated is that for it to be useful that system needs to have a greater reliability than the thing it's monitoring. The signal to noise ratio is otherwise a bit ridiculous.


Reddit Status [1] isn't perfect, but it's miles better than a static page saying everything is operational while everything is in-fact inaccessible. That it took 30 minutes for the page saying everything is fine to be updated with a warning that there is an issue (while almost all of the services and regions remained marked as operational) only makes the ineffectiveness of that page more blatant.

It goes without saying that the monitoring system must be separate from what it's monitoring and must be more reliable. Compared to running a CDN for half of the internet, automated monitoring is table-stakes.

[1] https://www.redditstatus.com/



Doesn't load for me.


It just updated to show Red a few seconds ago.

Edit: at least it showed there was a problem within <10 minutes, unlike other status pages that sometimes are green the entire time.


I don't think I've ever seen a status page report an issue at the time the issue was occurring. Seems a bit pointless.


Because most of them are manually updated, and not pulling directly from an uptime system (at least for incidents, etc)

(Also it doesn't help that uptime monitoring systems are usually stupid and love triggering on false positives)


In my experience (with the sites I visit and care about), the status pages are usually pretty accurate and helpful.


A bit too early imo. It took it 3-5 minutes to start displaying errors.


Do any of these corporate status pages ever work? AWS doesn't and neither does CF. This website is a more useful status page


It's now just updated to show incident


I wasted a bunch of time debugging the HTTP 500 errors on my site before I realized everything is 100% OK on my end, and that it's Cloudflare returning the error not my servers.


Ditto - I'm sitting here, wtf I'm not running Nginx on my blog, but I'm getting an Nginx response, hit IP directly....oooh.... right that doesn't make sense it's working fine. Cloudflare can't be down, that's next to, wait, status page (to their credit it's got a status note). HN here we go...


Sorry about that.


Would it be possible to adjust that 500 page to include an indication that it originates from Cloudflare, for the case that an outage like this happens again in the future?


I was surprised by that also. Will ask.


As we'd say, no worries mate! One little blog is probably the least of the s*tstorm you guys just had to deal with :)


- Encrypted DNS seems to be having issues (very slow resolution, if any)

- Having issues connecting to GitHub (Could be they are using CF, or could be DNS issue - but I'm able to connect fine to Google services)

- Twitter loads, but all images fail to resolve

- https://www.cloudflarestatus.com/ loads very slowly, and no assets (CSS, images, etc) load

EDIT from CF :: The issue has been identified and a fix is being implemented. Posted Jun 21, 2022 - 06:57 UTC


Here in Japan, Twitter seems to be fully operation, e.g. images load too.


GitHub was working fine for me in NZ. So was Twitter.

Very interesting outage.


Seems to be a lot of stuff flapping right now. I was able to load a client site that was behind Cloudflare, and now not.

cloudflare.com was returning Connection Refused, then error 522 cloudflarestatus.com was returning Connection Refused, now can't even resolve the IP

My guess would be that a router misconfiguration is being progressively deployed throughout their infrastructure.

EDIT: Continues to look like a cascading failure across their network. 1.1.1.1 is now unreachable for me.


Ironically isitdownrightnow.com is also down


This one seems be due to a hug of death rather than isitdownrightnow.com being behind Cloudflare, probably from too many people checking on all the other sites that are down.

I say this because: (1) it eventually loaded for me after I tried a few times and gave it time to load, and (2) its certificate doesn't report to be from Cloudflare but other sites I've checked that are down do


downdetector.com :

Downdetector: ಠ_ಠ Well, this is awkward...

5XX Server Error Web server is returning an unknown error

There is an unknown connection issue between Cloudflare and the origin web server. As a result, the web page can not be displayed.

    Ray ID: 71eac4764c149452
    Your IP address: 2600:1700:22e6:2410:2271:5b09:2694:a035
    Error reference number: 520
    Cloudflare Location: San Jose


downdetector.com is also down since it's behind Cloudflare.


Hackernews is my new status page. All others are useless.


I first checked DownDetector which is really good for individual services. Except they are now also down...


I too went to downdetector to see what was going on and funnily enough is was also down. They also had a funny error message.

Edit:- Found it from another user.,

Downdetector: ಠ_ಠ Well, this is awkward...



I'd say all others are down


Plenty of other status posts get flagged and taken down. Guess CF is special


Clearly YC should start an Saas web host...


May be atleast like [Incident HN] to report about incident status.


unironically true. first thing i checked.


Yep. Came here when I saw two of my sites go down before I even checked Heroku or Cloudflare


They're not down worldwide, we're still seeing traffic from some POPs, but it looks like a majority of their POPs are dead.

This feels like a bad config push.


Google Analytics shows me that I still have quite a lot of traffic, so it is not 100% down. Edit: If I set my VPN to Poland it works.


Cloudflare Warp connects, then prevents anything else from loading. Thought it was my router


Yes, I started checking router and wondering if anyone had managed to install some sort of exploit on it as I was getting that 500 nginx page from half a dozen Web sites.

In UK, good old bbc.co.uk still working fine...


I had tinkered with my network settings just before this to troubleshoot an entirely unrelated problem so for a minute there I thought I broke everything lol


Same here. I picked a bad time to mess with my DNS config.


CF status page now showing a wide spread incident.

https://www.cloudflarestatus.com/incidents/xvs51y9qs9dj


This is probably the best link for this status instead of the generic cloudflare.com one.


Cloudflare has too many outages.

Their core service (DNS and web proxying) should see an outage once every 10 years or less. Much like Google Search (which is a far more complex service).

Yet it seems we get an outage more frequently than once a year. In my opinion, that makes the service too unreliable to base my business off - it's not like I can failover to another provider while they're down.


This is the official Cloudflare incident URL:

https://www.cloudflarestatus.com/incidents/xvs51y9qs9dj


I'm unable to log into Bitwarden Safari extension. That's an alarming detail... Mobile app still works, fortunately


Single Point of Failure. LOL.


While this sucks, at least we'll have a great writeup to look forward to


Uh, that’s not good. The negatives of centralisation really smack you in the face.

I’ll start moving my sites away from Cloudflare soon. Not because it’s bad — in fact it has been amazing, but rather to decentralise.


doesn't help. my site doesn't use CF and I'm still down because digital ocean is down


Am I wrong to use this as an excuse to not use Cloudflare?


Of course not. If you don't need Cloudflare, don't use it :)

For anyone else needing their services, they are a perfectly reliable provider (most of the time). Would make sense to ensure a fail-over, though.


Apparently yes, people in here are quick to defend our new overlord.

"But they're great", they cry, "why should I use anything else?"


I'd love to use another company, but there's no one offering the same for the same price tag. Most of their services are free and they charge very little for the rest. Especially if you have a traffic heavy page with little revenue, Cloudflare is pretty much the only solution for CDN, WAF etc. All the others charge for traffic and cost a fortune.



How about waiting a bit first..

Maybe, things are back faster than you could cover from any trivial issue.


It's been 30+ minutes.

All of my home's Ring cameras have been inaccessible this entire time. It's not that big of a deal for me because I planned for that eventuality, but a lot of people have not.

If you run a critical service (like Ring) and your infra is tied to CloudFlare - you're stuck! There is nothing at all you can do. That's freaking scary man. If I was working infra at Ring I'd much prefer to get paged and start fixing the problem. There are very few problems that can't be fixed in 15 minutes if you plan well for failover...


Just noticed this morning.. IRCCloud and Discord are affected by this too. Wonder what else.


Explains why the online training course I was part way through stopped working! Amusing that the quickest diagnosis came from skimming the headlines here :)


I jumped around trying to find out what happened... should have realized HN was a better place to start. wink


https://www.cloudflarestatus.com still shows „All Systems Operational“


Several sites I was trying to access all went down at the same time. Came to Hacker News to see what was up - not disappointed!

*Including America's Cardroom, perhaps the biggest "offshore" US poker site. I can promise you that there are a lot of people who were playing in tournaments that are very unhappy right now. New York here.


It's times like these when I'm appreciative of the simplicity of the HN tech stack. Was talking to some people on discord when it went down and then noticed some other websites were down. Came right to HN to see 5 different threads about this. Will be curious to see what the cause of the issue turns out to be


https://www.cloudflarestatus.com/incidents/xvs51y9qs9dj:

Identified

The issue has been identified and a fix is being implemented.

Posted 5 minutes ago. Jun 21, 2022 - 06:57 UTC


All shopify based stores are down


Yep - I have a Shopify app that uses Cloudflare, and was just about to panic when I saw this post.

Turns out no one will notice, as all of Shopify is down anyway ¯\_(ツ)_/¯


LMAO of course when every single thing I tried to use won't load or gives me a useless default 500 nginx error page, I find out why here. Figured it was CloudFlare. Single point of failure, not once.


The worst part is cloud infrastructure companies like DigitalOcean and Linode are both down simply because for some reason they can't build their own infrastructure to not rely on Cloudflare lol.


I think they rather got overwhelmed by many more requests reaching them that would usually hit Cloudflare. Also, as is widely known, people tend to hammer F5 when something like this happens, additionally increasing the number of requests.


Things like Victorops/Splunk On-call are also down because of this.

This means if your alerts are fired through them, you'll peacefully be sleeping through this incident unless your customers wake you up.


Who watches the watchmen?


If only we could come up with a globally distributed set of networks and systems that could be run by millions of entities that don't rely on each other to keep working. Oh no wait...


This was very educational, all of a sudden I couldn't reach 60% of all websites I normally visit everyday. I guess this is the cost of laziness under the guise of DDOS protection.


You may also want to consider that this is how the web looks all the time for those of us blocked by Cloudflare.


I almost thought my internet or dns has some problem until I opened HN


I'm placing my bets on a config file with emoji characters


DigitalOcean is down as well.


I host my apps in DigitalOcean and it seems the main domains are still resolving, (e.g., https://nono.ma or https://gettingsimple.com).

Subdomains aren't working though (e.g., https://sketch.nono.ma).

Update: It seems everything is resolving properly now.


No, only their website.


Well my website hosted on DO is down and their console web app is down. That's pretty hard to manage when I'm on my phone.


But Our DigitalOcean DNS setting are not working.


DO DNS is down.


Hmm, I am able to access Cloudflare's own website but sites that are proxied through them give me Nginx's default 500 error page.


Yep, Hubspot is down for me: https://app-eu1.hubspot.com/


Hello 8.8.8.8 my old friend.


Discord is back, Kickstarter progressed from 500 to an exotic "Error 1016 Origin DNS error".

EDIT: all flushed, Kickstarter works.


"Investigating - Cloudflare is investigating wide-spread issues with our services and/or network.

Users may experience errors or timeouts reaching Cloudflare’s network or services.

We will update this status page to clarify the scope of impact as we continue the investigation. The next update should be expected within 15 minutes."


Can someone link me to some information that explains what Cloudflare is besides being a CDN?

Like I understand how websites can be served using a CDN and how a lot of the internet depends on that... but I don't see how gaming services like Valorant or cloud providers like AWS or chat room like Discord depend on Cloudflare.

Thanks!


Their WAF is very useful, it makes it very easy to defend against attacks without paying anything. In general, their big plus point is that they offer many services for free, making it easier to onboard.

But by now they offer lots of services, although I believe WAF and CDN are probably still the most important to many.


Sites returning 500 is one thing, people will understand that's an error. Site can't be found because DNS is out is not one that the generic public will start to debug, but instead they'll walk away from the site, sometimes for good.

Question: how could be (temporary) DNS errors be made nicer?


Sites are coming back up on my end (NL)


I believe discord was also affected due to this.. However, I did get messages from my friend in Thailand


I was setting up some DNS for a site, when it suddenly stopped working after 30mins of missing with the settings and googling I gave up come in here, and see this.

My sites that are just using DNS are working fine, it's only those with the orange cloud, proxy turned on that are broken.


I would change the URL to be https://www.cloudflarestatus.com/ intead, the cloudflare.com domain looks to be hosted in a different way.


Shouldn't have happened in the first place. Should have had something that worked on their own website to indicate the service is down, not needing to come to a somewhat obscure tech forum to find out the details.


I don't understand why companies simply don't have outages, it seems like it would be a lot less stressful.


They're talking about getting accurate information from an employee posting on HN rather than on the status page, rather than the outage itself.


Because of state and federal regulations, the path through back doors, fire exits and water coolers are always shorter from engineer's desks to the planetary atmosphere than it is through the front door and reception areas.


What?


To be fair my information was not accurate. It was fast but when I said it was a problem with our "backbone" I was wrong (it was a networking problem but not the backbone). I favour speed over accuracy here, but the status page wants to be fast and accurate.


My main interest was that you were aware and that a fix was on the way. That's the difference between having to desperately act myself or just sit tight and placate clients. So, I appreciated your original comment!


Ah, well in that case they already have that so no need to complain: https://www.cloudflarestatus.com/


The comment on HN had more useful information (that the issue was understood and a fix coming) before that status page then updated. I think that's their point.

Prior to that, it was some time (in the "all my sites are wrecked" timescale) before the status page had any indication of an outage.


The way I read their complaint was that they should have something on their website to indicate they were down. Anyway, at the time they complained, the status page also already said that the issue was identified and a fix was being rolled out.


Their post was saying that the dedicated status domain should be the first place to get useful information. There were multiple new threads on HN before the status page was updated at all. I'm sure there are legal reasons, but it's not ideal.

Then there was the CTO's (appreciated!) comment prior to the status page's second update with information suggesting this would be resolved soon (which IMO is the information everyone needs to report back to clients, bosses, etc).

That the status page was subsequently updated prior to OP's complaint isn't really relevant. It's still a point of discussion, whether someone comments immediately or later, right?


Possibly.. you seem to have a lot more insight into what GP meant instead of what they said, so I'll defer to you on this one.



I agree.


Maybe you should try first to actually go into their status page[1]. It is showing a global service disruption since 06:43 UTC; about 20 minutes since you wrote this

[1] https://www.cloudflarestatus.com/


I assume they're referring to the fact that it did not show any issues for ~5 or so minutes after the outage began.


Their status page does show the service down as well as them having identified the issue and working on a fix:

https://www.cloudflarestatus.com/


The entire day yesterday performance with Cloudflare was extremely sluggish. Pages which relied on it, even if it's only loading a JS-file from the CDN, would hang for tens of seconds.


Today a mistake, tomorrow an order from the DOJ. Take heed, Internet.


I cannot access science.org, quora.com, substack.com at the moment. It shows 500 Internal Server Error. Didn't know why but now it is clear. I guess I just wait until it is fixed.


Everything using Cloudflare is down. Tells you that how much of the internet Cloudflare is part of.

Once in a while web2 is going great with Cloudflare. Until when everyone uses it and it goes down.


I look forward to a technical report on this outage on their blog.


Statuspage seems to be useless, I was just trying to get the status via multiple networks and my mobile network. Ironically other downdetector services are also down.


Haha... I got pinged on my phone a site I manage is down, trying to figure out what's wrong with it, noticing other sites down, realizing it's Cloudflare


Auth0 login is also not working. Their website is up though


Our monitoring systems had influxdb endpoints behind cloudflare. Now we not only lost users but also access to data about the impact of outage.


Slack seems to be having trouble sending messages as well. Was trying to let my team know i've acked the request and am unable to do so.


I'm having '500 Internal Server Error' (nginx) on a regular tab, and everything working fine on an icognito tab. Go figure.


Not having fun right now, but this brings home just how reliant so much of the internet is on a very few very big service providers.


So interesting, the actual edge servers work.

But the Key/Value store that all cloudflare's configuration data lives on is giving 500 errors


This aligned with me debugging a separate issue in a program that programmatically uses the npm registry... that wasn't fun.


Was just about to post. Many sites returning 500 in the UK, and cloudflare seems to be the point of failure, including itself.


Yep, same issue. All our services are down, this is very bad. Can't even point directly to the app servers to resolve.


Linode seems unreachable too (and related hosted VPS). Is it my problem or is it a general failure on Linode?


I think it has everything to do with CF and not hosting. I have Linode servers without CF, and everything is great.


https://hn.algolia.com also down now.


Up on my phone (AT&T) but down on my main ISP/desktop (AT&T lightspeed). BGP issues too, maybe?


Oh... this explains why discord is down.


I haven't been impacted in Australia at all, but all of my (probably US based) monitors are going off.


Again, why did people decide to centralized like 80% or something of the internet under a single company?


Why do people ask questions like this? You know the answer. This company offered products or services superior to alternatives so people elected to use them.


Peep, anyone there?

Wow, for me it looked like the world had gone mad. This is a reminder to not only rely on 1.1.1.1 for DNS resolution in PiHole.

I host most of my services locally, but ironically could not connect to my own homelab. I use a dedicated domain with DynDNS and did not configure the network and DNS without reliance on external DNS. Surely it's infinitely more likely for me to make a mistake, right?


yeah, and if cloudflare could make their anti-bot "verification" interoperable with noscript/basic (x)html browsers, and not force those grotesquely and absurdely massive google (blink/geeko) and apple(webkit) web engines, that would be less criminal.


My morning started with crunching logs and not finding any errors and slowly panicking.

But we'll, can happen :)


The Cape Town location does not seem to be impacted in any way. Everything works as expected.


This is such a weird outage, cloudflare sites are down on some of my devices but not others.


Hah. I was seeing a lot of 500 errors across the web and came to HN to see what was up.


Very bad start of the day :(

Cloudflare.com now up, and websites are coming up, argo tunnels still down


All sites of mine are down now.


DNS looks to be ok, but sites that I have proxied through them aren't working.


The DNS seems much slower for me, but is working.


Can confirm, no issues on our infra side. Cloudflare took down the web once again.


Down here in Japan, too. Thought it was my connection for a bit. sigh


Another reminder that centralized services like Cloudflare are a bad idea.


Cloudflare is great everyone, let's not forget how awesome they are.


I guess this is probably why I can't log into league of legends atm.


That indicates how much of the Internet depends on cloudflare server.


Sooner or later they will start to give a d..n about resiliency :)


Wishing Cloudflare ops teams the best to recover fast from this outage. Meanwhile, we urge customers to check out www.cdnreserve.com , and implement a sound CDN backup strategy (auto-failover) when the primary CDN suffers an outage.


Never market during another companies outage, offer help instead. Tomorrow it will be you.


I absolutely agree and very respectfully so. No one is immune to outages. Well said. CDNReserve is designed in a way, that if the outage occurs on one platform it will map the traffic to failover CDN and if the failover/backup CDN suffers an outage, the traffic will be shifted to the primary CDN using CDNReserve. Its built on the premise that the likelihood of two CDNs having outage at the same time is close to ZERO.


The likelihood of CDNReserve having an outage on the other hand is 100%.

You aren't the first to come up with the idea of a CDN traffic director (I built one), and you'll soon discover customers recognize you are just another single point of failure and not the solution. Best to focus on the things other companies in the space market on, bill optimization, latency optimization, etc.


Agreed. The likelihood of any platform having an outage is 100%. But the likelihood of two networks having the outage at the same time is close to zero. It's awesome that you built a similar solution in the past. It would be great to jump on a call and learn from your experience if you are open to it.


GitLab.com is impacted too as they are also behind Cloudflare.


GitLab team member here. Thanks for sharing.

Incident: https://status.gitlab.com/pages/incident/5b36dc6502d06804c08... with the latest update:

[Monitoring] Services seem to be back to normal, and we continue monitoring. Details in https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7...


maybe we should start suggesting the use of IPFS/Arweave to host critical infra status pages or similar sites that are occasionally imminently needed


All things work here? Using cf dns, but even cf.com loads.

Edit: in belgium


Getting intermittent SERVFAIL from 1.1.1.1 DNS as well


That explains why i couldn't access phoronix


Seems to be back up again, albeit a bit slow.


I am guessing this is why substack is down?


Same here! Seems to work with FR proxy tho!


Yeah, using a Mullvad VPN with location set to France lets me access everything


I think unpkg were affected by this too.


Everything is up for me. Short downtime?


Looks like it's mostly good now?


Everything working here in Australia


A lot of my Australian colleagues were saying a lot of things were down, including all of our websites however me being in NZ, I was able to visit them.

So I do think Cloudflare actually is a bit more decentralised than we give them credit for really.

Just the fact that they messaged here in the HN thread about what was happening, what they knew, and how they were gonna fix it. That's just _awesome_

Kudos to them. I can't wait to see their after report.


Discord, Runescape down for me.

And some work related systems.. but that's less important.


Seems fine in Melbourne but not Sydney


I'm in Melbourne and it all seems ok


No it's not.


Back up now as of about 5 min ago.


wow so that's why all of the "is it down for me" sites are down lol


Back up?


Yup.


linear.app -> down notion.so -> down

Wow, this dependency on cloudflare is wide.


DNS API failing with 500


cloudflare has a 100% SLA. This needs revision https://www.cloudflare.com/business-sla/


looks like lot of sites being impacted by this one


ah fuck


hugops to cf team. <3


is it just DNS?


firefox DOH uses cloudflare and nextdns. nextdns is down as well


turns out having a central failure point for the entire web was a bad idea


Someone should a website that collates the approx. 90M times this sentiment has been made on this website (a good chunk of me making it) just a reminder how nothing somehow changes on this front the moment it comes back up and people go back to relying on single SAAS's for everything.


To be fair this is not relying on a single SaaS for everything but many people relying on a single SaaS. I mean if you want to use a reverse proxy/CDN, you must rely on someone.


My company uses 3 CDNs, although not cloudflare. If one (say Aakami) goes down it gets removed from the pool and life continues.


Well that makes your company more responsible than my credit union.


Our key customer facing services are a 99.995% uptime (and a total of 2 or fewer incidents per year of "any length"), which means once you start concatenating services with 99.995% SLAs you aren't there.

How that SLA measures a 2 second outage for some customers is a separate thing, and sort of shows how meaningless these things can be on the internet (if you lose service for 10% of your potential customers is that an outage? How about 90%? How do you know how many were lost).


Measuring outages doesn't seem so meaningless as long as you money seems inaccessible.

Their main site went down for about 20 hours a couple weeks ago because their hosting provider went down. They deployed an HTTPS only static site in its stead, so at first blush it looked like they deployed nothing. Great when you're trying to find contact information hosted on that site.

Their online banking site leveraged Cloudflare, so obviously they just rode that outage out with no notifications, etc.


Sure but that's a total outage for a long time.

What if for some reason a single /24 was unreachable from the site (say an errant route for 12.85.25.0/24 somehow got in the path). How would you even know that was a problem - how many customers are on that /24, how would I measure their failed attempts to connect?

I have a remote office in India on Tata. The other day it had access to much of the internet, but due to a fibre break in the Mederteranian it didn't have access to end points in Europe for a good 20 seconds.

However the other link on a different ISP remained working at that time.

Does that count as an outage? If I wasn't actively monitoring that link with a high resolution would I even know about it?


I'd argue you're starting from a few orders of magnitude more competency than the credit union was. Their non-banking site was hosted by some podunk company in Texas with no sense of redundancy anywhere. Their provider had a near total networking outage and the credit union had no plan to recover from that.

Insofar as proactively monitoring a single /24, you (probably) don't. I don't think it's (usually) a company's job to monitor their customer's ISPs. The failures that "my" credit union had were due to their own choice in infra (Armor, Cloudflare). When Sonic nuked my config on their DSLAM after some maintenance I raised an issue with Sonic not with whatever other companies became inaccessible as a result.

> Does that count as an outage?

My POV may very well differ from whatever contracts and SLAs you have in place, but yeah maybe. If you can't fail over to the alternative ISP then yes that's an outage. Of course a trans-atlantic fiber break would also likely be a lot more noticeable than fat fingering a route for a /24. And sure, I've been stuck at megacorp when the VPN started handing out addresses in a new subnet but our department's networking team hadn't caught up. That's why you listen to your customers instead of throwing out a "someone else screwed up there's nothing we can do" response.

Me personally I don't think that a 20 minute banking outage is a massive problem (I've long since moved my money elsewhere), even the 20 hour outage was relatively minor. It just speaks to the unwillingness of the credit union to be highly available. They knew of the Armor outage and didn't actually test the remediation. I assume they didn't know about the Cloudflare outage. Both worry me. What happens when they're faced with a total failure of their online banking system?


But it isn't an outage. My monitoring point in Singapore could reach both ends, they just couldn't talk to each over, due to a routing issue on a third party network over the internet.

On my own network which I control I accept that if a circuit breaks I'll have a 1, maybe 2 second outage while traffic reroutes. For some of my services that's would be a problem, for others it's not. If facebook loads 2 seconds later, nobody cares. If the winning penalty in the world cup final blacks out, that's a big problem.


I'm new to this whole thing. Can u point me on how I don't depend 100% on CF, if its DNS is down? Is there such a service? (kinda like load balancing, but with DNS?)


DNS is easy. You can (and must) have multiple nameservers for your domain. Just use different companies (and different regions) and if one goes down the others will still resolve.


what are those 3 CDN companies? Cloudflare, Aakami and what else?


> what are those 3 CDN companies? Cloudflare, Aakami and what else?

The OP says "not Cloudflare"; so probably Akamai, Fastly, CloudFront?


Yes, sorry my rant was more like "everything relies on a single SaaS" rather than the SaaS doing everything, just mixed up the phrasing


I wonder how long Cloudflare would have to be down for to have a noticeable change


I remember when some key part of AWS EC2--EBS in us-east-1 maybe?--was down for a few days straight. Honestly, the main thing it taught me was "if you are honest with your customers they will mostly just come back later and buy everything they didn't buy today".


14h53m


Yes and no. Obviously not great when everything goes down, but I find a strange sense of solace and calm when I know there’s a lot of people in the same boat and there’s little I can do but wait.


Yeah same lol. I saw that my SaaS services were not working and I got stressed thinking why all my EC2 instances went down at the same time. I checked down detector for EC2 and it reported that Cloudflare is down. I breathed a sigh of relief thinking that the (almost) whole of internet is down - nothing that I can do here.


It's quite ironic that the Internet was designated to withstand nuclear attack, yet with how much everyone started using "cloud" a stupid configuration mistake in an important company can put it on its knees.

We should really rethink that constant reliance on single point of failure.


I wonder if it really was though. I’d think that these centralised services go down less than the self hosted stuff previously. Is it better to have more overall uptime but downtime means everything stops, or random downtimes of individual sites that adds up to more downtime.


I mean if large websites like Notion or Medium had used IPFS instead, there would be no central point of failure, and web pages would still be available from distributed hosts


It's not a central failure point though. Plenty of websites don't use Cloudflare.


If most of the websites, that most people rely on for their day-to-day functionality, use Cloudflare, it's effectively a central point of failure.

Sure, there are alternatives and not everything uses it, but if it's enough to greatly affect a large proportion of internet users, it's a problem.

Just like if google mail went away for ever. There are plenty of other email providers, right ?


Similar to when Facebook/Whatsapp went down earlier in the year and we all reverted back to SMS for an evening.

Fun times.


This should hopefully drive home the idea of why HN shouldn't be cheering on Cloudflare's slow takeover of the internet.


CLoudflare just offer great services. Its straight up fact that even their free model is extremely generous. There is no big conspiracy to 'take over the internet' but when the product is good, the product is good.


> There is no big conspiracy

CloudFlare should be run by the CIA or something - asthonishing MITM opportunities. The only clear sign the CIA is not deeply involved is that CloudFlare is far too competent.


It blows my mind how most of the otherwise savvy readers of HN completely gloss over the fact that Cloudflare unwraps TLS on most their internet traffic.

I trust that the current leadership might not do something evil, but they are publicly traded. At some point a group of investors are going to figure out that merging Cloudflare with an advertising network would create a level of user targeting that Google and Facebook could never dream of.


Governments in Europe and elsewhere are already working on legislation to mitigate e2e encryption by law. Regulating things like cloudflare as they have already done with ISPs to hand over data is not even much of an imagination leap. For example in the UK all time:srcip:destip:user data must be kept for 1 year for every residential ISP and provided to government departments (not even law enforcement) through a standard system


Agreed, NSA is much more likely


This is what is commonly called "golden handcuffs".

Of course they offer a great product, that's how you create a monopoly.


Not a 'big conspiracy'. It's the business model, isn't it? Or isn't CF going for the biggest marketshare and maximizing profits on that, like all the others?


It's certainly any business' model to grow as big as possible, but it's a hard business model to implement so competition is hard to find. I just can't blame CF for that imo.


This. I used Cloudflare for 10 years as a free customer. Now I pay them at least $5k a month. Freemium works when the services are good.


Out of interest, what competitors are available? Clodufare protects against DDOS? Does it do other stuff?


Broadly speaking, any CDN will have similar functionality. Fastly, Akamai, AWS Cloudfront...


DDoS-Guard is the goto if Cloudflare decides it can't take you as a customer.


It's working in Belgium but not in the Netherlands


Maybe they are more decentralized than what we are giving them credit. I'm having different error messages (nginx, dns, 404) on different websites. Not sure if it's a full breakdown of their systems or a coordinated attack.


My sites on Hetzner (Germany) are coming up fine as well.


Still down, here.


[dead]


Yeah the NZ ones just came back online, https://veb.co.nz/ was returning a 500 nginx error which confused the shit out of me until I visited HN

Seems our DNS was working over time because a lot of the websites that were down for others were working OK for us.


Out of interest, do you think for sites like those, cloudflare provides any value?


Odd. My client's aus-hosted, nz-customer site was hit by this.


Is this why 4chan is down?


[flagged]


Lol


Had a hard time trusting Cloudflare since they blocked 8chan due to political activism.


Ohhh, downvoted, I assume because 8chan are the bad guys.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: