Y'all, this is going to be deeply unsatisfying, but it's what I can report personally:
I have no earthly clue why this thread on our community site is unlisted.
We're looking at the admin UI for it right now, and there's like, a little lock next to do the story, but the "unlist story" option is still there for us to click. The best I can say is: I'm reasonably sure there wasn't some top-down edict to hide this thread (the site is public, anybody can sign up for an account and see the thread).
Say what you want about us, but hiding out from stuff like this isn't one of our flaws. When I find out more about what happened with this thread, I'll let you know (or Kurt will reply here and tell me I'm wrong).
I don't know enough about what happened with this Sydney server to be helpful to people who had instances running on it. When I know more about it, I'll be helpful, but I'm just learning about this stuff right now, after getting back in from a night out.
Almost immediately afterwards
It looks like... all the posts in the app-not-working category are "private"? Like it's some setting on the category itself? "Private" here means you need to have signed up for a Discourse account to see them?
Honest advice, probably to Kurt rather than you, is you need better processes, accountability and (probably) communication in your company. The tone of your reply (and other communications from fly.io) is reflective of the lack of those things given the public sentiment regarding fly.io. At 60+ employees and so many issues that tone goes from humanly endearing to indicative of a non-scaling business. Other replies indicate you don't want the things (process, oversight, etc.) that a growing B2B business needs to really succeed which is not a good sign. Sure there's a cost to that corporate-ness and you want to minimize that cost but it's also a necessary evil for the business you're in at the scale you're at.
If something breaks once it's an accident, if it breaks twice it's bad luck but if it breaks down three times it's broken processes. Based on the comment here things break at fly.io a lot more often than three times.
I'm just a person on Hacker News that happens to be at Fly.io; as I've said before, it's probably reasonable to think of me as an HN person first, and a Fly.io person second. My tone is my tone, and has been for the many years I've participated in this community. I got back from an evening out, saw that we were on the front page, poked around a little to find out what the hell was going on, and did my best to add some context. That's all.
If you're reading my comments on HN as some kind of official response from the company, you've misconstrued them.
> If you're reading my comments on HN as some kind of official response from the company, you've misconstrued them.
For what it’s worth, this is the reason most companies eventually restrict their employees from making statements about the company; It doesn’t matter if you thought it was clear that is was unofficial, any statement from an employee in a position of power (such as someone with access to the control panel) will be perceived as a communication from the company.
You may have intended it to be a personal remark about your job, but there are a lot of people in this thread looking for any communication they can get about the company.
When you step in to fill that void as a person who appears to have access and power within the company, you are the official communication whether you intend to be or not.
For the sake of fly.io, you should either restrict yourself and not respond or, if you can't resist, make it crystal clear, that you DO NOT represent fly.io. Your first message can and will be misunderstood and it DOES throw a poor light on fly.io.
I am a paying customer of fly.io, on the Scale plan.
TBH I thought you were replying as the CEO of fly.io since 1) I've seen them post here before, 2) I have no idea how big fly.io's staff is and 3) your post didn't otherwise describe who you were. It doesn't look like I was the only one to be confused.
If you had said "thoughts are my own; I just work there" or something I think it would have been more clear.
It seems you took my comment personally but it was about not just your comments but the overall tone of the fly.io communication (see recent blog post regarding funding) and approach to issues (three days of silence on a dead instance). You view processes and guidelines as chains versus as a ladder to help you climb a cliff. If the processes and communication was good then you'd know when you should self-restrict and when you shouldn't. You'd be empowered to make decisions within a framework that benefits fly.io the most versus being left to guess yourself. You'd understand why you should do that sometimes and why it's a better option for everyone.
I don't, but that's fine: it's not important that we understand each other all that clearly here, since all I'm talking about is how our public forum works.
For an opposing viewpoint: I don't want HN to become the place where corporate comms comes to bullshit us. I want engineers who work there to talk to us as peers, which seems like what's happening here. I get candor and humility (and playfulness, sure) from Fly's tone, which I appreciate.
I get stuff like this is frustrating. But I bet Fly staff are pretty frustrated too.
From this my take away is that I could get fired for picking Fly.io for work. Not because there was an outage but because days could pass before getting support.
What assurances could you give the community here that the support would be better next time?
This is our public site, for people who don't have support plans with us.
It's difficult for me to say more about what happened here and how you might have handled it, because I don't know what happened with this SYD host, because it's 1AM and the people who worked on it are, I assume, asleep. When I know more, I'll do my best to get you a postmortem.
Try filing a bug with any of the big three cloud vendors when you're on their free plan. It's really not different, the thing that is going to get you fired is not realizing you're not paying a couple hundred bucks per month for premium service on the infrastructure that is mission critical to your company.
Funny story, when I started my current role I researched our hosting provider. I couldn't find the matching invoices in the accounting system. So I called the vendor, a local company. They'd not set our account up correctly, billing was not enabled. Since then we've been billed. I'm glad we sorted it but it wasn't a good look to start my role by increasing our spending.
I feel like starting your role by discovering a crucial service wasn't being paid for and therefore was at risk of suddenly going away should be a pretty positive thing.
However 'should' is pretty load bearing there and actual results are probably heavily dependent on management culture and the current state of office politics.
We had a customer once that our automatic billing system tried to reach for 3 months about failing credit card charges (<$5k/mo). Our system stopped the service.. I'm pretty sure their subsequent outage cost their customers millions. Lessons about what it means to have (and be) enterprise customers were learned. Unfortunately the lady who was ignoring our e-mails in her inbox got fired.
> Try filing a bug with any of the big three cloud vendors when you're on their free plan.
A host being down for 3 days isn’t a bug. And you can contact AWS support, even on the free plan, and get a reply. Try it yourself. The great thing about AWS and the other cloud providers? If a host has issues they email all customers with workloads on it so you don’t need to refresh or check a forum.
I understand fly is a community darling. They’re unreliable, with poor support currently. Maybe the dev experience is great and that makes up for it, but pretending like everything else is equally shitty? Not true.
Lots of experience with Fly's paid support here. tl;dr Absurdly good.
FAR better wrt both response times and technical expertise than you'll get with any large public cloud provider.
I was dealing with some annoying cert + app migration stuff (migrating most of an app from AWS to Fly), and Kurt (CEO) was personally sending me haproxy configs bc I'm not smart enough to know how to configure low-level tcp stuff in haproxy. Not to put him on the spot here -- I doubt he'll have time to do that level of support going forward -- but that's my experience of the company's dedication to support and technical expertise.
For instance one of those things I've noticed is that most Discourse instances have those nag banners if you're not logged in begging you to log in – and that's one of the least objectionable things they do IMO. I discovered recently that Discourse also blacklists all but the most recent browsers (because Discourse is designed for the next ten years!) and serves up a plain text version on anything older… but not without a nag banner of its own admonishing you for not using a supported browser.
The infinite scrolling… ugh. I'm not a huge fan of XenForo, but as a successor to vBulletin it seems to be far more user friendly.
My understanding is that it was causing support problems, because people were Googling for solutions to problems with their apps (because of the Heroku diaspora, we have a lot of first-time Docker users), finding old stale threads on our forum that looked related, and then reviving them.
I think we can just `noindex` the category instead of making it private?
After 15 months & more than 100 million requests served by our Phoenix + PostgreSQL app running on Fly.io, I would be hard pressed to find a reason to complain.
- Some deploys failed, and re-running the pipeline fixed it.
- Early July 2023, 9k requests from Frankfurt returned 503s. Issue lasted 10 seconds.
- While experimenting with machines, after many creations & deletions, one volume could not be deleted. Next day, the volume was gone.
That's about it after 15 months of running production workloads on Fly.io.
I'm sorry to hear that many of you didn't have the best experience. I know that things will continue improving at Fly.io. My hope is that one day, all these hard times will make for great stories. This gives me hope: https://community.fly.io/t/reliability-its-not-great/11253
There's also a lock icon next to the "App not working" category in the header, which I took to mean that that entire category is hidden from logged-out users (which experimentally seems to be the case).
I have the impression from this thread that this thread was public (as in, would work if you just linked to it from something like HN) earlier, and now it isn't?
Obviously, deliberately hiding a negative story on our Discourse is a little like deleting a bad tweet; it's just going to guarantee someone captures and boosts it. We have a lot of flaws! But not knowing how the Internet works probably isn't one of them. No idea what's going on here, still trying to work it out.
Yes, from the Google-cached version, it appears that the thread previously didn't have the app-not-working tag; it was only tagged with "rails".
Not going to try and guess why or when that tag change happened. Personally, I'm less concerned with this particular thread than with the apparent decision to systematically hide all potentially-negative threads from search engines.
That category was added after one of our support folks replied, likely for tracking. I don't know why it's private. They may not even know this category is private. Hiding negative shit wasn't a deliberate decision... we're aware of google cache and we don't need to give HN another reason to dunk on us.
> That category was added after one of our support folks replied
FYI, this doesn't appear to be strictly accurate. The OP commented at 23:52 UTC saying that the thread had been made private, and the reply from "Sam-Fly" was not posted until 02:36 UTC.
My point was that the app-not-working category is used in conjunction with support/our team getting involved. I assume this is what Sam meant by "flagged it internally", which was followed by investigation, then a post. I don't see how the timestamps uncover something nefarious.
If you're talking about the comment you're replying to, tbh I found it was way more relatable than a more "professional" PR-speak response. Maybe you were talking about something else
Eh, I like it. It's refreshing to see a company representative communicate like an actual human being instead of the usual meaningless corporate robot-speak.
I'd rather take this response and see that they're working on it than "Oopsie poopsie, our machine elves have messed up!" or corporate newspeak saying nothing.
you have no idea wtf you writing about; it's been a few hours now and it's become clear that someone tagged the post as 'app-not-working, which made the post got 'private' and only available for logged-in users. it's also become apparent that the linked post in on a community forum for users without a support plan.
the dramatic tone and accusations in your reply are not warranted anymore
I have no earthly clue why this thread on our community site is unlisted.
We're looking at the admin UI for it right now, and there's like, a little lock next to do the story, but the "unlist story" option is still there for us to click. The best I can say is: I'm reasonably sure there wasn't some top-down edict to hide this thread (the site is public, anybody can sign up for an account and see the thread).
Say what you want about us, but hiding out from stuff like this isn't one of our flaws. When I find out more about what happened with this thread, I'll let you know (or Kurt will reply here and tell me I'm wrong).
I don't know enough about what happened with this Sydney server to be helpful to people who had instances running on it. When I know more about it, I'll be helpful, but I'm just learning about this stuff right now, after getting back in from a night out.
Almost immediately afterwards
It looks like... all the posts in the app-not-working category are "private"? Like it's some setting on the category itself? "Private" here means you need to have signed up for a Discourse account to see them?