Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Databricks is trying hard to get into serverless, but it seems like they refuse to allow it to actually be cheaper, which defeats the purpose of serverless.


I don't think being cheaper is the main value sell of serverless. When I hear "serverless" I think "ease of deployment and automatic scaling".


Serverless is incredibly cheap for endpoints that don't get called too often, and incredibly expensive for endpoints that are.

I guess different people just have different experiences.


Right but ultimately that's a cost thing, right? Because you can solve those problems through other means and by hiring internally.

Serverless is meant to obviate some of that. But it is less compelling when the vendor tries to gobble up that margin for themselves.


You will all forced to go serverless because new grads can't use the command line. Running a database is about the hardest thing you can do. If it is serverless, you don't need special skills, preventing employees from becoming valuable lowers costs across the board.


Have you tried being less jaded? Running a database is NOT about the hardest thing you can do.


When running a service, databases are the hardest to run. K8S still doesn't handle them well (this is by design), so they are the first thing to get outsourced to a managed service.

This is me being less jaded. Support those little wins!


I had an interview with a senior data engineering candidate and we were talking about how expensive Databricks can get. :D I set up specific budget alerts in Azure just for Databricks resources in DEV and PROD environments.


There are so many gotchas. I'm getting so tired of working around it, but my company is all in on serverless so the pain will continue. A lot of it is tied up with Unity Catalog shortcomings, but Serverless and UC are basically joined at the hip.

A few just off the top of my head:

* You can't .persist() DataFrames in serverless. Some of my work involves long pipelines that wind up with relatively small DFs at the end of them, but need to do several things with that DF. Nowhere near as easy as just caching it. * Handling object storage mounted to Unity Catalog can be a nightmare. If you want to support multiple types of Databricks platforms (AWS, Azure, Google, etc.), then you will have to deal with the fact that you can't mount one type's object storage with another. If you're on Azure Databricks, you can't access S3 via Unity Catalog. * There's no API to get metrics like how much memory or CPU was consumed for a given job. If you want to handle monitoring and alerting on it yourself, you're out of luck. * For some types of Serverless compute, startup times from cold can be 1 minute or more.

They're getting better, but Databricks is an endless progression of unpleasant surprises and being told "oh no you can't do it that way", especially compared to Snowflake, whose business Databricks has been working to chew away at for a while. Their Variant type is a great example. It's so much more limited than Snowflake's that I'm still learning new and arbitrary ways in which it's incompatible with Snowflake's implementation.


hmm, what is a serverless Pg? I don't quite understand. I thought you needed a database server if you wanted to run Pg.


basically they separate the compute and storage into different components, where the traditional PG use both compute and storage at the same server.

because of this separation, the compute (e.q SQL parsing, etc) can be scaled independently and the storage can also do the same, which for example use AWS S3

so if your SQL query is CPU heavy, then Neon can just add more "compute" nodes while the "storage" cluster remain the same

to me, this is similar to what the usual microservice where you have a API service and DB. the difference is Neon is purposely running DB on top of that structure


So how is this distributed Postgres still an ACID-compliant database? If you allow multiple nodes to query the same data this likely is just Trino/an OLAP-tool using Postgres syntax? Or did they rebuild Postgres and not upstream anything?


They keep using the core Postgre while they touch the storage layer to works with S3. Can try ro read more here https://jack-vanlightly.com/analyses/2023/11/15/neon-serverl...


Thank you, very nice read! (Though from some scanning it looks like it mostly helps reads)


You're welcome. I think for the write part, it's always back to the old classic consensus. In then end there always that distributed voting mechanism to decide the write order


It's only serverless in the way it commits transactions to cloud storage, making the server instance ephemeral; otherwise it has a server process with compute and in-memory buffer pool almost identical to pg, with the same overheads.


Marketing speech.


You shouldn't be getting downvoted. Serverless is nothing more than a hype which is meant to overcharge you instead of running it on a server owned by you


That's a reductionist view of a technical aspect because of the way the technical aspect is sold. Serverless are VMs that launch and turn off extremely quickly, so much so that they open up new ways of using said compute.

You can deploy serverless technologies in a self hosted setup and not get "overcharged". Is a system thread bullshit marketing over a system process?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: