I still don't get why they didn't separate clients on a database level. Sure, pu...

mustardo · on April 14, 2022

Using separate databases or schemas per tenant comes with the following problems

* Managing schema migrations across every DB

* You cant query across the DB, want to know some cross tenant thing for ops? That's now a lot harder

* Connection pooling and resource usage can be harder to manage

Most systems I've worked on use a single DB with a `tenant_id` col on every relevant table, it's easy to have your query builder slap in the auth'd tenant I'd. This approach does come with issues like saving and restoring an individual tenants data

Like a lot of things in life, it's a trade off

donavanm · on April 14, 2022

> why not use different databases? They cost nothing and provide perfect separation.

I understand the sentiment, but This is a pretty simplistic take that I very much doubt will hold true for meaningful traffic. Many databases have licensing considerations that arent amenable. Beyond that you get in to density and resource problems as simple as IO, processes, threads etc. But most of all theres the time and effort burden in supporting migrations, schema updates, etc.

Yes layered logical separation is a really good idea. Its also really expensive once you start dealing with organic growth and a meaningful number of discrete customers.

Disclaimer: Principal at AWS who was helped build and run services with both multi tenant and single tenant architectures.

dx034 · on April 14, 2022

Don't you usually license based on server resources? Or do you know really have to pay per database/schema? At least on-prem licenses tend to be based on resource usage, not on the number of databases or schemas. I'm not talking about different db processes, just databases/schemas within a database.

And for migrations and schema updates I'd see this as a huge advantage. Migrating customers one by one is much easier than everyone at once. You also never have the issue that operations at one customer could cause a global lock affecting other customers.

Of course resource sharing isn't easy in this scenario, but you'd never want to connect data between customers anyway so I don't see the issue with that.

But maybe it works harder in a cloud environment where more is abstracted away.

donavanm · on April 14, 2022

Ah, when you said "database" I assumed you meant a dedicated single tenant instance of an RDMBS (or similar), and not necessarily something like dedicated tables. I will admit to being a decade out of touch with the vagaries of "processor", server, and client access licensing. In my relevant past I've only worried about (RDS/EMR/Redshift/etc) instances and tables.

Very fair call out on having more granular, discrete, instances for things like DML/schema updates and expensive queries. I love fault isolation and have had many sad days oncall when we exceeded the capabilities of The Database.

I wouldnt say it's harder because it's more abstract. I think the general motivation is to desperately avoid anything that scales cost/effort with the number of users. Even if it's sublinear a team can really drown under the cost of scaling up a service. And that's a serious consideration when a baseline expectation is to go from 0 to 10,000 or 50,000 active customers in just a few years. The care and feeding of (for example) 10 multi tenant partitions is just simpler than having to monitor & operate 10,000 independent databases with wildly divergent usage profiles. I will grant this hyper growth is not a common scenario for the industry, or if it is then its "one of them good problems."

I'd also say I have worked on a project that did have independent data tables for each customer instance. And we spent a meaningful amount of time abstracting away table creation/migration/etc, a common DAL that abstracted away the multitude of tables, common monitoring, etc. It has made some things around data migration & management easier but I honestly don't know if it's more efficient than multi tenant clusters in the long term. But the only way the economics and operational effort has worked is by going "all in" on using "serverless" technologies that efficiently scale to zero and have no carrying cost when idle

abraae · on April 14, 2022

There are lots of downsides to doing absolute partitioning of tenants (along with lots of upsides as you point out).

Really annoying things that slash your velocity. You can't easily run pan-customer queries, can't aggregate data.

Ironically too using a single database makes full backup and restore much easier.

dx034 · on April 14, 2022

Wouldn't you extract data anyway to another system for analytics? Running analytics queries on production databases seems a bit risky in any setting?

treis · on April 14, 2022

You can have a read-only replica if you're concerned about performance.