Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In real-world business requirements it often need to read some data then touch other data based on previous read result.

It violates the "every transaction can only be in one shard" constraint.

For a specific business requirement it's possible to design clever sharding to make transaction fit into one shard. However new business requirements can emerge and invalidate it.

"Every transaction can only be in one shard" only works for simple business logics.



I talk about these problems in the "How hard can sharding be?" section of the article — long story short, not all business requirements can be dealt with easily, but surprisingly many can if you choose a smart sharding key.

You can also still do optimistic concurrency across shards! That covers most of the remaining ones. Anything that requires anything more complex — sagas, 2PC, etc. — is relatively rare, and at scale, a traditional SQL OLTP will also struggle with those.


Thanks for reply.

So in my understanding:

- The transactions that only touch one shard is simple

- The transactions that read multiple shards but only write shard can use simple optimistic concurrency control

- The transactions that writes (and reads) multiple shards stay complex. Can be avoided by designing a smart sharding key. (hard to do if business requirement is complex)


The optimistic concurrency control that reads multiple shards cannot use simple CAS. It probably needs to do something like two-phase committing


That's right!

If you anticipate you will encounter the third type a lot, and you don't anticipate that you will need to shard either way, what I'm talking about here makes no sense for you.


Business people have a nasty habit of identifying two independent pieces of data you have and finding ideas to combine them to do something new. They aren’t happy until every piece of data is copied with every other piece and then they still aren’t happy because now everything is horrible because everything is coupled to everything.


in my experience most backends I have worked on people don't use the facilities of their database. They indeed simply hit the database two or more times. But that doesn't mean it's not possible to do better if you actually put more care in your queries. Most of the time multiple transactions can be eliminated. So I don't agree this is a business requirement complexity problem. It's a "it works so it's good enough" problem, or a "lazy developer" problem depending on how you want to frame it.


This (along with n+1) is somewhat encouraged in business applications due to the prevalence of the repository pattern.


Give each business or customer its own schema and you almost never need sharding.


Yes, but you could also flip it the other way around — make the business or customer your sharding key, and you'll only need to manage one schema!




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: