I would prefer Python or Node over JVM for a light weight microservice. I would ...

jpgvm · on Dec 6, 2020

Python and Node both have highly fragmented ecosystems, low quality packages, poor tooling and neither are statically typed, capable of multi-threading in a meaningful way and other then their niches (data science for Python and client-side web for JS) are worse at everything than JVM or CLR.

I understand their attraction, they are "simple" and "easy" languages. But they are not boring. If anything they create a ton of distinctly un-boring problems like build chains, packaging, framework of the week is no longer supported (or has a new incompatible version).

Engineers may like these for whatever reasons, especially before they have tried the higher quality tooling provided by real boring tech but inevitably they lead to projects that either run behind time because of technical issues not related to business problems and/or rot after development is paused and are hard to resuscitate and maintain afterwards.

Also NoSQL doesn't "scale" better than SQL. NoSQL stores can scale better in certain data access patterns but if your data model is inherently relational and you implement it on top of NoSQL all you have done is reinvented a relational store in your application model and likely crippled integrity, scalability and performance in one fell swoop.

LordHumungous · on Dec 6, 2020

I disagree with pretty much everything in your comment.

> NoSQL stores can scale better in certain data access patterns

Exactly.

donor20 · on Dec 6, 2020

And that's where folks go wrong. NoSQL is not boring tech. As soon as you need to scale, it is MORE likely, not less, that you will end up in a weird state in your app, user enrollment flow etc. NoSQL makes scaling MUCH harder in my view. SQL tools give you a common interface many folks can engage with, and many tools.

I really want to talk to people building piles of spaghetti on noSQL because "it scales". At some point you've just go to start tearing your hair out.

Postgresql can do 1.5M queries (read/write) per second on OLTP loads on one box just to get started. If you really need more you can get extremely high with replicas. Then application design comes into play.

I'm tired of folks picking "noSQL" so things can scale. Dealing with all the edges cases as these things scale is a nightmare (plug mongodb and friends seem to fall over MUCH more often, recovery is miserable with them etc).

PeterCorless · on Dec 7, 2020

All of the issues with SQL, including strong consistency, data normalization, table JOINs, etc., mean that any RDBMS is going to inherently going to be limited in its viability to scale compared to a properly architected NoSQL database.

Disclaimer: I work at ScyllaDB. A LOT of our migrations are from people who got started on MongoDB, and then it fell over. Another group come from DynamoDB, and then they see their monthly bill.

There are also people who have moved to Scylla from PostgreSQL because it fell over, or those who blanched at their Oracle bill.

Scalability is not inherent to SQL or NoSQL. It requires both technical features as well as economical offerings. It is a quality of a product made with users and real-world workloads in mind.

As in all DB work, YMMV.

LordHumungous · on Dec 6, 2020

How do you load balance across replicas? How do you shard across replicas?

donor20 · on Dec 6, 2020

A lot of folks underestimate what one box can do. Memory / core counts have gone crazy on just one box. Local storage also has gone crazy. 4TB memory on a single node dual CPU machine, CPU's with 32 cores per CPU+?

Read only replicas are pretty trivial as well.

LordHumungous · on Dec 6, 2020

A lot of people underestimate what a high scale workload means.

jpgvm · on Dec 7, 2020

And even more people talk about high scale workloads with no clue what they actually look like. :)

I routinely work with 10TB+ PostgreSQL clusters, 10TB+ BigTable clusters and 500TB+ BigQuery projects all in my current day job. I'm in Data Infra btw so this is sort of my bread and butter.

In the past I have worked with 100TB+ Cassandra clusters, 50TB+ MySQL+Vitess and countless other stores like MongoDB, RethinkDB, Voldemort, TokyoCabinet and probably tons I have forgotten.

It's highly unlikely one actually works with and manipulates these volumes of data on a regular basis and doesn't respect SQL stores and the JVM (the literal king of Big Data).

LordHumungous · on Dec 9, 2020

I don't understand the point you are trying to make.

jpgvm · on Dec 6, 2020

Load balancing across read replicas is usually handled by your connection bouncer, say pgBouncer/pgPool/etc though you may also do some amount of more complex both L3 and L7 balancing if you get really big.

Sharding is usually a matter of actually splitting the masters. There are many techniques for achieving this. If you want the database to do all the work you will probably want to use something like Citus for PostgreSQL or Vitess for MySQL.

You can also build bespoke topologies using PostgreSQL logical or MySQL binlog replication.

Failing that you can do application level sharding if you don't want the database doing anything fancy for you and manage each shard as an independent database cluster.

By the time you actually need to do this you will be able to afford one of these options. :)

In the meantime you will save a ton of CPU, storage and development time vs a "NoSQL" store as databases like PostgreSQL are inherently more efficient for all but the simplest of KV access patterns.

LordHumungous · on Dec 6, 2020

> By the time you actually need to do this you will be able to afford one of these options. :)

What if I need to do this now? Why would I build a distributed postgres snowflake that takes 10 hours to spin up a new replica, requires that I implement my own sharding, instead of using a datastore that is designed to handle all of these things at scale?

jpgvm · on Dec 7, 2020

Comes down to your data model. If its inherently relational it's still the best play. Scale and performance are much more tractable problems than integrity and consistency. One you can measure and be sure of, the other you need a Phd to fully understand all the edge conditions that need covering.

There are some pretty decent NoSQL stores now for simpler access patterns. As long as you stay away from nonsense like MongoDB and stick to real databases like Cassandra/ScyllaDB/BigTable/etc you will do fine.

These stores are a fraction as flexible as PostgreSQL/MySQL but do allow scale-out storage and fast primary key lookups and scans. Good for when the size of your data is well in excess of 1TB+ and you don't need anything complex or consistency.

donor20 · on Dec 8, 2020

Reality - these folks don't need to "do this now".

Yes, Visa may need this. Guess what, 1TB+ of transaction data paying 30 center + 2% PER LINE - you'll be able to afford to do something reliable and scalable.

Folks don't realize, noSQL is not actually that scalable except in very narrow ways. And you can spin up pretty good scale SQL stuff with things look AWS RDS, including backups, replicas, snapshots to go back in time etc (noSQL doesn't support a lot of this).

pc86 · on Dec 6, 2020

Both SQL and NoSQL will scale fine for 99% of apps (and let's be honest, ~90% of apps don't need any scale).

Your data schema/format should be dictated by the data itself more than some handwavey "we might need to scale" requirement that isn't true the majority of the time.

LordHumungous · on Dec 6, 2020

K but what if you your service needs to scale.