For me anyone that uses the term "NoSQL" to group completely different databases...

chaostheory · on Feb 14, 2014

> Cassandra and MongoDB are completely fine as general purpose databases.

Whenever I hear this, I feel that the person saying it either doesn't have enough experience or that every problem to them looks like a nail for their particular flavor of NoSQL. That's like saying a street bike is a general purpose bicycle. I'm sure someone out there can make it work on an outdoor offroad trail, but is it ideal?

Cassandra is really awesome at writes but reads are really expensive. Atomicity at the row level only (unless you want that 30% performance hit) and eventually consistency make this not great at real time. CQL3 is an improvement but it's still not as good as ANSI SQL. I still have to do a lot of extra work.

With Mongodb it does really well with tree based data. The second you try to do join queries outside of that doc tree... major performance hit. (Maybe this has changed knowing the speed mongo evolves.) When data is larger than RAM... major performance hit... not so great for something like logging.

Using an RDBMS for everything isn't great, but the pain you suffer is a lot less especially since these days more and more them have some sort of federation or cluster solution. That said, some NoSQL datastores are better for certain applications than an RDBMS, just not for all of them.

Xorlev · on Feb 14, 2014

Reads definitely aren't cheap, not compared to the hyper-optimized write path. Cassandra reads can however be exploited to make a lot of common time-series reads extremely cheap by amortizing the cost of the read over a lot of returned data (think a clustered row with lots of byte-sorted columns). Not that RDBMSes can't do the same thing, but it's a complex challenge to linearly scale and maintain availability.

Cassandra also exploits SSDs quite well by avoiding write-amplification problems, given that all write IO is sequential only.

I'm a huge fan of both traditional RDBMSes and alternative data stores. The most important rule for touring data stores is to be sure not to bring it into work when it's not appropriate and to evaluate the hell out of any solution before you write a single line of code.

Most of the time, PostgreSQL will do 99.9% of what you need and then some.

iamaleksey · on Feb 14, 2014

> Cassandra is really awesome at writes but reads are really expensive

This is getting old, and is just plain untrue. Reads are more expensive than the extremely cheap writes, but they really aren't expensive.

> Atomicity at the row level only

Partition level, really, in modern terminology (apologies if that's what you meant).

chaostheory · on Feb 14, 2014

> This is getting old, and is just plain untrue. Reads are more expensive than the extremely cheap writes, but they really aren't expensive.

It has improved a lot over time but it isn't a myth due to the fact that it's not good out of the box. You have to spend some time with configuration and mulling over the schema to mitigate it. It's something you don't have to really think about as much on other data stores.

jdrobins2000 · on Feb 14, 2014

I agree that Cassandra CAN be used for basically everything a RDBMS can, but to claim it can be used just as easily is a stretch (based on my experience migrating from PostgreSQL to Cassandra). It lacks many administration tools, and for data with complex, non-hierarchical relationships, data modeling is more complicated and you must do any joins yourself. Sure you can get major performance and scalability gains if you use it correctly, but you also give up being fully ACID.

As a developer, using C for all data in any complex application is going to be harder up front than using a RDBMS. From what I have heard and learned about C, though, it is less painful to use C for a highly performant/scalable database than trying to scale most or all RDBMSs. From what I have read, Mongo is also hard to scale.