So this is just an observation and not a judgment.
With only 4 years of professional experience I have never worked on a MongoDB project in which MongoDB somehow wasn't an issue. The proposed solutions merely being "improve the indices" or "scale the cluster", often without clearly defining what's going on.
So granted I have a very limited experience, it's baffling to me mainly in comparison to the PostgreSQL projects I worked on. They definitely also had problems but these problems were clearly defined even if resolution wasn't quick or easy. It was usually an out-of-date or generally messy schema that was causing issues and folks usually were able to clearly define the schema problems.
I hear this about MongoDB (that it's a plague and you should stay far away) all the time but DynamoDB, which is also NoSQL? One of AWS' finest products; just launching an app on AWS? Use DynamoDB! So on and so forth ad nauseum.
Why is DynamoDB the bee's knees but MongoDB is a thing to be despised?
1. One has a better marketing machine than the other?
2. DynamoDB is pushed in slightly more balanced ways than Mongo was, at the start Mongo was supposed to be the second coming of the (database) Messiah.
3. I don't believe DynamoDB has defaults that lose your data.
Mongo works really well for backends to test your UI against in coding bootcamps. Dead simple CRUD storage and access model. As a mock React data store, it'd be hard to find simpler.
DynamoDB is a managed service that scales far beyond Mongo and truthfully far beyond what any of us here would be seriously discussing.
With regard to complex schemas and related data, both DynamoDB and Mongo are harder to deal with once you've past a trivial size. If all you need is basic CRUD with limited or no joins, Mongo could suffice to a large deployment, but if you're mostly accessing by primary key, DynamoDB will soundly eat its lunch performance-wise and for cheaper.
If you need to join related data (or just find them convenient), relational models work famously well, and usually up the point where you have 10 million simultaneous users.
Mongo more and more is being relegated to niches where the problem just happens to exactly fit Mongo's feature list, which unfortunately for Mongo encompasses ever-narrowing gaps in the other options' offerings.
Mongo can't match the scale or economy of DynamoDB and can't match the flexibility of relational. That's why fewer and fewer treat it as their go-to database nowadays.
Dynamo seems nice for some very specific use cases, but there are others where it falls short.
There's limits on how big your data can be, which is a little annoying if you want to use it to store a couple larger things alongside all your small data.
The C# API is absolutely painful IMO, and makes it easy for new developers to use it in a way that you would have been better off grabbing a different technology.
I suppose one could argue S3 would be the better choice for large payloads, OTOH I'm a fan of minimizing potential points of failure so if we already are using DynamoDB I'd rather not toss in an additional S3 integration that could break or need maintenance later.
1. Below 25GB and the read/write threshold of most apps, DynamoDB is free.
2. DynamoDB is a 100% managed service. No instances to wrangle. No manual partitioning. Just define your table's name, the table's partition key, and an optional sort key, and you're off to the races! It just works.
If you have a serverless environment like lambdas with API Gateway or AppSync, it can scale pretty much to the extent of your business model rather than some fixed limit.
BUT for data schemas beyond the most trivial, it can easily be more complex to deal with than a relational database. Whereas a relational database usually aims for normalization where no data is duplicated and foreign keys keep things straight, DynamoDB works best with a denormalized data set. No joins. Ever. Schema integrity is your problem, not the database's. In the deal though, you get a database engine that can scale effectively infinitely.
In other words, storage is cheap, but access is expensive, so data duplication is pretty much encouraged in DynamoDB for the sake of speed. You aim for getting everything you need in a single entry or sequential row iteration.
When we use a relational database and run into problems, we run EXPLAIN and EXPLAIN ANALYZE to figure out the query plan, so we can optimize. SQL is a 4th generation, declarative language that describes WHAT data you want, not HOW you get it.
DynamoDB in the larger sense is at the level of EXPLAIN output. It is 100% HOW to get data. The WHAT is at application level and implemented by you in code.
If EXPLAIN output makes no sense to you, then DynamoDB probably isn't for you either unless it's a trivial app/data set.
But then again, if it's a trivial app/data set, literally anything can work. An O(n!) algorithm is perfectly reasonable given a small/simple enough data corpus and large enough computing resources. It's when the data set gets slightly larger that decisions become important.
But when things are very small/simple, it's often hard to argue with fast+free. Those are the sweet spots for DynamoDB: very small/simple and the mind-bogglingly humongous. For everything in the middle, relational databases work wonderfully and are much easier to work with, especially for non-trivial data sets.
DynamoDB is a key value store so forces you into an up-front waterfall design model where you can't easily evolve your schemas. This has massively weakened the scenarios where it can be used but there's a survivor bias where if that's all you need you're really dealing with a simple use case that's easy to encapsulate and reason about
DynamoDB is evolved from Dynamo, described in details in their SIGOPS’07 paper. Amazon uses Dynamo internally for e.g. shopping charts. There’s nothing wrong with a distributed key/value store. It’s just that I think Dynamo started off being super reliable, and not overpromising …
With only 4 years of professional experience I have never worked on a MongoDB project in which MongoDB somehow wasn't an issue. The proposed solutions merely being "improve the indices" or "scale the cluster", often without clearly defining what's going on.
So granted I have a very limited experience, it's baffling to me mainly in comparison to the PostgreSQL projects I worked on. They definitely also had problems but these problems were clearly defined even if resolution wasn't quick or easy. It was usually an out-of-date or generally messy schema that was causing issues and folks usually were able to clearly define the schema problems.