MongoDB, Better

yanowitz · on March 31, 2014

If you're stuck with mongo in legacy infrastructure and it doesn't make sense to refactor/architect it away, I suggest tokumx. It's allowed us to kick the can on this problem for at least another year. Almost no lock contention, far more compact on disk (even cheap disk space adds up) and (what seems to be) a growing set of users.

I'm optimistic that pg9.4 will be our migration path. But regardless, tokumx has given us the breathing room to defer the decision.

mikegioia · on March 31, 2014

I was considering tokumx because it seems like scaled better than normal mongodb but to be honest I'd love to move of this database entirely. I'm somewhat unfamiliar with postgres but does it have a storage/querying system comparable to mongo?

munro · on March 31, 2014

If you want to store & query JSON [1], Postgres 9.3 is great! Plus you can index functions, meaning you get cool things like fast JSON look up, and doing case insensitive searches. Which is hard in Mongo, you would either have to do a slow regexp look up, or save a lower case version in your application logic.

  CREATE INDEX ON members ((lower(my_json_data->>'email')));

[1] http://www.postgresql.org/docs/9.3/static/functions-json.htm...

on March 31, 2014

[deleted]

kev009 · on March 31, 2014

I wager it's reasonable to consider it anything not consistent with _your_ _current_ architectural culture/guidelines.

Some examples using this definition:

* CICS on an IBM System Z might be considered legacy to one group and platform to another.

* Ruby might be legacy at Twitter but isn't elsewhere.

egeozcan · on March 31, 2014

As the communities grow, more people learn about mongodb's limitations and feel the need to switch. I like your "ruby" example because what mongodb faces is really similar: Easy to start with, hard to go... well... "web-scale" (by the way is this expresson becoming a word or has it already been?)

I usually do not link to my comments but I'll risk looking like a narcissist this time to give an example: https://news.ycombinator.com/item?id=6801970

jonknee · on March 31, 2014

Legacy is anything that's not perfect, but working and still being used just because it's working.

jimmaswell · on March 31, 2014

That could describe almost anything

jrarredondo · on March 31, 2014

I think that is the point. I would tweak the definition a little and say "Legacy is anything that's not fashionable anymore, but working and still being used just because it's working."

DAddYE · on March 31, 2014

Have you considered rethinkdb?

gtrubetskoy · on March 31, 2014

This sums up MongoDB pretty well: http://nyeggen.com/blog/2013/10/18/the-genius-and-folly-of-m...

bronson · on March 31, 2014

That's the best summary of MongoDB I've ever seen. Thanks!

(It's not full of vitriol and strawmen like most of the Mongo posts that seem to end up here.)

harlowja · on March 30, 2014

I wonder how licensing works here with mongodb being AGPL.

A message I proxied recently: http://lists.openstack.org/pipermail/openstack-dev/2014-Marc...

epoxyhockey · on March 31, 2014

Yahoo isn't the only large company to instruct its employees to avoid mongodb due to it being AGPL. In my opinion, the only people using mongodb in a commercial setting are those who are paying 10gen for a commercial license or those that don't know they are violating the AGPL license.

alrs · on March 31, 2014

???

Most people in a commercial setting aren't modifying the MongoDB source code. AGPL does not require that any software that communicates with AGPL software be AGPL'd.

But yes, if you're using MongoDB in a commercial endeavor, and you modify the source code, and you're using AGPL version, you do need to share your changes to the MongoDB source code.

nemothekid · on March 31, 2014

As I understand it, the drivers are a point of contention, and the parent's parent explains it well.

Technically, according the AGPL, MongoDB's database drivers licenses (apache) are incompatible with AGPL, and technically should be licensed under AGPL. Now the parent says that should be fine for official drivers because MongoDB isn't going to sue themselves, but the issue is for community drivers, like the Node Driver or the Golang driver. Since the AGPL states that any software built for the exclusive use for the accompanying software must be AGPL - then it follows that community drivers should be AGPL as well.

To me that means not only can you not modify the database, but you cannot modify the drivers. And I'm also unsure if that also means that any applications that link those drivers means that they must be AGPL as well. And if your web application must be AGPL, it also means that the source of whatever service you are providing must be available as well. So in a way it doesn't just affect corporations that want to modify Mongo, it affects everyone who wants to use Mongo (with a community driver atleast).

IANAL

ithkuil · on March 31, 2014

The copyright holder can issue multiple licences for a given product.

It's not about suing themselves'.

harlowja · on March 31, 2014

Under my current understanding the reach is much farther, see for example google open source directors response @ http://www.theregister.co.uk/2011/03/31/google_on_open_sourc...

>>> The Affero GPL is designed to close the so-called "application service provider loophole" in the GPL, which lets ASPs use GPL code without distributing their changes back to the open source community. Under the AGPL, if you use code in a web service, you required to open source it.

girvo · on March 31, 2014

I have no idea why you were downvoted, and I can't reply to your child post, but I wanted to point out that you're both agreeing with one another: if you're not changing the source, there shouldn't be an issue. Most deployments AFAIK wouldn't change the source, so no issue, but if you do AGPL does indeed require you to change it back even if you're not distributing said source.

yeukhon · on March 31, 2014

> the only people using mongodb in a commercial setting are those who are paying 10gen for a commercial license or those that don't know they are violating the AGPL license.

It is not clear to me exactly what the problem is from the message. Are we now discouraged to use MongoDB as a database (from startup to university), or writing MongoDB database driver based on existing MongoDB language driver such as pymongo or writing on top of MongoDB's database driver?

harlowja · on March 31, 2014

I think the followup from yahoo 'might' help: http://lists.openstack.org/pipermail/openstack-dev/2014-Marc...

yeukhon · on March 31, 2014

Yeah that definitely helps clarifying the issue, I think.

As long as Openstack only use Apache licensed code >>from MondgoDB Inc.<< and diligently avoids using any open source contributions from any community contributor to the MongoDB ecosystem, then you remain compliant the your CLA.

I wouldn't have known the APGL licensed database and the conflict with MongoDB, Inc. licensed code and community code. I guess now all ORM built upon pymongo used in commercial settings is that trouble zone then?

This is a huge bummer. Definitely an alert for those looking forward to use MongoDB at commercial settings as parent said.

AdrianRossouw · on March 31, 2014

This is actually really interesting. Due to it's nature, using mongo as a data source can easily become something very deeply ingrained into your application.

Whereas if mysql shipped with a similar restriction, you could easily flip the connection strings and have it mostly working on postgres or something else.

dkhenry · on March 30, 2014

I would love to see a use case from a large deployment. MongoDB is trivial at small scale and it is only when you get to large deployments that it really needs some TLC. If they can somehow make large deployments simple it might make mongo a viable contender again for Humongous data ( If you have a data schema that would play nicely with it )

AdrianRossouw · on March 31, 2014

i would love to see a solid technical reason to choose mongodb over any of the other NoSQL db's (couchdb, riak, redis, etc..) other than "it's popular".

No, I'm not trolling. I really want to know : https://news.ycombinator.com/item?id=7446919

nevi-me · on March 31, 2014

Geolocation, specifically GeoJSON. That's the main reason why I chose it (I started working on my app while it was at 2.0). When 2.4 came out with better geospatial indices (albeit basic compared to PostgreSQL+PostGIS) and GeoJSON support, I moved to using GeoJSON, and I am happy so far.

The website/app is at https://rwt.to , and an example route search is; from "Milky Way, Johannesburg" to "O.R. Tambo International Airport".

I should note that I've had a look at geocouch and it didn't fit my use case, I'm not doing trivial 'find my 3 places near [y,x]' queries, but am traversing a pseudo-network of routes to calculate directions. Neo4j also wouldn't have worked in my case. TokuMX is based on MongoDB 2.2 as far as I'm aware, so them too.

AdrianRossouw · on March 31, 2014

That's a very good reason, and the first real one I have heard, thanks man.

Also, I used to work for MapBox, and I know we did one project on mongo which I was not involved in, and afterwards we built everything with CouchDB (which is how I got acquainted with it).

For the geo stuff we actually used a lot of sqlite and to a lesser extent spatialite. We would pre-calculate things and build them into the rendered tiles in mbtiles format, or stream the point/polygon data from the couch database for realtime client-side compositing.

But yeah, routing is pretty high level stuff. I think they are only now putting the finishing touches on their openstreetmap driven routing system many years later.

[1] http://mapbox.com

nemothekid · on March 31, 2014

I would consider "Its easy to get started with" a valid technical reason.

Of all the "We moved from MongoDB to Cassandra/Riak/etc and gained massively!" I've rarely seen - and its possible that this is selection bias - companies start with the other NoSQL options.

I want to say, that unlike MongoDB - the others actually force you to think about your data and actively decide how you are going to store it. With MongoDB you can pretty much add an index on anything, but with Cassandra (maybe Riak/Dynamo too) you only get one free index before you have to denormalize and write application code to keep your performance.

Then lastly, MongoDB is good enough for most use cases. We didn't see major performance issues until we started constantly writing data to it (high write/low read) (basically we were wrestling with lock contention). I'd wager for a significant amount of MongoDB deployments, not only is Mongo easy to use, but fast enough too.

So while the other NoSQLs are (probably) more complicated and likely more performant, MongoDB, to me, hits a sweet spot of ease of use and performance that is good enough for most applications out there.

However, considering other "raw" technical aspects like performance, durability and scaling I've never seen anything that has shown MongoDB to be a leader.

AdrianRossouw · on March 31, 2014

Well, it's the opposite perception here.

"I've rarely seen - and its possible that this is selection bias - companies start with the other NoSQL options."

It seems like everyone starts with Mongo, because everyone starts with Mongo.

This means that you don't have the deluge of posts from people moving from other databases,

a) there are much fewer of them b) they chose them for solid technical reasons (not just because everyone does this)

So as for your perception that other NOSQL databases are "probably" more complicated, you should know that complexity is an objective measure. I think that mongo is definitely a lot more objectively complex than couchdb, and from what I have read around the subject, many of the other NoSQL databases.

What Mongo could well be is 'easier', which is relative. It seems like it's more familiar to certain programmers, which is kind of echoed by the fact that there's an incredibly popular object relational mapper (mongoose), that is being used with what is supposedly a non-relational database.

I use those terms in a very specific sense btw, which I documented here - http://daemon.co.za/2014/03/simple-and-easy-vocabulary-to-de...

It's from a very insightful presentation by the creator of the Clojure language, and I only wrote a summary because I got sick of trying to get people to watch an hour-long video before trying to discuss systems on this level.

tracker1 · on March 31, 2014

Mongoose adds a bit to the table. It actually adds schema validation, which mongo doesn't inherently have, and should be part of the application anyhow. I feel that's the biggest reason to use mongoose over the straight mongo driver in many cases.

I've used Mongo in a couple projects where it was a great fit. The scale wasn't huge, but having pre-shaped data for a mostly read scenario was great. I've found that it works really nicely for a lot of situations, and would definitely be a consideration.

I find that document databases work best when your data is read far more than written to, and when you can shape your data structures for simple key reads in most cases combined with indexed searches. I would consider the use of ElasticSearch or RethinkDB in most cases where you might look at MongoDB. It really depends on your needs here.

Riak and Couch offer other advantages, and like anything it really depends. Cassandra is another nice option for larger scalability, but everything has a cost.

Mongo is very reasonable, and to be honest, if you don't need more than a single server for your needs, it's really easy to get up and running quickly, and development tooling is decent enough, and the concepts are pretty easy to get up to speed with.

nemothekid · on March 31, 2014

I can't speak for everyone, but to me MongoDB is far easier than any other NoSQL engine I've looking into. The reason why I said "probably" because I can't speak for every NoSQL database out there.

We had a 5 node cluster in Mongo that we moved to Cassandra last summer. While our experience with Cassandra is by and large much more performant and cost effective than MongoDB, getting setup with Cassandra was not as easy with MongoDB. With MongoDB you can literally start throwing data in your database, then add an index after the fact. With Cassandra we had to make sure our data was modeled correctly, and decide where we would denormalize. Riak from what I remember has a similar data model to Cassandra, and Redis isn't something you just "start up and go" (mainly because its an in memory store).

So I know for a fact that Cassandra, Riak, Dynamo, and Redis are far more complex than MongoDB. Cassandra even requires you run a "repair" command periodically, and that alone makes it more ops work than Cassandra. We can even throw HBase in there too as it requires Zookeeper nodes, Named Nodes, and all that Hadoop goodness.

Now none of these databases are hard to use, but compared to mongo, mongo is a cakewalk. You literally spin it up, throw json inside, and get json back. There is no query writing, and for most cases there is very little ops management. In most cases if a query is slow, you can fix that by adding an index, or moving to SSDs, only once you have exhausted these options do you really have to consider anything else.

FullContact also has a similar story : http://www.fullcontact.com/blog/mongo-to-cassandra-migration... tl;dr Mongo was great for getting the product up and iterating quickly, but then they moved once they thought they needed too. Its my opinion that its far easier to get started with MongoDB that it is to get started with Postgres/MySQL.

Lastly, damn the technical reasons why its so popular, Mongo/10Gen used be a huge marketing engine around ~1.6/8. They captured a lot of developer mindshare and I'd attribute that to why its so popular now as well. Wasn't much longer after that when they naysayers & those hurt by the initial hype came out of the woodwork and we got the now infamous "MongoDB is webscale" video.

mikegioia · on March 31, 2014

It allows you to query json documents in a way similar to sql. redis is key/val and sits in ram, couch requires complicated design documents to query and is better as a key/val, but I'm unfamiliar with riak.

I think trello uses mongo primarily for production. technically it's feasible but I've found it to be more trouble than it's worth to scale -- too many machines are required per shard. I'm currently looking into rethink db as a replacement now though.

AdrianRossouw · on March 31, 2014

I use elasticsearch to index my couch database and query it using the rest based json query language.

i find it helps being able to scale them out separately too.

http://daemon.co.za/2012/05/replacing-couchdb-views-with-ela...

brickcap · on March 31, 2014

Hey Adrian I have got a question. Why did you chose elastic search for indexing couchdb data. Why not couchdb-lucene?

Couchdb-lucene seems more tightly integrated with couchdb to to me.

AdrianRossouw · on March 31, 2014

I think it wasn't 'ready' at that point in time, and the json based query language was closer to what we needed.

The real problem was that the data was being imported in bulk by the user, from a many-meg-sized csv . It would grind couchdb to a halt trying to build views, so having elasticsearch be a separate process that could work through it made a lot of sense.

brickcap · on March 31, 2014

Thanks for answering. I have used elastic search before and I was very impressed by it. Now I am trying to evaluate couchdb-lucene to see if it can prove to be a good alternative.

h1karu · on March 31, 2014

There's no need to go through all of that trouble.. Why not just use Cloudant until the BigCouch merge hits ? :)

AdrianRossouw · on March 31, 2014

Trouble? It's a single REST request to get a couch database continued indexed in elasticsearch in almost 'real-time'.

It just subscribes to the _changes feed and updates the index, in the same way couch replication works.

h1karu · on March 31, 2014

I spoke too soon, seems pretty bad ass in that case. gotta love _changes

AdrianRossouw · on March 31, 2014

yeah, it's totally overpowered. =)

timfpark · on March 31, 2014

i think a lot of the reasons I've seen come down to business reasons, not technical. Someone wants an app fast, like now, and MongoDB is fast to setup and get running with.

AdrianRossouw · on March 31, 2014

I guess I'll be able to confirm once I'm forced to build something in it, but I don't think it can really be faster to setup and get running with than CouchDB.

Usually first thing you need to do is write a REST layer on top of it, and with CouchDB that part is just done already.

Obviously there's certain kinds of data I wouldn't put in Couch, or any kind of NoSQL database.

You need to know what the right tool for the job is,but I just want to figure out when that tool is mongo.

h1karu · on March 31, 2014

Why would that tool ever be mongo ? I think Mongo is a thing because people coming from Rails ORM libraries feel like "wow I can jump on the no-sql bandwagon just by using a library that feels kind of like the ActiveRecord I'm used to".

It's only popular because there's less of a conceptual gap between mongo and the relational database tools that a lot of people are used to.

Couchdb on the other hand requires you to actually learn and use map/reduce.. which is a pain for people who don't feel like having to learn something new, but Couchdb is MUCH MUCH better in a lot of ways and Mongo is pretty much fundamentally flawed in my opinion

I do wish rackspace luck though with their offering. I think it was smart of them to create this mongodb product for one simple reason: a good number of people are already using mongodb so it makes sense to help them get the most out of it.

timfpark · on March 31, 2014

i didn't mean to say that MongoDB was the only database with that property. Yes, CouchDB does as well.

nasalgoat · on March 31, 2014

I managed one of the largest MongoDB installations and I can safely say running it at scale is extremely difficult.

They've made a few changes, like not hardcoding the maximum number of connections and shards anymore, which helps but overall the big problems like database-level locking are serious problems even a year later.

The reasons for choosing it were very simple - the lead developer was familiar with JSON and liked using it for queries, and he liked the "schema-less" nature of document storage. No consideration was given to performance or scaling issues, it was purely a comfort level decision.

dkhenry · on April 1, 2014

Right and rackspace is claiming they are taking care of all that for me. Or are they just spinning up hosts and I still need to configure everything

victorhooi · on March 31, 2014

I'm sorry, but this seems a bit of a lazy question.

There are many large-scale deployments of MongoDB - a simple Google search will yield you results.

Off the top of my head - FourSquare, Stripe, ServerDensity, eBay (non-site) etc.

MongoDB (the company) also uses it for MMS - their cloud-based monitoring system, which probably handles hundreds of thousands of metrics every second from tens of thousands of hosts.

So yes, there is a lot of FUD about "it doesn't scale" etc.

Most of the FUD seems to originate from people not reading the manual, and completely misconfiguring things, and wondering why it doesn't work.

To be fair, most competing products (Riak, Couch etc.) will scale enough for most people. So this is sort of a red herring. (And by the point that you are as big as FourSquare, the assumption is you'll probably hire engineers who will read the manual =) ).

So the decision boils down to other things - how easy is the query language, do you need GeoJSON support, do you need aggregations, how mature is the overall ecosystem etc.

And that's why people are picking MongoDB - not really the WOAH, LOOK AT THE OPS PER SECOND!.

camus2 · on March 31, 2014

> Most of the FUD seems to originate from people not reading the manual, and completely misconfiguring things, and wondering why it doesn't work.

Most of the FUD comes from the deceptive marketing 10gen used to promote MongoDB. It now has a well deserved bad reputation that will never go away,no matter how much startup choosed it.

victorhooi · on March 31, 2014

What deceptive marketing exactly?

Could you cite an example please?

For example, there was noise before about how MongoDB was allegedly tweaking benchmarks.

The funny thing was, from what I've read, they've always had a policy of never ever publishing official benchmarks. Their line was, read the manual, and try it with your own data.

Are you aware of any "deceptive marketing"?

camus2 · on March 31, 2014

Are you aware of any FUD ? then ask yourself where it comes from. MongoDB is a bad database,If you care about your datas you shouldnt use it.

ddorian43 · on March 31, 2014

"it is a replacement for rdbms"-mongodb

dkhenry · on March 31, 2014

Maybe you misunderstood what I was asking. I know MongoDB can scale out to very large sizes, but it becomes more of a effort to scale it out correctly ( 3 config servers and all shards replicated ). My question is can I just turn a dial at rack space and now I have my seven servers all provisioned and configured correctly, since that take the dead simple single node MongoDB deployment and extends it to the sharded replicated cluster.

emperorcezar · on March 31, 2014

None of this means anything unless there is clear pricing. Am I not seeing it?

saryant · on March 31, 2014

http://objectrocket.com/pricing

mason55 · on March 31, 2014

Seems a bit expensive assuming each shard gets you three more servers. Based on RS pricing you could build your own instances with 1TB SSDs for like $500 each, so for the same price that you'd get 100GB x 3 shards you could do 1TB x 3 shards on your own. I guess with the RS pricing you also get managed backups but IMO that's not worth the price difference. If you figure you build your own, 3 shards x 3 RS members x 1TB is 9 servers at $500 each or $4500. With this service... if 100GB is $1599 you have to imagine 1TB has to be at least on the order of $10k, so you're looking at $30k total. $25k a month for managed backups and infrastructure just doesn't seem worth it to me. Maybe what they're using is more powerful than the instances we're on but I still have trouble reconciling the pricing. And if that 1TB is 1TB total and not 1TB per shard/replica set member then the pricing looks way, way worse.

kapilvt · on March 31, 2014

Rackspace did an evaluation of the various mongodb hosted solutions before buying out objectrocket.. Its worth a read http://developer.rackspace.com/blog/benchmarking-hosted-mong...

TLDR, for hosted mongodb its pretty awesome (ssd hardware in direct connect locations, tuned and managed).

mason55 · on March 31, 2014

That's an awesome link, thanks. Does a lot to clarify where the pricing disconnect is between spinning up my own servers in the RS cloud and using ObjectRocket.

IMO RS's ObjectRocket pages should do a better job showing what you're getting on top just deploying to a bunch of RS cloud servers with SSD's in them.

jrarredondo · on March 31, 2014

Mason55, I work for Rackspace. The ObjectRocket service does not run on the cloud servers you get from RS. It's actually a different architecture just for MongoDB: flash drives all over, containers, pretty much tuned for MongoDB across the stack. While we don't describe the full details of the architecture, you have a very valid recommendation that we will take into consideration to make sure it is clear what you are getting for your money. Thanks.

mason55 · on March 31, 2014

Thanks for the additional info, great to have someone from RS chime in!

taude · on March 31, 2014

Funny, only seconds after clicking on the OP page, I started getting the targeted Ads for MongoDB & Rackspace when visiting YouTube videos.

jasonmccay · on March 31, 2014

Isn't this just a link to an advertisement?

putlake · on March 31, 2014

We use MongoDirector and have been pretty happy with it. They have a Bring Your Own AWS infra plan that works perfect.

endijs · on March 31, 2014

Sometimes I wonder - what's hated more by HN readers... PHP or MongoDB... And what's not fun - i use both.

cordite · on March 31, 2014

Is this like MongoHQ?

jasonmccay · on March 31, 2014

Sort of ... but there are pretty serious limitations.

1. You are locked into Rackspace as your provider. MongoHQ provides multiple cloud providers.

2. They force you to shard. This increases operational and application complexity ... and may not offer any real advantages for the amount of data that you have.

jrarredondo · on March 31, 2014

I work for Rackspace.

On [1], you can actually keep your app in AWS and connect / migrate your data to OR over Direct Connect.

On [2], you don't have to shard. There is an option to scale vertically when you need to.

You can call RS people who can help guide you making the best decision on [1] or [2] based on your current and future situations.