>Basically they don't. Database migration based on frontend deploy doesn't really make sense at facebook scale, because deploy is no where close to synchronous; even feature flag changes aren't synchronous.
Our deployments aren't strictly "synchronous" either. We have thousands of database shards which are all migrated one by one (with some degree of parallelism), and new code is deployed only after all the shards have migrated. So there's a large window (sometimes up to an hour) when some shards see the new schema and others see the old schema (while still running old code). It's one click of a button, however, and one logical release, we don't split it into separate releases (so I view them as "automatic"). The problem still stays, though, that you can only guard code with feature flags, migrations can't be conditionally disabled. With this setup, if a poorly tested migration goes awry, it's even more difficult to rollback, because it will take another hour to roll back all the shards.
Serious question: are you going to catch "corrupt data"-style migrations in staging in general?
There are of course "locks up the DB"-style migrations where you can then go in and fix it, so staging helps with that. But "oh this data is now wrong"-style errors seem to not really bubble up when you are just working off of test data.
Not to dismiss staging testing that much, but it feels like a tricky class of error where the answer is "be careful and don't delete data if you can avoid it"...
We don't have a staging environment (for the backend) at work either. However, depending on the size of the tables in-question, a migration might take days. Thus, we usually ask DBA's for a migration days/weeks before any code goes live. There's usually quite a bit of discussion, and sometimes suggestions for an entirely different table with a join and/or application-only (in code, multiple query) join.
Sorry for the silly question, perhaps, but what is the purpose of a db migration? Do schemas in production change that often?
For context, the last couple of services I wrote all have fixed, but implicit schema, (built on key value stores). That is, the DB has no types. So instead, the type system is enforced by the API layer. Any field changes so far are gated via API access and APIs have backwards compatibility contracts with API callers.
I’m not saying that the way I do it currently is “correct” - far from it. I strongly suspect it’s influenced by my lack of familiarity with relational databases.
There is a lot to be said about enforcing the schema in the database vs doing it in application code, but not doing migrations comes with an additional tradeoff.
If you never change the shape of existing data, you are accumulating obsolete data representations that you have to code around for all eternity. The latest version of your API has to know about every single ancient data model going back years. And any analytics related code that may bypass the API for performance reasons has to do the same.
So I think never migrating data accumulates too much technical debt. An approach that many take in order to get the operational benefits of schemaless without incurring technical debt is to have migrations lag by one version. The API only has to deal with the latest two schema versions rather than every old data representation since the beginning of time.
Variations of this approach can be used regardless of whether or not the schema is enforced by the database or in application code.
Relational databases can be very strict, for example if you use foreign key references then the data base enforces that a row exists in the referenced table for every foreign key in the referring table. This strict enforcement makes it difficult to change schema.
The way you handle things with API level enforcement is actually a good architecture and it would probably make schema changes easier to deal with even on a relational database backend.
A fairly recent example is a couple of tables for users who are “tagged” for marketing purposes (such as we sent them an email and want to display the same messaging in the app). These tags have an expiration date at the tag level but we wanted the expiration date per-user too. This enables marketing to create static tags. This requires a migration to the data so this can be supported.
Schemas don’t change that often, in my experience.
Our deployments aren't strictly "synchronous" either. We have thousands of database shards which are all migrated one by one (with some degree of parallelism), and new code is deployed only after all the shards have migrated. So there's a large window (sometimes up to an hour) when some shards see the new schema and others see the old schema (while still running old code). It's one click of a button, however, and one logical release, we don't split it into separate releases (so I view them as "automatic"). The problem still stays, though, that you can only guard code with feature flags, migrations can't be conditionally disabled. With this setup, if a poorly tested migration goes awry, it's even more difficult to rollback, because it will take another hour to roll back all the shards.