> To prevent deleting data by mistake, we invent soft-delete pattern by having a is_deleted column. However, this brings extra complexities around foreign key, unique index enforcement. Ideally, PostgreSQL could allow users to configure an archived table. The removed data is moved to the archived table first and purged after a configured retention period. This simplifies application logic and data compliance work.
You can get the same result by changing your is_deleted boolean field to a date_deleted date field. Then use a cron to purge all the records with dates older than the configured retention period.
> The typical way to do schema migration is to compose a list of ALTER TABLE statements. This becomes hard to track the latest schema state as the migration accumulates. It's more intuitive for the developers to specify the desired state. Ideally, PostgreSQL could allow developers to specify the desired CREATE TABLE schema, the engine then reconcile it with the latest schema, figure out the diff, and plan the migration path internally.
This is a terrible idea. The behaviour of engines that do this is unpredictable and suddenly you lose data because the engine deleted a column.
If a tool blindly drops columns, that's just a bad tool! It doesn't mean the concept is flawed.
Thousands of companies successfully use declarative schema management. Google and Facebook are two examples at a large scale, but it's equally beneficial at smaller scales too. As long as the workflow has sufficient guardrails, it's safe and it speeds up development time.
Some companies use it to auto-generate migrations (which are then reviewed/edited), while others use a fully declarative flow (no "migrations", but automated guardrails and human review).
I'm the author of Skeema (https://github.com/skeema/skeema) which has provided declarative flow for MySQL and MariaDB since 2016. Hundreds of companies use it, including GitHub, SendGrid, Cash App, Wix, Etsy, and many others you have likely heard of. Safety is the primary consideration throughout all of Skeema's design: https://www.skeema.io/docs/features/safety/
Meanwhile a few declarative solutions that support Postgres include sqldef, Migra, Tusker (which builds on Migra), and Atlas.
This is exactly how auto migrations in things like Django and Prisma work. Yes you need to check the changes before you apply them but most of the time they are entirely sensible.
Wouldn't it be more interesting to have this as an external tool?
You input your create table statement and it issues you back the migration statements? Then you can check it against your development database or whatever and if you feel fine use it?
This way you could check and modify the migration path without writing the alter statements.
This is one of the most frustrating thinks with sqlite for me. Changing a table doesn't always work with an alter statement but sometimes you need to drop and recreate it with the new columns. Why can't they do the magic for me. It's really frustrating and was often enough the sole reason I used postgres for private projects.
There are actually a bunch of external tools that offer declarative schema management flow. Personally I agree that that the overall flow should be external to the DB, but it would be useful if databases could offer more built-in functionality in this area to make these tools easier to write.
We do exactly this, but with our own "ground truth" schema in the form of an XML file defining the various tables, views, triggers and so on.
We then have a program which compares the latest schema from XML to a given database, and performs a series of CREATE, ALTER and so on to update the database so it conforms.
Since we've written it ourselves we have full control over what it does and how it does it, for example it never issues DROP on non-empty tables/columns or similar destructive actions.
We've had it for a long time now and it's worked very well for us, allowing for painless autonomous upgrades of our customers on-prem databases.
I find this interesting! Is there anything similar in the open source world? I have built projects with Supabase in the past and one of my gripes about it is that it becomes obnoxious to track what happens to the schema over time in version control with dozens of migration files. For example, you have no idea what the current state of a table or trigger is because there might be several updates that occured over four dozen migration files and you may not even be sure which files those changes are in. You have to just look at the database directly rather than the code.
I haven't stumbled over anything, but I wouldn't be surprised if it exists. It's not magic, my colleague wrote the first version in a few days, and we've iterated on it since.
It just requires that there are some system views or similar that you can use to extract the current database schema, so you have something to compare against.
Our tool goes through the XML file and for each table runs a query to find the current columns, and for each column find the current configuration. Then compare with the columns in the XML file and decide what to do for each, ALTER, DROP or ignore (because possible data loss) etc. Datatype changed from "int" to "varchar(50)"? Not a problem since 50 chars are enough to store the largest possible int, so issue ALTER TABLE. Column no longer present? Check if existing column has any data, if not we can safely DROP the column, otherwise keep it and issue warning.
Views, triggers and stored procs are replaced if different. We minimize logic in the database, so our triggers and stored procs are few and minimal.
Materialized views require a bit of extra handling with the database we use, in that we can't alter but have to drop and recreate. So we need to keep track of this.
As you say it's very nice to use as a developer, as you only have to care about what the database should look like at the end of the day, not how it got there. Especially since almost all of our customers skip some versions (we release monthly).
Why would you think so?
The PostgreSQL developers didn't stick out by delivering bad software/features so far. If they would implement this I would actually expect a pretty good way with even better documentation.
The creation of the backing migration plan would have to be interactive, certainly. But that doesn't mean the concept as a whole is destined for failure.
For me, it’d be more control over the query planner, ideally down to the level of submitting my own physical query plan.
I was looking at a query yesterday that joined 3 tables and we were unable to convince Postgres to use the optimal plan, which was to scan an index on Table1 that matched its ORDER BY order, and filter by joining against the other tables.
With a CTE for Table1 that had a limit on it, Postgres would pick the optimal plan, but then may underflow the desired limit after filtering. However the query plan is optimal and it will finish in 3ms.
Without the limited CTE, Postgres would run the Table2 and Table3 join in parallel with Table1, and then try to hash join like 3,0000,000 rows to produce the final tuple set. This query plan takes 40 seconds.
It’s just so frustrating knowing the system has the capability to serve a query in fractions of a second but you can’t extract that performance directly. Instead you need to write some truly bizarre code you hope the query planner will like, and then live the rest of your life in fear the query planner changes its mind.
A few days late here (joys of opening multiple tabs then only getting to them when bored), here's hoping you look back and see this:
The difference between using a LIMIT and not hints this is happening because of a low correlation statistic between that index and the order on disk. Postgres is avoiding random-access lookups in what looks to you like the optimal plan, because this blows away the disk cache (paging cache) and would result in a much slower overall time than you think it will, given other configuration values.
The CLUSTER command will reorder the table data to match a chosen index, after which postgres will use that index with an index scan the way you want it to, even if you retrieved the whole table and have a slow disk, because now with a high correlation statistic the query planner knows it'll be working with the disk cache instead of against it.
Yes this is completely my experience too. I spent most of yesterday battling with the query planner. Sometimes you can't express with table statistics something that you know to be true for the exact query that you're making.
There are a lot of arguments about query hints etc becoming stale and the performance changing as the table grows but I'm less worried about that - it would be a gradual degradation of performance. What worries me is that the table stats cross a threshold at 3am and suddenly the query planner chooses something crazy.
I've also wondered if an approach of trying a bunch of query plans could be fun. Get it to pick the top 10 possibilities and just run them all and record stats on which was fastest. Doubly so if it ever decides to run a seq scan where there is an index. Please. Just try the index! That said yesterday I was definitely at the point where I just wanted to express the exact query plan myself.
Very memorable post about this happening at Bluesky, about being utterly unable to trust Postgres to keep doing what it's doing.
> Bluesky scaling lesson: Postgres is not a great choice if your data can be irregular.
> The query planner will switch over to a new plan that consumes 100% CPU in the middle of the night whenever its table stats flip the heuristics the wrong way.
> Postgres badly needs query hints like MySQL has!
I'd like to have a way of disabling certain (legacy?) functionality. They even list some of it on their wiki [1]. Maybe we could even get speedups by not covering the legacy cases.
A plan to phase out some of these legacy features would be even better. Don't know if they do this in general.
As a developer who is not a DB expert, I always wonder why index cannot be auto added based on queries may be ? So lets say I have a table and once it starts filling in data and sees the typical queries coming in (say based on user id or email etc), just add the index ?
Adding an index means the write path suddenly became slower. Sometimes, a covering index will be very beneficial. « It depends » is usually the right answer. I’m fine with the engine telling me « this query is usually slow » and I investigate why. I’m not ok with the engine adding indices willy-nilly and suddenly writing to the DB is 10x slower.
Indexes aren't free. They take up storage space and reduce the performance of inserts. On top of that, if an index gets too big it can be really slow, the table needs partitioning instead.
In some applications you might value the performance of insert a lot more than select, and not want to pay for extra storage of the index.
> The largest real-world query that we are aware of accesses more than 4,000 relations.
And you have a lot of constraints:
- More indexes more data, less RAM, less SPACE.
- More indexes, more query optimization paths, more complex solving of query optimization
- Adding more indexes could create suboptimal access patterns.
- And that changes with time
- And that changes could be on a second
- And then you can create an unintentional delay that WILL impact the pocket of somebody
Doing this stuff "silently" is a sure way to add problems that will be very hard to solve or debug. Sure, without this some of the problems still remain, but at least will be possible to see why and to know when it get solved.
If anyone manages to create a solid implementation of this, it will deserve a noble prize, an oscar, a place in the half of fame, and will be rich.
I've always found that this is a great idea that doesn't hold up to the reality of DB tuning. There's no "typical" query, in the sense that should it optimize for the more common but pretty performant query, or the rarer, very slow query? What type of client are you most concerned about? Both? Is that possible? which queries are worth the (total) cost of the index? Does a specific index hurt some other workload like automated integrations, or is it OK to have slow updates in the middle of the night? You can definitely profile over time and there are tools that help you capture data and make recommendations, but auto-indexing would be a limited to poor solution a lot of the time.
> Archived Table
> To prevent deleting data by mistake, we invent soft-delete pattern by having a is_deleted column. However, this brings extra complexities around foreign key, unique index enforcement. Ideally, PostgreSQL could allow users to configure an archived table. The removed data is moved to the archived table first and purged after a configured retention period. This simplifies application logic and data compliance work.
Also of interest, you can use the cluster command to reorder rows fysically on disc according to an index. This helps you cluster all active records, and cluster all deleted records. This way you can keep the records in one table, but more efficient querying as you need to read less database blocks.
Interesting list. I think the items related to schema migrations are probably bad ideas, as the logic and process around even simple schema migrations can be extremely complex and varied. It seems much better to keep this as an external component than to build features into the db engine around a narrow set of approaches.
I like several of the other ideas, though. Branching is at least partially implemented in other RDBMSes. MS SQL Server has been able to do instant read-only database snapshots (which capture a point in time and can be queried just like the original DB) for decades.
A built in connection pooler would be nice! My first database tech was Microsoft SQL Server, and ADO.NET supports it out of the box, so when I made the move it was the most obvious missing feature to me.
As someone who's never used those technologies - is that not a client-side concern? What's different about it being "built in" vs adopting something like pgpool or Supavisor when you have those requirements?
>Rollback. If a new schema change introduces issues, versioning allows for safer rollbacks to previous stable states, minimizing downtime and impact on users.
Wonderful intention, not always easy/possible to pull off because you're not just managing the stateless structure, but the state which is stored by the structure. Migrations that drop columns by merging their data into another may well lose information and may do so in a way that makes the original data unrecoverable... not mentioning how to decide the information for any records possibly added after the dropping migration, but prior to running the rollback.
Sure, there are ways to handle these kind of scenarios, and sometimes in reversible ways... but there comes a point were a rollback looks more like moving forward adding a feature/capability than it does a return to a previous state.
Defining the "one true way" for migrations may cause more harm than good despite how convenient it may be. How migrators help you manage scenarios like rollback can vary from tool to tool and is part of the selection process when finding the right tool for your project. I've built a database migrator for my project which implements a non-traditional approach to migrations which works very well for my project, but would cause others to cringe with all sorts of objections. Sometimes an external tool is a better answer than an internal one.
I like to extend the idea and would ask for a more ingraned concept of versions and version migration.
Basically what flyway does but on DB level.
You could ask your DB from any backend if its in a migration status or not, it would be first class citizen were its absolutly clear that a DB migration should NOT disrupt production, it could also make it much more longrunning.
Like let me schedule a migration over 1 or 2 days through the DB and thanks to statistics and knowledge it knows to do the big chunks at night for example.
I wish it had logical locks where you didn’t need to lock a particular row. Instead you’d just specify a string and lock on it and then execute your application code. Would be useful for distributed locks, or for transactions that may conflict with another so you take the logical lock in the application code before proceeding.
That is the way, but the UX is pretty ass because the lock ID is a 64 bit number instead of a string. How the heck are you supposed to keep track of what lock ID you should be checking in a given situation across multiple client apps?
You get the same problem for strings. How do you know that you should lock "user_update" and not "update_user" for example? And how do you avoid name collision when client A wants to check for a lock that is used by client B for other purposes?
The solution to both cases is to define them as either static constants or use an Enum. Then you would not care if the end result is a string or a number.
At my work place we simply have a static class with lock names that we use.
Good point, that’s a better approach. I guess for a multi-repo situation at work, you would need to create a base project like “postgres-lock-ids” so you can synchronize the lock across everything.
Have you passed a string into Postgres and then hashed it into bigint with another function? If yes, what function did you use?
I assume that if you do it this way, then you see a string key in logs, views of current/locked queries, etc. Which should immensely help when debugging any kind of problems.
I have a long time humble wish, an auto-updated timestamp for row updates.
On mysql you can just declare on the row:
`on update CURRENT TIMESTAMP`
On postgresql you have to set triggers on each table individually, and trying to automate with ddl commands require admin privileges which you not always have when applying migrations.
Just annoying when the attributes aren't in a logical order because they have been added to incrementally.
I've got a table that contains the columns length, width, height and cube which is length x width x height) but cube is like 5 columns down next to the 'short description' field (which would ideally be next to the description field!)
The order can also impact postgres query performance, but for me it's mostly a look & feel thing.
Thanks, that was an interesting read. I'm curious how much real-world storage it would save for your average developer. Wonder if there's a nice little script out there you can run against an existing instance to get some kind of stats.
It would be easy to create a script like that. You'd save quite a significant storage space, especially when you have tables with hundreds of millions of rows. The real problem though is with further migration of your schema down the line, where you're going to add/remove columns, as it's almost always the case.
You can get the same result by changing your is_deleted boolean field to a date_deleted date field. Then use a cron to purge all the records with dates older than the configured retention period.