Hacker Newsnew | past | comments | ask | show | jobs | submit | ants_a's commentslogin

What you are describing here does not match how postgres works. A read on the replica does not generate temporary tables, nor can anything on the replica create locks on the primary. The only two things a replica can do is hold back transcation log removal and vacuum cleanup horizon. I think you may have misdiagnosed your problem.

There is no backpressure from replication and streaming replication is asynchronous by default. Replicas can ask the primary to hold back garbage collection (off by default), which will eventually cause a slow down, but not blocking. Lagging replicas can also ask the primary to hold onto WAL needed to catch up (again, off by default), which will eventually cause disk to fill up, which I guess is blocking if you squint hard enough. Both will take considerable amount of time and are easily averted by monitoring and kicking out unhealthy replicas.

These are limitations in the current PostgreSQL implementation. It's quite possible to have consistent commits and snapshots across sharded databases. Hopefully some day in PostgreSQL too.

Postgres setups are typically based on physical replication, which is not an option on MySQL. My testing shows the limit to be about 177k tps with each transaction consisting of 3 updates and 1 insert.

Be careful. During consulting I ran into similar magnitude of writes for a mostly CRUD workload.

They had huge problems with VACUUM at high tps. Basically the database never had space to breath and cleanup.


For updating a single resource where the order of updates matters the best throughput one can hope for is the inverse of locking duration. Typical postgres using applications follow the pattern where a transaction involves multiple round trips between the application and the database to make decisions in the code running on the application server.

But this pattern is not required by PostgreSQL, it's possible to run arbitrarily complex transactions all on server side using more complex query patterns and/or stored procedures. In this case the locking time will be mainly determined by time-to-durability. Which, depending on infrastructure specifics, might be one or two orders of magnitude faster. Or in case of fast networks and slow disks, it might not have a huge effect.

One can also use batching in PostgreSQL to update the resource multiple times for each durability cycle. This will require some extra care from application writer to avoid getting totally bogged down by deadlocks/serializability conflicts.

What will absolutely kill you on PostgreSQL is high contention and repeatable read and higher isolation levels. PostgreSQL handles update conflicts with optimistic concurrency control, and high contention totally invalidates all of that optimism. So you need to be clever enough to achieve necessary correctness guarantees with read committed and the funky semantics it has for update visibility. Or use some external locking to get rid of contention in the database. The option for pessimistic locking would be very helpful for these workloads.

What would also help is a different kind of optimism, that would remove durability requirement from lock hold time, which would then result in readers having to wait for durability. Postgres can do tens of thousands of contended updates per second with this model. See the Eventual Durability paper for details.


I'm wondering if it would make sense to integrate the rim, motor and wheel bearing into a single assembly to save weight and cost. That combined with the weight and packaging benefits of not having half shafts and differentials might make it worth it. Plus there can be additional benefits, like the extra maneuverability that ZF Easy Turn and Hyundai's e-Corner have demonstrated.

30kW sustained/60 kW per wheel peak power is easily enough even for large passenger vehicles. Sustained could take 3 ton vehicle up a 10% grade at 120 km/h.


Things like that do exist though. There's an expensive Renault with motors like this, there's also the MW Motors Luka.

MW Motors eventually made a version where the electric motors were moved from the wheel hubs to a more conventional arrangement, so presumably they felt that it was some sort of problem, but they still make the original version and I've never been in one, so I can't be sure.


In that snippet are links to Postgres docs and two blog posts, one being the blog post under discussion. None of those contain the information needed to make the presented claims about throughput.

To make those claims it's necessary to know what work is being done while the lock is held. This includes a bunch of various resource cleanup, which should be cheap, and RecordTransactionCommit() which will grab a lock to insert a WAL record, wait for it to get flushed to disk and potentially also for it to get acknowledged by a synchronous replica. So the expected throughput is somewhere between hundreds and tens of thousands of notifies per second. But as far as I can tell this conclusion is only available from PostgreSQL source code and some assumptions about typical storage and network performance.


> In that snippet are links to Postgres docs and two blog posts

Yes, that's what a snippet generally is. The generated document from my very basic research prompt is over 300k in length. There are also sources from the official mailing lists, graphile, and various community discussions.

I'm not going to post the entire outout because it is completely beside the point. In my original post, I expliclity asked "What is the qualitative and quantitative nature of relevant workloads?" exactly because it's not clear from the blog post. If, for example, they only started hitting these issues with 10k simultaneous reads/writes, then it's reasonable to assume that many people who don't have such high work loads won't really care.

The ChatGPT snippet was included to show that that's what ChatGPT research told me. Nothing more. I basically typed a 2-line prompt and asked it to include the original article. Anyone who thinks that what I posted is authoritative in any way shouldn't be considering doing this type of work.


Triggers are not even particularly slow. They just hide the extra work that is being done and thus sometimes come back to bite programmers by adding a ton of work to statements that look like they should be quick.


So my display aspect ratio is 2.5dB. Or is it 5dB because it's not measuring power?


That thread is indeed about the same issue. I don't think anyone has done a more concise writeup on it.

Core of the issue is that on the primary, commit inserts a WAL record, waits for durability, local and/or replicated, and then grabs a lock (ProcArrayLock) to mark itself as no longer running. Taking a snapshot takes that same lock and builds a list of running transactions. WAL insert and marking itself as visible can happen in different order. This causes an issue on the secondary where there is no idea of the apparent visibility order, so visibility order on secondary is strictly based on order of commit records in the WAL.

The obvious fix would be to make visibility happen in WAL order on the primary too. However there is one feature that makes that complicated. Clients can change the desired durability on a transaction-by-transaction basis. The settings range from confirm transaction immediately after it is inserted in WAL stream, through wait for local durability, all the way up to wait for it to be visible on synchronous replicas. If visibility happens in WAL order, then an async transaction either has to wait on every higher durability transaction that comes before it in the WAL stream, or give up on read-your-writes. That's basically where the discussion got stuck without achieving a consensus on which breakage to accept. This same problem is also the main blocker for adopting a logical (or physical) clock based snapshot mechanism.

By now I'm partial to the option of giving up on read-your-writes, with an opt-in option to see non-durable transactions as an escape hatch for backwards compatibility. Re-purposing SQL read uncommitted isolation level for this sounds appealing, but I haven't checked if there is some language in the standard that would make that a bad idea.

A somewhat elated idea is Eventual Durability, where write transactions become visible before they are durable, but read transactions wait for all observed transactions to be durable before committing.


Thanks to you both! I've updated the article to discuss this, and we've got an update on the AWS blog too. :-)

https://jepsen.io/analyses/amazon-rds-for-postgresql-17.4


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: