But usually, in spaces where you need speed Python is just an orchestrator or gl...

logicchains · on Aug 12, 2024

Yes pandas/numpy calls C++ to do calculations efficiently, but the "glue" can still introduce significant slowdown relative to that when it's copying tens of gigabytes of dataframe unnecessarily between processes. Of course that slow part itself could also be moved to C++, but that's much more effort then just parallel mapping over the dataset in Python with no copying/multiprocessing, as will be possible with no-gil.

aragilar · on Aug 12, 2024

Bad code/quick hacks will always be slow (but can be great for prototypes), and sometimes it's worth planning how you're going to process something rather than piling on multiprocessing. Once you reach the point of multigigabyte IPC, it's worth spending the time doing it right.

robertlagrant · on Aug 12, 2024

Building libraries on a GIL-less Python would enable people to access that power without them all building it from scratch themselves.

aragilar · on Aug 12, 2024

GIL-less Python isn't magic pixie dust, the same group of users who have slow, poorly structured code are at best run into deadlocks. GIL-less Python can be used by well-designed libraries to achieve speedups, but that's not code written by the aforementioned pandas users, and speaking from experience, there's a lot more room for order of magnitude speedups from fixing quick hacks than running things in parallel, and usually it's a lot easier than managing multithreaded code.

robertlagrant · on Aug 12, 2024

> GIL-less Python can be used by well-designed libraries to achieve speedups, but that's not code written by the aforementioned pandas users

Yes, that's why having something like Pandas use it would be better than getting all users to write their own version.

aragilar · on Aug 12, 2024

I would be shocked if pandas wasn't already using multithreading where they could. Naturally, free-threaded Python (to use the actual name it's being called) gives libraries like pandas more options (which I think is a good thing, even if I think things aren't going to be as smooth as people would like), but there's only so much pandas can do for badly written code. This would be like postgresql moving from multiple processes to multiple threads, sure there may be speedups for some users, but for users that haven't added any indices, there's a lot of performance left on the table.