I also love Python the language, but it’s hard to keep using it when it’s so slow. Parallel processing helps, but it’s still slow. Definitely dumping pandas the first chance I get. It’s one of two major bottlenecks with the other being anaconda for Windows. Maybe the culprit is running Python on Windows since it relies on so many parts of nix?
Try to update your Python environment and also install ALL recommend dependencies for Pandas (and for Geopandas if you use it). I had one old env in Anaconda (Python 3.9.12, Pandas 1.4.2) and then created a new one in Mambaforge (Python 3.10.6 with Pandas 1.5.0). It gives me speed up one of experiment project from ≈30 min to ≈5 min. Use Python code only for glue and leave extensive calculations for C/C++ code. Pandas and Geopandas have the recommended C/C++ dependencies which dramatically speed up calculations and in new versions as I guess they improve integrations with these dependencies.
PS I also recommend to every one to use Mambaforge instead of Anaconda, because it uses Mamba dependency solver written in C++ and it is in orders of magnitude faster.
> Use Python code only for glue and leave extensive calculations for C/C++ code.
That would completely defeat the point of Python for me. I’d rather switch to typescript or C# / Java before I code in C again, but you’re right. Fast Python is an oxymoron. However, in my case my bottleneck is the pandas library. I have to see whether workarounds like Dask work
The problem is when you need 30 servers instead of one to manage your load, though, lol.