- Much faster and more memory efficient. - Consistent Expression and SQL-like AP...

fifilura · on Aug 29, 2024

Do you use it over many machines (RAMs)?

I dont care so much about the memory and CPU stuff, I mostly leave the heavy lifting to an SQL engine.

Although the Null handling seems very compelling, I guess it comes at a cost of incompatibility with existing libraries, otherwise Pandas would have implemented it as well?

I am curious about the SQL api though.

cpcloud · on Aug 29, 2024

Polars' support for SQL is pretty nascent and missing a lot of functionality.

If it were better, we'd use it internally in Ibis for the Polars backend implementation.

If you're going down the mixed SQL, DataFrame API route then Ibis is probably the best solution out there for that.

I work on Ibis, so take what I say with a grain of salt. There may yet be other libraries out that there that have similar functionality.

magnio · on Aug 30, 2024

> Do you use it over many machines (RAMs)?

If you mean whether I run it distributedly a la Spark then no. If you mean whether I test it on various machines with different RAM sizes then yes.

> I dont care so much about the memory and CPU stuff, I mostly leave the heavy lifting to an SQL engine.

Well, I care. Both pandas and polars are, to my view, single-machine dataframe library, so the memory and CPU constraints are rather stringent.

My comparison is based solely on my experience: reading csv files that are 20% to 50% the size of RAM, pandas takes (or errors out after) 2 to 10 minutes, while polars finishes in 20 seconds. Queries in pandas are almost always slower than polars.

But reading your comment, it seems you and I have different use cases for dataframe libraries, which is fine. I mostly use them for exploratory analysis, so the SQL api is not that much of a plus to me, but the performance is.

fifilura · on Aug 31, 2024

My point is that it is still not a magnitude change. And it (probably?) introduces bugs and incompatibilities.

Many cloud providers now offer serverless SQL and Spark capacities (serverless=no set up for you). This is the magnitude change for me.

With pandas you can maybe process 10 million rows, with polars maybe 50 million. But with a distributed service maybe 100 times more?

oreilles · on Aug 29, 2024

When using Pandas appropriately, that is with method chaining, lambda expressions (instead of intermediate assignments) and pyarrow datatypes, you also get much faster speed and null values handling.

fifilura · on Aug 30, 2024

I know.

And by now I know that very well.

Like someone-screaming-in-my-ears-know.

I am starting to think that Polars is showing all the signs of a hype or a cult.

I am still not convinced, particularly since the community feels more like a marketing department than someone who wants to genuinely help.

I can do that thing you describe with SQL.

cpcloud · on Aug 29, 2024

Did it get more memory efficient during the time between authoring point 1 and authoring the last point?

dgfitz · on Aug 29, 2024

You may think this was clever, but it is a common literary technique to emphasize a point.

I'd suggest less snark, you're not doing yourself any favors.

itsoktocry · on Aug 29, 2024

>I'd suggest less snark, you're not doing yourself any favors.

Why do people feel the need to jump in and police tone like this? Who are you? You're not doing yourself any favours, either.

>common literary technique to emphasize a point.

"Common" is a stetch, and who cares.

temp_praneshp · on Aug 29, 2024

> Why do people feel the need to jump in and police tone like this?

From the community guidelines(https://news.ycombinator.com/newsguidelines.html): "Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes."

> Who are you?

A member of the community.

dgfitz · on Aug 29, 2024

> Why do people feel the need to jump in and police tone like this?

Did you know that more information is communicated via tone than words?

dgfitz · on Aug 31, 2024

This irritates me. What was the point of THEIR comment? To be cunty? It absolutely was NOT a productive comment. They were being an ass. And you’re defending them being an ass, asking me who I am like I’m somehow not allowed to point out someone being a cunt unless I have some type of status symbol of which you need to approve. At this point I can only fathom you’re a coworker or friend of theirs, hence the defense. Nothing else makes sense.

cpcloud · on Sept 3, 2024

No idea who the person is who defended what I said. If they're a co-worker of mine, I don't know who it is.

The original purpose of my comment was to, in a light-hearted way, point out redundant information, not to incite a flame war about tone on HN.

This thread reminds me of why I don't come here very often.