More

pid-1 · 2025-12-03T01:40:58 1764726058

Pandas is cancer. Please stop teaching it to people.

Everything it does can be done reasonable well with list comprehensions and objects that support type annotations and runtime type checking (if needed).

Pandas code is untestable, unreadable, hard to refactor and impossible to reuse.

Trillions of dollars are wasted every year by people having to rewrite pandas code.

mttpgn · 2025-12-03T02:47:56 1764730076

> Pandas code is untestable

The thousand-plus data integrity tests I've written in pandas tell a different story...

mulmboy · 2025-12-03T06:05:33 1764741933

> Everything it does can be done reasonable well with list comprehensions and objects that support type annotations and runtime type checking (if needed).

I see this take somewhat often, and usually with similar lack of nuance. How do you come to this? In other cases where I've seen this it's from people who haven't worked in any context where performance or scientific computing ecosystem interoperability matters - missing a massive part of the picture. I've struggled to get through to them before. Genuine question.

jononor · 2025-12-03T08:19:17 1764749957

Code using pandas is testable and reusable in much the same way as any other code, make functions that take and return data.

That said, the polars/narwals style API is better than pandas API for sure. More readable and composable, simpler (no index) and a bit less weird overall.

jmpeax · 2025-12-03T08:50:24 1764751824

Polars made the mistake of not maintaining row order for all operations, via the False-by-default argument of maintain_order. This is basically the billion-dollar null mistake for data frames.

jononor · 2025-12-03T13:48:36 1764769716

Yeah that really should have been default. Very big footgun, especially when preserving ordering is default in pandas, numpy, etc. And especially when there is no ingrained index concept in polars, people might very well forget that one needs to have some natural keys and not rely on ordering. One needs to bring more of an SQL mindset.

isolatedsystem · 2025-12-03T06:16:34 1764742594

I've recently had to migrate over to Python from Matlab. Pandas has been doing my head in. The syntax is so unintuitive. In Matlab, everything begins with a `for` loop. Inelegant and slow, yes, but easy to reason about. Easy to see the scope and domain of the problem, to visualise the data wrangling.

Pandas insist you never use a for loop. So, I feel guilty if I ever need a throwaway variable on the way to creating a new column. Sometimes methods are attached to objects, other times they aren't. And if you need to use a function that isn't vectorised, you've got to do df.apply anyway. You have to remember to change the 'axis' too. Plotting is another thing that I can't get my head around. Am I supposed to use Pandas' helpers like df.plot() all the time? Or ditch it and use the low level matplotlib directly? What is idiomatic? I cannot find answers to much of it, even with ChatGPT. Worse, I can't seem to create a mental model of what Pandas expects me to do in a given situation.

Pandas has disabused me of the notion that Python syntax is self-explanatory and executable-pseudocode. I find it terrible to look at. Matlab was infinitely more enjoyable.

radus · 2025-12-03T06:56:20 1764744980

Polars has a much more consistent API, give it a shot.

Regarding your plotting question: use seaborn when you can, but you’ll still need to know matplotlib.

kelipso · 2025-12-03T11:35:28 1764761728

Yeah, pandas is truly awful. After working with things like R, ggplot, data.table, you soon realize pandas is the worst dataframe analysis and plotting library out there.

I pretty much consider anyone who likes it to have Stockholm syndrome.

fifilura · 2025-12-03T13:01:09 1764766869

A lot of people appreciate the declarative approach.

A for loop is a lot about the "how" but apply, join etc are much closer to the "what".

wesleywt · 2025-12-03T07:55:59 1764748559

Maybe you are just bad at pandas.

globular-toast · 2025-12-03T07:56:21 1764748581

Can you write more about this? A lot of people use pandas where I work, whereas I'm completely fluent in list comprehensions and dataclasses etc. I had the impression it was doing something "more" like using numpy arrays/matrices for columns.

physicsguy · 2025-12-03T05:45:36 1764740736

I found Pandera quite good for wrapping input/output expectations over Pandas. At the end of the day the vectorisation of operations in it and other table based formats mean they’re not easy to replace performantly.

pid-1 · 2025-08-10T15:47:57 1754840877

> what's the consequences for HN of a user having their password compromised

HN does not enforce anonymity, so the identity of some users (many startup owners btw) is tied to their real identities.

A compromised password could allow a bad actor to impersonate those users. That could be used to scam others or to kickstart some social engineering that could be used to compromise other systems.

raesene9 · 2025-08-10T16:48:24 1754844504

Indeed a consequence for the individual user could be spammed posts, but for scams, I'd guess that HN would fall back on their standard moderation process.

The question was though, what are the consequences for HN, rather than individual users, as it's HN that would take the cost of implementation.

Now if a lot of prominent HN users start getting their passwords compromised and that leads to a hit on HNs reputation, you could easily see that tipping the balance in favour of implementing MFA, but (AFAIK at least) that hasn't happened.

Now ofc you might expect orgs to be pro-active about these things, but having seen companies that had actual financial data and transactions on the line drag their feet on MFA implementations in the past, I kind of don't expect that :)

perching_aix · 2025-08-10T21:23:59 1754861039

I think this conversation would benefit from introducing scale and audience into the equation.

Individual breaches don't really scale (e.g. device compromise, phishing, credential reuse, etc.), but at scale everything scales. At scale then, you get problems like hijacked accounts being used for spam and scams (e.g. you can spam in comment sections, or replace a user's contact info with something malicious), and sentiment manipulation (including vote manipulation, flagging manipulation, propaganda, etc.).

HN, compared to something like Reddit, is a fairly small scale operation. Its users are also more on the technically involved side. It makes sense then that due to the lesser velocity and unconventional userbase, they might still have this under control via other means, or can dynamically adjust to the challenge. But on its own, this is not a technical trait. There's no hard and fast rule to tell when they cross the boundary and get into the territory where adding manpower is less good than to just spend the days or weeks to implement better account controls.

I guess if I really needed to put this into some framework, I'd weigh the amount of time spent on chasing the aforementioned abuse vectors compared to the estimated time required to implement MFA. The forum has been operating for more than 18 years. I think they can find an argument there for spending even a whole 2 week sprint on implementing MFA, though obviously, I have no way of knowing.

And this is really turning the bean counting to the maximum. I'm really surprised that one has to argue tooth and nail about the rationality of implementing basic account controls, like MFA, in the big 2025. Along with session management (the ability to review all past and current sessions, to retrieve an immutable activity log for them, and a way to clear all other active sessions), it should be the bare minimum these days. But then, even deleting users is not possible on here. And yes, I did read the FAQ entry about this [0], it misses the point hard - deleting a user doesn't necessarily have to mean the deletion of their submissions, and no, not deleting submissions doesn't render the action useless; because as described, user hijacking can and I'm sure does happen. A disabled user account "wouldn't be possible" to hijack, however. I guess one could reasonably take an issue with calling this user deletion though.

[0] https://news.ycombinator.com/newsfaq.html

raesene9 · 2025-08-11T07:59:48 1754899188

It's interesting you suggest a two week sprint for this. How large do you think HNs development team is, do you know if they even have a single full time developer?

I don't but the lack of changes in the basic functionality of the site in the number of years I've used it make me feel that they may not have any/many full time devs working on it...

perching_aix · 2025-08-11T08:30:32 1754901032

I really don't think the site is like this because they lack capacity. It's pretty clearly an intentional design choice in my view, like with Craigslist.

But no, I do not have any information on their staffing situation. I presume you don't either though, do you?

raesene9 · 2025-08-11T12:08:00 1754914080

Indeed I don't. However it we examine the pace of new features of the last several years (I can't think of a single way this site has changed over that time period), it's reasonable to surmise that there isn't a lot of development of the user accessible/visible portions of the site, and that leads me to guess that they don't have much in the way of dev. resources.

pid-1 · 2025-06-25T03:51:02 1750823462

I think the biggest issue is M365 Copilot was sold as something that would integrate with business data (teams, files, mail, etc.) and that never worked out quite well.

So you end up with a worse ChatGPT that also doesn't have work context.

mvATM99 · 2025-06-25T05:44:12 1750830252

When you do have that work context MS copilot performs quite well. But outside of that usecase it's easy to see their model is pretty bad.

aydyn · 2025-06-25T09:44:53 1750844693

It absolutely does not perform well with work context.

pid-1 · 2025-05-20T09:14:29 1747732469

> and people were buying it! real corporations and governments were buying this crap - it's insane

Anedote: in Wall Street, Global Relay and TeleMessage are the major players when it comes to achieving communication for compliance.

asdffdasy · 2025-05-20T11:25:12 1747740312

before that wallstreet ran on yahoo messenger! they only stopped because new yahoo brand owners didn't understood the value of this and shut it down because there weren't enough teens signing up.

pid-1 · 2025-03-06T17:19:51 1741281591

If you're already in AWS, why wouldn't you use AWS Glue Catalog + AWS SDK for pandas + Athena?

You can setup a data lake, save data and start doing queries in like 10 minutes with this setup.

thedougd · 2025-03-06T17:30:47 1741282247

These days you can 'just' create an S3 tables bucket. https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tab...

tsss · 2025-03-06T19:18:00 1741288680

Athena is really expensive though and you will often run into a hard limit on the size of your query.

pid-1 · 2025-03-07T12:44:02 1741351442

Like most things serverless Athena is cheap as long as you don't use it.

My company has 100s of data pipelines that are executed infrequently.

For this use case Athena is ridiculously cheap and easy to use vs most other solutions.

fifilura · 2025-03-07T03:18:30 1741317510

I never found Athena expensive. Compared to employment cost it will be miniscule.

And some times, if your query is CPU extensive but the queried data size is not huge you can get a ridiculous value for money, like many CPU-days in 10 minutes for just $5 if your query covers 1TB after partitioning.

Query size limits are also configurable.

Obviously it depends on what data you are working on, but not having to set up and pay for a computational cluster is a huge cost saving.

mritchie712 · 2025-03-06T17:31:58 1741282318

Agreed.

A lot of people worry would worry about "vendor lock-in" here, but it's certainly convenient.

pid-1 · on Dec 25, 2024

Interesting, I had a similar experience as an exchange student in Britain.

pid-1 · on Dec 16, 2024

I've brought MS Copilot licenses for my company in ~ February 2023. They were sold in a yearly commitment and offered no trials. A bad deal, but I was afraid of missing out AI productivity gains.

I'm definitely not renewing those. Times are hard and the value provided does not justify the cost.

I wonder how many companies will do the same.

syntaxing · on Dec 16, 2024

I recently personally resubscribed to copilot after cancelling my subscription a couple months ago since it was not providing value. But now with the new/beta “Edit” mode and being able to specify to use o1, o1 mini, and sonnet 3.5, the $10 a month feels a lot more worth it. The edit mode has outperformed aider for me.

pid-1 · on Dec 16, 2024

Sorry, I think we are talking about different stuff.

I was referring to Copilot for M365, which costs 30 USD / month and targets biz users.

Github Copilot is definitely worth it.

syntaxing · on Dec 16, 2024

Oh wow, I definitely misunderstood, all the “Copilot” usage by MS/Github got me mixed up

Bluecobra · on Dec 16, 2024

I've found Copliot to be really good at generating funny memes and haikus for specific issues/tasks where I work. The productivity gains come from me not having to use Photoshop and have more time to browse websites like this one.

sebazzz · on Dec 16, 2024

In some cases it is useful, like in Excel where it can generate formulas or describe an approach to a problem. Not different than GH Copilot. The same for MS Power Automate editor AI assistent.

But in other cases like Word or Outlook it is just a louzy summarizer and does not have much added value.

pid-1 · on Dec 16, 2024

That's surprising. I haven't met a single user who found Copilot useful in Excel.

I've found some that enjoyed using it a "smart auto complete" both in Word and Outlook, however.

Most of my users are financial analysts.

ellisv · on Dec 16, 2024

Interesting because I pay for Copilot and Supermaven, although most of my coworkers don't. To be fair, it was twisting their arms to get them to use linters, formatters, and other tools so I think asking them to use AI auto-complete is a bit much right now.

pid-1 · on Dec 6, 2024

That particular issue also exists in GitLab. See https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2797

imp0cat · on Dec 6, 2024

Right, but Gitlab does have the excellent built-in pipeline editor that will visualize and validate your pipelines for you.

It can also render the complete pipeline config (making it easy to run and debug the problematic parts locally just by copying the relevant parts, even if they're hidden in and include somewhere).

pid-1 · on Oct 31, 2024

That was not my personal experience. CS and Warcraft 3 community lobbies featured rampant cheating. Way more than CS:GO and Dota 2.

pid-1 · on Oct 1, 2024

I have done the following in the past:

1. pip install libfoo==1.x.x

2. pip install libfoo==2.x.x --target ~/libs/libfoo_v2 # vendor libfoo v2

3.

import sys

import libfoo

original_sys_path = sys.path.copy()

sys.path.insert(0, '~/libs/libfoo_v2')

import libfoo as libfoo_v2

sys.path = original_sys_path

There are caveats of course. But works for simple cases.