Pandas is cancer. Please stop teaching it to people.
Everything it does can be done reasonable well with list comprehensions and objects that support type annotations and runtime type checking (if needed).
Pandas code is untestable, unreadable, hard to refactor and impossible to reuse.
Trillions of dollars are wasted every year by people having to rewrite pandas code.
> Everything it does can be done reasonable well with list comprehensions and objects that support type annotations and runtime type checking (if needed).
I see this take somewhat often, and usually with similar lack of nuance. How do you come to this? In other cases where I've seen this it's from people who haven't worked in any context where performance or scientific computing ecosystem interoperability matters - missing a massive part of the picture. I've struggled to get through to them before. Genuine question.
Code using pandas is testable and reusable in much the same way as any other code, make functions that take and return data.
That said, the polars/narwals style API is better than pandas API for sure. More readable and composable, simpler (no index) and a bit less weird overall.
Polars made the mistake of not maintaining row order for all operations, via the False-by-default argument of maintain_order. This is basically the billion-dollar null mistake for data frames.
Yeah that really should have been default. Very big footgun, especially when preserving ordering is default in pandas, numpy, etc. And especially when there is no ingrained index concept in polars, people might very well forget that one needs to have some natural keys and not rely on ordering. One needs to bring more of an SQL mindset.
I've recently had to migrate over to Python from Matlab. Pandas has been doing my head in. The syntax is so unintuitive. In Matlab, everything begins with a `for` loop. Inelegant and slow, yes, but easy to reason about. Easy to see the scope and domain of the problem, to visualise the data wrangling.
Pandas insist you never use a for loop. So, I feel guilty if I ever need a throwaway variable on the way to creating a new column. Sometimes methods are attached to objects, other times they aren't. And if you need to use a function that isn't vectorised, you've got to do df.apply anyway. You have to remember to change the 'axis' too. Plotting is another thing that I can't get my head around. Am I supposed to use Pandas' helpers like df.plot() all the time? Or ditch it and use the low level matplotlib directly? What is idiomatic? I cannot find answers to much of it, even with ChatGPT. Worse, I can't seem to create a mental model of what Pandas expects me to do in a given situation.
Pandas has disabused me of the notion that Python syntax is self-explanatory and executable-pseudocode. I find it terrible to look at. Matlab was infinitely more enjoyable.
Yeah, pandas is truly awful. After working with things like R, ggplot, data.table, you soon realize pandas is the worst dataframe analysis and plotting library out there.
I pretty much consider anyone who likes it to have Stockholm syndrome.
Can you write more about this? A lot of people use pandas where I work, whereas I'm completely fluent in list comprehensions and dataclasses etc. I had the impression it was doing something "more" like using numpy arrays/matrices for columns.
I found Pandera quite good for wrapping input/output expectations over Pandas. At the end of the day the vectorisation of operations in it and other table based formats mean they’re not easy to replace performantly.
> what's the consequences for HN of a user having their password compromised
HN does not enforce anonymity, so the identity of some users (many startup owners btw) is tied to their real identities.
A compromised password could allow a bad actor to impersonate those users. That could be used to scam others or to kickstart some social engineering that could be used to compromise other systems.
Indeed a consequence for the individual user could be spammed posts, but for scams, I'd guess that HN would fall back on their standard moderation process.
The question was though, what are the consequences for HN, rather than individual users, as it's HN that would take the cost of implementation.
Now if a lot of prominent HN users start getting their passwords compromised and that leads to a hit on HNs reputation, you could easily see that tipping the balance in favour of implementing MFA, but (AFAIK at least) that hasn't happened.
Now ofc you might expect orgs to be pro-active about these things, but having seen companies that had actual financial data and transactions on the line drag their feet on MFA implementations in the past, I kind of don't expect that :)
I think this conversation would benefit from introducing scale and audience into the equation.
Individual breaches don't really scale (e.g. device compromise, phishing, credential reuse, etc.), but at scale everything scales. At scale then, you get problems like hijacked accounts being used for spam and scams (e.g. you can spam in comment sections, or replace a user's contact info with something malicious), and sentiment manipulation (including vote manipulation, flagging manipulation, propaganda, etc.).
HN, compared to something like Reddit, is a fairly small scale operation. Its users are also more on the technically involved side. It makes sense then that due to the lesser velocity and unconventional userbase, they might still have this under control via other means, or can dynamically adjust to the challenge. But on its own, this is not a technical trait. There's no hard and fast rule to tell when they cross the boundary and get into the territory where adding manpower is less good than to just spend the days or weeks to implement better account controls.
I guess if I really needed to put this into some framework, I'd weigh the amount of time spent on chasing the aforementioned abuse vectors compared to the estimated time required to implement MFA. The forum has been operating for more than 18 years. I think they can find an argument there for spending even a whole 2 week sprint on implementing MFA, though obviously, I have no way of knowing.
And this is really turning the bean counting to the maximum. I'm really surprised that one has to argue tooth and nail about the rationality of implementing basic account controls, like MFA, in the big 2025. Along with session management (the ability to review all past and current sessions, to retrieve an immutable activity log for them, and a way to clear all other active sessions), it should be the bare minimum these days. But then, even deleting users is not possible on here. And yes, I did read the FAQ entry about this [0], it misses the point hard - deleting a user doesn't necessarily have to mean the deletion of their submissions, and no, not deleting submissions doesn't render the action useless; because as described, user hijacking can and I'm sure does happen. A disabled user account "wouldn't be possible" to hijack, however. I guess one could reasonably take an issue with calling this user deletion though.
It's interesting you suggest a two week sprint for this. How large do you think HNs development team is, do you know if they even have a single full time developer?
I don't but the lack of changes in the basic functionality of the site in the number of years I've used it make me feel that they may not have any/many full time devs working on it...
I really don't think the site is like this because they lack capacity. It's pretty clearly an intentional design choice in my view, like with Craigslist.
But no, I do not have any information on their staffing situation. I presume you don't either though, do you?
Indeed I don't. However it we examine the pace of new features of the last several years (I can't think of a single way this site has changed over that time period), it's reasonable to surmise that there isn't a lot of development of the user accessible/visible portions of the site, and that leads me to guess that they don't have much in the way of dev. resources.
I think the biggest issue is M365 Copilot was sold as something that would integrate with business data (teams, files, mail, etc.) and that never worked out quite well.
So you end up with a worse ChatGPT that also doesn't have work context.
before that wallstreet ran on yahoo messenger! they only stopped because new yahoo brand owners didn't understood the value of this and shut it down because there weren't enough teens signing up.
I never found Athena expensive. Compared to employment cost it will be miniscule.
And some times, if your query is CPU extensive but the queried data size is not huge you can get a ridiculous value for money, like many CPU-days in 10 minutes for just $5 if your query covers 1TB after partitioning.
Query size limits are also configurable.
Obviously it depends on what data you are working on, but not having to set up and pay for a computational cluster is a huge cost saving.
I've brought MS Copilot licenses for my company in ~ February 2023. They were sold in a yearly commitment and offered no trials. A bad deal, but I was afraid of missing out AI productivity gains.
I'm definitely not renewing those. Times are hard and the value provided does not justify the cost.
I recently personally resubscribed to copilot after cancelling my subscription a couple months ago since it was not providing value. But now with the new/beta “Edit” mode and being able to specify to use o1, o1 mini, and sonnet 3.5, the $10 a month feels a lot more worth it. The edit mode has outperformed aider for me.
I've found Copliot to be really good at generating funny memes and haikus for specific issues/tasks where I work. The productivity gains come from me not having to use Photoshop and have more time to browse websites like this one.
In some cases it is useful, like in Excel where it can generate formulas or describe an approach to a problem. Not different than GH Copilot. The same for MS Power Automate editor AI assistent.
But in other cases like Word or Outlook it is just a louzy summarizer and does not have much added value.
Interesting because I pay for Copilot and Supermaven, although most of my coworkers don't. To be fair, it was twisting their arms to get them to use linters, formatters, and other tools so I think asking them to use AI auto-complete is a bit much right now.
Right, but Gitlab does have the excellent built-in pipeline editor that will visualize and validate your pipelines for you.
It can also render the complete pipeline config (making it easy to run and debug the problematic parts locally just by copying the relevant parts, even if they're hidden in and include somewhere).
Everything it does can be done reasonable well with list comprehensions and objects that support type annotations and runtime type checking (if needed).
Pandas code is untestable, unreadable, hard to refactor and impossible to reuse.
Trillions of dollars are wasted every year by people having to rewrite pandas code.