Hacker Newsnew | past | comments | ask | show | jobs | submit | jimmar's commentslogin

I don't know that I'd trust IBM when they are pitching their own stuff. But if anybody has experience with the difficulty of making money off of cutting-edge technology, it's IBM. They were early to AI, early to cloud computing, etc. And yet they failed to capture market share and grow revenues sufficiently in those areas. Cool tech demos (like the Watson Jeopardy) mimic some AI demos today (6-second videos). Yeah, it's cool tech, but what's the product that people will actually pay money for?

I attended a presentation in the early 2000s where an IBM executive was trying to explain to us how big software-as-a-service was going to be and how IBM was investing hundreds of millions into it. IBM was right, but it just wasn't IBM's software that people ended up buying.


Xerox was also famously early with a lot of things but failed to create proper products out of it.

Google falls somewhere in the middle. They have great R&D but just can’t make products. It took OpenAI to show them how to do it, and the managed to catch up fast.


"They have great R&D but just can’t make products"

Is this just something you repeat without thinking? It seems to be a popular sentiment here on Hacker News, but really makes no sense if you think about it.

Products: Search, Gmail, Chrome, Android, Maps, Youtube, Workspace (Drive, Docs, Sheets, Calendar, Meet), Photos, Play Store, Chromebook, Pixel ... not to mention Cloud, Waymo, and Gemini ...

So many widely adopted products. How many other companies can say the same?

What am I missing?


I don't think Google is bad at building products. They definitely are excellent at scaling products.

But I reckon part of the sentiment stems from many of the more famous Google products being acquisitions orignally (Android, YouTube, Maps, Docs, Sheets, DeepMind) or originally built by individual contributors internally (Gmail).

Then here were also several times where Google came out with multiple different products with similar names replacing each other. Like when they had I don't know how many variants of chat and meeting apps replacing each other in a short period of time. And now the same thing with all the different confusing Gemini offerings. Which leads to the impression that they don't know what they are doing product wise.


Starting with an acquisition is a cheap way of accelerating once your company reaches a certain size.

Look at Microsoft - Powerpoint was an acquisition. They bought most of the team that designed and built Windows NT from DEC. Frontpage was an acquisition, Azure came after AWS and was led by a series of people brought in in acquisitions (Ray Ozzie, Mark Russinovich, etc.). It's how things happen when you're that big.


I think it's a little unfair to give DEC credit for NT. Sure, they may have bought the team, but they did most (all?) of the work on NT at Microsoft.

That's not like Google buying Android when they already had a functioning (albeit not at all polished) smartphone OS.


Why wouldn't you count things initially made by individual contributors at Google?


Because those were "free time" projects. It wasn't directed to do by the company, somebody at the company with their flex time - just thought it was a good idea and did it. Googlers don't get this benefit any more for some reason.


Because they're not a good measure of the company's ability to develop products based on the direction from leadership.


Leadership's direction at the time was to use 20% of your time in unstructured exploration and cool ideas like that, though good point of the other poster that that is no longer a policy.


Those are all free products, some of them are pretty good. But free is the best business strategy to get a product to the top of the market. Are others better, are you willing to spend money to find out? Clearly, most people are not interested. The fact that they can destroy the market for many different types of software by giving it away and still stay profitable is amazing. But that's all they are doing. If they started charging for everything there would be better competition and innovation. You could move a whole lot of okay-but-not-great cars, top every market segment you want, if you gave them away for free. Only enthusiasts would remain to pay for slightly more interesting and specific features. Literally no business model can survive when their primary product is competing with good-enough free products.


They come up with tons and tons of products like Google Glass and Google+ and so on and immediately abandon them. It is easy to see that there is no real vision. They make money off AdSense and their cloud services. That's about it.


Google does abandon a lot of stuff, but their core technologies usually make their way into other, more profitable things (collaborative editing from Wave into Docs; loads of stuff from Google+; tagging and categorizing in Photos from Picasa (I'm guessing); etc)


It annoyed me recently that they dropped support for some Nest/Google Home thermostats. Of course, they politely offered to let me buy a replacement for $150.


> Products: Search, Gmail, Chrome, Android, Maps, Youtube, Workspace (Drive, Docs, Sheets, Calendar, Meet), Photos, Play Store, Chromebook, Pixel ... not to mention Cloud, Waymo, and Gemini ...

Many of those are acquisitions. In-house developed ones tend to be the most marginal on that list, and many of their most visibly high-effort in-house products have been dramatic failures (e.g. Google+, Glass, Fiber).


I was extremely surprised that Google+ didn't catch on. The week before Google+ launched, me and all my friends agreed that Facebook is toast, Google will do the same thing but better, and everyone has a Gmail account so there will be basically zero barrier to entry. Obviously, we were wrong; Google+ managed to snatch defeat out of the jaws of victory, Google+ never got significant traction, and Facebook managed to keep growing and now they're yet another Big Evil Tech Corporation.

Honestly, I still don't really know how Google managed to mess that up.


I got early access to Google+ because of where I worked at the time. The invite-only thing had worked great for GMail but unfortunately a social network is useless if no-one else is on it. Then the real names thing and the resulting drumbeat of horror stories like "Google doxxed me to my violent ex-husband" killed what little momentum they had stone dead. I still don't know why they went so hard on that, honestly.


I think the sentiment is usually paired with discussion about those products as long-lasting, revenue-generating things. Many of those ended up feeding back into Search and Ads. As an exercise, out of the list you described, how many of those are meaningfully-revenue-generating, without ads?

A phrasing I've heard is "Google regularly kills billion-dollar businesses because that doesn't move the needle compared to an extra 1% of revenue on ads."

And, to be super pedantic about it, Android and YouTube were not products that Google built but acquired.


They bought YouTube but you have to give Google a hell of a lot of credit for turning it into what it is today. Taking ownership of YouTube at the time was seen by many as taking ownership of an endless string of copyright lawsuits, suing them into oblivion.


Youtube maintains an independent campus from the google/alphabet mothership, I'm curious how much direction they get, as (outwardly, at least) appear to run semi-autonomously.


Before Google touched Android it was a cool concept but not what we think of today. Apparently it didn't even run on Linux. That concept came after the acquisition.


That is because the DoubleClick parasite has long infected the host.


Notably all other than Gemini are from a decade or more ago. They used to know how to make products, but then they apparently took an arrow in the knee.


Didn't they buy lots of those actually ?


And to my point, it took Apple to make the iPhone for Google to make Android.

It took OpenAI for Google to finally understand make a product out of their years if not decades of AI research.

YouTube and Maps are both acquisitions indeed.


Search was the only mostly original product. With the exception of YouTube which was a purchase, Android and ChromeOS all the other products were initially clones.


Google had less incentive. Their incentive was to keep API bottled up and in brewing as long as possible so their existing moats in search, YouTube can extend in other areas. With openai they are forced to compete or perish.

Even with gemini in lead, its only till they extinguish or make chatgpt unviable for openai as business. OpenAI may loose the talent war and cease to be leader in this domain against google (or Facebook) , but in longer term their incentive to break fresh aligns with average user requirements . With Chinese AI just behind, may be google/microsoft have no choice either


Google was especially well positioned to catch up because they have a lot of the hardware and expertise and they have a captive audience in gsuite and at google.com.


The original statistical machine translation models of the 90's, which were still used well into the 2010's, were famously called the "IBM models" https://en.wikipedia.org/wiki/IBM_alignment_models These were not just cool tech demos, they were the state of the art for decades. (They just didn't make IBM any money.)


Neither cloud computing nor AI are good long term businesses. Yes, there's money to be made in the short term but only because there's more demand than there is supply for high-end chips and bleeding edge AI models. Once supply chains catch up and the open models get good enough to do everything we need them for, everyone will be able to afford to compute on prem. It could be well over a decade before that happens but it won't be forever.


This is my thinking too. Local is going to be huge when it happens.

Once we have sufficient VRAM and speed, we're going to fly - not run - to a whole new class of applications. Things that just don't work in the cloud for one reason or another.

- The true power of a "World Model" like Genie 2 will never happen with latency. That will have to run locally. We want local AI game engines [1] we can step into like holodecks.

- Nobody is going to want to call OpenAI or Grok with personal matters. People want a local AI "girlfriend" or whatever. That shit needs to stay private for people.

- Image and video gen is a never ending cycle of "Our Content Filters Have Detected Harmful Prompts". You can't make totally safe for work images or videos of kids, men in atypical roles (men with their children = abuse!), women in atypical roles (woman in danger = abuse!), LGBT relationships, world leaders, celebs, popular IPs, etc. Everyone I interact with constantly brings these issues up.

- Robots will have to be local. You can't solve 6+DOF, dance routines, cutting food, etc. with 500ms latency.

- The RIAA is going door to door taking down each major music AI service. Suno just recently had two Billboard chart-topping songs? Congrats - now the RIAA lawyers have sued them and reached a settlement. Suno now won't let you download the music you create. They're going to remove the existing models and replace them with "officially licensed" musicians like Katy Perry® and Travis Scott™. You won't retain rights to anything you mix. This totally sucks and music models need to be 100% local and outside of their reach.

[1] Also, you have to see this mind-blowing interactive browser demo from 2022. It still makes my jaw drop: https://madebyoll.in/posts/game_emulation_via_dnn/


> You can't solve 6+DOF, dance routines, cutting food, etc. with 500ms latency.

Hopefully it's just network propagation that creates that latency, otherwise local models will never beat the fanout in a massive datacenter.


What you are saying is true. But IBM failing to see a way to make money off a new technology isn't actually news worth updating on in this case?


They were selling software as a service in the IBM 360 days. Relabeling a concept and buying Redhat don't count as investments.


What is your reason for believing that IBM was selling software as a service in the IBM 360 days?

What hardware did the users of this service use to connect to the service?


Hardware was part of the service, obviously.


It is very misleading or outright perverse to write "they were selling software as a service in the IBM 360 days" when there was no public network that could be used to the deliver the service. (There were wide-area networks, but each one was used by a single organization and possibly a few of its most important customers and suppliers, hence the qualifier "public" above.)

But anyways, my question to you is, was there any software that IBM charged money for as opposed to providing the software at no additional cost with the purchase or rental of a computer?

I do know that no one sold software software (i.e., commercial off-the-shelf software) in the 1960s: the legal framework that allowed software owners to bring lawsuits for copyright violations appeared in the early 1980s.

There was an organization named SHARE composed of customers of IBM whereby one customer could obtain software written by other other customers (much like the open-source ecosystem) but I don't recall money ever changing hands for any of this software except a very minimal fee (orders of magnitude lower than the rental or purchase price of a System/360, which started at about $660,000 in 2025 dollars).

Also, IIUC most owners or renters of a System/360 had to employ programmers to adapt the software IBM provided. There is software with that quality these days, too (.e.g, ERP software for large enterprises) but no one calls that a software as a service.


Replying to myself:

>except a very minimal fee

the fee would be for membership SHARE. The fee (if it even existed) would not have been passed on to the entity that paid to create the software.


> but it just wasn't IBM's software that people ended up buying.

Well, I mean, WebSphere was pretty big at the time; and IBM VisualAge became Eclipse.

And I know there were a bunch of LoB applications built on AS/400 (now called "System i") that had "real" web-frontends (though in practice, they were only suitable for LAN and VPN access, not public web; and were absolutely horrible on the inside, e.g. Progress OpenEdge).

...had IBM kept up the pretense of investment, and offered a real migration path to Java instead of a rewrite, then perhaps today might be slightly different?


Oh wow I didn’t know Eclipse was an IBM product originally. IDEs have come so far since Eclipse 15 years ago.

And while I’m writing this I just finished up today’s advent of code using vim instead of a “real IDE” haha


Websphere is still big at loads of banks and government agencies, just like Z. They make loads on both!


Markdown is the minimum viable product. It’s easy to learn and still readable if not rendered in an alternate format. It’s great.

For making PDFs, I’ve recently moved from AsciiDoc to Typst. I couldn’t find a good way to get AsciiDoc to make accessible PDFs, and I found myself struggling to control the output. Typst solves all of AsciiDoc’s problems for me.

But in the end, no markup language will make you write better. It’s kind of like saying that ballpoint pens are limiting your writing, so you should switch to mechanical pencils.


Yes, the author conflates two different use-cases.

Markdown is the answer for "how do we enable people that don't want to invest a lot of time into producing content that's somewhat better than plain text?".

It's not trying to solve the problem of "how do we enable people that are willing to invest time into learning to produce the best possible and most structured content possible?" and I doubt that there will be language that will serve both of those use-cases very well.


The problem in practice is that quickly one merges into the other. You start with a markdown readme, then you have markdown documentation for a small project. But then one day you need full documentation for your project with cross links, translations, accessibility. With Markdown you end up bolting these things on and each flavor does it a bit differently.

Perhaps some of the blame can be laid with the poor UX of technically superior systems. restructuredtext (apart from the terrible name) built with Spinx can do impressive things but becomes a huge pain to configure. All the XML-based tools like DocBook are very complete but try to get started actually building something - apart from having to author them in XML (which is already a kind of punishment), then you have to figure out XSLT stylesheets, 2000s-era design Java tools for processing them. And just look at the DocBook landing page! AsciiDoc has improved their onboarding recently but does have the issue of feeling like a markdown-ish alternative that's just a bit different for no clear reason.


> does have the issue of feeling like a markdown-ish alternative that's just a bit different for no clear reason.

Asciidoc is older than markdown. Kind of hard to be design something to be the same as something that isn't invented yet.


One downside here is that as more and more tools focus on the first use-case, people start using those tools by default when they actually fall into the second use-case. And there's often a pretty high barrier to switching once you've produced a lot of content, so a bunch of projects are using the wrong one long-term.


Arguably having a ton of hard to write, hard to maintain docs is waaay worse than Markdown that gets attention in PRs (MRs).

Especially that the things in the article seem irrelevant compared to actually adding and handling non-text content IMHO. (Mermaid diagrams for example.)

Sure a validator would be nice, but that's why a simple preview is available in most collaboration platforms.


Djot is another interesting alternative that tries to make Markdown more parsable and coherent: https://github.com/jgm/djot#rationale


I hadn't heard of that before but it looks like it solves a lot of my complaints about markdown.

I hope it gains more momentum.


Unfortunately it doesn’t seem to have a formal spec.


typst looks interesting -- but how are you writing it? from what I looked at, it looks like theres an official web editor and a vscode plugin with limited support. this feels pretty limited, as someone who came in expecting something like obsidian.


I've started experimenting with Typst for a few documents, and here's my stack:

- Zed editor with Typst plugin

- Tinymist LSP settings turned on to render on save in Zed, see https://code.millironx.com/millironx/nix-dotfiles/src/commit...

- Okular open to the output document. Okular refreshes the document when changed on disk.

It's not as polished as say, LaTeX Workshop in VSCode, but it gets the job done.


I'm not aware of any limitations in the Tinymist plugin.

And you can just write it in the plain text editor of your choice, and keep an eye on the PDF with typst watch.


> I'm not aware of any limitations in the Tinymist plugin.

I looked into this a while ago, and couldn't find a workflow I could live with. Have things improved? What's the workflow like for working on an image in, say, OmniGraffle to include in the document? Does text search in embedded PDFs work these days? LinkBack so I can edit the images easily inline?


you can just install the typst compiler yourself and let it run in the cli

    typst watch file.typ // compiles automatically on file changes


You can write Typst in any editor you like, and the Typst compiler is FOSS available.

I write Typst code from emacs personally


Typst really does look good. Can one get an editor with live PDF preview ? It would be useful mainly for immediate feedback on markup correctness; then an HTML output ought to be "close enough".


Tinymist in VS Code does this out of the box (and looks like it can be set up in other editors). That or you can configure it to save out a new PDF automatically on save or as you edit the document and just open it in a PDF viewer that'll reload when the file changes.


LaTeX made me write better because of commenting above every paragraph.


What LaTeX helped me with was in taking more care of the content than the form/appearance.


AWS already documents a solution to self-host a NAT instance: https://docs.aws.amazon.com/vpc/latest/userguide/work-with-n...


I always find these discussions about AWS NAT gateways interesting because I recall way back in the day, before AWS had a manages NAT gateway, the recommendation was to roll your own anyway. Or at least that's what I heard. I took an ACloud Guru course and one of the first ec2 lessons was to create a simple NAT gateway in your VPC so that your other instances could reach the Internet.


People might be fleeing public schooling because lawmakers are dictating what happens in the classroom. There are lots of good teachers who struggle with the resources given to them and the constraints imposed on them.

At home, parents can be flexible. They can let their kids use AI when appropriate or discourage its use. They don't have to wait for legislators to get involved. If there is a great math book, parents can just buy it instead of waiting for some committee to evaluate it.


> If there is a great math book, parents can just buy it

How do you know if the math book is great if there hasn’t been consensus about it. The problem isn’t the committee that will always be there in some form. The problem is the politics the committee is used for. If the committee were to prioritize and offload their specific requirements for review instead of requiring substantial analysis twice then the school system would be just as quick.


> IBM anticipates that the first cases of verified quantum advantage will be confirmed by the wider community by the end of 2026.

In 2019, Google claimed quantum supremacy [1]. I'm truly confused about what quantum computing can do today, or what it's likely to be able to do in the next decade.

[1] https://www.nasa.gov/technology/computing/google-and-nasa-ac...


There's legitimately interesting research in using it to accelerate certain calculations. For example, usually you see a few talks at chemistry conferences on how it's gotten marginally faster at (very basic) electronic structure calculations. Also some neat stuff in the optimization space. Stuff you keep your eye on hoping it's useful in 10 years.

The most similar comparison is AI stuff, except even that has found some practical applications. Unlike AI, there isn't really much practicality for quantum computers right now beyond bumping up your h-index

Well, maybe there is one. As a joke with some friends after a particularly bad string of natural 1's in D&D, I used IBM's free tier (IIRC it's 10 minutes per month) and wrote a dice roller to achieve maximum randomness.


that was my understanding too - in the fields of chemistry, materials science, pharmaceutical development, etc... quantum tech is somewhat promising and might be pretty viable in those specific niche fields within the decade.


A decade from now Quantum computing will be in the same place it was a decade ago, on the cusp of proving a quantum advantage for tailor made problems in comparison to normal availability supercomputers. Classical compute will advance in that time period to keep the quantum computers always on the cusp.

The major non-compute related engineering breakthroughs needed for quantum computing to actually be advantageous in a way that would be revolutionary are themselves so revolutionary that the advancements of quantum computing would be vastly overshadowed. Again it's a case where those breakthroughs would so greatly enhance classic compute in terms of processing and reduction in costs that it still probably wouldn't be economically viable to produce general purpose quantum computers.


The trouble with quantum supremacy results is they disappear as soon as you observe them (carefully).

Sorry for that, but seriously, I'd treat this kind of claim like any other putative breakthrough (room-temperature superconductors spring to mind), until it's independently verified it's worthless. The punishment for crying wolf is minimal and by the time you're shown to be bullshitting the headlines have moved on.

The other method, of course, is to just obsessively check Scott Aaronson's blog.


IBM challenged that the 2019 case could be handled by a supercomputer [1].

The main issue is that these algorithms where today's early quantum computers have an advantage were specifically designed to be demonstration problems. All of the tasks that people previously wanted a quantum computer to do are still impractical with today's hardware.

[1] https://www.quantamagazine.org/google-and-ibm-clash-over-qua...


I installed it, entered one prompt, clicked the "Proceed" button, and got "Model quota limit exceeded."

Those quota limits brought me back down to earth quickly.


Especially since:

    There is currently no support for:

    Paid tiers with guaranteed quotas and rate limits
    Bring-your-own-key or bring-your-own-endpoint for additional rate limits
    Organizational tiers (self-serve or via contract)
So basically just another case of vendor lock-in. No matter whether the IDE is any good - this kills it for me.


I respect Troy Hunt's work. I searched for my email address on https://haveibeenpwned.com/, and my email was in the latest breach data set. But the site does not give me any way to take action. haveibeenpwned knows what passwords were breached, the people who breached the data knows what passwords were breached, but there does not seem to be any way for _me_, the person affected, to know what password were breached. The takeaway message is basically, "Yeah, you're at risk. Use good password practices."

There is no perfect solution. Obviously, we don't want to give everybody an easy form where you can enter an email address and see all of the password it found. But I'm not going to reset 500+ password because one of them might have been compromised. It seems like we must rely on our password managers (BitWarden, 1Password, Chrome's built-in manager, etc.) to tell us if individual passwords have been compromised.


> there does not seem to be any way for _me_, the person affected, to know what password were breached

You should be using a unique randomly-generated password for each website. That way, one breach doesn't lead to multiple accounts getting hijacked AND you'll know which passwords were breached solely based on the website list. The only passwords I still keep in my head are:

  1. The password to my password manager
  2. The password to my gmail account
  3. The passwords for my full disk encryption
All of those passwords are unique and not used anywhere else. Everything else is in my password manager with a unique randomly generated password for each account. And for extra protection, I enable 2fa on any site that supports u2f/webauthn.

I used to reuse the same password for everything, and that lead to a pretty miserable month where suddenly ALL of my accounts were compromised. I'd log in to one account and see pizzas I never ordered. Then I'd open uber and see a ride actively in-progress on the other side of the country. It was not fun.


Yes! Me too. Not adding anything here except a confirmation on the above approach. You kind of need your email password as a "break glass" scenario. But mostly, you just need your password manager.


and root disk encryption, unless you have some alternative method set up.


That's the default in this day and age, no?


I mean, probably should be. But for me, no. Well, not my personal computer anyway. That's a mistake, I know. But corporate computer yes.

So no, I don't think "in this day and age" necessarily. And I believe that the vast majority of "normal" users don't do full drive encryption either. But yes, we should.


Last I looked, windows and Mac installs both push the user to set up bitlocker or FileVault, respectively. You have to actively say no if you don’t want it.


I deliberately dodged there, as you noted. I do not have full disk encryption setup. I know that I'm probably have a very bad day if I come to lose my laptop, etc. I should do this, no doubt.

But I'm not sure. While maybe good password management is starting to soak into common computer usage, I don't think disk encryption is all that common just yet across the average user. It should be. But the average user is just moving to their phone anyway, with face id and encryption by default, instead of maintain their own personal device.

Corporate devices seem to be a bit better in this regard, though.


Nice. Now I'd like to know WHICH password got leaked.

That way the breach impact can quickly be limited.

Troy probably would share that information for a price. Not sure whom to pay though - the "good" guy who won't say a word, or a criminal who will happily share it with me?

It's possible the latter would be cheaper too.


They don’t store email addresses with password in the database. That would be way too risky. These are separate databases, so you can lookup your email address, and separately check a password.


I think for passwords they only store a hashed version.


Also if possible, use a unique email address for each site. I know that's not feasible for most people, and some sites (e.g. LinkedIn) are structured so that email addresses become linked, but it does provide useful isolation.


> It seems like we must rely on our password managers (BitWarden, 1Password, Chrome's built-in manager, etc.) to tell us if individual passwords have been compromised.

Yes.


If you read the instructions, you will discover https://haveibeenpwned.com/Passwords which will let you enter a password and securely check if it has been published in a breach.

If it has, it is either a simple password that multiple people are using, or a complex secure password that can make you pretty confident it is your password that has been published.

1Password just does the same thing for all of your passwords - it doesn’t check against your account name either. That information isn’t stored so they can’t become a new source of breached accounts (as explained at the site).


Letting me check my passwords one at a time is like letting me check my grains of rice individually for poison before eating.



There is also an API


The problem with breaches like the latest data set is that there's no source on where the breach came from, it's an aggregate from multiple breaches. They can't tell you that info because it's not in the initial data set.


> But the site does not give me any way to take action.

It gives you as much information as you should be given. Any more information would just be spreading around the hacked dataset.

It does give you an awful lot of information about the specific hacks that exposed your information, and what was the content of that exposure. You may have been owned, but the way you were owned doesn't really matter e.g. I don't care that my firstname.lastname@gmail.com was exposed as being me. I may not care that my username@yahoo.com account was exposed as being username at archive.org. If that's it, I can keep using them. But a lot of hacks are a lot worse, and you might have to rearrange things or close them down. haveibeenpwned gives you enough information to make all those decisions.

Also, your second paragraph seems to imply that the site doesn't tell you if passwords were compromised for an email address. It definitely does by identifying the hack and describing its extent. You don't need the actual password to know that you need to change it. Likely, the hacked site forced you to change it anyway.


Change the password for what account though? The dashboard doesn’t seem to list the actual website(s ) linked to the email/password breached, so how am I to know which password to rotate?

If I follow the recommended best practice, I have a different password for every website or service. That could be hundreds of them. Am I supposed to rotate all of them every time there’s a breach?


You buy you email in and then the result it a website that got breached. Together this should give you enough information.


> It does give you an awful lot of information about the specific hacks

No it doesn't. Enter <old email address> → 5 data breaches → first one says:

> During 2025, the threat-intelligence firm Synthient aggregated 2 billion unique email addresses disclosed in credential-stuffing lists found across multiple malicious internet sources

It doesn't tell me which site or which of the many passwords used together with that address. Just that it has been in a generic data dump.


So it gives me the information that my email has been exposed.

Where? In what service? Did my password got leaked too? I can't change password / delete the account if I don't know where.

Did any other data got leaked? Anything sensitive? Do I have to cancel my credit card? Were any files leaked as well? My home location?

At this point HIBP is next to useless.

And how showing me WHAT is in the database about the email I proved I own would be spreading it? At this point if I want to learn it I need to either try to find the torrent with it (spreading it further!) or pay the criminals.


Btw they are not storing more info along the email address, because that would be way too risky. Just imagine the HIBP database being leaked.

Also, they don’t always know where your info has leaked. Some datasets are aggregates.


This information is given for each of the leaked incidents. Troy also explains this in his blog post.


At one point I responded to a haveibeenpwned notice by immediately having the user reset a password.

I've got over 200 users in a domain search (edit: for this particular incident), and nearly all of them were in previous credential breaches that were probably stuffed into this one. I'm not going to put them through a forced annoyance given how likely it is the breached password is not their current one, and I'm urging people to start moving in this direction unless you obtain a more concrete piece of advice.


Same here: reset on first beach (ROFB), but on subsequent ones only if it is no collection, eg a new infostealer breach.



This doesn't help. If the email address check says the address has been exposed it doesn't tell you which password that was used together with that has been exposed. Was it one from 10 years ago you don't even remember? Or that's still actively in use? Which one of my hundreds of passwords?


You can use the API to check all of your passwords. Then you'll know the security state of all of your passwords.

https://haveibeenpwned.com/API/v3


Doesn't help. Some accounts are old and may not be in my current PW DB. Or they were memorized, or forgotten.

If the thing suggests the EMAIL (+ associated password) has been compromised for some unknown account then to do a risk assessment I would have find which account it belongs to, not which currently-in-use passwords match the same datasets.

Those are different queries, providing different bits of information.


Here's what I'm suggesting: query all your current passwords against the password API. Then you'll know which of your current password are compromised. Change them.

You don't need to query old passwords, only current passwords. If you're talking about accounts that you've forgotten the password to: then do you care about those accounts? If yes, probably best to do a password reset and set a new password. If you don't care about the account, then why bother?

As for why HIBP doesn't provide an API linking passwords to emails: HIBP has no database that links passwords and emails. So they can't provide any way to query that. They don't want to be in the business of linking passwords to emails.


Of course it helps.

How's this for making it actionable:

Regardless of whether or not someone can associate it with your email, if your password has been seen in the wild, change it.

There you go.


It doesn't matter, don't use passwords that have been compromised. Period.


my password: 2,408

password: 46,628,605

your password: 609

good password: 22

long password: 2

secure password: 317

safe password: 29

bad password: 86

this password sucks: 1

i hate this website: 16

username: 83,569

my username: 4

your username: 1

let me login: 0

admin: 41,072,830

abcdef: 873,564

abcdef1: 147,103

abcdef!: 4,109

abcdef1!: 1,401

123456: 179,863,340

hunter2: 50,474

correct horse battery staple: 384

Correct Horse Battery Staple: 19

to be or not to be: 709

all your base are belong to us: 1


Spaces are skewing the numbers lower. Remove them from any of those and see the number increase at least an order of magnitude. That “let me login” goes from 0 to 4,714 just by removing spaces (“letmelogin”).


I guess this means passwords with spaces are safer!


correcthorsebatterystaple (no spaces) 4,163


Password2020: 109,729

Edit:

louvre: 7,219


> all your base are belong to us: 1

Only 1, really?


Because of the spaces.

Without spaces, it's 681.


I was trying random phrases just out of curiosity, and couldn't help but chuckle when it said "epsteinfiles" wasn't found :-)


[flagged]


You can check against the API with just the first characters of your hashed password (SHA-1 or NTLM), for example: https://api.pwnedpasswords.com/range/21BD1 or you can download the entire dataset.


How can you download the entire dataset?


You can download the entire dataset using curl (will be 40+ GB)

    curl -s --retry 10 --retry-all-errors --remote-name-all --parallel --parallel-max 150 "https://api.pwnedpasswords.com/range/{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}"


It's not that I couldn't have written that oneliner, it's that I assumed you'd get blocked very quickly.


It is officially recommended by the Troy Hunt: https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader/i...


That speaks to a certain confidence in one's servers ability to hold up under load, doesn't it?

"Oh you want your own copy? Sure, just thrash seven shades of shit out of the database. Here's how."


It's not a database, it's just files. And they are hosted by Cloudflare so they can cope with a lot of downloads.

I think he should make the files smaller my removing the second half of the hashes, i.e. reduce it from 40 hex digits to 20. This increases the change of a false positive (i.e. I enter my password, it says it was compromised but it wasn't, it just has the same hash as one that did) from 1 in 10^48 to 1 in 10^24 (per password), but that's still a huge number. (There's less than 10^10 people in the world, they only have a few passwords each). This will approximately halve the download, maybe more because the first half of each hash is more compressible (when sorted) the second half is totally random.


> It's not a database, it's just files. And they are hosted by Cloudflare so they can cope with a lot of downloads.

Database: a usually large collection of data organized especially for rapid search and retrieval (as by a computer) [1]

It is a database. Stop nitpicking.

[1] https://www.merriam-webster.com/dictionary/database


Confidence in Cloudflare, for sure.


That's crazy, thank you.


You are being purposefully obtuse here. HIBP is a very, very well established site with a long history of operating in good faith.


> > It's not that I couldn't have written that oneliner, it's that I assumed you'd get blocked very quickly.

> junon https://news.ycombinator.com/user?id=junon

> You are being purposefully obtuse here. HIBP is a very, very well established site with a long history of operating in good faith.

Allowing people to query and someone downloading the entire dataset is normally considered abuse, so being blocked is the expectation here. You're so dense you're bending light around you.


Several open source tools can be found on GitHub, but here’s the “official” one https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader


Second line I already notice:

> 000F6468C6E4D09C0C239A4C2769501B3DD:5894

... Does the 5894 mean what I think it does?


I remember when I was searching the file for some passwords my friends and family use, it took me a while to work out that number too. There are some passwords that many people seem to independently come up with and think must be reasonably secure. I suppose they are to the most basic of attacks.


5894 means that the password appeared 5894 times in the dataset.

5894 is not the password associated with the hash.


Yes, it did mean what I thought, then.

But I guess some passwords appear far more often than that in the dataset.


Some passwords are far more commonly used than others; that isn't surprising.


HaveIBeenPwned has been around for ages and it does not send your password to the server - you can check it with the browser console. It hashes it, sends a range of the hash to the server, server replies with a list of hashes that match that range and it's checked locally for a match.


Still, I would not trust that. The password could be leaked through other means, for example by setting a timer, and exfiltrating fragments of it across future requests.

The website loads some external fonts and spits out many warnings in the console by default. Does not instill confidence in the truly paranoid hacker.


You can hash yourself and check against the api with 5 lines of python


That level of care is warranted, but you'll find that you are given the tools to audit and it will pass.


You can check it yourself by looking up the hash prefix and searching for your hashed password.


Man, there's a ton of non-obvious ways they could exfiltrate that. I'm not going to read their code.


I was going to provide my passwords to any random person on the internet, Troy Hunt might be close to the top of the list, but I think your sentiment is sensible.

I remember searching the dataset being fairly straight forward. It's been a while since I've done it, but I think I just downloaded the text file and then grepped it for hashes of my passwords, but I see people doing much more useful things:

https://medium.com/analytics-vidhya/creating-a-local-version...


You can download all the hashes and check against them locally. https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader


> Passwords are protected with an anonymity model, so we never see them (it's processed in the browser itself), but if you're wary, just check old ones you may suspect.

That could mean one might be able to disconnect from the internet while checking.


No, it doesn't mean that, that's ridiculous. How would that work? Magic?


Download all the hashes first - not practical.


It's more practical than you may think. Just needs about 40 GBs right now. I did it a couple years back in a fit of peculiar paranoia, downloaded the full hash list and checked all my KeePass-stored passwords at that time against it.

https://github.com/HaveIBeenPwned/PwnedPasswordsDownloader


The above post https://news.ycombinator.com/item?id=45840724 links to 71.3 KiB of data; since it's a 5-nybble prefix (20 bits) we may easily estimate a size of 71.3 GiB assuming that's a representative sample. Not unfeasible nowadays, but it seems you do have to make separate requests and would presumably be rate-limited on them.

If you only download the hash pages corresponding to passwords you hold, even supposing that everything else is fully compromised, an attacker would have to reverse a couple thousand SHA-1 hashes, dodge hash collisions, and brute-force with the results (yes, yes: arson, murder and jaywalking) to pwn you.


One possible solution could be to give you an option to send the affected password as a list to the mail address you specify, then only people with access to that mail address will see them


Hash of the affected password? People share these things and don't always run their own mail servers.


That would be a great idea!


The details about the “Stealer Logs” on the dashboard even state:

> The websites the stealer logs were captured against are searchable via the HIBP dashboard.

There is no way to use the HIBP dashboard to figure out what domains my email address appears against.

Am I meant to change all passwords associated with that email address? Or do I need to get a paid subscription to query the API to figure out exactly what password(s) to change?

This has always confused me. On the one hand, HIBP is an invaluable service, but, on the other, it does nothing more than stating you’re in trouble, with no clear way forward.


It's quite certainly a up selling attempt. I once spend a couple of hours to see what was actually exposed in the infostealer breach my email appeared (eg: payment data? Physical address? Government id ?) to no avail.

This service is toxic tbh.



Respectfully, in context of my claim (that this is upselling attempt), your answer is untrue.

"You need an active subscription in order to provision an API key".

This is minimum $4.50 pm. Of course it's not a lot but let's not move the goalposts by discussing whether it's a fair price or not.

I don't want to say it's a lie, because I assume you didn't know.

API is a paid service, not free.

Separately, if I open the dashboard link while being logged out, the Web page promises:

"viewing stealer log entries that captured your email address"

Needless to say, this is also false (maybe true with a paid subscription?). If I click on the Stealer Logs in the dashboard it only shows "discord.com" (old account I used with this email was deleted years ago), and nothing else. Even though Breaches suggests there's something else.

This is not "logs" by any stretch of imagination.


You don't need a paid subscription. The API is free.

https://haveibeenpwned.com/API/v3



Only if you want to search by account. If you want to search by password, it's free. You can query all your passwords to see which ones are breached, and change those.

> Authorisation is required for all APIs that enable searching HIBP by email address or domain, namely retrieving all breaches for an account, retrieving all pastes for an account, retrieving all breached email addresses for a domain and retrieving all stealer log domains for a breached email addresses. There is no authorisation required for the free Pwned Passwords API.

And searching by account wouldn't tell you anything useful. It would just say "Synthient Credential Stuffing Threat Data". It wouldn't tell you what password to change, because HIBP doesn't know what site the password(s) that it found in "Synthient Credential Stuffing Threat Data" were associated with, and HIBP doesn't maintain a database linking passwords to emails.


The only part of the API that is free is the passwords API, which would not help for this use case.

Every other endpoint requires a subscription. This is very far from “The API is free”.

> searching by account wouldn't tell you anything useful

The API can return the domains listed in stealer logs for a specific email address: https://haveibeenpwned.com/API/v3#StealerLogsForEmail


Sorry, I missed that you were talking about stealer logs. This specific credential dump of 2B emails wasn't a stealer log, so stealer log info will not tell you anything about this specific credential dump.

You're right that the API for stealer log info isn't free.

However, the dashboard can provide you information about stealer logs for free.

https://haveibeenpwned.com/Dashboard#StealerLogs


Yeah and I am confused by his new setup private vs business. I got that mail too but can simply not see what addresses were affected by that breach.


What? You expect the guy to tell you your password? Lol, lmao even.

I know roughly what passwords were exposed because either I remember it, or the date of the leak or the associated email.

I know simple passwords are almost public and that leaks of say linkedin will be properly hashed, while a vb forum from 2006 might not be.


Per https://www.truthwave.com/legal/terms-of-service

> 8.6 Indemnification. If you behave in a way that gets us in legal trouble, we may exercise legal recourse against you. You agree to indemnify, defend (if we so request), and hold harmless TruthWave and our officers, directors, suppliers, partners, and agents from and against any third-party claims, demands, losses, damages, or expenses (including reasonable attorney fees) arising from (a) the content you post or submit, (b) your use of the Services (c) your violation of these Terms, or (d) your violation of any rights of a third party. Your indemnification obligation will survive the termination of these Terms and your use of the Services.

So if I submit a tip to TruthWave, and they get sued, I'm on the line to pay for TruthWave's legal defense? Yeah...no.


Conservatives want to cut taxes to force liberals to cut spending. Liberals want to increase spending to force the conservatives to increase taxes. But voters don't like increased taxes (on themselves) or decreased spending. So we end up with the worst of both worlds--higher spending without the tax revenue to afford it. Then we get to watch each side win the war of opinion on cable news. They throw each other under the bus, eventually come to a deal, and both sides claim victory. Rinse. Repeat.


> Conservatives want to cut taxes to force liberals to cut spending. Liberals want to increase spending to force the conservatives to increase taxes.

This is straight propaganda. Both parties increase spending. One party cuts taxes for the rich, the other increases taxes on the rich.

Conservatives haven't been "small government" and haven't cut taxes for folks other than the rich in a long, long time. Even calling them "small government" is a misnomer, because that propaganda is about privatization, not reducing costs (private services are less efficient and more costly than the government service they replace!)

Let's stop calling the problem "both-sides". One side is considerably worse for the economy, and you and I know it's the conservatives.


A 'cost' paid for by a private service directly translates into prices and salaries. Whereas a government wealth transfer program appears as taxation and handouts. Everyone likes the former. No one likes the latter. Why this is still confusing in 2025 is beyond me


This is why I say it's propaganda.

The same underlying service is being done. It hasn't been cut. It's been shifted from a government service, to a private industry. The government pays roughly the same amount they were previously paying (or in a lot of cases slightly more, with the promise of paying less in the future), the private company provides the service as cheaply as they can, and takes a cut of the cost for profit, benefiting a small number of people, while providing a worse service level for tax payers.

This isn't a capitalism vs socialism thing. My issue with it is that privatization is blatant corruption sold as "capitalism". There's no capitalism here, because there's no competition, outside of the bidding on the contract. Competing on who can provide the cheapest service doesn't improve the service; in fact, it reduces the service quality to maximize profit. Reducing service quality to maximize profit would be fine, from a capitalism point of view, if others were offering the same service to the customer.


It really depends on the industry, but 'bidding for a contract' does not entail a lack of competition. Yes, for certain industries, like utilities, there is no competition based on the way things are set up (and really based on the foreseeable way in which things could be set up). For things such as requisitions of commodity items, then competition is not only possible but preferable. So I disagree with your blanket characterization of things. People really need to have more nuance when discussing these things. Privatizing railroads is different than privatizing food processing.


I'm saying it's not competitive because they're not competing to provide a better service, they're competing to provide the same service at a lower cost, but the quality of the service is effectively always worse than the original government provided service. That isn't capitalism.

Capitalism would be to provide multiple options to the users of the service, and have the providers compete against each other in a proper market.

I used to work for the government (Naval Oceanographic Office), and I worked with the contracting agencies on areas that had been privatized and it was a nightmare. Every few years you'd have multiple companies bid to run the service, but for the most part the same contractor would win the bid because they wrote the software in such a way that only they could run. It had relatively no documentation, had piss poor processes wrapping it, and the subject matter experts worked for the contracting agency. When the contract did change, everything would grind to a halt. For sure, that was more expensive than the original government provided service, but once something is privatized, it can never go back.

I agree we need to have more nuance here. You for some reason think I'm suggesting that "things such as requisitions of commodity items" shouldn't be private, which is not at all what I'm saying. I'm saying that existing government provided services, like the post office, for example, are run cheaper and more effectively by the government, and turning services like these private is for the sake of corruption.


> Conservatives... haven't cut taxes for folks other than the rich in a long, long time.

This is a lie:

https://www.nytimes.com/2019/04/14/business/economy/income-t...

> private services are less efficient and more costly than the government service they replace!

Yes, this is why capitalism famously collapsed in the 90s, and all of the formerly capitalist countries socialized their economies!


> The tax savings were relatively small for many families, however. The middle fifth of earners got about a $780 tax cut last year on average, according to the Tax Policy Center.

> The top 20 percent of earners received more than 60 percent of the total tax savings, according to the Tax Policy Center; the top 1 percent received nearly 17 percent of the total benefit, and got an average tax cut of more than $30,000. And that’s not even factoring in the law’s huge cut to corporate taxes, which disproportionately benefit the wealthy households that own the most stock.

Don’t be part of the problem Marcus. The reality is cutting taxes for the poor by $2/day, for the rich by $80/day, and telling everyone they got a tax break with a straight face… while you simultaneously cut services, issue policies that cause inflation, and levy taxes domestically on the poor through tariffs is the republican way!


This makes me laugh.

> The tax savings were relatively small for many families, however. The middle fifth of earners got about a $780 tax cut last year on average, according to the Tax Policy Center.

If you take someone who pays a small amount of taxes (the middle fifth paid $2170 in taxes in 2017), and give them a big tax cut ($780 in savings would mean they got a ~30% cut), the number is still small. Pretending that this is insignificant is just goofy.

> The top 20 percent of earners received more than 60 percent of the total tax savings.

People who pay the most taxes get the most out of tax cuts? Scandalous! Income taxes paid by quintile:

Lowest: $-476

Fourth: $-677

Third: $2170

Second: $6952

First : $31,132

https://fred.stlouisfed.org/series/CXUFEDTAXESLB0102M

https://fred.stlouisfed.org/series/CXUFEDTAXESLB0103M

https://fred.stlouisfed.org/series/CXUFEDTAXESLB0104M

https://fred.stlouisfed.org/series/CXUFEDTAXESLB0105M

https://fred.stlouisfed.org/series/CXUFEDTAXESLB0106M


Lowest and fourth got a tax increase. Middle class got a minor tax cut, the middle-upper class got a decent cut, and the upper class got a large cut.

It's also worth noting that the cut in question also had temporary provisions that expired, causing a tax increase for nearly all brackets except the upper class.

There has also been a massive cut in social services, which primarily affect the lowest/fourth brackets, which means they're paying more taxes for fewer services.

It's hard to interpret that "tax cut" in a way that doesn't scream "we're increasing taxes on most, and cutting services, to give the wealthy a tax cut".


It’s your link buddy, just quoting your own source.

$780 is insignificant.

Tax policy is written by humans, and they can do what they like with it. If you want to cut taxes for the poor, you do. If you want to cut taxes for the ultra wealthy, but make sure the statistics say poor people got a tax cut, you can do that too.

If you paid 1M in taxes, a 30% cut is 300k, and if you paid 1k it’s $300. One person will buy some bitcoin or a Porsche, the other will be lucky to buy some gas and groceries.

Both got a 30% tax cut, but it would be goofy to claim they have equivalent value.

If you wanted to be an honest person, maybe you correct the poster that there were tax cuts for the poor, but also point out that the cuts heavily favored the wealthy. Which by the way, was their argument.


> $780 is insignificant.

Yeah, just stating your opinion isn't an argument.

> Tax policy is written by humans, and they can do what they like with it.

Insightful!

> If you want to cut taxes for the poor, you do.

What taxes? The poor pay negative income taxes. Did you read the post you're responding to?

> If you want to cut taxes for the ultra wealthy, but make sure the statistics say poor people got a tax cut, you can do that too.

I would love for the poor to pay zero taxes. It would be an improvement over the amount they "pay" now!

> If you paid 1M in taxes, a 30% cut is 300k, and if you paid 1k it’s $300. One person will buy some bitcoin or a Porsche, the other will be lucky to buy some gas and groceries.

Okay.

> If you wanted to be an honest person

You don't have to seethe, you know. You can be wrong without letting everyone know that you're miserable and angry about an internet post.

> ... maybe you correct the poster that there were tax cuts for the poor, but also point out that the cuts heavily favored the wealthy. Which by the way, was their argument.

You're almost caught up! Now that argument was I making in response? If you tried to understand instead of trying to misunderstand (or worse, just vomiting angry words without any substance to them), you might learn something!


As an outside observer: it's your comment that seems seething and out of place on HN, not theirs.


The USA has no negative income tax. There are programs like the EITC which provide benefits to the poor and can be larger than their tax burden depending on the specific circumstances.

The EITC was initially signed into law by Ford (R) and expanded by Reagan (R). Regan apparently called it "the best anti-poverty, the best pro-family, the best job creation measure to come out of Congress".

I'm sure you knew all this, so thanks for being honest in this post about the fact that you would like to dismantle this particular social safety net.

Seething comment sounds like projection btw, I'm not mad. The whole point of HN is to have the discussion expand in detail. Seems like it's working:

- Someone generalized - You called them a liar - We found out the generalization wasn't strictly correct, but basically true in spirit: the wealthy received the majority of the benefit, the poor got a small token for the sake of statistics / sound bytes.


> The USA has no negative income tax.

They're called refundable tax credits. They result in people being net recipients of the income tax after refunds are paid out. This is a negative income tax.

> Someone generalized - You called them a liar

They didn't "generalize", they made a claim which is literally and undeniably untrue. That is a lie.

> We found out the generalization wasn't strictly correct, but basically true in spirit.

We found that the people he claimed didn't get a tax cut actually got a 30% tax cut. That's an obvious, blatant lie.


Putting aside for a moment that 30% of nothing is nothing, where are you even getting this idea that someone got a 30% tax reduction?

Look at the plot you shared "Personal Taxes: Federal Income Taxes by Quintiles of Income Before Taxes: Third 20 Percent (41st to 60th Percentile)"

2015: 1854

2016: 1954 (+100)

2017: 2170 (+216)

2018: 2676 (+506)

2019: 2519 (-157)

What about these numbers makes you think that the third quintile on average got a 30% tax cut?


The underlying belief of both is that America is exceptional thus will magically be saved from debt.

Now America is exceptional for many reasons, but if we don't fix our debt we will meet the same unexceptional fate as many empires before us.


U.S. government debt is the safest investment you can make today. If you "write if off" by simply telling debt holders you won't pay them, you'd make U.S. government debt the most unsafe investment possible. Nobody would ever buy U.S. government debt. That, and peoples' pensions and other retirement accounts that hold U.S. government debt would be hammered.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: