Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> When I am searching for something, I usually want to find primary sources.

And therein lies the rub; for years now Google's search results have returned useless SEO garbage. For now, it definitely seems like an LLM answer is better than what was being returned and I guess this is the reason why Google ripped it out.

 help



An LLM answer is not "better", it's in a completely different category. LLM answers can be useful, for topics where you can easily verify a fact (i.e if you ask for a Linux command and it gives you one, you can run it and see if it did what you wanted), or for topics which are more opinion than pure fact ("list some trade-offs between decision A and decision B"). But when you want information that's provided by some authoritative source, you want to see it from that source.

Google Search has been terrible for a long time. But you could still dig through it and find those primary sources. That is, in my opinion, the primary purpose of a search engine. Replacing it with what an LLM has invented based on ingesting both reliable and unreliable sources is not viable for a large category of things. The main way we can judge the reliability of something is to loo at where it comes from. If I'm looking for, say, official US job market statistics, whether I trust the numbers I find depends on whether I find them published on a US government website or on a random person's blog. A number presented to me by a chat bot would not let me judge, so it's useless.

The best a language model could possibly do, by definition, is to find websites and link them to me, letting me judge their credibility. But then it's just a worse search engine.


> But you could still dig through it and find those primary sources. That is, in my opinion, the primary purpose of a search engine.

And you are a small minority. People go to google to get answers, not to look for articles in order to look for answers in the articles.


Yeah all you need is answer-shaped text. Why would the truth of that answer matter at all?

Personally I think I've developed a pretty good sense of when a question is easy enough that I can just trust the AI overview, and when I need to dig deeper. Google's original AI overviews were not reliable enough to ever trust, but now they are usually accurate summaries of the cited sources.

Job market statistics are actually probably a strong point for the AI overview. I just Googled 'us job market last month' and got an AI overview that accurately summarized a New York Times article for qualitative information ("surprisingly strong 115,000 jobs", "no-hire, no-fire"), followed by accurately summarizing the official Bureau of Labor Statistics website for raw stats, followed by some other stuff I didn't check. Not everyone would prefer The New York Times' take, but the citation prominently displays their name and logo, so you can tell what you're getting.

Weak points are when the topic is obscure enough that the AI overview conflates two different things or overgeneralizes, or trusts the wrong sources.


If Google can't filter out the SEO spam from their results, why do you think they did it for the LLM training data?

The training process literally ingests the majority of text on the internet, including a huge volume of SEO garbage, and seeks to create a self-consistent compressed model of that. This is totally imperfect of course but is also likely more truthful than the median Google result, because of the incentive for self-consistency and coherence that is created by the reward function as well as during RL.

Imagine that you had 1,000 years to read every Google result on a particular topic, and literally infinite patience. You would read a lot of rubbish but ultimately you are a smart person, you would figure out the underlying truth and likely produce something that is more valuable than the average or even the sum of the parts.


Honestly this feels like wishful thinking. If they could do it at all, they could do it to fix search.

Why are you assuming that they want to filter out the SEO spam?

It's a new frontier and people have not targeted it yet?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: