More

jonathan-adly · 2025-08-28T13:24:26 1756387466

Basically- the same math as modern automated manufacturing. Super expensive and complex build-out - then a money printer once running and optimized.

I know there is lots of bearish sentiments here. Lots of people correctly point out that this is not the same math as FAANG products - then they make the jump that it must be bad.

But - my guess is these companies end up with margins better than Tesla (modern manufacturer), but less than 80%-90% of "pure" software. Somewhere in the middle, which is still pretty good.

Also - once the Nvidia monopoly gets broken, the initial build out becomes a lot cheaper as well.

Workaccount2 · 2025-08-28T13:56:21 1756389381

The difference is the money printer right now only prints for ~6 months before it needs to be replaced with an even more expensive printer.

churchill · 2025-08-28T14:35:35 1756391735

And if you ever stop/step off the treadmill and jack up prices to reach profitability, a new upstart without your sunk costs will immediately create a 99% solution and start competing with you. Or more like hundreds of competitors. Like we've seen with Karpathy & Murati, any engineer with pedigree working on the frontline models can easily raise billions to compete with them.

Expect the trend to pick up as the pool of engineers who can create usable LLMs from scratch increases through knowledge/talent diffusion.

Workaccount2 · 2025-08-28T15:13:51 1756394031

The LLM scene is an insane economic bloodbath right now. The tech aside, the financial moves here are historical. It's the ultimate wet dream for consumers - many competitors, face-ripping cap-ex, any missteps being quickly punished, and a total inability to hold back anything from the market. Companies are spending hundreds of billions to put the best tech in your hands as fast and as cheaply as possible.

If OpenAI didn't come along with ChatGPT, we would probably just now be getting Google Bard 1.0 with an ability level of GPT-3.5 and censorship so heavy it would make it useless for anything beyond "Tell me who the first president was".

hugedickfounder · 2025-08-28T13:35:51 1756388151

the difference is you can train on outputs deepseek style, there are not gates in this field profit margins will go to 0

jonathan-adly · 2025-08-27T14:23:31 1756304611

We have been running this playbook for the last 2 years in healthcare, and we have been super successful. Doubling every quarter over the last year. 70%+ profitability, almost 7 figures of revenue. 100% bootstrapped.

People are still mentally locked in to the world where code was expensive. Code now is extremely cheap. And if it is cheap, then it makes sense that every customer gets their own.

Before - we built factories to give people heavy machinery. Now, we run a 3d printer.

Everyday I thank SV product-led growth cargo cults for telling, sometimes even forcing our competition to not go there.

jonathan-adly · 2025-07-03T16:34:08 1751560448

One of the most pleasant experiences I had writing code, is early AI days when we did hyperscript SSE. Super locality of behavior, super interesting way of writing Server Sent Events code.

eventsource demo from http://server/demo

    on message as string
        put it into #div
    end

    on open
        log "connection opened."
    end

    on close
        log "connection closed."
    end

    on error
        log "handle error here..."
    end

end https://hyperscript.org/features/event-source/

jonathan-adly · 2025-07-02T21:38:38 1751492318

Lots of YC companies copy each other process and selection criteria. Basically- they all have the same blind spots and look for the same type of engineer.

So, super easy to scam all of them with the same skillset and mannerism.

jonathan-adly · 2025-06-17T22:34:57 1750199697

I send this article as part of onboarding for all new devs we hire. It is super great to keep a fast growing team from falling into the typical cycle of more people, more complexity.

jonathan-adly · 2025-01-14T16:17:05 1736871425

So, there is a whole world with vision based RAG/search.

We have a good open-source repo here with a ColPali implementation: https://github.com/tjmlabs/ColiVara

ResearchAtPlay · 2025-01-14T17:24:16 1736875456

Thanks for the link to the ColPali implementation - interesting! I am specifically interested in evaluation benchmarks for different image embedding models.

I see the ColiVara-Eval repo in your link. If I understand correctly, ColQwen2 is the current leader followed closely by ColPali when applying those models for RAG with documents.

But how do those models compare to each other and to the llama3.2-vision embeddings when applied to, for example, sentiment analysis for photos? Do benchmarks like that exist?

jonathan-adly · 2025-01-14T17:37:06 1736876226

The “equivalent” here would be Jina-Clip (architecture-wise), not necessarily performance.

The ColPali paper(1) does a good job explaining why you don’t really want to directly use vision embeddings; and how you are much better off optimizing for RAG with a ColPali like setup. Basically, it is not optimized for textual understanding, it works if you are searching for the word bird; and images of birds. But doesn’t work well to pull a document where it’s a paper about birds.

1. https://arxiv.org/abs/2407.01449

ResearchAtPlay · 2025-01-14T19:56:32 1736884592

Makes sense. My main takeaway from the ColPali paper (and your comments) is that ColPali works best for document RAG, whereas vision model embeddings are best used for image similarity search or sentiment analysis. So to answer my own question: The best model to use depends on the application.

jonathan-adly · 2025-01-14T01:05:40 1736816740

Great work! Honestly it helps so much just explaining these metrics for folks.

Early on RAG was an art, now when things are stabilized a bit, it’s more of a science - and vendors should at a minimum have some benchmarks.

plurch · 2025-01-14T01:31:16 1736818276

Thanks! Yes, evaluations and benchmarks are fundamentally important. It's the only way to know if you are actually making improvements.

jonathan-adly · 2025-01-10T20:52:09 1736542329

I would like to through our project in the ring. We use ColQwen2 over a ColPali implementation. Basically, search & extract pipeline: https://docs.colivara.com/guide/markdown

jonathan-adly · on Dec 20, 2024

Here is a nice use-case. Put this in a pharmacy - have people hit a button, and ask questions about over-the-counter medications.

Really - any physical place where people are easily overwhelmed, have something like that would be really nice.

With some work - you can probably even run RAG on the questions and answer esoteric things like where the food court in an airport or the ATM in a hotel.

swatcoder · on Dec 20, 2024

> Put this in a pharmacy - have people hit a button, and ask questions about over-the-counter medications.

Even if you trust OpenAI's models more than your trained, certified, and insured pharmacist -- the pharmacists, their regulators, and their insurers sure won't!

They've got a century of sunk costs to consider (and maybe even some valid concern over the answers a model might give on their behalf...)

Don't be expecting anything like that in an traditional regulated medical setting any time soon.

dymk · on Dec 21, 2024

The last few doctors appointments I’ve had, the clinician used a service to record and summarize the visit. It was using some sort of TTS and LLM to do so. It’s already in medical settings.

swatcoder · on Dec 21, 2024

Transcription and summary is a vastly different thing than providing medical advice to patients.

pixelsort · on Dec 20, 2024

Thanks for digging that out. Yes, that makes sense to me as someone who made a fully local speech-2-speech prototype with Electron, including VAD and AEC. It was responsive but taxing. I had to use a mix of specialty models over onnx/wasm in the renderer and llama.cpp in the main process. One day, multimodal model will just do it all.

jonathan-adly · on Dec 6, 2024

Hi HN!

We benchmarked two ways to improve the latency in RAG workflows with a multi-vector setup. Hybrid search using Postgres native capabilities and a relatively new method of token pooling. Token pooling unlocked up to 70% faster latency with <1% performance cost.