If we expand this to 3 years, the single biggest shift that totally changed LLM development is the increase in size of context windows from 4,000 to 16,000 to 128,000 to 256,000.
When we were at 4,000 and 16,000 context windows, a lot of effort was spent on nailing down text splitting, chunking, and reduction.
For all intents and purposes, the size of current context windows obviates all of that work.
What else changed?
- Multimodal LLMs - Text extraction from PDFs was a major issue for rag/document intelligence. A lot of time was wasted trying to figure out custom text extraction strategies for documents. Now, you can just feed the image of a PDF page into an LLM and get back a better transcription.
- Reduced emphasis on vector search. People have found that for most purposes, having an agent grep your documents is cheaper and better than using a more complex rag pipeline. Boris Cherny created a stir when he talked about claude code doing it that way[0]
>For all intents and purposes, the size of current context windows obviates all of that work.
Large context windows can make some problems easier or go away for sure. But you may still have the same issue of getting the right information to the model. If your data is much larger than e.g. 256k tokens you still need to filter it. Either way, it can still be beneficial (cost, performance, etc.) to filter out most of the irrelevant information.
>Reduced emphasis on vector search. People have found that for most purposes, having an agent grep your documents is cheaper and better than using a more complex rag pipeline
This has been obvious from the beginning for anyone familiar with information retrieval (R in RAG). It's very common that search queries are looking for exact matches, not just anything with similar meaning. Your linked example is code search. Exact matches/regex type of searches are generally what you are looking for there.
When we were at 4,000 and 16,000 context windows, a lot of effort was spent on nailing down text splitting, chunking, and reduction.
For all intents and purposes, the size of current context windows obviates all of that work.
What else changed?
- Multimodal LLMs - Text extraction from PDFs was a major issue for rag/document intelligence. A lot of time was wasted trying to figure out custom text extraction strategies for documents. Now, you can just feed the image of a PDF page into an LLM and get back a better transcription.
- Reduced emphasis on vector search. People have found that for most purposes, having an agent grep your documents is cheaper and better than using a more complex rag pipeline. Boris Cherny created a stir when he talked about claude code doing it that way[0]
https://news.ycombinator.com/item?id=43163011#43164253