marshyj's comments

marshyj · 2025-09-17T16:57:16 1758128236

If China is ok spending a few years catching up on chips then they must not think that "AGI" or a serious takeoff of AI is near.

kamikazeturtles · 2025-09-17T18:00:58 1758132058

Does any normal person think "AGI" is real?

I thought that was just the marketing strategy execs employed to get regulatory capture and convince all the AGI pilled researchers to work for them

yatopifo · 2025-09-17T20:21:10 1758140470

They seem to be highly pragmatic. Rather than chasing AGI, they are more interested in what can be done with today's technology. Any breakthrough towards AGI will inevitably leak quickly, so they'll be able to catch up as long as the foundation is ready. In a bicycle race, it can be quite beneficial to travel behind the leader and enjoy a reduction in drag forces. Perhaps that's their guiding principle.

marshyj · 2025-06-25T20:55:55 1750884955

I do like context engineering better, I also agree that there's a lot that goes into getting good answers out of LLMs and GPT wrapper is a gross oversimplification for many of the products being built on top of them. Just putting good evals in place is often a complicated task.

jangletown · 2025-06-26T14:05:12 1750946712

That's true, we have been trying to help customers doing evals for ages now, and it's super hard for everyone to build a really good dataset and define great quality metrics

just wanted then to shameless plug this lib I've built recently for this very topic, because it's been much easier to sell that into our clients than evals really, because it's closer to e2e tests: https://github.com/langwatch/scenario

instead of 100 examples, it's easier for people to think on just the anecdotal example where the problem happens and let AI expand it, or replicate a situation from prod and describe the criteria in simple terms or code