Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That would be reinforcement learning. The juice is quite hard to squeeze.


Agreed for most cases.

Each Cursor rule is a byproduct of tons of work and probably contains lots that can be unpacked. Any research on that?


Yeah, at a very high level it's similar to an actor-critic reinforcement learning algorithm. The rule text is a value function and one could build a critic model that takes as input the rule text and the main model's (the actor's) output to produce a reward.

This is easier said than done though because this value function is so noisy it's often hard to learn from it. And also whether or not a response (the model output) matches the value function (the Cursor rules) is not even that easy to grade. It's been easier to train the chain-of-thought style reasoning since one can directly score it via the length of thinking.

This new paper covers some of the difficulties of language-based critic models: https://openreview.net/pdf?id=0tXmtd0vZG

Generally speaking, the algorithm and approach is not new. Being able to do it in a reasonable amount of compute is the new part.


Suggestion was even simpler - feed a reasoning model a prompt like “tell me a few reasons a user might’ve created this Cursor rule: {RULE_TEXT}”

Do that for a bunch of rules scraped from a bunch of repos - and you’ve got yourself a dataset for training a new model with - or maybe for fine tuning.


Yeah, go for it.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: