That would be reinforcement learning. The juice is quite hard to squeeze.

cadamsdotcom · 2025-06-30T22:49:09 1751323749

Agreed for most cases.

Each Cursor rule is a byproduct of tons of work and probably contains lots that can be unpacked. Any research on that?

kevmo314 · 2025-07-01T04:48:52 1751345332

Yeah, at a very high level it's similar to an actor-critic reinforcement learning algorithm. The rule text is a value function and one could build a critic model that takes as input the rule text and the main model's (the actor's) output to produce a reward.

This is easier said than done though because this value function is so noisy it's often hard to learn from it. And also whether or not a response (the model output) matches the value function (the Cursor rules) is not even that easy to grade. It's been easier to train the chain-of-thought style reasoning since one can directly score it via the length of thinking.

This new paper covers some of the difficulties of language-based critic models: https://openreview.net/pdf?id=0tXmtd0vZG

Generally speaking, the algorithm and approach is not new. Being able to do it in a reasonable amount of compute is the new part.

cadamsdotcom · 2025-07-01T06:13:15 1751350395

Suggestion was even simpler - feed a reasoning model a prompt like “tell me a few reasons a user might’ve created this Cursor rule: {RULE_TEXT}”

Do that for a bunch of rules scraped from a bunch of repos - and you’ve got yourself a dataset for training a new model with - or maybe for fine tuning.

kevmo314 · 2025-07-01T13:23:06 1751376186

Yeah, go for it.