One thing that confused me is that you introduce the "preference" for an action, but don't really define it. Some exposition about what that means would have helped.
Ah yes, I didn't want the reader to fuss too much about the idea of a preference, since it's really just an analog of un-normalized pi(modulo the exponentiation part). It doesn't have a strict definition nor is it a formal term. I will un-bold it and italicize it instead. Thank you :)