Your intuition is spot on. With this line of argument you will end up with Thomp...

Your intuition is spot on. With this line of argument you will end up with Thompson Sampling [0]. TS is an old algorithm from the 1930s. It is disconcertingly effective, even when the underlying Bayesian assumptions are not correct. It is also extremely hard to analyze. This led to a weird state of affairs - TS gave excellent empirical performance but we could not make any theoretical statements about its efficacy. It is only very recently that this has changed.

[0] https://en.wikipedia.org/wiki/Thompson_sampling