As other commenters have pointed out in one way or another, the problem seems to actually be that this simplistic model of voter choice can't capture all the structure of the real world that humans can quickly infer from the setup. Things like: state elections have millions of voters, 55/45 is actually a decisive, not a narrow win etc.
In a generic setup, imagine you have a binary classifier that outputs probabilities in the .45-.55 range - likely it won't be a really strong classifier. You would ideally like polarized predictions, not values around .5.
Come to think of it, could this be an issue of non-ergodicity too ( hope I'm using the term right)? i.e. state level prior is not that informative wrt individual vote?
This is not a matter of class balance that much. If you want to predict which of two parties somebody will vote with, the most natural framing is that of binary classification.
For that you need to threshold your predictions. Ideally you'd like your model to generate a bimodal distribution so that you can threshold without many false positives etc.
In a generic setup, imagine you have a binary classifier that outputs probabilities in the .45-.55 range - likely it won't be a really strong classifier. You would ideally like polarized predictions, not values around .5.
Come to think of it, could this be an issue of non-ergodicity too ( hope I'm using the term right)? i.e. state level prior is not that informative wrt individual vote?