Essentially, the leading way to do automatic music transcription is to train a n...

Essentially, the leading way to do automatic music transcription is to train a neural network on supervised data, i.e., paired audio-MIDI data. In the case of piano recordings, there is a very good dataset for this task which was released by Google in 2018:

https://magenta.tensorflow.org/datasets/maestro

Most current research involves refining deep learning based approaches to this task. When I worked on this problem earlier this year, I was interested in adding robustness to these models by training a sort of musical awareness into them. You can see a good example of it in this tweet:

https://x.com/loubbrad/status/1794747652191777049