Me too. I've also found that even when trying to restrict models meant for these tasks, they tend to go on tangents and waste tremendous amounts of tokens without providing meaningfully better outputs. I'm not yet sold on these models for anything outside of fuzzy tasks like "does this logic seem sound?". They tend to be good at that (though they often want to elaborate excessively or propose solutions excessively).
the inspector at the end was a neat surprise! I had some issues trying to build the examples on windows but I think it's an opportunity to contribute to the project