The tooling has become somewhat mature, and the community is growing so there are more experts to help out when you are stuck. For example, this project relied heavily on the tooling developed to decompile the Pokemon GBA games.
This may be true, but since the game was published by THQ in the west who then went bankrupt, no one knows who owns the rights to this game and so no one is likely to ever release it. If source was ever released we still wouldn't use it in the decompilation since that then becomes much more of a legal issue.
Dimps developed a "game engine" of sort for their GBA games. As far as I understand SA1 was the first implementation of this engine and then they iterated upon it for their future games. It's an extremely minimal engine, implements some helpers for rendering sprites and backgrounds, and a task system (since the GBA doesn't have threading or any task system in it's SDK).
You can see this SDK in the root of the src, everything in game was written specifically for the Sonic Advance trilogy.
Do you wanna develop that tool? The sprite and midi extraction is pretty complex in this game, it was much easier for us to extract them by hand and then in the future we will produce something which extracts them automatically. Until we have that tooling in place we don't plan to release any binaries.
If that happens we will keep the repo private and develop the tooling to extract these in an automated way. As others have said, SEGA doesn't police their IP in this way. Checkout this game which has existed since the early 2000s https://www.srb2.org/
Just omit the problematic parts from public? And perhaps you want to migrate to some more resilient and friendly host than github.com, while you still have a chance to redirect people. All it takes is one form submission to nuke your repo.
(Assuming you care about your work and sharing it. Sure "Sega is chill" so far but you are really pushing it here and giving them a great display of why they might want to reconsider)
For matching decompilations like this, pretty terrible. It can give a rough layout of a function with some branching, but it fails to create reasonable human like structs (which have to be inferred from the assembly), and matching what it has created can be the majority of the work. We did this without any AI assistance, but I relied on the Ghidra decompilation feature for outlining a functions layout (though even that had it's limitations).
The training data is just too minimal for this sort of thing. The decomp.me database would probably be really good to train a model on.