This looks similar to Triton, I wonder what it does differently. But in any case, for any of these libraries, it would be awesome if it could output object files from this, with PTX or SASS code. Then it can be linked into a binary instead of needing a Python environment to run it.
Warp outputs its intermediate GPU CUDA or CPU C++ files that can be compiled and linked into a binary. Here is an old example of mine calling Warp kernels from C++: https://github.com/erwincoumans/warp_cpp
Triton offers broad GPU support for writing high throughput kernels. Some higher level ML/AI tools, such as PyTorch, can use Triton internally. I don’t know off the top of my head if any simulation libraries do.