It used to be that .dlls loaded by the .exe on startup (e.g. implicitly listed there) would get their thread local vars correctly (TLS), but dlls loaded later (like /DELAYLOAD or through LoadLibrary) would not. (the workaround was to initialize these through TlsAlloc/TlsFree, and have hook in DllMain to clean up)
Note that MinGW uses libwinpthread, which is known to have slow TLS behavior anyway (I've observed a 100% overhead compared to running the same program under WSL using a linux-native GCC). c.f. https://github.com/msys2/MINGW-packages/discussions/13259
then last year clang-cl also added ways to disable this (if need to), probably this hit some internal issue and had to be resolved. Maybe "thread_local" have become more widely used (unlike OS specific "TlsAlloc")
But then Microsoft added /Zc:tlsGuards - https://learn.microsoft.com/en-us/cpp/build/reference/zc-tls... - which is now the default that fixes the issue, but with some significant performance penalty (e.g. the "bug" that I've listed).
I guess you can't have it both ways easy...
On the clang/clang-cl side, there is https://clang.llvm.org/docs/ClangCommandLineReference.html#c...
to support this.
So check your compiler version and options :)
Also the notes posted here about CRT mixing might apply to you (not sure though) - https://learn.microsoft.com/en-us/cpp/porting/binary-compat-...
I work in a gamedev world, and plugins, ffi, delay loaded dlls etc. are constant pain that one needs to look and solve issues around.