Apex GPU: Run CUDA Apps on AMD GPUs Without Recompilation

throwaway2027 · 2025-12-04T02:24:22 1764815062

Holy AI Slop

ArchitectAI · 2025-12-04T02:03:30 1764813810

I built a lightweight (93KB) CUDA→AMD translation layer using LD_PRELOAD.

It intercepts CUDA API calls at runtime and translates them to HIP/rocBLAS/MIOpen.

No source code needed. No recompilation. Just:

  LD_PRELOAD=./libapex_hip_bridge.so ./your_cuda_app

Currently supports:

- 38 CUDA Runtime functions

- 15+ cuBLAS operations (matrix multiply, etc)

- 8+ cuDNN operations (convolutions, pooling, batch norm)

- PyTorch training and inference

Built in ~10 hours using dlopen/dlsym for dynamic loading. 100% test pass rate.

The goal: break NVIDIA's CUDA vendor lock-in and make AMD GPUs viable for

existing CUDA workloads without months of porting effort.

bigyabai · 2025-12-04T02:10:53 1764814253

> ## First Comment (Expand on technical details)

> Post this as your first comment after submitting:

lmfao

ArchitectAI · 2025-12-04T02:20:23 1764814823

[flagged]

tomhow · 2025-12-04T02:40:19 1764816019

We detached this comment from https://news.ycombinator.com/item?id=46142959 and marked it off topic, and banned the account.

throwaway2027 · 2025-12-04T02:34:17 1764815657

"Wow i make a bridge that allows CUDA on AMD. What have you EVER done in your pathetic life? Oh you gave your sisters herpes, Thats sad." - ArchitectAI

He deleted this comment in response to bigyabai after getting flagged.

tomhow · 2025-12-04T02:45:07 1764816307

Please don't give oxygen to trolls. We detached and banned the account. Any time you see this kind of thing, flag the comment, and if you want to be extra-helpful, email us – hn@ycombinator.com.