Hacker Newsnew | past | comments | ask | show | jobs | submit | arpadav's commentslogin

this might be an extremely stupid question, but is this just a demo project of https://github.com/ffmpegwasm/ffmpeg.wasm? or is this bringing forth some other utility that im not seeing?

Location: SLC, Utah (UTC-6)

Remote: US Friendly

Willing to relocate: No (after 2 years, yes)

Languages: Rust, CUDA, C++, Python, MATLAB

Misc tech: Linux, JAX, PyTorch, TensorRT, ONNX, Git

Resume/CV: https://arpadvoros.com/cv.pdf

Github: https://github.com/arpadav

Linkedin: https://linkedin.com/in/arpadav

Website: https://arpadvoros.com/

Email: arpadav@gmail.com

Hey, I'm Arpad. I have a background in signal processing, deep learning (mainly computer vision), and embedded systems. I have 5+ years of research experience and end-to-end edge AI development. My strong suite comes in GPU + embedded development, optimization, and architecture design.

Best fit: fast-moving teams, end-to-end ownership, collaborative.


looks like a cool project, but id say keep working on it since there seems to be some confusion on why someone would want to use this: no benchmarks and overall pretty vibe-codey (which id personally be very hesitant to use in production)

another comment already mentioned comparison to vortex, which is the same compression ratio and same speeds as youre claiming - but your compression is half of parquet. and if speed is the main goal youre going for, python is an interesting choice. no hate, but def keep working on it, and would love to see more concrete benchmarks with various columnar store types


> “is a mess”

then cites two examples where you have to write a couple extra args..

better title: “QOL changes i wish UV had”


That phrase and "Who designed this command line interface" are probably written for attention and clicks. The feedback content is useful and I agree with most of it but using such phrases diminishes the value of that feedback and invites defensiveness. I find uv's command line interface cumbersome for me too but I understand why it was written this way.


is this even comparable? lol


the main 4 i see are:

1. use-after-free, drop semantics vs manual cudaFree

2. kernel args enforced using `cuda_launch!` whereas CPP void* args is just an array of pointers, validating count only

3. alias mutable writes. e.g. CPP can have more than one thread writing out[i] with same i and this will compile. but DisjointSlice<T> with ThreadIndex doesnt have any public constructor (see: https://github.com/NVlabs/cuda-oxide/blob/2a03dfd9d5f3ecba52...) and only using API of `index_1d` `index_2d` and `index_2d_runtime`

4. im pretty sure you can cuda memcpy a std::string and literally any other POD and "corrupt" its state making it unusable. here it ONLY accepts DisjointSlice<T>, scalars, and closures (https://nvlabs.github.io/cuda-oxide/gpu-programming/memory-a...)

but most of the nitty gritty is in these sections

* https://nvlabs.github.io/cuda-oxide/gpu-safety/the-safety-mo...

* https://nvlabs.github.io/cuda-oxide/gpu-programming/memory-a...

edit: that being said, not like this catch everything, just looks to give much more guardrails against UB with raw .cu files


This is amazing.. ive been working with custom CUDA kernels and https://crates.io/crates/cudarc for a long time, and this honestly looks like it could be a near drop-in replacement.

im especially curious how build times would compare? Most Rust CUDA crates obv rely on calling CMake or nvcc, which can make compilation painfully slow. coincidentally, just last week i was profiling build times and found that tools like sccache can dramatically reduce rebuild times by caching artifacts - but you still end up paying for expensive custom nvcc invocations (e.g. candle by hugging face calls custom nvcc command in their kernel compilation): https://arpadvoros.com/posts/2026/05/05/speeding-up-rust-whi...


Cudarc slaps!

> Most Rust CUDA crates obv rely on calling CMake or nvcc, which can make compilation painfully slow.

I anecdotally haven't hit this; see the `cuda_setup` crate I made to handle the build scripts; it is a simple `build.rs` which only recompiles if the file changes, and it's a tiny compile time (compared to the rust CPU-side code)


i'll have to check this out, thanks!


Do other people agree cuda-oxide looks like a near dorp in replacement for cudarc?

That would be amazing, but probably not imo complementarily so.

I am curious what distinguished cuda-oxide. Beyond it being totally under nv control.


perhaps not drop-in, but all my workflows with cudarc have always been "i make cuda kernel, i use cudarc for ffi to said kernels, i call via rust" - which for this case is pretty analogous

briefly looking at the repo, looks like the main workflow is using rustc-codegen-cuda to convert rust -> MIR -> pliron IR -> LLVM IR -> PTX, which is embedded in the host binary, where then cuda-core loads embedded PTX at runtime onto the GPU

but, if you arent directly making cuda kernels and just want cudarc for either calling existing kernels or other cuda driver api access then cudarc is lighter-weight option? or just use one of the sub-crates in this repo like cuda-core for those apis


Hi, author of cuda-oxide here. Yes, I think that’s basically the right framing: cudarc and cuda-oxide sit at different points in the stack.

cudarc is a host-side CUDA API for Rust: loading modules, managing contexts/streams/events/memory, launching kernels, and accessing CUDA libraries/driver APIs. If your workflow is “I already have CUDA C++/PTX/CUBIN kernels and want to call them from Rust”, cudarc is a very natural fit.

cuda-oxide is focused on the other side of the problem: writing the GPU kernel itself in Rust and compiling it through rustc/MIR into GPU code. The generated PTX is then embedded in the host binary and loaded at runtime by our host-side pieces.

We include cuda-core/cuda-host because we need an end-to-end path for “write Rust kernel, build it, launch it”, but that doesn’t mean the generated PTX is tied forever to our launcher. We’d like the PTX from cuda-oxide to be usable from other host-side CUDA APIs too, including cudarc, and we’re exploring ways to make that interop smoother.

So the short version is: cudarc is about driving CUDA from Rust; cuda-oxide is about generating CUDA device code from Rust. They’re complementary rather than replacements for each other.

We also have a short ecosystem note in the book that talks about cudarc: https://nvlabs.github.io/cuda-oxide/appendix/ecosystem.html#...


I am observing the same from the article... is it heavily inspired by Cudarc, i.e. is this intentional, or are we reading too much into this, given Cudarc is a light abstraction over the CUDA api?


daily driver has been zed ever since they introduced helix more. still super excited to see how far it can go. congrats to them


They added "gw" (amp jump) in 1.1.2 preview! It's just amazing. It was the last thing I needed before totally switching over lol


this looks awesome


Thanks! Please try it out. Stop by in the Discord or Github Issues if you have any questions!


i've had my shot at sycamore a number of times. IMO leptos (leptos.dev) has far more fine-grained capabilities, and dioxus (dioxuslabs.com) is overall more hand-holdy but also powerful. comes with tradeoff for speed. wasm still isnt there yet (yet..) but a lot more web frameworks (including smaller rust ones) can be tracked here: https://krausest.github.io/js-framework-benchmark/current.ht...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: