G/A can't be used for large scale training, nobody is going to give their data t...

paulmd · on July 30, 2023

Nobody is going to give OpenAI or X or Meta access to their models but frankly Google/Amazon are at a scale where they’ve already bypassed this trust issue. People already give their code, their operations, etc to large cloud providers, it’s been that way for 10+ years now.

Your shit isn’t so good that google is going to peek under the covers and steal your shit, because that would actually implode their business when they got caught doing it. The net present value of all of google’s future decades of operation is a lot higher than your hot dog detector app, or even critical F500 business operations.

latchkey · on July 30, 2023

What you're saying is logical, but the perceptional reality is different. There are large scale AI customers out there who absolutely refuse to use the large public cloud providers for training on the grounds of protecting their data. They want 100% control over it and they want their own segregated data centers.

jjoonathan · on July 30, 2023

> Let's also not mix consumer needs with enterprise.

NVidia mixed them and now everything is written in CUDA. Lol.

latchkey · on July 30, 2023

And we now have hipcc to go back to AMD. Sweet!

jjoonathan · on July 30, 2023

Have fun with that. I burned my hand badly enough on OpenCL that I now know to wait for proof, not promises.

latchkey · on July 30, 2023

People are doing benchmarks on older rocm's and it is looking pretty good.

https://www.mosaicml.com/blog/amd-mi250

Waiting on the updates.

I'll add that I have learned over time to not discount motivation. If AMD it motivated, they can do it. This has been proven already with their dominance over the server cpu market.

pjmlp · on July 31, 2023

hipcc is a joke, it doesn't handle everything CUDA is capable of, specially not the polyglot capabilities.

ColonelPhantom · on Aug 1, 2023

What do you mean by polyglot? As in multiple hardware, or mixed-source? HIP is mostly API-compatible with CUDA, so you can just mix host code and device code with it.

That said, ROCm does indeed work with machine code instead of IR. You can compile fat binaries with more than one type of machine code and they'll work on any of the chips you compiled for, at the cost of the binaries becoming, well, basically obese if you want a decent range of hardware supported.

querez · on July 30, 2023

> G/A can't be used for large scale training, nobody is going to give their data to them. Major trust issues there.

And yet, that's what a lot of the big AI startups are doing. Granted, it's not what everyday business are doing (yet). But TPUs offer pretty impressive perf/cost ratio, so I'd be surprised if it's actually "nobody".

latchkey · on July 30, 2023

> that's what a lot of the big AI startups are doing

They don't have any other choice or they are just dumb...

https://www.popsci.com/technology/google-ai-lawsuit/

rrdharan · on July 30, 2023

The fact that this lawsuit exists doesn’t prove anything.

Real evidence that Google or Amazon actually introspected the contents of their cloud platform customer’s VMs, Databases, GPUs, disks, blob storage buckets etc. would be far more convincing, but such evidence doesn’t exist - because it doesn’t happen.

latchkey · on July 30, 2023

It is enough to scare people away and that is all that matters in the grand scheme of things. I know this for a fact.