You are missing key points here. "reproduce" means produce the same. Not just tr...

Sayrus · on Sept 27, 2024

Reproducible builds are not a requirement for open source software, why is it one for open source models?

wrs · on Sept 27, 2024

I would say that functionally reproducible builds are sort of inherent in the concept of “source”. When builds are “not reproducible” that typically just means they’re not bit-for-bit identical, not that they don’t produce the same output for a given input.

prophesi · on Sept 28, 2024

Once neural networks enter the scene, I don't think giving the same output for a given input is possible in the field currently. I believe this is as open as language models can be, and what people mean when they say it's a "fully open source" model.

Zamiel_Snawley · on Sept 28, 2024

If they provide the training code, and data set, how is that not enough to reproduce functionally equivalent weights? I don’t have any experience in the AI field, what else would they need to provide/define?

As others have mentioned, reproducible builds can be quite difficult to achieve even with regular software.

Compiler versions, build system versions, system library versions, time stamps, file paths, and more often contribute to getting non-identical yet functionally equivalent binaries, but the software is still open source.

worewood · on Sept 27, 2024

How often do people expect to compile open-source code and get _exactly_ the same binary as the distributed one? I've seen this kind of restriction only on decompilation projects e.g. the SM64 decompilation -- where they deliberately compare the hashes of original vs. compiled binaries, as a way to verify the decompilation is correct.

It's an unreasonable request with ordinary code, even more with ML where very few ones have access to the necessary hardware, and where in practice, it is not deterministic.

bubaumba · on Sept 29, 2024

> How often do people expect to compile open-source code and get _exactly_ the same binary

_Always_, with the right options. And that's the key point. If distributed code is different it means it may be infected or altered in other way. In other words it cannot be trusted.

The same with models, if they are not reproducible or verifiable they cannot be trusted. Trust is the main feature of open source. Calling black box with attached data 'open source', even 'the first' is a bit of a stretch. It's not reproducible and not verifiable. And it's definitely not the first model with open data.

To be correct you should add 'untrusted' if you want to call this thing 'open source'. Like with Meta's models who knows what it holds.

PS: finally I'm negative, fanboys don't like it ;-)

e12e · on Sept 27, 2024

I expect that if I compile your 3d renderer, and feed it the same scene file you did - I get the same image?

TylerE · on Sept 28, 2024

Why would you expect that? 3D renderers are not generally deterministic. Many will incorporate, for instance, noise algorithms. They will frequently not produce byte-identical renders on the same hardware using the same binary.

e12e · on Sept 29, 2024

Same recognizable image? Like if you look at the povray benchmark image on Linux and Windows?

For an llm that should (ideally) translate to similar answers?

bavell · on Sept 28, 2024

I think you are erroneously conflating open source with deterministic builds.

Yes, there is a random element when "producing the binary" but that doesn't mean it isn't open source.