I mean I'm sure cramming synthetic data and scaling models to enhance like, in-m...

I mean I'm sure cramming synthetic data and scaling models to enhance like, in-model arithmetic, memory, etc. makes "alignment" appear more complex / model behavior more non-newtonian so to speak, but it's going to boil down to censorship one way or another. Or an NSP approach where you enforce a policy over activations using another separate model, and so-on and so-on.

Is it likely that it's a bigger problem to try and apply qualitative policies to training data, activations, and outputs than the approach ML-guys think is primarily appropriate (ie., nn training) or is it a bigger problem to scale hardware and explore activation architectures that have more effective representation[0], and make a better model? If you go after the data but cascade a model in to rewrite history that's obviously going to be expensive, but easy. Going after outputs is cheap and easy but not terrifically effective... but do we leave the gears rusty? Probably we shouldn't.

It's obfuscation to assert that there's some greater policy that must be applied to models beyond the automatic modeling that happens, unless there's some specific outcome you intend to prevent, namely censorship at this point, maybe optimistically you can prevent it from lying? Such application of policies have primarily targeted solutions that reduce model efficacy and universality.

[0] https://news.ycombinator.com/item?id=35703367