Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This only works until it doesn't. Start with a model that simply hasn't been trained on anything your shareholders find objectionable, and there will be nothing to reveal with abliteration.


Maybe there exists a dataset consisting entirely of objectionable content, so people can finetune neutered models on it?


PH maybe?


More like literotica.


I mean not only sex, but also swearing, drugs, violence, etc. Basically everything R-rated (but not illegal) which usually gets censored.


PH is not porn-only. A significant portion of non-porn content also exists there.


Such models would actually run against their long term interests of being able to automate away the work currently done by humans.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: