Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We need living datasets for computer vision.

There is no current mechanism for updating something like ImageNet. There is no place you go to point out problems or contribute changes to keep it in line.

We really need to make datasets social. This has a whole host of challenges like copyright, ownership, versioning, and even hosting costs. Even with all that it is a tractable problem though.



> the data drifts

It's in fact the data itself is the closest to reality, not the ML model...

It's a conundrum in the flow of reality for me:

Reality -> data that reflects reality (hopefully as best it can) -> train model based on data (typically takes a non-trivial time scale!) -> make decision / do stuff based on algo with newest data

If you ask me, even with all known technology we have, seems like it's impossible to simultaneously aquire realtime data and train a model to operate and or make decisions on exactly that data. It's a catch 22, there will always be a lag

Even as humans with super big brains we can't hope to do this outside of extremely simple tasks like "throw and catch the ball"


It's not exactly a hard problem to continuously train a model - although it may be costly. You can even train the model based on every interaction it has; but this quickly leads to degradation because users provide it with data that is of low quality, for example when they intentionally try to make chatbots says racist things, etc.


HuggingFace is almost there. Uploading a re-trained or fine-tuned model is trivial. Same with datasets.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: