Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There aren’t any YOLO models for captioning and the other models aren’t robust enough to make for good embedding models.


You can get labels out of the classifier and bounding box models.

They are super fast.

Its just an alternative i'm mentioning. I would assume a person knowing a little bit of that domain.

Otherwise the first option would be CLIP i assume. llm-vl is just super slow and compute intensive.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: