There aren’t any YOLO models for captioning and the other models aren’t robust e... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		throwaway314155 23 days ago \| parent \| context \| favorite \| on: Meta Segment Anything Model 3 There aren’t any YOLO models for captioning and the other models aren’t robust enough to make for good embedding models.

Glemkloksdjf 22 days ago [–]

You can get labels out of the classifier and bounding box models.

They are super fast.

Its just an alternative i'm mentioning. I would assume a person knowing a little bit of that domain.

Otherwise the first option would be CLIP i assume. llm-vl is just super slow and compute intensive.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact