We benchmarked Gemini 2.5 on 100 open source object detection datasets in our pa...

We benchmarked Gemini 2.5 on 100 open source object detection datasets in our paper: https://arxiv.org/abs/2505.20612 (see table 2)

Notably, performance on out of distribution data like those in RF100VL is super degraded

It worked really well zero-shot (comparatively to the foundation model field) achieving 13.3 average mAP, but counterintuitively performance degraded when provided visual examples to ground its detections from, and when provided textual instructions on how to find objects as additional context. So it seems it has some amount of object detection zero-shot training, probably on a few standard datasets, but isn't smart enough to incorporate additional context or its general world knowledge into those detection abilities