> Can someone smarter than me explain what this is about?
I think you can find the answer under point 3:
> In this work, our primary goal is to show that pretrained text-to-image diffusion models can be repurposed as object trackers without task-specific finetuning.
Meaning that you can track Objects in Videos without using specialised ML Models for Video Object Tracking.
All of these emergent properties of image and video models leads me to believe that evolution of animal intelligence around motility and visually understanding the physical environment might be "easy" relative to other "hard steps".
The more complex that an eye gets, the more the brain evolves not just the physics and chemistry of optics, but also rich feature sets about predator/prey labels, tracking, movement, self-localization, distance, etc.
These might not be separate things. These things might just come "for free".
So the brain does not necessarily receive 'raw' images to process to begin with, there is already a lot of high level data extracted at that point such as optical flow to detect moving objects.
And the occipital is developed around extraordinary levels of image separation, broken down into tiny areas of the input, scattered and woven for details of motion, gradient, contrast, etc.
If you train a system to memorize A-B pairs and then you normally use it to find B when given A, then it's not surprising that finding A when given B also works, because you trained it in an almost symmetrical fashion on A-B pairs, which are, obviously, also B-A pairs.
I got the same idea from the title, but as it seems, it's just a compilation of different breaches.
The Line "scraped out of thousands of Telegram channels" means "someone downloaded databreach files shared on Telegram", not "scraped Public channels for E-Mail Adresses" as i first interpreted it.
I noticed that it tries to forward the User to external Sources more (Answering Query, and then "For further Info, just ask an Expert"), or tries to get the User to do the Work (Here is a nice Overview of the Program, now you do the rest of the coding).
If i don't want the RLHF to get in my Way, i switch over to the API (sadly not the 4.0 one).
I also noticed a decline in following Instructions, i have a Primer i am preseeding my Chats with.
The Primer ends with "Do you understand? [Y|N]", and ChatGPT 3.5 usually answered with a Summary, ChatGPT 4.0 in the beginning just wrote "Y".
Now it behaves like 3.5, answering with a Summary instead of a "Y". Adjusted the Prompt to -> "Confirm instructions with a short and precise "Ok"." which seems to work.
It's still running, with currently 393 Users and a total of 250000 Clicks. The organic User average seems to be around 50, so i think someone is trying to stress test the System.
I think you can find the answer under point 3:
> In this work, our primary goal is to show that pretrained text-to-image diffusion models can be repurposed as object trackers without task-specific finetuning.
Meaning that you can track Objects in Videos without using specialised ML Models for Video Object Tracking.