Almost feel like this was intentionally framed this way to build more engagement (via comments where it's posted). It's pretty well known dalle and stable diffusion are bad at text and precise vector-style graphics. Do this on a professional art piece and let's see how much $10 gets you.
I think it's just that it's such a strange comparison to make, like making an article entitled, "Who's better at doing donuts in the parking lot: helicopters or planes?"
The image generation models weren't trained on chart images, everyone already knows they're gonna be bad at that. Fiverr artists will obviously be better, though even then, who the hell is paying people on fiverr to draw generic charts?
If you wanted to compare them, it would make more sense to compare them based on how they're actually used (especially in the case of the AI models): to make art.
Though if your title was more specific, ala "DALL-E 2 vs $10 Fiverr Commissions: Who's Better at Charts?" you'd probably get somewhat fewer complaints. Having the title be generic implies that you're gonna be looking at common/primary use cases.
It's still stupid. This is like asking DALL-E to generate an image that solves a math equation step by step. Of course this is easier for a human to do.
Try getting a landscape in the style of vincent van gogh for 10$ on fiver though. AI will give you that in seconds easily, and that's what's amazing about it.
I was in a meeting on Cognitive AI at the Royal Society in London last week where a gentleman from Stanford presented work where GPT-3 was prompted to solve math equations step-by-step and did well (better than I would have expected). Point being, if GPT-3 can do it, DALL-E should also be able do it, and testing whether that is the case is not stupid, but interesting.
The big question with systems like those image generation models is to what extent their generation can be controlled, and how much sense it makes. This is exactly the kind of testing that has to be done to answer such questions. Just flooding social media with cherry-picked successes doesn't help answer any questions at all. Because cherry-picking never does.
To be honest, I don't get the defensiveness of the comments in this thread. Half the comments are trying to call foul by invoking some rule they made up on the spot, according to which "that's not how you should use it". The other half pretend they knew all along what the result would be, and yet they're still upset that someone went and tried it, and posted about it. That kind of reaction is not coming from a place of inquisitiveness, or curiosity, that is for sure. It's just some kind of sclerotic reaction to novelty, people throwing their toys because someone went and did something they hadn't thought about.
> Try getting a landscape in the style of vincent van gogh for 10$ on fiver though.
In another comment posted in this thread I tried to get Stable Diffusion to give me a graph with three lines in the style of van Gogh and other famous artists. I'd be very curious to see what that would look like and I can't imagine it easily. I'm left wondering, because Stable Diffusion can't do it. Maybe I should ask someone on fiverr.
What you said was that they weren't trained on chart images, not that they weren't the focus:
> The image generation models weren't trained on chart images, everyone already knows they're gonna be bad at that.
I have no idea how you could even know what was, or wasn't in those models training sets. Yet you posted with conviction as if you were sure you knew. What's the point of that?
Edit - Also, what do you mean "it obviously wasn't the focus"? The focus of what? The focus of training, or the focus of presenting the results on social media?
This is absurdly silly. These data sets contain millions of images at a bare minimum from web crawls, often billions, so of course there will be a non-zero number of charts in them. If you want to be pedantic about it be my guest I guess.
You could probably find a few driver's ed teachers who taught their students to do doughnuts too, but saying "driver's ed teachers don't teach their students to do doughnuts" would nonetheless be largely accurate.
Silly yourself. If there were simply a "non-zero" number of charts in them, the model wouldn't have, you know, modelled them. That the model can reproduce graphs is clear evidence that it saw enough graphs to reproduce them.
And don't call me silly just because you used imprecise language to try to make a vague point with great conviction as if you absolutely knew what you're talking about, when you absolutely didn't. Show some respect to the intellect of your interlocutor, will you?
And, seriously, you haven't answered my question: the focus of what? What do you mean by "it obviously wasn't the focus"?
I think you were emboldened by the downvoting of my comment and assumed you don't need to make sense, but I think the downvoters were downvoting something else than what you refuse to answer.
My man just look at the title, I clicked wondering if dall-e made better anime characters than $10 fiverr artists. But all I got was plots, who in their right mind asks for plots on fiverr.
Am I being engaged right now, was your comment also to generate engagement.hm.
I disagree and I downvoted you because I think you're being condescending and uncharitable.
I don't think OP chose graphs because they're "obviously" going to make AI look bad; I think he chose it because it's an incredibly simple image - extremely so. If the AI can't do this, how can you trust it to generate something complex? If it literally can't yet draw basic lines as described, how can it illustrate a story or any form of media where specifics matter?
And I don't think his post title implies that he was going to use some complex art prompt, either. Not in any way.
I agree that it's complex relative to what AI can currently handle (clearly) but I don't agree that it's complex in general. For a human, it's a simple description. You or I could draw it freehand correctly given 5 minutes, with no training or preparation.
I don't see how this isn't the task these AIs are supposed to solve. They are meant to take a text description and output a corresponding visual result. This just demonstrates the narrow limits on the complexity of the input they can take.
If you're saying they're not designed to deal with inputs more complex than one sentence, then sure, I guess I agree. But this post goes to show that if you require specificity in your desired visual output, then you need more than one sentence's worth of complexity, and therefore the current generation of AIs are not yet broadly usable.
It's about illustrating the current limitations. This post is not implying that the technology is a failure or that it isn't enormous progress.
> For a human, it's a simple description. You or I could draw it freehand correctly given 5 minutes, with no training or preparation.
In the blog post, the humans drew it incorrectly as well (although they got closer). If it was as simple as you say it is, i would not expect the humans to err as well.
> If you're saying they're not designed to deal with inputs more complex than one sentence, then sure, I guess I agree.
Indeed. I would further say its not designed for someone to use it as text directed paintbrush. This is not surprising since human graphic artists dont work that way either, or at least get very pissed off when they are micro managed in that fashion.
That said i think its also fairly obvious that these systems are also not replacements for graphic artists in general. The human element is important for a lot of reasons; graphic artists dont just "draw pictures". I dont think people seriously familiar with these systems have ever seriously suggested it was a full replacement for graphic artists, although in fairness random internet commentators certainly have been having a moral panic over it.
Not to mention its entirely possible that an AI more designed for this task would do better.
> But this post goes to show that if you require specificity in your desired visual output, then you need more than one sentence's worth of complexity, and therefore the current generation of AIs are not yet broadly usable.
I don't really agree that this post showed that, but i would agree that these AIs are not the best tools if you have very specific objective requirements.
AIs are tools not magic, there are things they are good at, but they aren't good at all the things and still require to be used with thought.
> It's about illustrating the current limitations. This post is not implying that the technology is a failure or that it isn't enormous progress.
I think the objection is that this article doesn't really demonstrate a meaningful limitation that wasn't obvious. It feels like a strawman. If dall-e or stable diffusion actually succeded at the task, i would be very impressed and consider it much more impressive than most of the pretty pictures everyone shows off.
If he didn't deliberately to make DALL-E look bad than he did it out of ignorance of what DALL-E's strengths and weaknesses are. Your evaluation of what is "simple" and what is "more complex" aren't in line with what DALL-E is capable of.
DALL-E isn't good with symbols like letters and numbers. It can't do even very much logical / mathematical reasoning. So a graph is one of the worst choices.
What it can do is make aesthetically pleasing images that match basic descriptions. So there are more "complex" images that DALL-E can produce than basic graphs.
> Okay, well maybe it was impossible for anyone to deduce what I was trying to convey.
Don't know if this is evidence of "framing for more engagement," but this line irks me. The latent diffusion models are pretty powerful, but I don't think there's anyone claiming that today's diffusion models are able to interpret complicated queries better than humans. The interesting part of diffusion models is that they can produce good results at all, not that they are better than humans. We're not in AGI territory. Even text models are still limited in many ways, and latent diffusion is highly reliant on the text model to produce good results. Even simpler queries can run into quite a lot of problems, that's exactly why a lot of people have been trying to figure out the best prompts to improve results.
You could have asked those tools to create images like the ones found in AI catalogs like https://lexica.art/ and https://www.krea.ai/ and then compared with what you can get for $10. This would be a comparison more favorable to AI