Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Midjourney is the most popular discord channel by far with 19.5M+ members, $200M in revenue in 2023 with 0 external investments and only 40 employees.

The problem has nothing to do with commercializing image gen AI and all to do with Emad/Stability having seemingly 0 sensible business plans.

Seriously this seemed to be the plan:

Step 1: Release SD for free

Step 2: ???

Step 3: Profit

The vast majority of users couldn't be bothered to take the steps necessary to get it running locally so I don't even think the open sourcing philosophy would have been a serious hurdle to wider commercial adoption.

In my opinion, a paid, easy to use, robust UI around Stability's models should have been the number one priority and they waited far too long to even begin.

There's been a lot of amazing augmentations to the stable diffusion models (ControlNet, Dreambooth etc) that have propped up, lots of free research and implementations because the research community has latched onto the stability models and I feel they failed to capitalize on any of it.



Leonardo.ai have basically done exactly this and seem to be doing OK.

It’s a shame because they’re literally just using stable diffusion for all their tech but built a nicer front end and incorporated control net. No-where else has done this.

Controlnet / instantID etc are the really killer things about SD and make it way more powerful than Midjourney, but they aren’t even available via the stability API. They just don’t seem to care.


InstantID uses a non-commercial licensed model (from insightface) as part of its pipeline so I think that makes it a no-go for being part of Stability's commercial service.


Yes, and I'm grateful to them for sticking to that plan. :-)

But for the individuals involved, it might also be

Step 2: Leverage fame in AI space for massive VC injection on favorable terms.


Yes, and MJ has no public API either. Same for Ideogram, I imagine they have at least 10m in the bank, and aren't even bothering making an API despite being SoTA for lots of areas.


> Seriously this seemed to be the plan:

> Step 1: Release SD for free

> Step 2: ???

> Step 3: Profit

That’s not true. He was pretty open about the business plan. The plan was to have open foundational models and provide services to governments and corporations that wanted custom models trained on private data, tailored to their specific jurisdictions and problem domains.


Was there any traction on this? I cannot imagine government services being early customers. What models would the want?Military -- maybe, for simulation or training, but that requires focus, dedicated effort and a lot of time. My 2c.


I've heard this pitch from a few AI labs. I suspect that they will fail, customers just want a model that works in the shortest amount of time and effort. The vast majority of companies do not have useful fine tuning data or skills. Consultancy businesses are low margin and hard to scale.


Heres a Stable Diffusion buisness idea: sign up all the celebrities and artists who are cool with AI, and provide end users / fans with an AI image generation interface, trained on their exclusive likenesses / artwork (loras).

You know, the old tried and true licensed merchandise model. Everybody gets paid.


> sign up all the celebrities and artists who are cool with AI

"Cool with AI" and "sell my likeness so nobody ever needs to hire me again" are too close for comfort on this one.


And also, here's a way to just make so much pornography of me.

I'm betting the list of folks who would sign the AI license are pretty small, and mostly irrelevant.


I think the following isn't said often enough: there must be a reason why there are extremely few celebrities and artists who are cool with AI, and it cannot be something abstract and bureaucratic as copyright concerns although those are problematic.

It's just not there yet. GenAI outputs aren't something audiences wants to hang on a wall. It's something that evoke sense of distress. Otherwise everyone's tracing them at least.


Most people mix up all the different kinds of intellectual property basically all the time[0], so while people say it's about copyright, I (currently) think it's more likely to be a mixture of "moral rights" (the right to be named as the creator of a work) and trademarks (registered or otherwise), and in the case of celebrities, "personality rights": https://en.wikipedia.org/wiki/Personality_rights

> It's just not there yet. GenAI outputs aren't something audiences wants to hang on a wall.

People have a wide range of standards. Last summer I attended the We Are Developers event in Berlin, and there were huge posters that I could easily tell were from AI due to the eyes not matching; more recently, I've used (a better version) to convert a photo of a friend's dog into a renaissance oil painting, and it was beyond my skill to find the flaws with it… yet my friend noticed instantly.

Also, even with "real art", Der Kuss (by Klimt) is widely regarded as being good art, beautiful, romantic, etc. — yet to me, the man looks like he has a broken neck, while the woman looks like she's been decapitated at the shoulder then had her head rotated 90° and reattached via her ear.

[0] This is also why people look at a Google street view image with a ©2017 Google[1] tiled over on a blue sky and say "LOL, Google's trying to own the sky", or why people even on this very forum ask how some new company can trademark a descriptive term like "GPT"[2], seemingly surprised by this being possible even though there's already a very convenient example of e.g. Hasbro already having "Transformers".

[1] https://www.google.com/maps/@33.7319434,10.8655264,3a,77.2y,...

[2] https://news.ycombinator.com/item?id=35692476


> Der Kuss (by Klimt) is widely regarded as being good art,

The point is, generative AI images are not widely regarded as good art. They're often seen as passable for some filler use cases and hard to tell apart from human generations, but not "good".

It's not not-there-yet because AI sometimes generates sixth fingers, it's something another level from Gustav Klimt, Damien Hirst, Kusama Yayoi, or the likes[0]. It could be that genAI is leaving something that human artist would filter out, or because images are too disorganized that they appear to us to be encoding malice or other negative emotions, or maybe I'm just wrong and it's all about anatomy.

But whatever the reason is, IMO, it's way too rarely considered good, gaining too few supportive celebrities and artists and audiences, to work.

0: I admit I'm not well versed with contemporary art, or art in general for that matter


> The point is, generative AI images are not widely regarded as good art. They're often seen as passable for some filler use cases and hard to tell apart from human generations, but not "good".

> It's not not-there-yet because AI sometimes generates sixth fingers, it's something another level from Gustav Klimt

My point is: yes AI is different — it's better. (Or, less provocatively: better by my specific standards).

Always? No. But I chose Der Kuss specifically because of the high regard in which it is held, and yet to my eye it messes with anatomy as badly as if he had put 6 fingers on one of the hands (indeed, my first impression when I look closely at the hand of the man behind the head of the woman, is that the fingers art too long and thumb looks like a finger).


> and yet to my eye it messes with anatomy

wait what? Isn't that missing the point of expressionism? Klimt's Judith I is basically a photo, surely he can draw sh*t if he wanted to?

But myriad predecessors such as Vermeer, Rembrandt, Van Gogh, da Vinci, et al., have done enough in realism, and also photography was becoming more viable and more prevalent, that artists basically started diversifying? Isn't that what lead to various forms of early 20th century arts like surrealism(super-real -ism), cubism, etc?

I don't mean offense but that's just, surely that level of understanding can't be basis of policy decisions when it comes to moral rights and licensing discussions and "artists should just use AI" and such???


I think you're conflating "good" in the sense of "competent" with "good" in the sense of "ethical" or "legal".

I am asserting here that the AI is (at its best) more competent, not any of the other things.

I suspect that the law will follow the economics, just as it often has done for everything else before — you're communicating with me via a device named after the job that the device made redundant ("computer").

But I said "often" not "always", because the business leaders ignoring the workers they were displacing 200 years ago led to riots, and eventually to the Communist Manifesto. I wouldn't discount this repeating.

--

I've just looked up "Judith I" (I recognise the art, just not the name), and I don't even understand why you're holding this up as an example of "basically a photo".

As for the other artists demonstrating realism: photography made realism redundant despite being initially dismissed as "not real art". Artists were forced to diversify, because a small box of chemistry was allowing unskilled people do their old job faster, cheaper, and better. Photography only became an art in its own right when people found ways to make it hard, for example by travelling the world and using it to document their travels, or with increasingly complex motion pictures.

I suspect that art fulfils the same role in humans as tails fulfil in peacocks: an expensive signal to demonstrate power, such that the difficulty is the entire point and anything which makes it easy is seen as worse than not even trying. This is also why forgeries are a big deal, instead of being "that's a nice picture", and why an original painting can retain a high price despite (or perhaps because of) a large number of extremely cheap prints being plastered onto everything from dorm rooms to chocolate wrappers.


Why would those celebs pay Stability any significant money for this, given they can get it for a one off payment of at most a few hundred dollars salary/opportunity cost by paying an intern to gather the images and feed it into the existing free tools for training a LoRA?


I think in this case the celebs are getting paid for using their likeness.


That sounds like the "lose money on every sale" philosophy of the first dot-com bubble, only without even the "but make it up in volume" second half.


You can already do that with reference images and even for inpainting. No training required. Also no need to pay actors outrageous sums to use their likeness in perpetuity as long as you do business. The licensing still tricky anyways, because even if the face is approved and certified, the entire body and surroundings would also have to be. Otherwise you basically re-invented the celebrity deepfake porn movement. I don't see any A-lister signing up for that.


What's insane to me is the fact that the best interfaces to utilize any of these models, from open source LLMs to open source diffusion models, are still random gradio webUIs made by the 4chan/discord anime profile picture crowd.

Automatic1111, ComfyUI, Oobabooga. There's more value within these 3 projects than within at least 1 billion dollars worth of money thrown around on yet another podunk VC backed firm with no product.

It appears that no one is even trying to seriously compete with them on the two primary things that they excel at - 1. Developer/prosumer focus and 2. extension ecosystem.

Also, if you're a VC/Angel reading my comments about this, I would very much love to talk to you.


> $200M in revenue in 2023 with 0 external investments and only 40 employees.

The dream


Only if your costs are a lot lower than $200M, which given the price of GPU compute right now, is not guaranteed.


For a founder maybe , definitely not for employees .

AI startups need not an insignificant amount of startup capital , you cannot just spend weekends to build like you would a saas app . Model training is expensive so only wealthy individuals can even consider this route

Companies like that have no oversight or control mechanisms when management inevitably goes down crazy paths, also without external valuations option vesting structures are hard to ascertain value.


Sometimes you need to say fuck the money, I’ve already got enough, and I just want to do what I enjoy. It may not be an ideal model for HN but damn not everything in life is about grinding, P/E ratios, and vesting schedules


Yeah, that's easier to say when you have enough. A lot of employees might not be in that privilege position. The reality for some of the folks might be addressing education loans, families to take care of, tuition for kids, medical bills, etc.


Also oversize houses, expensive cars, etc...


I’m grateful to be debt free


As a counter-point, with no VCs there's more equity left for employees.


If valued $2bn then even a 0.1% is $2m. Not bad.


Early employees may have gotten 2-3% and are completely undiluted.


Yes indeed! Probably furiously vesting now!


As a counter-counter-point that gets rarely discussed on HN, VCs aren't taking as much of the pie as people think. In a 2-founder, 4-engineer company, it wouldn't be unusual to have equity be roughly:

20% investors 70% founders 2-3% employees (1% emp1, 1% emp2, 0.5% emp3, 0.25% emp4) 7% for future employees before next funding round


This is not a fair comparison because you are not taking into account liquidation preferences. Those investors don't have the same class of equity as everyone else. That doesn't matter in the case of lights out success but it matters a great deal in many other scenarios.


Sure. My point was that most employees think that VCs take 80+%, and especially the first few employees usually have no idea just how little equity they have compared to the founders.


You can't run a sustainable business if you take VC money. VCs need an exit.


There's money to be made for sure, and Stability's sloppy execution and strategy definitely didn't help them. But I think there are also industry-wide factors at play that make AI companies quite brittle for now.


Are we sure that Midjourney is still on that trajectory?

I was a heavy user since the beginning but my usage has dropped to almost 0


I think you're a sample size of one


I mean of course I am a sample size of one when I am speaking about my own experience?

That's why I asked that question to see if others notice something similar or if that's just me


>The vast majority of users couldn't be bothered to take the steps necessary to get it running locally so I don't even think the open sourcing philosophy would have been a serious hurdle to wider commercial adoption.

The more I think about the AI space the more I realize that open sourcing large models is pointless now.

Until you can reasonably buy a rig to run the model there is simply no point in doing this. It's no like you will be edified by setting the weights either.

I think an ethical business model for these business is to release whatever model can fit into a $10,000 machine and keeping the rest closed source until above machine is able to run them.


The released image generation models run on consumer GPUs. Even the big LLMs will run on a $3500 Mac with reasonable performance, and the CPU of a dirt cheap machine if you don't care about it being slow, which is sometimes important and sometimes isn't.

Also, things like this are in the works:

https://news.ycombinator.com/item?id=39794864

Which will put the system RAM of the new 24-channel PC servers in range of the Nvidia H100 on memory bandwidth, while using commodity DDR5.


The `big' AI models are trillion parameter models.

The medium sized models like GPT3 and Grok are 185b and 314b respectively.

There is no way for _anyone_ to run these on a sub $50k machine in 2024, and even if you can the token generation speed on CPU is under 0.1 tokens per second.


It's just semantic gymnastics. I'm sure most people will consider LLaMa 70B a big model. Of course if you define big = trillion then sure big = trillion[1].

[1]: https://en.wikipedia.org/wiki/No_true_Scotsman


Yes, you are engaging in the no true scotsman fallacy, please stop.


You can get registered DDR4 for ~$1/GB. A trillion parameter model in FP16 would need ~2TB. Servers that support that much are actually cheap (~$200), the main cost would be the ~$2000 in memory itself. That is going to be dog slow but you can certainly do it if you want to and it doesn't cost $50,000.


Even looking on Amazon, DDR4 seems still a decent bit above $2/GB:

2 x 32GB: $142

2 x 64GB: $318

8GB: $16

2 x 16GB: $64

2TB of 128GB DDR4 ECC: $9,600 (https://www.amazon.com/NEMIX-RAM-Registered-Compatible-Mothe...)

> Servers that support that much are actually cheap (~$200)

What does this mean? What motherboards support 2TB of RAM at $200? Most of them are pushing $1,000. With no CPU.

It may not hit $50K, but it's definitely not going to be $2K.


Here's a server that supports 3TB of memory for $130, you get 3TB by filling all 24 memory slots with 128GB LRDIMMs, 2TB with 16:

https://www.ebay.com/itm/176298520843

Here are 128GB LRDIMMs for $98:

https://www.ebay.com/itm/196305803969

For 2TB and the server you're at $1698. You can get a drive bracket for a few bucks and a 2TB SSD for $100 and have almost $200 left over to put faster CPUs in it if you want to.

That's stinking Optane, would work if you're desperate. Normal 128GB LRDIMMs cost more than other DDR4 DIMMs. You can, however, get DDR4 RDIMMs for ~$1/GB:

https://www.ebay.com/itm/186345903230

With 32GB RDIMMs that machine would max out at 768GB, which could still run a 1T model at q4 or grok at FP16. And then it would cost less than $1000.

Or find a quad-socket system with 48 memory slots and then use 64GB LRDIMMs ($1.12/GB):

https://www.ebay.com/itm/176299295509

The quad socket systems aren't $200, but you can find them for $550 or so:

https://www.newegg.com/hp-proliant-rack-mount/p/2NS-0006-3E5...

Maybe less if you shop around (they're not as common).


How slow? Depending on the task I fear it could be too slow to be useful.

I believe there is some research on how to distribute large models across multiple GPUs, which could make the cost less lumpy.


You can get a decent approximation for LLM performance in tokens/second by dividing the model size in GB by the system's memory bandwidth. That's assuming it's well-optimized and memory rather than compute bound, but those are often both true or pretty close.

And "depending on the task" is the point. There are systems that would be uselessly slow for real-time interaction but if your concern is to have it process confidential data you don't want to upload to a third party you can just let it run and come back whenever it finishes. And releasing the model allows people to do the latter even if machines necessary to do the former are still prohibitively expensive.

Also, hardware gets cheaper over time and it's useful to have the model out there so it's well-optimized and stable by the time fast hardware becomes affordable instead of waiting for the hardware and only then getting to work on the code.


Why would increasing memory bandwidth reduce performance? You said "You can get a decent approximation for LLM performance in tokens/second by dividing the model size in GB by the system's memory bandwidth"


Yeah the sentence is backwards, you divide the system's memory bandwidth by the size of the model.


Mixtral 8x7b is better than both of those and runs on a top spec M3 Max wonderfully.


Indeed! Also, Mixtral 8x7b runs just as well on older M1 Max and M2 Max Macs, since LLM inference is memory bandwidth bound and memory bandwidth hasn't significantly changed between M1 and M3.


It didn't change at all, rather was reduced in certain configurations.


I will make a 2 trillion parameter model just so your comment becomes outdated and wrong.


I approve this comment.


ChatGPT is 20B according to Microsoft researchers, also the fact that big AI models are trillion parameter models is mostly speculation, about GPT-4 it was spread by geohot.


To be precise, ChatGPT 3.5 turbo being 20B is officially a mistake from a Microsoft Researcher, quoting a wrong source published before the release of chatgpt3.5 turbo. Up to you to believe it or not. But I wouldn’t claim it’s a 20B according to Microsoft Researchers.

The withdrawn paper: https://arxiv.org/abs/2310.17680

The wrong source: https://www.forbes.com/sites/forbestechcouncil/2023/02/17/is...

The discussion: https://www.reddit.com/r/LocalLLaMA/comments/17jrj82/new_mic...


It's interesting how the paper was completely retracted instead of just being corrected.


Yep. It feels like a 20B parameter model.


GPT-3 was 175B, so it'd be a bit odd if GPT-4 wasn't at least 5x larger (1T), especially since it's apparently a mixture of experts.


I think it became apparent when mixtral came out. I've noticed too during training that my model overwrites useful information so it makes sense for these types of models to have emerged.


Disagree. A few weeks ago, I followed a step-by-step tutorial to diwnlad ollama, which in turn can download various models. On my not-soecisl laptop with a so-so graphics card, Mixtral runs just fine.

As models advance, they will become - not just larger - but also more efficient. Hardware advances. Large models will run just fine on affordable hardware in just a few years.


I’ve come to the opposite conclusion personally - AI model inference requires burst compute, which particularly suits cloud deployment (for these sort of applications).

And while AIs may become more compute-efficient in some respects, the tasks we ask AIs to do will grow larger and more complex.

Sure you might get a good image locally but what about when the market moves to video? Sure chat GPT might give good responses locally, but how long will it take when you want it to refactor an entire codebase?

Not saying that local compute won’t have its use-cases though… and this is just a prediction that may turn out to be spectacularly wrong!


Ok but yesterday I was on a plane coding and I wouldn’t have minded having GPT4 as it is today available to me.


Thanks to Starlink, planes should have good internet soon.


Huge models are the best type to open source.

You get all the benefits of academics and open source folks pushing your model forward, and a vastly improved hiring pool.

But it doesn't stop you launching a commercial offering, because 99.99% of the world's population doesn't have 48GB+ of VRAM.


First paragraph: These are wild stats. Thank you to share. How did they fund themselves, if no external funding?


I wonder if MidJourney is still ripping. I’m actually curious if it’s superior to ChatGPT’s Dall-E images… I switched and cancelled my subscription when ChatGPT added images, but I think I was mostly focused on convenience.


If you have a particular style in mind then results may vary but aesthetically Midjourney is generally still the best, however Dalle-3 has every other model beat in terms of prompt adherence.


Image quality, stylistic variety, and resolution are much better than ChatGPT. Prompt following is a little better with ChatGPT, but MJ v6 has narrowed the gap.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: