Wouldn't it be better to generate multiple tracks that can be mixed / tweaked to...

anigbrowl · on June 21, 2024

Yes, same problem as with commercial AI music products not providing stems or MIDI, The engineers on these products are too full of themselves to actually ask anyone in the field what they want, so we just keep getting these stupid magic 8 ball efforts.

This one is particularly annoying as I worked for years as a sound engineer and have recorded or produced the soundtrack for 10 feature films and some large number of shorts. What's going to happen with this is directors or producers are gonna do this at home for every scene in a burst of over-enthusiasm, realize the totality is Not Great, and then demand someone like me fix it, but for 1/4 of what the job used to pay, arguing 'but most of the work is already done'. It's all so tiresome.

Jensson · on June 21, 2024

Same reason you don't see AI making images in layers etc, its just much easier to train an AI that generate everything in one layer. Training a model with the same level of quality output that generates multiple layers is much much harder, and of course companies and users prefers the higher quality over having layers, especially since the quality you get with a single layer is still barely passable.

j16sdiz · on June 21, 2024

The sample they used for training are mixed.

Unless they can have enough raw, unmixed sample, this depends on how well they "unmix" them.

anigbrowl · on June 21, 2024

Yes...that's the problem. A problem that could be easily avoided by asking existing professionals what matters and what tools they actually want.

jononor · on June 21, 2024

Most ML engineers know that many want more fine grained control. But the straight forward way to train such models is incredibly data demanding. The datasets used for whole image generation consist of several billion images. I do not think anyone has compiled any DAW project / stems projects that are anywhere close to this size. So that is a limiting factor right now. But we will find ways to get there, probably a lot of progress over the next 5 years. Maybe even the next 2.

knowaveragejoe · on June 21, 2024

It sounds like between the two of you(and the person who mentioned generating images in layers for image editing software), you've stumbled upon an obvious gap in the market.

cageface · on June 21, 2024

I’ve tried to explain this to several friends. Until these tools can generate output that can be mixed properly they’re going to be very niche.

the_other · on June 21, 2024

> Wouldn't it be better to generate multiple tracks that can be mixed / tweaked together, rather than a single track? That way you can also keep the parts you like and continue iterating on the parts you dislike.

That'd interest me (a musical hobbyist) more than the "whole track" generators, for sure.

I imagine it's a harder task tho'. Presumably, if you give the same source material (video, prompt) to the AI multiple times, it will generate different pieces of music. So if you do a series of prompts, each one specifying a different instrument or group/bus, then you (or the AI) need to arrange for the parts to blend correctly, follow the same cues and assemble to a coherent arrangement. Is that one pass with multiple outputs, or multiple passes/prompts with one output each?

I have got the impression (from casual reading) that the music generators don't inherently "know" about different parts of a piece of music. They just know about the final output.

TacticalCoder · on June 21, 2024

> Wouldn't it be better to generate multiple tracks that can be mixed / tweaked together, rather than a single track? That way you can also keep the parts you like and continue iterating on the parts you dislike.

Totally and that is 100% what is coming. For a great many pictures too: why generate a picture full of lightning issues / approximation when you'll soon be able to generate and entire 3D scene and render it properly.

We've mastered 3D rendering and audio engineering.

I want the 3D models and the 3D scenes. I want the individual tracks (and combine them in Dobly Atmos or whatever shall be cool).

And that is coming, no question about it.

tkgally · on June 21, 2024

ElevenLabs just released something that is more controllable:

https://news.ycombinator.com/item?id=40736536

bryanrasmussen · on June 21, 2024

the AI Musical IF This Then That Step 2 > https://www.lalal.ai/ "Extract vocal, accompaniment and various instruments from any audio and video"

chaosprint · on June 21, 2024

it's limited by the mechanism of diffusion.