There's an art to subtitling that goes beyond mere speech-to-text processing. Sometimes it's better to paraphrase dialog to reduce the amount of text that needs to be read. Sometimes you need to name a voice as unknown, to avoid spoilers. Sometimes the positioning on the screen matters. I hope the model can be made to understand all this.
> Sometimes it's better to paraphrase dialog to reduce the amount of text that needs to be read
Please no. Some subtitle companies do think like this, and it's really weird, like when they try to "convert" cultural jokes, and then add in a bunch of more assumptions regarding what cultures you're aware of depending on the subtitle language, making it even harder to understand...
Just because I want my subtitles in English, doesn't mean I want all typical discussed Spanish food names to be replaced by "kind of the same" British names, yet something like that is something I've come across before. Horrible.
I totally get this. When I'm watching videos for the purpose of learning a language, I want all the actual words in the subtitles. But if I'm watching just ot enjoy, say in a language I don't care to learn, I don't mind someone creatively changing the dialog to how it probably would have been written in English. This happens with translations of novels all the time. People even seek out specific translators who they feel are especially talented at this kind of thing.
I know a little Spanish and even I get annoyed when the English subtitles don’t match what they said in Spanish. Of course I expect grammatically correct Spanish to be translated into grammatically correct English.
It depends on the context! Trying to Americanize Godzilla, for instance, has largely failed because Godzilla is an allegory for the unique horror of nuclear bombing which Japan experienced. Making him just a lizard that walks through New York is kind of stupid.
Jokes are an example of something translators can do really well - things like puns don't work 1:1 across languages. A good translator will find a corresponding, appropriate line of dialogue and basically keep the intent without literally translating the words.
Food is kind of silly because it's tied to place - if a setting is clearly Spanish, or a character is Spanish, why wouldn't they talk about Spanish food? Their nationality ostensibly informs something about their character (like Godzilla) and can't just be fine/replaced.
> Jokes are an example of something translators can do really well - things like puns don't work 1:1 across languages. A good translator will find a corresponding, appropriate line of dialogue and basically keep the intent without literally translating the words.
Again, those aren't "cultural translations" but "idioms translations", which I do agree should be translated into something understandable in the language, otherwise you wouldn't understand it.
What I was aiming at in my original comment was examples like these:
> Family Guy original voice-overs + subtitles making a joke about some typical father figure in Hollywood for example. Then the Latin Spanish subtitles will have translated that joke but replaced the specific actor with some typical father figure from the Mexican movie industry, while the Castilian subtitles would have replaced it with someone from the Spanish movie industry.
More precisely speaking, there are two kinds of subtly different subtitles with different audiences: those with auditory imparements and those with less understanding of given language. The former will benefit from paraphrasing while the latter will be actively disadvantaged due to the mismatch.
Not all of his translations were nowhere near the original. For example, his translation of Guy Ritchie's Snatch was excellent (in my opinion of course) and is still quoted to this day. I'd say it's the only one that absolutely nails it and then some.
On the other hand, his Lord of The Rings was an "alternative" dub as you described. Didn't watch that one though.
> Spanish food names to be replaced by "kind of the same" British names
The purpose of a translation is after all to convey the meaning of what was said. So for example you'd want the English "so so" to be translated in Spanish as "más o menos" instead of repeating the translation of "so" twice. You don't want to just translate word for word, venir infierno o alta agua.
A lot of dialog needs language specific context, many expressions don't lend themselves to literal translation, or the translation in that language is long and cumbersome so paraphrasing is an improvement.
Like with anything else, the secret is using it sparingly, only when it adds value.
> But for example you'd want the English "so so" to be translated in Spanish as "más o menos" instead of doubling down on whatever literal translation for "so" they choose.
Agree, but I don't think those are "cultural translations" but more like "idioms translations", which mostly makes sense to do.
What I originally wrote about are things like Family Guy original voice-overs + subtitles making a joke about some typical father figure in Hollywood for example. Then the Latin Spanish subtitles will have translated that joke but replaced the specific actor with some typical father figure from the Mexican movie industry, while the Castilian subtitles would have replaced it with someone from the Spanish movie industry.
I think I've encountered cases like this watching Netflix movies originally in my native language but subtitled in English. But in every case I can remember the substitution made perfect sense.
Without adapting the translation the natives will immediately understand the reference while you, the non-native with no sense of who they're talking about are left wondering what's the real message. Calling someone a "Mother Teresa" might miss the mark somewhere in China. Same if an Italian movie made references to food like Casu Marzo and the average American would probably miss a lot of the context.
Just recently I saw this in a series where some workers in the oil extraction industry were staring at a pan of paella asking what it is and calling it jambalaya. Paella is world famous, how about khash?
That's why I said I understand the usage but it should only be done if it really helps the comprehension, not just the principle of gratuitously adapting to one language.
> Without adapting the translation the natives will immediately understand the reference while you, the non-native with no sense of who they're talking about are left wondering what's the real message. Calling someone a "Mother Teresa" might miss the mark somewhere in China. Same if an Italian movie made references to food like Casu Marzo and the average American would probably miss a lot of the context.
Right, but isn't that why the American is watching this Italian movie anyways, to get a wider understanding of Italian culture? I don't watch foreign movies with the expectation that they're adapted to my local culture, then there wouldn't be much point in watching it.
> isn't that why the American is watching this Italian movie anyways
I don't know. I always though people would prefer to hear the original voice of the actor, since speech is a big part of the acting. But movies are dubbed in a lot of countries. And most people from those countries I spoke to said that they find the idea of subtitles very odd because it's distracting them from the movie.
I am sure many if not most people simply want to understand what's the message behind the conversation on the screen first and foremost. Learning something new only works if while you try to keep up with the dialogue, you also keep track of all the expressions you heard for the first time, to look them up later.
This is why I think it's the translator's job to balance translation and adaptation. Directly translate the original where context helps you understand the meaning so you get the "original" experience, and adapt where leaving just the 1:1 translation will make you lose the thread or miss some details.
That's a good example of translation where there's only really so many ways to do it. A bad example like people are talking about is the original 4Kids Pokemon where every time someone brought out an Onigiri (rice ball), they would call them jelly donuts.
There is the art of subtitling, and then there is the technical reality that sometimes you have some content with no subtitles and just want a solution now, but the content didn't come with an SRT or better yet VTT and OpenSubtitles has no match.
They're using Whisper for speech to text, and some other small model for basic translation where necessary. It will not do speaker identification (diarization), and certainly isn't going to probe into narrative plot points to figure out if naming a character is a reveal. It isn't going to place text on the screen according to the speaker's frame place, nor for least intrusion. It's just going to have a fixed area where a best effort at speech to text is performed, as a last resort where the alternative is nothing.
Obviously it would be preferred to have carefully crafted subtitles from the content creator, translating if the desired language isn't available but still using all the cues and positions. Secondly to have some carefully crafted community subtitles from opensubtitles or the like, maybe where someone used "AI" and then hand positioned/corrected/updated. Failing all that, you fall to this.
I've first hand encountered several situations with subtitles where it would have been ambiguous who was speaking without speaker annotations, despite the voices being distinctive, me being able to hear them clearly etc. Just think of a rapid exchange with neither speaker on-screen for more than two or three sentences and replies.
You can probably get 99% there without that for a lot of content, but I'd challenge the notion that this is somehow only important for hearing impaired viewers (or people just watching without clearly audible sound for other reasons).
I guess you can have that in a real life situation as well, where you don't have subtitles at all (hello AR) and still can handle. Do you want your subtitles full of metadata the whole movie for every movie and every day, for those several situations when the image director made a mess? You can always play that confusing scene again.
I definitely prefer subtitles to be as helpful as possible, yes. That includes having situationally appropriate metadata (which is different from "all the metadata, all the time").
I don't think that's an unrealistic goal to have for AIs; they're already extremely good at semantic scene description after all. By looking at the image in addition to just the audio track they probably also get a lot more metadata, which a refined world model will eventually be able to use just like a human subtitle editor can today.
So you mean, the AI should figure out when a problematic scene is coming, and only then add labels and whatnot? Not impossible, just somebody must teach them, same with subtitles positioning.
Like other commenters pointed out, we are talking two different types of subtitles - those for hearing impaired have very different requirements. I'm not sure which one is VLC gonna cover. Best would be both, just don't mix them up please.
AI subtitles are just text representation of the sound track.
There is no need for artistic interpretation, substituting words, or hiding information. If it’s in the audio, there’s no reason to keep it out of the subtitle.
An AI subtitle generator that takes artistic license with the conversion is not what anyone wants.
This is horrible for people who learn languages using TV Shows and Movies. One of the most frustrating things I've encountered while learning German is the "paraphrase" thing, it makes practicing listening very hard, because my purpose wasn't to understand what was being said, but rather familiarizing my ear with spoken German.
So, knowing exactly the words being said is of utter importance.
> Sometimes it's better to paraphrase dialog to reduce the amount of text that needs to be read
NO!
I speak and understand 90% of English but I still use subtitles because sometimes I don't understand a word, or the sound sucks, or the actor thought speaking in a very low voice was a good idea. When the subtitles don't match what's being said, it's a terrible experience.
> Sometimes it's better to paraphrase dialog to reduce the amount of text that needs to be read.
Pretty sure this is a violation of the Americans with Disabilities Act, so illegal in the U.S. at least. Being Deaf doesn't mean you need "reduced" dialogue.
As long as they're synced properly I don't care much, some movies/shows have really bad sound mix and it's not always possible to find good subs in the first place
I suppose this feature should have been termed closed captioning and not subtitling. It seems you're not going to get much sympathy for human translation here.
> There's an art to subtitling that goes beyond mere speech-to-text processing.
Agreed.
> Sometimes it's better to paraphrase dialog to reduce the amount of text that needs to be read.
Hard no. If it’s the same language, the text you read should match the text you listen to. Having those not match makes parsing confusing and slow.
> Sometimes you need to name a voice as unknown, to avoid spoilers.
Subtitles don’t usually mention who’s talking, because you can see that. Taking the source of a voice is uncommon and not something I expect these system to get right anyway.