"Each time you make a Voice Call on Telegram, a neural network learns from your ...

tyingq · on March 30, 2017

I have the feeling that's there soley for the "AI-Powered" hype train benefit. Versus actually being used as a genuine feedback loop to improve the service.

Logs of speed, ping times, packet loss, would likely be more useful in good old, non-AI reports...to identify regional issues, peering opportunities, etc.

I don't think you need AI for voip calls that upgrade/degrade quality based on network health.

CodeMichael · on March 30, 2017

Sounds like marketing speech for variable bitrate encoding and/or adjusting compression aggressiveness.

JshWright · on March 30, 2017

> variable bitrate encoding

I hope not... VBR can easily leak all sorts of information (including the actual content of the conversation)

drakenot · on March 30, 2017

Whoa, I had never thought of monitoring VBR as an attack vector for recovering audio.

Do you have a link discussing this?

JshWright · on March 30, 2017

Sure, here are a couple papers on the topic:

https://www.cs.jhu.edu/~cwright/oakland08.pdf

https://www.cs.jhu.edu/~cwright/voip-vbr.pdf

It's fundamentally very similar to the sorts of issues you end up with if you compress then encrypt. If the attacker can make some educated guesses about the plaintext prior to the compression, the compression ratio can be a very powerful tool in their arsenal.

walterbell · on March 30, 2017

Wire implemented CBR for their encrypted calls, upstreamed it to WebRTC and submitted a patch to Signal, https://medium.com/wire-news/call-security-constant-bit-rate...

JshWright · on March 30, 2017

Silent Phone has used CBR since day 1.

CodeMichael · on March 30, 2017

Correct, last article I recall reading about deciphering VBR from packet size alone was something in the neighborhood of 50% success rate.

Y_Y · on March 30, 2017

Then why not compress really efficiently by just transmitting packet sizes?

c22 · on March 30, 2017

Because 50% of them won't be understood?

e12e · on March 30, 2017

On the other hand, if you could do it, you'd probably have invented a convoluted speech-to-text (where text is a index into a dictionary of words). Note that you would also likely lose things like inflection, voice, accent etc - so while it might work as a texting system with voice input - it would be a poor substitute for voice chat..

samat · on March 30, 2017

Those codecs have tons of parameters to tweak (source: private conversation with Pavel Durov)

moe · on March 30, 2017

But what sort of parameters are adjusted?

This is HN. A link to an example would be appreciated.

Edit: To clarify, I work with audio codecs too, and can't really think of parameters (other than the compression level?) that would make much sense to adjust on the fly.

If "AI" is used for more than just a buzzword here, then I imagine the answer must be quite interesting.

fenwick67 · on March 30, 2017

They probably adjust the incoming / outgoing buffer sizes (and therefore the audio delay, since it's live) to account for packet loss.

They might also prioritize traffic depending on how full your buffers are.

I can only assume Youtube and Netflix do similar parameter tweaks to optimize their video delivery based on the connection (totally filling the buffer to a max size all the time would waste bandwidth, but if the client has lots of packet loss they need a larger safety net).

pbhjpbhj · on March 30, 2017

Right, looks like we're up to maybe 6 parameters. The claim was dozens which i take to mean at least 24, possibly 36 as the lower bound.