Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think you're asking the wrong question. E2E is actively harmful to videoconference beyond two participants. A waste of time is an euphemism, I bet you that it would actively hurt many metrics/teams if done and these would push to have it reverted.

A videoconference has to be able to adjust audio and video quality in real time between participants. E2E is a technical barrier to that because the streams are encrypted and harder to work with, it is possible to have E2E but it's hard work and it's resource intensive and it affects reliability.



Zoom's real-time adjustments of codecs happen on the sending clients and not in the cloud, so E2E doesn't impact quality.


Is that true? Certainly it's true that the client changes the quality of what it's sending based on what its uplink supports, but what if a client has very good uplink and sends a high quality stream, but 1 (out of some larger number) of other participants doesn't have the downlink to receive it? Does Zoom actually instruct the sender to degrade the quality of what it uploads, and so everyone gets a worse experience? That seems unlikely.


It's called scalable video coding. The source sends multiple streams of packets depending on their upstream, and the more streams you get the higher quality the resulting video. Each client can tell the server which streams they want to subscribe to, which are then picked apart and multiplexed per the needs of each receiving client.


I haven't read this before that E2E is a barrier to adaptive audio & video quality. Would you care to elaborate or is there a source I could read up on?


The most straightforward way to handle adaptive audio/video quality in a multipoint video call is to send the highest quality the sender can to the server, and have the server transcode that to participants in varying qualities.

If it's E2E, the server can't transcode the stream. Realistic options are the sender sending multiple quality streams and the server picking the right one for each receiver, or sending to the server at the quality of the least capable receiver.

There's a concept of bitrate peeling, which would be great for this --- send the highest quality to the server, and the server sends a truncated stream to receivers with poor bandwidth etc. The transport stream would have to be designed so that the truncation points are known to the server, and that the receiver can verify integrity at any chosen truncation point. There's the real problem that this isn't a productionized concept; AFAIK, it's only an expiremental feature in Vorbis, and in that case, quality at a given bitrate is inferior when truncating to that bitrate vs directly encoding at that bitrate.


I wonder if it's possible to design a practical stream format + encryption algorithm combo that allows an intermediary to downsample the stream without requiring knowledge of the unencrypted contents of the stream. Sounds like a really cool topic for a PhD thesis.


Sender sends video as a base quality stream plus a delta to the next level quality and plus a third stream which is the delta to top quality. Server passes on either 1, 2, or 3 streams according to bandwidth


I'm not an expert but I believe Homomorphic Encryption may make this possible: https://en.wikipedia.org/wiki/Homomorphic_encryption


I'm even less of an expert but here is an example: https://pdfs.semanticscholar.org/a42e/022f812a7b5c464cf83454...


Without E2EE, the server can directly access and manage how the video and audio streams are sent to each client. It can downgrade video for participants who are struggling, prioritize throughput of audio data for certain clients, prioritize higher res video only for the user who's currently speaking, drop individual frames for clients that are getting behind, etc etc.

With E2EE, by definition, the server becomes a dumb relay, and all control moves into the clients. Depending on how E2EE is implemented, you then lose the ability to do some or all of these optimizations.

There are potential workarounds, but each comes with further tradeoffs in turn, in performance, privacy & complexity. That's not to say that it's completely impossible to optimize given E2EE, as you can move many optimizations to each client, or try to have clients expose just enough data for servers to optimize traffic without being able to read the contents (e.g. by splitting the audio & video streams so the server can manage them independently, or having each client upload separate high & low res video of their own video stream). It definitely makes many optimizations dramatically harder though, creates some serious engineering problems, and in practice rules out some optimizations completely.


Very simplified:

Client sends a video stream to zoom. Zoom reencodes the video on the fly in different formats and forward to the 10 participants separately.

Participants have different devices and network connectivity, that's how the dirty real world is, so zoom has to do that. It's development work and it's compute intensive but it's a hard requirement to have "good" videoconferencing.

Imagine the same thing with E2E. Client sends a video steam to zoom. Zoom can't do shit with it because it's encrypted, it can't be decoded it can't be reencoded.


Whether E2E introduces technical barriers or difficulties, or not, wasn't my angle. If it makes things Impossible or Hard, then this should be factually stated in the documentation, right after the statement that the service is not (entirely) E2E.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: