The Catch 22 of Base64: Attacker Dilemma from a Defender Point of View

zrm · on May 1, 2018

> based on the assumption that legitimate users have no practical need to do multiple encoding of the same text.

Things can legitimately be encoded multiple times because encapsulation is a thing and multiple independent stages may each be configured to accept possibly binary input and produce base64 output.

You also don't need to re-encode something an unreasonable number of times to get "Vm0wd", all you have to do is start with "Vm0" and base64 encode it twice. "Vm0" only has 24 bits of entropy which means it will regularly occur at random in legitimate data.

And then nobody can figure out why "Vm0-Edge-West" isn't working.

benchaney · on May 1, 2018

I'm not really sure what the threat model is here. If the attacker can control what encoding scheme you use, surely you have much more serious problems than the possibility of wasting space.

jessaustin · on May 1, 2018

TFA is about attackers using Base64 encoding.

loup-vaillant · on May 1, 2018

> While Base64 encoding is very useful to transfer binary data over the web

This part I cannot fathom. The era of 7-bit bytes is over. What can possibly justify the need for a "printable characters" encoding now? Something stupid like putting data in a JSON string? What's the next step, base-64 encode the JSON containing that string and put it in an XML tag?

dylz · on May 1, 2018

I see crap like:

  <data><![CDATA[PD94bWw+PHdoeS8+PHRoaXNpc2F3ZnVsPjwvcm9vdD4=]]></data>

constantly in real life in 2018

jancsika · on May 1, 2018

Isn't this likely due to a second-order effect?

For example, there is (was?) a bug in nw.js with using @font-face to specify a font url that is in the same (local) location as the manifest. As a hack I tried specifying the font as a base64-encoded data url. It worked!

I had no use for base64, other than that there exists an interface for specifying data and it takes a base64-encoded string.

loup-vaillant · on May 1, 2018

Yes, but why? There's a reason for everything, so I guess this serves some purpose, somehow? I guess the ultimate reason is stupidity or bad planning or cut corners, but I'm still curious.

Bonus point if there's a situation where avoiding something like base64 is actually difficult, even in 2018.

jessaustin · on May 1, 2018

If one wanted to embed a random token in an email, what would one use instead of base64?

loup-vaillant · on May 1, 2018

Attachments?

jessaustin · on May 1, 2018

ISTM that amounts to the same thing? MIME is Base64.

loup-vaillant · on May 1, 2018

0culus · on May 1, 2018

I've seen base64 used to implement "security through obscurity" in web applications. Many times in situations where they could have just used a cryptographic hash and been done with it.

boneitis · on May 1, 2018

To really crank up the pedant-O-meter, wouldn't it be accurate to instead describe the output growth in point #1 as polynomial, even sub-quadratic?

boneitis · on May 2, 2018

Perhaps i can blame the booze, as i an admittedly very drunk but still unconvinced and/or confused.

I asked my question coming from the angle of big-oh analysis, and all three (as of the time of this reply) responses in the thread seem to describe a growth by a constant factor of 1.3bar, bar over 3, otherwise known as polynomial growth, yes?

Absolutely, the growth can be mathematically expressed with an exponent, but we verbally characterize the actual growth to be polynomial if the scalar stays constant (a.k.a. a literal number in the exponents place) rather than increasing as the amouny of input increases (independent variable lying within the actual exponent), no?

boneitis · on May 2, 2018

Man, I'm dyslexic.

benchaney · on May 1, 2018

No. It is exponential, not polynomial. That is, it is (4/3) ^ x rather than x^(4/3).

tempodox · on May 1, 2018

The least one could say is: The growth factor is not 1.3333, but 4/3.

matharmin · on May 1, 2018

The output size is O(4/3^n), which is exponential.