Just note: you're throwing away 58% of your results! The original coin flip runs...

typicalset · on Dec 24, 2020

In the general case, you can compress a sequence of N biased coin flips with arithmetic coding. For a coin with known bias, this compression is (essentially) optimal and will therefore produce ~N*(Shannon information) unbiased bits.

dragontamer · on Dec 24, 2020

In the "even more general case", the Shannon Theorem suggests that random_code(message) -> reaches the Shannon Limit. :-)

The problem is that perfectly-random codes can't be decoded very easily... so this little factoid is kind of worthless in most cases. Fortunately, we don't actually care what the information looks like when flipping coins. We usually just want "some random message", so no need to decode in this application!

------

Cryptographers assume hash-functions are perfect random codes (they aren't in theory, but they are in practice... at least until they're cracked. Kinda funny how that works out). As such, the cryptohash(message) methodology should also reach the Shannon-limit, as long as your cryptohash remains secure... and you stay within the blocksize restrictions of the hash.

SideQuark · on Dec 25, 2020

>assuming you have a perfect cryptographic hash function

Except there is no such thing (not in the sense you need it) - it too would be a perfect source of randomness, and you're just pushing the problem down the road.

As an example, NIST publication 800-90B recommends multiple randomness extractors, one of which is SHA. They recommend using twice the amount of entropy in as the entropy out to get random "enough" bits. Thus even SHA is not going to mix bits well enough as you want.

(The things called perfect hash functions in the literature map N items into N slots with no collisions, which is not what is needed here).

As a perhaps surprising counterexample to any simple solution, here [1] is a math paper with a proof that there can be no optimal algorithm that is best for all values of coin bias.

Here's [2] a cool walkthough on some of the ideas in theory that have been investigated.

[1] https://web.eecs.umich.edu/~qstout/abs/AnnProb84.html [2] http://www.eecs.harvard.edu/~michaelm/coinflipext.pdf

dragontamer · on Dec 25, 2020

> Except there is no such thing (not in the sense you need it)

???

Okay, lets choose the simplest hash function of them all: a 1-bit XOR (resulting in a 1-bit "hash").

Lets say I flip the 75% biased coin 100 times, count heads as 1 and tails as 0. I XOR all the results together. What's the probability of XOR(all) == 1, and what's the probability that XOR(all) == 0?

You'll find that its quite close to 50/50, which is the limit to the amount of entropy you can pull from a 1-bit hash function.

---------

Now lets see for smaller values:

1 flip: 75% chance of 1, 25% of 0. Which is the limit of information so far.

2 flips: ... you know what? Imma write a program and get back to ya a bit later. I probably will edit my post.

------

2 flips: 62% chance of Parity0 3 flips: 43% chance of Parity0 4 flips: 53% chance of Parity0

Etc. etc. etc.

By 16 flips, its 49.9985% chance of Parity0, pretty close to unbiased. It gets pretty hard to distinguish this coin from an unbiased one.

--------

I'm not sure if you need a cryptographic hash actually. Its just that cryptographic hashes are close enough to random that its easier to use.

Now I'm curious if a CRC32 would efficiently extract 16-bits of entropy from biased coins. CRC32 is clearly not a cryptographic hash, but its pretty good as an "avalanche" due to the nature of GF-arithmetic.

SideQuark · on Dec 28, 2020

>It gets pretty hard to distinguish this coin from an unbiased one.

All you're doing is trying to reinvent the Law of Large Numbers - nothing in this post shows a perfect hash function, and in fact is demonstrating my point - by you putting biased data into a hash you're getting biased data out.

To completely remove the bias you need infinite flips. And thus you have improved nothing.

Next, if you're going to flip 100 times to get 1 bit out, simply use the Neumann method - it does work, uses only a few flips on average.

Here's a proof why your method doesn't work: Let p = prob of getting a 1, q = 1-p is prob of getting a 0. Then XORing n of these together is a 1 iff there are an odd number of 1 bits. This happens Sum_{j=1 to N by odd} nCj p^j (1-q)^(n-j) of the time. This simplifies to (1-(1-2p)^n)/2. For this to be exactly 1/2, which is no bias, p must be exactly 1/2. There is no other value that doesn't fail.

This argument can be extended to whatever hash functions you want. You're not going to get the outcome you desire no matter how you structure it.

>I'm not sure if you need a cryptographic hash actually.

It matters not the hash. None will do what you want.

This is why when crpyto uses weak PRNGs at the front, even after all the powerful crypto in the middle, they are weakened. You're trying to do what decades of cryptographers know is impossible.

rini17 · on Dec 25, 2020

The entropy is -0.7log2(0.7)-0.3log2(0.3) = 0.88 .

The post is very confusing, in no case can deterministic hashing increase the entropy. And if it outputs only half of input entropy then it's only 8% more efficient than von neumann in this case.