January 30, 2010

Random Seeds, Coded Hints, and Quintillions

Here is a seedable random number generator in Javascript that you can set up to produce a determinstic sequence of pseudorandom numbers. Browsers do not provide a built-in way to seed Math.random(), so this solution is handy both when you need a repeatable pseudorandom sequence that is completely predictable, and when you need a robust seed that is much more unpredictable than your browser's built-in random number generator.

Many games that use weak random number generators have been cracked by exploiting their lack of randomness, and recently it has even been shown that it is possible to guess your 'random' Social Security Number given information about the time and location of your birth. To resist this type of attack, you want do better than a linear congruential PRNG seeded with the current time. Explanations below.

A Math.seedrandom Function

This script defines a function Math.seedrandom() that replaces Math.random with a seeded sequence of your choice. You use it by including seedrandom.js and then calling

Math.seedrandom('any string you like');

Next time you call Math.random(), you will get a deterministic sequence of results that can be reproduced at any time on any browser by calling seedrandom with the same string (with the example seed above, you will always get 0.4514661562021821, 0.06749172707294095, 0.8393296727715214, etc).

Seed:

The code uses RC4 as the pseudorandom number generator, so the randomness is a bit better than what you get from most browsers' built-in Math.random. But since it does all the computation in javascript, it is also is 3-10x slower.

A minified version of seedrandom.js (using the Google Closure Compiler) is about 1K.

The code also supports automatic seeding...

Local Entropy Seeding

If you don't want to think of a seed but just need an arbitrary unpredictable seed, seedrandom.js can also do automatic seeding from local entropy that is good enough for non-adversarial use.

If you call Math.seedrandom() without an argument, a seed is derived from available local data. The DOM contains several candidate sources of entropy such as the cookie, the window size and position, the scroll position, the history length, the current event, the clock, the native random number generator, and javascript variable state.

Much of this information is collected by seedrandom using a recursive traversal of the window object:

function flatten(obj, depth) {
  var result = [];
  if (depth && typeof(obj) == 'object')
    for (var prop in obj)
      try { result.push(prop, flatten(obj[prop], depth - 1)); } catch (e) {}
  return result.length ? result : '' + obj;
}
 
seed = '' + flatten([new Date().getTime(), entropy, window], 3);

Since the underlying RC4 generator uses a 256-byte seed, any explicit and DOM-derived seeds are smeared into a 256-byte array like this:

var smear = 0;
for (var j = 0; j < seed.length; ++j)
    key[j & 255] = ((smear ^= key[j & 255] * 19) +
        seed.charCodeAt(j)) & 255;
shortseed = '';
for (j in key) shortseed += String.fromCharCode(key[j]);
return shortseed;

The shortened seed string is returned from Math.seedrandom() so that you can record and replay an autogenerated seed if needed.

Entropy Accumulation

Unfortunately, while local seeding is pretty fast, the amount of unpredictability that can be collected quickly from the DOM ends up being pretty thin and not enough to protect you against a determined hacker who is trying to beat your online poker game by using brute force to deduce your seed.

You can add entropy by calling Math.seedrandom with sources of unpredictable bits such as mouse positions over time. The script maintains an "entropy" variable that accumulates new bits of information every time Math.seedrandom() is called, so if you call Math.seedrandom(data) with explicit seeds several times and then finish with a call to Math.seedrandom() without arguments, all the previously-supplied explicit entropy will be mixed together with DOM state to build a now-more-unpredictable automatic seed.

Available local browser entropy has been studied, and because of the coarseness of timers and limited access to physical devices, it is probably necessary to collect about a minute of user-interface event observations to build an attack-proof pool of local entropy.

If you need robust unpredictability and you do not have a whole minute to wait, you need to find a source of entropy that is not local.

Network Entropy Seeding

One option for quick robust entropy is to use an online source of random bits like random.org. Random.org provides a high-volume online stream of unpredictable bits that are derived from atmospheric noise detected by an array of radio receivers in Dublin and Copenhagen, all built and run by Trinity College professor Mads Haahr. His service will happily ship a few of these physically generated bits to you over https for free.

Ideally after loading seedrandom.js, you would like be able to run some code like the following that supplies random strings that come from random.org:

Math.seedrandom({'results':['pcPKzEptrHN3wkgTZJJe', '9uM6b5j5vKZzpVVds2ZT', 'zpHSar2s2a7kLnewGFFh']});

You can get very close to this by using YQL to reformat data from a random.org download page. The query you want is this:

select * from html
  where url = "https://www.random.org/passwords/?num=50&len=9"
  and xpath = "//ul[@class='data']//p/text()"

The YQL above can be invoked as a URL that returns the results as JSONP for convenient use as a cross-domain script include. The format of the random data can be cleaned up a bit by using YQL execute scripts, which I have done using this xml file. To make usage easy, I have posted a shortened url to http://bit.ly/srandom-256. So to make a random.org-seeded PRNG in Javascript, you can just write:

<script src="http://davidbau.com/encode/seedrandom-min.js"></script>
<script src="http://bit.ly/srandom-256"></script>
<p>The following number is very hard to guess:
<script>document.write(Math.random());</script>

There are two main disadvantages when seeding using a network source. Frst, it introduces a series of intermediaries that you must trust to pass untampered random bits to you, including in this case, bit.ly, Yahoo, davidbau.com, and random.org. Second, there will be a delay of a few hundred milliseconds to load the data.

It is possible to seed from several independent sources (e.g., see HotBits) and load asynchronously to avoid these issues. That is left as an exercise for the determined reader.

A Secret Decoder Ring

Sometimes an unpredictable seed is not what you want - sometimes you want to specify an explicit nonrandom seed.

What is an explicitly seeded PRNG good for? It is good for knowing the future with certainty while everybody else is left guessing! For example, you can use a PRNG for a really easy-to-implement javascript secret decoder ring.

The one I have linked here is useful for pencil-and-paper written clues, the sort that you might want to hide in a kid's treasure hunt. In our family we have used it for exactly that purpose many times.

The secret encoder encrypts letters as normal letters and preserves spaces, capitalization and so on, so you can easily transcribe the ciphertext. It is fully symmetric, by which I mean that the algorithm for going from ciphertext to cleartext is exactly the same as the one in the other direction, so that kids won't get frustrated when they mix up encryption and decryption. The case-insensitivity and space-insensitivity of the code is helpful for the same reason.

davidbau.com/encode your treasure hunt hints here.

Wichmann-Hill

The RC4 algorithm generates discrete random integers. But Math.random() is supposed to return a uniformly distributed floating-point value in the range [0, 1). What is the right way to get from integers to the floats?

Most floating-point PRNG code that I have seen fails to do this correctly, in my opinion.

For example, the widely-used 1982 Wichman-Hill algorithm (that used to be used by python) goes something like this:

x = (171 * x) % 30269
y = (172 * y) % 30307
z = (170 * z) % 30323
return (x/30269.0 + y/30307.0 + z/30323.0) % 1.0

The three generators have period 30268, 30306, and 30322, which have LCM about 6.9 trillion. This corresponds to a gap between numbers that can be generated on the order of 10-13. This gap is about 1000 times larger than the gap between numbers near 1.0 in the IEEE 754 double representation which is about 10-16.

In other words, Wichman-Hill can only generate about 0.1% of the 52-bit mantissas that are possible in a double-precision floating point value. Generating the whole set of 252 different numbers is not a particularly impressive problem by modern standards, and so Wichman-Hill, with its relatively short period and coarse resolution, is no longer the right algorithm to use.

Generating Quadrillions of Uniform Doubles

The more modern approach to generating a double precision floating point value (used by python since 2.3 and Boost C++ libraries) is to simply use a long-period integer generator to directly build an integer with 52 bits of randomness, and then divide the number by 252 in order to fill up the 52-bit mantissa of an IEEE 754 double, something like this:

function random() { return get_randint_up_to(1 << 52) / (double)(1 << 52); }

But when we are using an underlying PRNG like RC4 that has a period around 10100, it seems like a shame to limit the the granularity at the low end of the sample set to 2-52. After all, if we can generate randomness that does not repeat in a sextillion sextillion sextillion sextillion cycles, why do we limit ourselves to a mere quadrillion different values?

A 64-bit floating-point value naturally cannot represent more than 264 different values. But it has very high resolution near zero, and that should be exploited.

Noticing One Event in a Quintillion

The following code is a natural way to simulate a one-in-a-quintillion event:

if (random() * (1.0e+18) < 1.0) { one_in_a_quintillion = true; }

Unfortunately, using the current regime of random number generators, the code above will not do what it appears to promise, because the coarse 2-52 rounding will ensure that the number will actually fire about 220 times too frequently.

This is not just a one-in-a-quintillion issue. If a typical modern PRNG happens to generate a population of numbers smaller than one billionth (which it can do in a few seconds on a modern CPU) then more than half of the random-looking digits in the floating-point representation of those small random numbers are a deception. The very smallest digits are not random at all: they are determined by the leading digits. If you were to choose to use those least-significant digits for hashing or subsampling, you would run into trouble.

IEEE 754 floating-point representations are specifically designed to provide telescoping high resolution near zero so that things like very small probabilities can be dealt with accurately. A double precision value has no problems representing accurate values with full precision down to about 10-96, which, we might notice, is a nice match for the power of the 10100 period of RC4.

I have not seen a previous PRNG that takes advantage of that power.

Building an Exponentially Correct IEEE 754 Double

So to scratch this pet peeve, the uniform [0, 1) generator included in seedrandom.js has the following code to smoothly utilize all the available exponents in the floating point representation (note that in javascript, all those large integers bigger than 232 are implicitly interpreted as IEEE 754 doubles):

math['random'] = function random() {  // Closure to return a random double:
  var n = arc4.g(6);                  // Start with a numerator <= 2 ^ 48
  var d = denom;                      //   and denominator = 2 ^ 48.
  var x = 0;                          //   and no 'extra last byte'.
  while (n < significance) {          // Fill up all significant digits by
    n = (n + x) * 256;                //   shifting numerator and
    d *= 256;                         //   denominator and generating a
    x = arc4.g(1);                    //   new least-significant-byte.
  }
  while (n >= overflow) {             // To avoid rounding up, before adding
    n /= 2;                           //   last byte, shift everything
    d /= 2;                           //   right using integer math until
    x >>>= 1;                         //   we have exactly the desired bits.
  }
  return (n + x) / d;                 // Form the number within [0, 1).
}

The code above scales up the exponent by increasing d as long as the mantissa does not fill the full 52 bits available in the floating-point representation ("significance" is set to 252 and "overflow" is set to 253). If the least-significant byte is partially significant, the only the topmost bits are used.

By adding the scaling loop for small numbers, this uniform variate generator uses not only the whole mantissa, but the whole exponent as well. One-in-quintillion simulations are now possible using the ordinary uniform distribution.

Some Limits

At the limits of resolution, there are a couple things to note. There are only 53 significant digits, so if you multiply 254 by any number more than 0.5, obviously you will get an even number 100% of the time.

In Javascript IEEE 754 round-to-even mode is used, so we need to take care to avoid rounding at the limits of resolution by shifting off any bits beyond the least significant digit. This achieves a last digit that is not the result of rounding which is equally likely to be even or odd when you multiply by 253. Update 2/9: note that the change to seedrandom.js to keep this last digit fair has changed the PRNG sequence. The old seedrandom sequence, which suffers from a slight bias-towards-even in the last digit, is still available here.

Once you have gone to all this trouble, the other set of issues to recognize are limits of the underlying RC4 algorithm. For example, the period of RC4 is recognized to be about 10100, one googol. This sounds like a big number, but keep in mind that the number of arrangments of a simple 52-card deck is about 8.1×1067. A two-deck shuffle has about 2.2×10150 arrangements, and so the RC4 generator would not be able to produce every possible permutation of two decks shuffled together. Reseeding occasionally could help solve this problem: the seed space of RC4 is about 10507.

Shuffling an Eight Deck Shoe

Never assume you have enough randomness without looking at the numbers. In Las Vegas it is common to deal blackjack out of an eight-deck shoe to make it harder for card counters to beat the house. But there are about 10671 distinguishable ways to shuffle eight decks of cards together, so matter how much you reseed an 8-bit RC4 generator with its 10507 states, there is no way you can achieve every possible shuffle.

With ordinary RC4, you will be limited to a subset of shuffles that represent less than one quintillionth quintillionth quintillionth quintillionth of the possible shuffles that you could do with a physical shoe of eight decks! In other words, after you have dealt out one full shoe of eight decks, you will have provided a distinctive fingerprint about your shuffling algorithm.

To shuffle several decks together using a PRNG without revealing a lot of information about the shuffle algorithm you happen to be using, you would need to switch to a generator with a much larger state space than 8-bit RC4. For example, the 10-bit version of RC4 has 102639 states. If you seed with a sufficiently large number of physically random bits it could pass as an unpredictable shuffler for an eight-deck shoe of cards.

It is hard to be a good card dealer.

Posted by David at January 30, 2010 08:06 PM
Comments

What is the licensing on your code: seedrandom.js

There is no copyright or license information (e.g. MIT, BSD, LGPL, GPL, Public Domain, etc). I would like to use it, but without any licensing info its not very usable.

Thanks
Brad

Posted by: Brad at March 9, 2010 07:31 PM

I've posted a version under BSD license now. This version also:

- adds an optional second argument, which, if set, lets you mix an explicit seed with the entropy pool without the slowness of other local entropy.
- changes the smearing multiplier to 19 which loses fewer bits of information if you initialize it with a long hexidecimal ASCII string.

On initializing without losing bits - a good way to initialize with a truly random string seed is to not limit yourself to printable characters but to use the full range of character values, i.e., Math.seedrandom('\xF9\x7F\x00\x1E...').

Posted by: David at March 11, 2010 06:40 AM
Post a comment









Remember personal info?