Menu - Top - Home - Donate to Me

Random Number Generator Recommendations for Applications

Peter Occil

Begun on Mar. 5, 2016; last updated on June 17, 2018.

Most apps that use random numbers care about either unpredictability or speed/high quality.

Introduction and Summary

Many applications rely on random number generators (RNGs); these RNGs include—

This document covers:

This document does not cover:

The following table summarizes the kinds of RNGs covered in this document:

Kind of RNG When to Use This RNG Examples
Cryptographic RNG In information security cases, or when speed is not a concern. /dev/urandom, BCryptGenRandom
Statistical RNG When information security is not a concern, but speed is. See also "Shuffling". xoroshiro128+, xorshift128+
Seeded PRNG When generating reproducible results in a way not practical otherwise. High-quality PRNG with custom seed

Contents

Definitions

The following definitions are helpful in better understanding this document.

Cryptographic RNGs

Cryptographic RNGs (also known as "cryptographically strong" or "cryptographically secure" RNGs) seek to generate random numbers that are cost-prohibitive to predict. Such RNGs are indispensable in information security contexts, such as—

They are also useful when the application generates random numbers so infrequently that the RNG's speed is not a concern.

Quality

A cryptographic RNG implementation generates uniformly distributed random bits such that it would be at least cost-prohibitive for an outside party to guess either prior or future unseen bits of the random sequence correctly with more than a 50% chance per bit, even with knowledge of the randomness-generating procedure, the implementation's internal state at the given point in time, and/or extremely many outputs of the RNG. (If the sequence was generated directly by a PRNG, ensuring future bits are unguessable this way should be done wherever the implementation finds it feasible; for example, see "Seeding and Reseeding".)

Seeding and Reseeding

If a cryptographic RNG implementation uses a PRNG, the following requirements apply.

The PRNG's state length must be at least 128 bits and should be at least 256 bits. The security strength used by the RNG must be at least 112 bits, should be at least 128 bits, and may equal the PRNG's state length.

Before an instance of the RNG generates a random number, it must have been initialized ("seeded") with a seed defined as follows. The seed—

The RNG should be reseeded, using a newly generated seed as described earlier, to help ensure the unguessability of its output. If the implementation reseeds, it should do so as often as feasible (whenever doing so would not slow down applications undesirably). If the RNG reseeds if it would generate more than a threshold number of bits without reseeding, that threshold should be 267 or less.

Nondeterministic Sources

A cryptographic RNG ultimately relies on one or more nondeterministic sources for random number generation.(2) Examples of nondeterministic sources are—

A value called entropy measures how hard it is to predict a nondeterministic source's output, compared to ideal random data; this is generally the size in bits of the ideal random data. (For example, a 64-bit output with 32 bits of entropy is as hard to predict as an ideal random 32-bit data block.) NIST SP 800-90B recommends min-entropy as the entropy measure and also details how nondeterministic sources can be used for information security.

If a cryptographic RNG implementation uses a PRNG, the output of the strongest nondeterministic source used to derive a seed ought to have as many bits of entropy as the security strength. If the implementation does not use a PRNG, the output of the strongest nondeterministic source used to derive an RNG output ought to have as many bits of entropy as the RNG output's size in bits.

Examples

Examples of cryptographic RNG implementations include the following:

Statistical RNGs

Statistical RNGs are used, for example, in simulations, numerical integration, and many games to bring an element of chance and variation to the application, with the goal that each possible outcome is equally likely. However, statistical RNGs are generally suitable only if—

If more than 20 items are being shuffled, a concerned application would be well advised to use alternatives to this kind of implementation (see "Shuffling").

A statistical RNG is usually implemented with a PRNG, but can also be implemented in a similar way as a cryptographic RNG provided it remains reasonably fast.

Quality

A statistical RNG generates random bits, each of which is uniformly distributed independently of the other bits, at least for nearly all practical purposes. If the implementation uses a PRNG, that PRNG algorithm must either satisfy the one-way property or be significantly more likely than not to pass all tests (other than MatrixRank and LinearComp) of BigCrush, part of L'Ecuyer and Simard's "TestU01". The RNG need not be perfectly equidistributed.

Seeding and Reseeding

If a statistical RNG implementation uses a PRNG, the following requirements apply.

The PRNG's state length must be at least 64 bits, should be at least 128 bits, and is encouraged to be as high as the implementation can go to remain reasonably fast for most applications.

Before an instance of the RNG generates a random number, it must have been initialized ("seeded") with a seed described as follows. The seed—

The implementation is encouraged to reseed itself from time to time (using a newly generated seed as described earlier), especially if the PRNG has a state length less than 238 bits. If the RNG reseeds if it would generate more than a threshold number of values without reseeding, that threshold should be the PRNG's period's square root or less.

Examples and Non-Examples

Examples of statistical RNGs include the following:

The following also count as statistical RNGs, but are not preferred:

Non-examples include the following:

Seeded PRNGs

In addition, some applications use pseudorandom number generators (PRNGs) to generate results based on apparent randomness, starting from a known initial state, or "seed". Such applications usually care about reproducible results. (Note that in the definitions for cryptographic and statistical RNGs given earlier, the PRNGs involved are automatically seeded before use.)

Seeding Recommendations

An application should use a PRNG with a seed it specifies (rather than an automatically-initialized PRNG or another kind of RNG) only if—

  1. the initial state (the seed) which the "random" result will be generated from—
    • is hard-coded,
    • is derived from user-entered data,
    • is known to the application and was generated using a cryptographic or statistical RNG (as defined earlier),
    • is a verifiable random number (as defined later), or
    • is based on a timestamp (but only if the reproducible result is not intended to vary during the time specified on the timestamp and within the timestamp's granularity; for example, a year/month/day timestamp for a result that varies only daily),
  2. the application might need to generate the same "random" result multiple times,
  3. the application either—
    • makes the seed (or a "code" or "password" based on the seed) accessible to the user, or
    • finds it impractical to store or distribute the "random" numbers or results (rather than the seed) for later use, such as—
      • by saving the result to a file,
      • by storing the random numbers for the feature generating the result to "replay" later, or
      • by distributing the results or the random numbers to networked users as they are generated, and
  4. any feature using that random number generation method to generate that "random" result will remain backward compatible with respect to the "random" results it generates, for as long as that feature is still in use by the application.

Meeting recommendation 4 is aided by using stable PRNGs; see "Definitions" and the following examples:

Recommendations for Seeded PRNGs

Which PRNG to use for generating reproducible results depends on the application. But as recommendations, any PRNG algorithm selected for producing reproducible results—

Examples

Custom seeds can come into play in the following situations, among others.

Games

Many kinds of games generate game content based on apparent randomness, such as—

where the game might need to generate the same content of that kind multiple times.

In general, such a game should use a PRNG with a custom seed for such purposes only if—

  1. generating the random content uses relatively many random numbers (say, more than a few thousand), and the application finds it impractical to store or distribute the content or the numbers for later use (see recommendations 2 and 3), or
  2. the game makes the seed (or a "code" or "password" based on the seed, such as a barcode or a string of letters and digits) accessible to the player, to allow the player to regenerate the content (see recommendations 2 and 3).

Option 1 often applies to games that generate procedural terrain for game levels, since the terrain often exhibits random variations over an extended space. Option 1 is less suitable for puzzle game boards or card shuffling, since much less data needs to be stored.

Example: Suppose a game generates a map with random terrain and shows the player a "code" to generate that map. Under recommendation 4, the game—

Unit Testing

A custom seed is appropriate when unit testing a method that uses a seeded PRNG in place of another kind of RNG for the purpose of the test (provided the method meets recommendation 4).

Verifiable Random Numbers

Verifiable random numbers are random numbers (such as seeds for PRNGs) that are disclosed along with all the information necessary to verify their generation. Usually, of the information used to derive such numbers—

One process to generate verifiable random numbers is described in RFC 3797 (to the extent its advice is not specific to the Internet Engineering Task Force or its Nominations Committee). Although the source code given in that RFC uses the MD5 algorithm, the process does not preclude the use of hash functions stronger than MD5 (see the last paragraph of section 3.3 of that RFC).

Noise

Randomly generated numbers can serve as noise, that is, a randomized variation in images and sound. (See also Red Blob Games, "Noise Functions and Map Generation")(4). In general, the same considerations apply to any RNGs the noise implementation uses as in other cases.

However, a noise implementation that implements cellular noise, value noise, or gradient noise (such as Perlin noise) should, wherever feasible, use an RNG to initialize a table of gradients or hash values in advance, to be used later by the noise function (a function that outputs seemingly random numbers given an n-dimensional point). Those gradients or hash values may be "hard-coded" instead if the seeding recommendations apply to the noise generation (treating the hard-coded values as the seed).

If a cellular, value, or gradient noise implementation incorporates a hash function

Programming Language APIs

The following table lists application programming interfaces (APIs) for cryptographic and statistical RNGs for popular programming languages. Note the following:

Language Cryptographic Statistical Other
C/C++ (G) (C) xoroshiro128plus.c (128-bit nonzero seed); xorshift128plus.c (128-bit nonzero seed)
Python secrets.SystemRandom (since Python 3.6); os.urandom() ihaque/xorshift library (128-bit nonzero seed; default seed uses os.urandom()) random.getrandbits() (A); random.seed() (19,936-bit seed) (A)
Java (D) (C); java.security.SecureRandom (F) grunka/xorshift (XORShift1024Star or XORShift128Plus)
JavaScript crypto.randomBytes(byteCount) (node.js only) xorshift library Math.random() (ranges from 0 through 1) (B)
Ruby (C); SecureRandom class (require 'securerandom') Random#rand() (ranges from 0 through 1) (A) (E); Random#rand(N) (integer) (A) (E); Random.new(seed) (default seed uses nondeterministic data)
PHP random_int() (since PHP 7) mt_rand() (A)

(A) General RNG implements Mersenne Twister, which is not preferred for a statistical RNG. PHP's mt_rand() implements or implemented a flawed version of Mersenne Twister.

(B) JavaScript's Math.random is implemented using xorshift128+ in the latest V8 engine, Firefox, and certain other modern browsers as of late 2017; the exact algorithm to be used by JavaScript's Math.random is "implementation-dependent", though, according to the ECMAScript specification.

(C) See "Advice for New Programming Language APIs" for implementation notes for cryptographic RNG implementations.

(D) Java's java.util.Random class uses a 48-bit seed, so doesn't meet the statistical RNG requirements. However, a subclass of java.util.Random might be implemented to meet those requirements.

(E) In my opinion, Ruby's Random#rand method presents a beautiful and simple API for random number generation.

(F) At least in Unix-based systems, calling the SecureRandom constructor that takes a byte array is recommended. The byte array should be data described in note (C).

(G) std::random_device, introduced in C++11, is not recommended because its specification leaves considerably much to be desired. For example, std::random_device can fall back to a pseudorandom number generator of unspecified quality without much warning.

Advice for New Programming Language APIs

Wherever possible, applications should use existing libraries and techniques that already meet the requirements for cryptographic and statistical RNGs. For example—

If existing solutions are inadequate, a programming language API could implement cryptographic and statistical RNGs by filling an output byte buffer with random bytes, where each bit in each byte will be randomly set to 0 or 1. For instance, a C language API for such RNGs could look like the following: int random(uint8_t[] bytes, size_t size);, where "bytes" is a pointer to a byte array, and "size" is the number of random bytes to generate, and where 0 is returned if the method succeeds and nonzero otherwise. Any programming language API that implements such RNGs by filling a byte buffer ought to run in amortized linear time on the number of random bytes the API will generate.

Cryptographic and statistical RNG implementations—

My document on random number generation methods includes details on ten uniform random number methods; in my opinion, a new programming language's standard library ought to include those ten methods separately for cryptographic and for statistical RNGs. That document also discusses how to implement other methods to generate random numbers or integers that follow a given distribution (such as a normal, geometric, binomial, or discrete weighted distribution) or fall within a given range.

Shuffling

There are special considerations in play when applications use RNGs to shuffle a list of items.

Shuffling Method

The first consideration touches on the shuffling method. The Fisher–Yates shuffle method does a substantially unbiased shuffle of a list, assuming the RNG it uses can choose from among all permutations of that list. However, that method is also easy to mess up (see also Jeff Atwood, "The danger of naïveté"); I give a correct implementation in another document.

Choosing from Among All Permutations

The second consideration is present if the application uses PRNGs for shuffling. If the PRNG's period is less than the number of distinct permutations (arrangements) of a list, then there are some permutations that PRNG can't choose when it shuffles that list. (This is not the same as generating all permutations of a list, which, for a sufficiently large list size, can't be done by any computer in a reasonable time.)

The number of distinct permutations is the multinomial coefficient m! / (w1! × w2! × ... × wn!), where m is the list's size, n is the number of different items in the list, x! means "x factorial", and wi is the number of times the item identified by i appears in the list. (This reduces to n!, if the list consists of n different items.)

Formulas suggesting state lengths for PRNGs are implemented below in Python. For example, to shuffle a 52-item list, a PRNG with state length 226 or more is suggested, and to shuffle two 52-item lists of identical contents together, a PRNG with state length 500 or more is suggested.

def fac(x):
    """ Calculates factorial of x. """
    if x<=1: return 1
    ret=1
    for i in range(x): ret=ret*(i+1)
    return ret

def ceillog2(x):
    """ Calculates base-2 logarithm of x, rounded up. """
    ret=0
    needCeil=True
    while x>1:
       one=needCeil and ((x&1)!=0)
       x=x>>1
       if one:
         ret+=1; needCeil=False
       ret+=1
    return ret

def stateLengthN(n):
  """ Suggested state length for PRNGs that shuffle
    a list of n items. """
  return ceillog2(fac(n))

def stateLengthNChooseK(n, k):
  """ Suggested state length for PRNGs that choose k
   different items randomly from a list of n items
   (see RFC 3797, sec. 3.3) """
  return ceillog2(fac(n)/(fac(k)*fac(n-k)))

def stateLengthDecks(numDecks, numCards):
  """ Suggested state length for PRNGs that shuffle
    multiple decks of cards in one. """
  return ceillog2(fac(numDecks*numCards)/ \
      (fac(numDecks)**numCards))

Whenever a statistical RNG implementation or seeded PRNG is otherwise called for, an application is encouraged to choose a PRNG with a state length suggested by the formulas above (and with the highest feasible period for that state length), where the choice of PRNG is based on—

(Practically speaking, for sufficiently large list sizes, any given PRNG will not be able to randomly choose some permutations of the list. See also "Lack of randomness" in the BigDeal document by van Staveren.)

The PRNG chosen this way should meet or exceed the quality requirements of a statistical RNG implementation.

Hash Functions

A seemingly random number can be generated from arbitrary data using a hash function.

A hash function is a function that takes an arbitrary input of any size (such as a sequence of bytes or a sequence of characters) and returns an output with a fixed size. That output is also known as a hash code. (By definition, hash functions are deterministic. The definition includes a PRNG that takes the input as a seed and outputs a random number of fixed size(5).)

A hash code can be used as follows:

For such purposes, applications should choose hash functions designed such that every bit of the input affects every bit of the output without a clear preference for 0 or 1 (the so-called avalanche property). Hash functions used in information security contexts should be designed such that finding an unknown second input that leads to the same output as that of a given input is cost-prohibitive (the one-way property) and so is finding an unknown input that leads to a given output (collision resistance), and should have other information security properties depending on the application.

GPU Programming Environments

Because, in general, GL Shading Language (GLSL) and other programming environments designed for execution on a graphics processing unit (GPU)—

random number generators for such environments are often designed as hash functions, because their output is determined solely by the input rather than both the input and state (as with PRNGs). Moreover, some of the hash functions which have been written in GLSL give undesirable results in computers whose GPUs support only 16-bit binary floating point numbers and no other kinds of numbers, which makes such GPUs an important consideration when choosing a hash function.

Motivation

In this document, I made the distinction between statistical and cryptographic RNGs because that is how programming languages often present random number generators — they usually offer a general-purpose RNG (such as C's rand or Java's java.util.Random) and sometimes an RNG intended for information security purposes (such as java.security.SecureRandom).

What has motivated me to write a more rigorous definition of random number generators is the fact that many applications still use weak RNGs. In my opinion, this is largely because most popular programming languages today—

Conclusion

Random numbers that merely "look random" are not enough for most applications. That is why this document defines cryptographic RNGs and statistical RNGs; I believe RNGs that meet either category will fulfill the expectations of many applications as regards random numbers. In general:

In addition, this document recommends using cryptographic RNGs in many cases, especially in information security contexts, and recommends easier programming interfaces for both cryptographic and statistical RNGs in new programming languages.

I acknowledge—

Request for Comments

Feel free to send comments. They could help improve this page.

Comments on any aspect of the document are welcome, but answers to the following would be particularly appreciated.

Notes

(1) If the software and/or hardware uses a nonuniform distribution, but otherwise meets this definition, it can be converted to use a uniform distribution, at least in theory, using unbiasing or randomness extraction methods that it is outside the scope of this document to describe.

(2) Nondeterministic sources that are reasonably fast for most applications (for instance, by enabling very many seeds to be generated per second), especially sources implemented in hardware, are highly advantageous in a cryptographic RNG.

(3) Timestamps with millisecond or coarser granularity are not encouraged, however, because multiple instances of a PRNG automatically seeded with a timestamp, when they are created at about the same time, run the risk of starting with the same seed and therefore generating the same sequence of random numbers.

(4) Noise implementations include cellular noise, value noise, gradient noise, colored noise (including white noise and pink noise), and noise following a Gaussian or other probability distribution. A noise implementation can use fractional Brownian motion to combine several layers of cellular, value, or gradient noise by calling the underlying noise function several times.

Note that usual implementations of noise (other than cellular, value, or gradient noise) don't sample each point of the sample space more than once; rather, all the samples are generated (e.g., with an RNG), then, for colored noise, a filter is applied to the samples.

(5) Note that some PRNGs (such as xorshift128+) are not well suited to serve as hash functions, because they don't mix their state before generating a random number from that state.

License

This page is licensed under Creative Commons Zero.