Menu - Top - Home - Donate to Me

Random Number Generator Recommendations for Applications

Peter Occil

Most apps that use random numbers care about either unpredictability, high quality, or repeatability. This article explains the three kinds of RNGs and gives recommendations on each kind.

Introduction

Many applications rely on random number generators (RNGs); however, it's not enough for random numbers to merely "look random". But unfortunately, most popular programming languages today—

so that as a result, many applications use RNGs, especially built-in RNGs, that have little assurance of high quality or security. That is why this document discusses high-quality RNGs and suggests existing implementations of them.

This document covers:

This document does not cover:

About This Document

This is an open-source document; for an updated version, see the source code or its rendering on GitHub. You can send comments on this document either on CodeProject or on the GitHub issues page.

Contents

Definitions

In this document:

Summary

Cryptographic RNGs

Cryptographic RNGs (also known as "cryptographically strong" or "cryptographically secure" RNGs) seek to generate random numbers that not only "look random", but are cost-prohibitive to guess. An application should use a cryptographic RNG whenever the application—

See "Cryptographic RNGs: Requirements" for requirements.
See "Existing RNG APIs in Programming Languages" for existing APIs.
For cryptographic RNGs, an application should use only one thread-safe instance of the RNG for the entire application to use.

Examples: A cryptographic RNG is recommended—

Noncryptographic PRNGs

Noncryptographic PRNGs vary widely in the quality of randomness of the numbers they generate. For this reason, a noncryptographic PRNG should not be used—

Noncryptographic PRNGs can be automatically seeded (a new seed is generated upon PRNG creation) or manually seeded (the PRNG uses a predetermined seed).

Manually-Seeded PRNGs

A given pseudorandom number generator (PRNG) generates the same sequence of "random" numbers for the same "seed". Some applications care about reproducible "randomness" and thus could set a PRNG's seed manually for reproducible "random" numbers.

When to Use a Manually-Seeded PRNG

By seeding a PRNG manually for reproducible "randomness", an application will be tied to that PRNG or its implementation. For this reason, an application should not use a manually-seeded PRNG (rather than a cryptographic or automatically-seeded RNG) unless—

  1. the application might need to generate the same "random" result multiple times,
  2. the application either—
    • makes the seed (or a "code" or "password" based on the seed) accessible to the user, or
    • finds it impractical to store or distribute the "random" numbers or "random" content, rather than the seed, for later use (e.g., to store those numbers to "replay" later, to store that content in a "save file", or to distribute that content rather than a seed to networked users), and
  3. any feature that uses such a PRNG to generate that "random" result is reproducible, in that it produces the same "random" result for the same seed for as long as the feature is still in use by the application.

Manually-Seeded PRNG Recommendations

If an application chooses to use a manually-seeded PRNG for reproducible "randomness", the application—

For advice on generating seeds for the PRNG, see "Seed Generation for Noncryptographic PRNGs").

Example: An application could implement a manually-seeded PRNG using a third-party library that specifically says it implements a high-quality PRNG algorithm, and could initialize that PRNG using a bit sequence from a cryptographic RNG. The developers could also mention the use of the specific PRNG chosen on any code that uses it, to alert other developers that the PRNG needs to remain unchanged.

Manually-Seeded PRNG Use Cases

Use cases for manually-seeded PRNGs include the following:

Manually-Seeded PRNGs in Games

Many kinds of game software generate seemingly "random" game content that might need to be repeatedly regenerated, such as—

In general, the bigger that "random" content is, the greater the justification to use a manually-seeded PRNG and a custom seed to generate that content. The following are special cases:

  1. If the game needs reproducible "random" content only at the start of the game session (e.g., a "random" game board or a "random" order of virtual cards) and that content is small (say, no more than a hundred numbers):
    • The game should not use a manually-seeded PRNG unless the seed is based on a "code" or "password" entered by the user. This is a good sign that the game ought to store the "random" content instead of a seed.
  2. In a networked game where multiple computers (e.g., multiple players, or a client and server) have a shared view of the game state and random numbers are used to update that game state:
    • The game should not use a manually-seeded PRNG where predicting a random outcome could give a player a significant and unfair advantage (e.g., the random number is the result of a die roll, or the top card of the draw pile, for a board or card game). The game may use such a PRNG in other cases to ensure the game state is consistent among computers, including in physics simulations and AI.

Examples:

  1. Suppose a game generates a map with random terrain (which uses lots of random numbers) and shows the player a "code" to generate that map (such as a barcode or a string of letters and digits). In this case, the game—

    • may change the algorithm it uses to generate random maps, but
    • should use, in connection with the new algorithm, "codes" that can't be confused with "codes" it used for previous algorithms, and
    • should continue to generate the same random map using an old "code" when the player enters it, even after the change to a new algorithm.
  2. Suppose a game implements a chapter that involves navigating a randomly generated dungeon with randomly scattered monsters and items. If the layout of the dungeon, monsters, and items has to be the same for a given week and for all players, the game can seed a PRNG with a hash code generated from the current week, the current month, the current year, and, optionally, a constant sequence of bits.

Single Random Value

If an application requires only one random value, with a fixed number of bits, then the application can pass the seed to a hash function rather than a PRNG. Examples of this include the following:

Ensuring Reproducibility

To ensure that a manually-seeded PRNG delivers reproducible "random" numbers across computers, across runs, and across application versions, an application needs to take special care. Reproducibility is often not achievable if the application relies on features or behavior outside the application's control, including any of the following:

Thus, an application ought to use manually-seeded PRNGs only when necessary, to minimize the need for reproducible "randomness". Where reproducibility is required, the application ought to avoid floating-point numbers, nondeterministic features, and other behavior outside its control, and ought to stick to the same versions of algorithms it uses.

As for reproducible PRNGs, java.util.Random is one example of a PRNG with consistent behavior, but none of the following is such a PRNG:

Nondeterministic Sources and Seed Generation

RNGs ultimately rely on so-called nondeterministic sources; without such sources, no computer can produce random numbers.

What Is a Nondeterministic Source?

A nondeterministic source is a source that doesn't give the same output for the same input each time (for example, a clock that doesn't always give the same time). There are many kinds of them, but sources useful for random number generation have hard-to-guess output (that is, they have high entropy; see the next section). They include—

RFC 4086, "Randomness Requirements for Security", section 3, contains a survey of nondeterministic sources.

Note: Online services that make random numbers available to applications, as well as the noise registered by microphone and camera recordings (see RFC 4086 sec. 3.2.1, (Liebow-Feeser 2017a)(12), and (Liebow-Feeser 2017b)(13)), are additional nondeterministic sources. However, online services require Internet or other network access, and some of them require access credentials. Also, many mobile operating systems require applications to declare network, camera, and microphone access to users upon installation. For these reasons, these kinds of sources are not recommended if other approaches are adequate.

Example: A program could ask users to flip coins or roll dice and type in their results. If users do so, the results typed this way will have come from nondeterministic sources (here, coins or dice).

What Is Entropy?

Entropy is a value that describes how hard it is to guess a nondeterministic source's output, compared to ideal random data; this is generally the size in bits of the ideal random data. (For example, a 64-bit output with 32 bits of entropy is as hard to guess as an ideal random 32-bit data block.) NIST SP 800-90B recommends min-entropy as the entropy measure. Characterizing a nondeterministic source's entropy is nontrivial and beyond the scope of this document. See also RFC 4086 section 2.

Seed Generation

In general, there are two steps to generate an N-bit seed for a PRNG(14):

  1. Gather enough data from independent nondeterministic sources to reach N bits of entropy or more.
  2. Then, condense the data into an N-bit number, a process called randomness extraction.(15)

Randomness extraction is discussed in NIST SP 800-90B sec. 3.1.5.1, and RFC 4086 sec. 4.2 and 5.2. Examples of extractors include keyed hash functions (see, e.g., (Cliff et al., 2009)(16)), von Neumann unbiasing (von Neumann 1951)(17), and a streaming version by (Zhou and Bruck 2012)(18). In information security applications, extractors other than keyed cryptographic hash functions should not be used by themselves in randomness extraction.

Example: The Cliff reference reviewed the use of HMAC (hash-based message authentication code) algorithms, and implies that one way to generate a seed is as follows:

  1. Gather data with at least 512 bits of entropy.
  2. Run HMAC-SHA-512 with that data to generate a 512-bit HMAC.
  3. Take the first 170 (or fewer) bits as the seed (512 divided by 3, rounded down).

Seed Generation for Noncryptographic PRNGs

In general, to generate a seed allowed by a noncryptographic PRNG, an application ought to use a cryptographic RNG or a method described in the previous section.

It is not recommended to seed PRNGs with timestamps, since they can carry the risk of generating the same "random" number sequence accidentally.(19)

Seeding Multiple Processes

Some applications require multiple processes (including threads, tasks, or subtasks) to use reproducible "random" numbers for the same purpose. An example is multiple instances of a simulation with random starting conditions. However, noncryptographic PRNGs tend to produce number sequences that are correlated to each other, which is undesirable for simulations in particular.

To reduce this correlation risk, the application can choose a high-quality PRNG that supports streams of uncorrelated sequences (sequences that behave like independent random number sequences and don't overlap) and has an efficient way to assign a different stream to each process. For example, in some PRNGs, these streams can be formed—

Multiple processes can be seeded for random number generation as follows.(21)

  1. Stream case. If the PRNG supports streams as described above: Generate a seed (or use a predetermined seed), then:

    1. Create a PRNG instance for each process.
    2. Hash the seed and a fixed identifier to generate a new seed allowed by the PRNG.
    3. For each process, advance the PRNG to the next stream (unless it's the first process), then give that process a copy of the PRNG's current internal state.
  2. General case. For other PRNGs, or if each process uses a different PRNG design, the following is a way to seed multiple processes for random number generation, but it carries the risk of generating seeds that lead to overlapping, correlated, or even identical number sequences, especially if the processes use the same PRNG.(22) Generate a seed (or use a predetermined seed), then:

    1. Create a PRNG instance for each process. The instances need not all use the same PRNG design or the same parameters; for example, some can be SFC64 and others xoroshiro128**.
    2. For each process, hash the seed, a unique number for that process, and a fixed identifier to generate a new seed allowed by the process's PRNG, and initialize that PRNG with the new seed.
  3. Leapfrogging (Bauke and Mertens 2007)(23). The following is an alternative way to initialize a PRNG for each process if the number of processes (N) is small. Generate a seed (or use a predetermined seed), then:

    1. Create one PRNG instance. Hash the seed and a fixed identifier to generate a new seed allowed by the PRNG.
    2. Give each process a copy of the PRNG's state. Then, for the second process, discard 1 output from its PRNG; for the third process, discard 2 outputs from its PRNG; and so on.
    3. Now, whenever a PRNG created this way produces an output, it then discards the next N minus 1 outputs before finishing.

Note: The steps above include hashing several things to generate a new seed. This has to be done with either a hash function of N or more bits (where N is the PRNG's maximum seed size), or a so-called "seed sequence generator" like C++'s std::seed_seq.(24)

Examples:

  1. Philox4×64-7 is a counter-based PRNG that supports one stream per seed. To seed two processes based on the seed "seed" and this PRNG, an application can—

    • take the SHA2-256 hash of "seed-mysimulation" as a new seed,
    • initialize the first process's PRNG with the new seed and a counter of 0, and
    • initialize the second process's PRNG with 1 plus the new seed and a counter of 0.
  2. Some dynamic threading (task-parallel) platforms employ task schedulers where tasks or subtasks (sometimes called strands or fibers) are not assigned to a particular operating system process or thread. To ensure reproducible "randomness" in these platforms, PRNGs have to be assigned to tasks (rather than system processes or threads) and are not shared between tasks, and each task's PRNG can be initialized as given in the "general case" steps above (where the task's unique number is also known as a pedigree) (Leierson et al., 2012)(10).

Existing RNG APIs in Programming Languages

As much as possible, applications should use existing libraries and techniques for cryptographic and high-quality RNGs. The following table lists application programming interfaces (APIs) for such RNGs for popular programming languages.

Language Cryptographic High-Quality
.NET (incl. C# and VB.NET) (H) RandomNumberGenerator.Create() in System.Security.Cryptography namespace; airbreather/AirBreather.Common library (CryptographicRandomGenerator) XoshiroPRNG.Net package (XoRoShiRo128starstar, XoShiRo256plus, XoShiRo256starstar); Data.HashFunction.MurmurHash or Data.HashFunction.CityHash package (hash the string seed + "_" + counter)
C/C++ (G) (C) xoroshiro128plusplus.c; xoshiro256starstar.c
Python (A) secrets.SystemRandom (since Python 3.6); os.urandom() ihaque/xorshift library (default seed uses os.urandom()); numpy.random.Generator with Philox or SFC64 (since ver. 1.7); hashlib.md5(b"%d_%d" % (seed, counter)).digest(), hashlib.sha1(b"%d_%d" % (seed, counter)).digest()
Java (A) (D) (C); java.security.SecureRandom (F) it.unimi.dsi/dsiutils artifact (XoRoShiRo128PlusPlusRandom, XoRoShiRo128StarStarRandom, XoShiRo256StarStarRandom, XorShift1024StarPhiRandom); org.apache.commons/commons-rng-simple artifact (RandomSource of SFC_64, XO_RO_SHI_RO_128_PP, XO_RO_SHI_RO_128_SS, XO_SHI_RO_256_PP, or XO_SHI_RO_256_SS)
JavaScript (B) crypto.randomBytes(byteCount) (node.js only); random-number-csprng package (node.js only); crypto.getRandomValues() (Web) xoroshiro128starstar package; md5 package (md5(seed+"_"+counter, {asBytes: true})); murmurhash3js package (murmurhash3js.x86.hash32(seed+"_"+counter)); crypto.createHash("sha1") (node.js only)
Ruby (A) (E) (C); SecureRandom.rand() (0 or greater and less than 1) (E); SecureRandom.rand(N) (integer) (E) (for both, require 'securerandom'); sysrandom gem Digest::MD5.digest("#{seed}_#{counter}"), Digest::SHA1.digest("#{seed}_#{counter}") (for both, require 'digest')
PHP (A) random_int(), random_bytes() (both since PHP 7) md5($seed.'_'.$counter, true); sha1($seed.'_'.$counter, true)
Go crypto/rand package md5.Sum in crypto/md5 package or sha1.Sum in crypto/sha1 package (for both, hash the byte array seed + "_" + counter)
Rust (C) rand_xoshiro crate (Xoroshiro128PlusPlus, Xoshiro256PlusPlus, Xoshiro256StarStar, Xoshiro512StarStar)
Perl Crypt::URandom module Crypt::Digest::MD5 module (md5($seed.'_'.$counter)); Digest::SHA module (sha1($seed.'_'.$counter)); Digest::MurmurHash3 module (murmurhash3($seed.'_'.$counter))
Other Languages (C) Hash the string seed + "_" + counter with MurmurHash3, xxHash64, CityHash, MD5, or SHA-1

(A) The general RNGs of recent versions of Python and Ruby implement Mersenne Twister, which is not preferred for a high-quality RNG. PHP's mt_rand() implements or implemented a flawed version of Mersenne Twister.

(B) JavaScript's Math.random() (which ranges 0 or greater and less than 1) is implemented using xorshift128+ (or a variant) in the V8 engine, Firefox, and certain other modern browsers as of late 2017; Math.random() uses an "implementation-dependent algorithm or strategy", though (see ECMAScript sec. 20.2.2.27).

(C) A cryptographic RNG implementation can—

and only use other techniques if the existing ones are inadequate for the application. But unfortunately, resource-constrained devices ("embedded" devices) are much less likely to have a cryptographic RNG available compared to general-purpose computing devices such as desktop computers and smartphones (Wetzels 2017)(26), although methods exist for implementing a cryptographic RNG on the Arduino (Peng 2017)(27).

(D) Java's java.util.Random class uses a 48-bit seed, so is not considered a high-quality RNG. However, a subclass of java.util.Random might be implemented as a high-quality RNG.

(E) Ruby's SecureRandom.rand method presents a beautiful and simple API for random number generation, in my opinion. Namely, rand() returns a number 0 or greater and less than 1, and rand(N) returns an integer 0 or greater and less than N.

(F) In Java 8 and later, use SecureRandom.getInstanceStrong(). In Java earlier than 8, call SecureRandom.getInstance("NativePRNGNonBlocking") or, if that fails, SecureRandom.getInstance("NativePRNG"). For Android, especially versions 4.3 and earlier, see (Klyubin 2013)(28). Using the SecureRandom implementation "SHA1PRNG" is not recommended, because of weaknesses in seeding and RNG quality in implementations as of 2013 (Michaelis et al., 2013)(29).

(G) std::random_device was introduced in C++11, but its specification leaves considerably much to be desired. For example, std::random_device can fall back to a PRNG of unspecified quality without much warning. At best, std::random_device should not be used except to supplement other techniques for random number generation.

(H) The .NET Framework's System.Random class uses a seed of at most 32 bits, so is not considered a high-quality RNG. However, a subclass of System.Random might be implemented as a high-quality RNG.

Hash Functions

A hash function is a function that takes an arbitrary input of any size (such as an array of 8-bit bytes or a sequence of characters) and returns an output with a fixed number of bits. That output is also known as a hash code.

For random number generation purposes:

The use of hash functions for other purposes (such as data lookup and data integrity) is beyond the scope of this document. See my note on hash functions.

Procedural Noise Functions

Noise is a randomized variation in images, sound, and other data.(30)

A noise function is similar to a hash function; it takes an n-dimensional point and, optionally, additional data, and outputs a seemingly random number.(31) Noise functions generate procedural noise such as cellular noise, value noise, and gradient noise (including Perlin noise). If the noise function takes additional data, that data—

Pseudorandom Functions

A pseudorandom function is a kind of hash function that takes—

and outputs a seemingly random number. (If the output is encryption keys, the function is also called a key derivation function; see NIST SP 800-108.) Some pseudorandom functions deliberately take time to compute their output; these are designed above all for cases in which the secret is a password or is otherwise easy to guess — examples of such functions include PBKDF2 (RFC 2898), scrypt (RFC 7914), and Ethash. Pseudorandom functions are also used in proofs of work such as the one described in RFC 8019 sec. 4.4.

RNG Topics

This section discusses several important points on the use and selection of RNGs, including things to consider when shuffling or generating "unique" random numbers.

Shuffling

In a list with N different items, there are N factorial (that is, 1 * 2 * ... * N, or N!) ways to arrange the items in that list. These ways are called permutations(32).

In practice, an application can shuffle a list by doing a Fisher–Yates shuffle, which is unfortunately easy to mess up — see (Atwood 2007)(33) — and is implemented correctly in another document of mine.

However, if a PRNG admits fewer seeds than the number of permutations, then there are some permutations that that PRNG can't choose when it shuffles that list. (This is not the same as generating all permutations of a list, which, for a list big enough, can't be done by any computer in a reasonable time.)

On the other hand, for a list big enough, it's generally more important to have shuffles act random than to choose from among all permutations.

An application that shuffles a list can do the shuffling—

  1. using a cryptographic RNG, preferably one with a security strength of b bits or greater, or
  2. if a noncryptographic RNG is otherwise appropriate, using a high-quality PRNG that—
    • admits b-bit seeds without shortening or compressing those seeds, and
    • is initialized with a seed derived from data with at least b bits of entropy, or "randomness".

For shuffling purposes, b can usually be calculated by taking n factorial minus 1 (where n is the list's size) and calculating its bit length. A Python example is b = (math.factorial(n)-1).bit_length(). See also (van Staveren 2000, "Lack of randomness")(34). For shuffling purposes, an application may limit b to 256 or greater, in cases when variety of permutations is not important. For other sampling tasks, the following Python examples show how to calculate b:

Unique Random Identifiers

Some applications require generating unique identifiers, especially to identify database records or other shared resources. Examples of unique values include auto-incremented numbers, sequentially assigned numbers, primary keys of a database table, and combinations of these. Applications have also generated unique values at random.

The following are some questions to consider when generating unique identifiers:

  1. Can the application easily check identifiers for uniqueness within the desired scope and range (e.g., check whether a file or database record with that identifier already exists)(35)?
  2. Can the application tolerate the risk of generating the same identifier for different resources(36)?
  3. Do identifiers have to be hard to guess, be simply "random-looking", or be neither?
  4. Do identifiers have to be typed in or otherwise relayed by end users(37)?
  5. Is the resource an identifier identifies available to anyone who knows that identifier (even without being logged in or authorized in some way)?(38)
  6. Do identifiers have to be memorable?

Some applications may also care about "unique random" values. Generally, however, values that are both unique and random are impossible. Thus, applications that want "unique random" values have to either settle for numbers that merely "look random"; or check for or tolerate possible duplicates; or pair random numbers with unique ones.

If the application can settle for "random-looking" unique integers, it can produce a unique N-bit integer and apply any of the following to that integer:

  1. A function that maps N-bit integers to N-bit integers in a reversible way (also called a mixing function with reversible operations; see "Hash functions" by B. Mulvey). This includes using the unique integer as the seed for a "full-period" linear PRNG that cycles through all N-bit integers exactly once(39).
  2. If unique integers 0 or greater, but less than K, are desired, choose an N-bit function described earlier, where N is the number of bits needed to store the number K-minus-1, and discard all outputs that are K or greater.

An application that generates unique identifiers should do so as follows:

This section doesn't discuss how to format a unique value into a text string (such as a hexadecimal or alphanumeric string), because ultimately, doing so is the same as mapping unique values one-to-one with formatted strings (which will likewise be unique).

Verifiable Random Numbers

Verifiable random numbers are random numbers (such as seeds for PRNGs) that are disclosed along with all the information necessary to verify their generation. Usually, such information includes random numbers and/or uncertain data to be determined and publicly disclosed in the future. Techniques to generate verifiable random numbers (as opposed to cryptographic RNGs alone) are used whenever one party alone can't be trusted to produce a number at random. Verifiable random numbers that are disclosed publicly should not be used as encryption keys or other secret parameters.

Examples:

  1. Generating verifiable randomness has been described in RFC 3797, which describes the selection process for the Nominations Committee (NomCom) of the Internet Engineering Task Force.
  2. Verifiable delay functions calculate an output as well as a proof that the output was correctly calculated; these functions deliberately take much more time to calculate the output (e.g., to generate a seemingly random number from public data) than to verify its correctness.(40) In many cases, such a function deliberately takes much more time than the time allowed to contribute randomness to that function.(41)
  3. In a so-called commitment scheme, one computer generates data to be committed (e.g. a random number or a chess move), then reveals its hash code or digital signature (commitment), and only later reveals to all participants the committed data (along with other information needed, if any, to verify that the data wasn't changed in between). Examples of commitment schemes are hash-based commitments.(41)
  4. So-called mental card game (mental poker) schemes can be used in networked games where a deck of cards has to be shuffled and dealt to players, so that the identity of some cards is known to some but not all players.(41)

Guidelines for New RNG APIs

This section contains guidelines for those seeking to implement RNGs designed for wide reuse (such as in a programming language's standard library). As mentioned earlier, an application should use existing RNG implementations whenever possible.

This section contains suggested requirements on cryptographic and high-quality RNGs that a new programming language can choose to adopt.

Cryptographic RNGs: Requirements

A cryptographic RNG generates random bits that behave like independent uniform random bits, such that an outside party has no more than negligible advantage in correctly guessing prior or future unseen output bits of that RNG even after knowing how the RNG works and/or extremely many outputs of the RNG, or prior unseen output bits of that RNG after compromising its security, such as reading its internal state.(42)

If a cryptographic RNG implementation uses a PRNG:

A cryptographic RNG is not required to reseed itself.

Examples: The following are examples of cryptographic RNGs:

High-Quality RNGs: Requirements

A PRNG is a high-quality RNG if—

The high-quality PRNG should admit any of 2127 or more seeds.

Every cryptographic RNG is also a high-quality RNG.

Examples: Examples of high-quality PRNGs include xoshiro256**, xoroshiro128**, xoroshiro128++, Philox4×64-7, and SFC64. I give additional examples in a separate page.

Designs for PRNGs

The following are some ways a PRNG can be implemented:

Implementing New RNG APIs

A programming language API designed for reuse by applications could implement RNGs using the following guidelines:

  1. The RNG API can include a method that fills one or more memory units (such as 8-bit bytes) completely with random bits. See example 1.
  2. If the API implements an automatically-seeded RNG, it should not allow applications to initialize that same RNG with a seed for reproducible "randomness"(46) (it may provide a separate PRNG to accept such a seed). See example 2.
  3. If the API provides a PRNG that an application can seed for reproducible "randomness", it should document that PRNG and any methods the API provides that use that PRNG (such as shuffling and Gaussian number generation), and should not change that PRNG or those methods in a way that would change the "random" numbers they deliver for a given seed. See example 2.
  4. A new programming language's standard library ought to include the following methods for generating numbers that behave like independent uniform random numbers (see my document on random number generation methods for details).
    • Four methods for random integers: 0 to n including n, 0 to n excluding n, a to b including b, and a to b excluding b.
    • Four methods for random numbers bounded by 0 and 1, with and without the endpoints.
    • Four methods for random numbers bounded by a and b, with and without the endpoints.

 

Examples:

  1. A C language RNG method for filling memory could look like the following: int random(uint8_t[] bytes, size_t size);, where bytes is a pointer to an array of 8-bit bytes, and size is the number of random 8-bit bytes to generate, and where 0 is returned if the method succeeds and nonzero otherwise.
  2. A Java API that follows these guidelines can contain two classes: a RandomGen class that implements an unspecified but general-purpose RNG, and a RandomStable class that implements an SFC64 PRNG that is documented and will not change in the future. RandomStable includes a constructor that takes a seed for reproducible "randomness", while RandomGen does not. Both classes include methods described in point 4, but RandomStable specifies the exact algorithms to those methods and RandomGen does not. At any time in the future, RandomGen can change its implementation to use a different RNG while remaining backward compatible, while RandomStable has to use the same algorithms for all time to remain backward compatible, especially because it takes a seed for reproducible "randomness".

Acknowledgments

I acknowledge—

Notes

(1) See also the question titled "Matlab rand and c++ rand()" on Stack Overflow.

(2) A distinction between cryptographic and noncryptographic RNGs seems natural, because many programming languages offer a general-purpose RNG (such as C's rand or Java's java.util.Random) and sometimes an RNG intended for information security purposes (such as Ruby's SecureRandom).

(3) For example, see F. Dörre and V. Klebanov, "Practical Detection of Entropy Loss in Pseudo-Random Number Generators", 2016.

(4) The definition of an RNG does not include items that generate non-uniform random numbers or signals, even if they otherwise meet the criteria of an RNG as defined here. (For example, RNGs exclude Gaussian and similar noise generators.) Many of these items, however, typically serve as sources from which uniform random numbers can be derived through techniques collectively known as randomness extraction (see "Seed Generation").

(5) See also the FIPS 200 definition ("The protection of information and information systems from unauthorized access, use, disclosure, disruption, modification, or destruction in order to provide confidentiality, integrity, and availability") and ISO/IEC 27000.

(6) However, some versions of GLSL (notably GLSL ES 1.0, as used by WebGL 1.0) might support integers with a restricted range (as low as -1024 to 1024) rather than 32-bit or bigger integers as are otherwise common, making it difficult to write hash functions for random number generation. An application ought to choose hash functions that deliver acceptable "random" numbers regardless of the kinds of numbers supported.

An alternative for GLSL and other fragment or pixel shaders to support randomness is to have the shader sample a "noise texture" with random data in each pixel; for example, C. Peters, "Free blue noise textures", Moments in Graphics, Dec. 22, 2016, discusses how so-called "blue noise" can be sampled this way.

See also N. Reed, "Quick And Easy GPU Random Numbers In D3D11", Nathan Reed's coding blog, Jan. 12, 2013.

(7) For more information, see "Floating-Point Determinism" by Bruce Dawson, the white paper "Floating Point and IEEE 754 Compliance for NVIDIA GPUs", and an Intel webinar.

(8) For integers, this problem also occurs, but is generally limited to the question of rounding after an integer division or remainder, which different programming languages answer differently.

(9) Fixed-point numbers are integers that store multiples of 1/n (e.g. 1/10000, 1/256, or 1/65536). Their resolution doesn't vary depending on the number, unlike with floating-point numbers. "The Butterfly Effect - Deterministic Physics in The Incredible Machine and Contraption Maker" is one use case showing how fixed-point numbers aid reproducibility. I have written a sample Python implementation of fixed-point numbers.

(10) Leierson, C.E., et al., "Deterministic Parallel Random-Number Generation for Dynamic Multithreading Platforms", 2012.

(11) Müller, S. "CPU Time Jitter Based Non-Physical True Random Number Generator".

(12) Liebow-Feeser, J., "Randomness 101: LavaRand in Production", blog.cloudflare.com, Nov. 6, 2017.

(13) Liebow-Feeser, J., "LavaRand in Production: The Nitty-Gritty Technical Details", blog.cloudflare.com, Nov. 6, 2017.

(14) These steps could also generate random numbers, rather than a seed, but this is generally slower than using PRNGs to do so.

(15) Also known as entropy extraction, deskewing, whitening, or unbiasing.

(16) Cliff, Y., Boyd, C., Gonzalez Nieto, J. "How to Extract and Expand Randomness: A Summary and Explanation of Existing Results", 2009.

(17) von Neumann, J., "Various techniques used in connection with random digits", 1951.

(18) Zhou, H. and Bruck, J., "Streaming algorithms for optimal generation of random bits", arXiv:1209.0730 [cs.IT], 2012. See source code of my implementation.

(19) For example, many questions on Stack Overflow highlight the pitfalls of creating a new instance of the .NET Framework's System.Random each time a random number is needed, rather than only once in the application. See also Johansen, R. S., "A Primer on Repeatable Random Numbers", Unity Blog, Jan. 7, 2015.

(20) Salmon, J.K.; Moraes, M.A.; et al., "Parallel Random Numbers: As Easy as 1, 2, 3", 2011.

(21) P. L'Ecuyer, D. Munger, et al. "Random Numbers for Parallel Computers: Requirements and Methods, With Emphasis on GPUs", April 17, 2015, section 4, goes in greater detail on ways to initialize PRNGs for generating random numbers in parallel, including how to ensure reproducible "randomness" this way if that is desired.

(22) For single-cycle PRNGs, the probability of overlap for N processes each generating L numbers with a PRNG whose cycle length is P is at most N*N*L/P (S. Vigna, "On the probability of overlap of random subsequences of pseudorandom number generators", Information Processing Letters 158 (2020)). Using two or more PRNG designs can reduce correlation risks due to a particular PRNG's design. For further discussion and an example of a PRNG combining two different PRNG designs, see Agner Fog, "Pseudo-Random Number Generators for Vector Processors and Multicore Processors", Journal of Modern Applied Statistical Methods 14(1), article 23 (2015).

(23) Bauke and Mertens, "Random numbers for large-scale distributed Monte Carlo simulations", 2007.

(24) Besides the seed, other things are hashed that together serve as a domain separation tag (see, e.g., the work-in-progress document "draft-irtf-cfrg-hash-to-curve"). Note the following:

(25) Using the similar /dev/random is not recommended, since in some implementations it can block for seconds at a time, especially if not enough randomness is available. See also "Myths about /dev/urandom".

(26) Wetzels, J., "33C3: Analyzing Embedded Operating System Random Number Generators", samvartaka.github.io, Jan. 3, 2017.

(27) B. Peng, "Two Fast Methods of Generating True Random Numbers on the Arduino", GitHub Gist, December 2017.

(28) A. Klyubin, "Some SecureRandom Thoughts", Android Developers Blog, Aug. 14, 2013.

(29) Michaelis, K., Meyer, C., and Schwenk, J. "Randomly Failed! The State of Randomness in Current Java Implementations", 2013.

(30) There are many kinds of noise, such as procedural noise (including Perlin noise, cellular noise, and value noise), colored noise (including white noise and pink noise), periodic noise, and noise following a Gaussian or other probability distribution. See also two articles by Red Blob Games: "Noise Functions and Map Generation" and "Making maps from noise functions".

(31) Noise functions include functions that combine several outputs of a noise function, including by fractional Brownian motion. By definition, noise functions deliver the same output for the same input.

(32) More generally, a list has N! / (W_1! * W_2! * ... * W_K!) permutations (a multinomial coefficient), where N is the list's size, K is the number of different items in the list, and W_i is the number of times the item identified by i appears in the list. However, this number is never more than N! and suggests using less randomness, so an application need not use this more complicated formula and may assume that a list has N! permutations even if some of its items occur more than once.

(33) Atwood, Jeff. "The danger of naïveté", Dec. 7, 2007.

(34) van Staveren, Hans. "Big Deal: A new program for dealing bridge hands", Sep. 8, 2000

(35) For applications distributed across multiple computers (e.g., servers), this check is made easier if each computer is assigned a unique value from a central database, because then the computer can use that unique value as part of unique identifiers it generates and ensure that the identifiers are unique across the application without further contacting other computers or the central database. An example is Twitter's Snowflake service.

(36) In theory, generating two or more random integers of the same size runs the risk of producing a duplicate number this way. However, this risk decreases as that size increases (see "Birthday problem"). For example, in theory, an application has a 50% chance for duplicate numbers after generating—

(37) If an application expects end users to type in a unique identifier, it could find that very long unique identifiers are unsuitable for it (e.g. 128-bit numbers take up 32 base-16 characters). There are ways to deal with these and other long identifiers, including (1) separating memorable chunks of the identifier with a hyphen, space, or another character (e.g., "ABCDEF" becomes "ABC-DEF"); (2) generating the identifier from a sequence of memorable words (as in Electrum or in Bitcoin's BIP39); or (3) adding a so-called "checksum digit" at the end of the identifier to guard against typing mistakes. The application ought to consider trying (1) or (2) before deciding to use shorter identifiers than what this document recommends.

(38) Note that the insecure direct object references problem can occur if an application enables access to a sensitive resource via an easy-to-guess identifier, but without any access control checks.

(39) For suggested "full-period" LCGs, see tables 3, 5, 7, and 8 of Steele and Vigna, "Computationally easy, spectrally good multipliers for congruential pseudorandom number generators", arXiv:2001.05304 [cs.DS].

(40) Verifiable delay functions are different from proofs of work, in which there can be multiple correct answers. These functions were first formally defined in Boneh, D., Bonneau, J., et al., "Verifiable Delay Functions", 2018, but such functions appeared earlier in Lenstra, A.K., Wesolowski, B., "A random zoo: sloth, unicorn, and trx", 2015.

(41) It is outside the scope of this page to explain how to build a protocol using verifiable delay functions, commitment schemes, or mental card game schemes, especially because such protocols are not yet standardized for general use and few implementations of them are used in production.

(42) Implementing a cryptographic RNG involves many security considerations, including these:

  1. If an application runs code from untrusted sources in the same operating system process in which a cryptographic RNG's state is stored, it's possible for malicious code to read out that state via side-channel attacks. A cryptographic RNG should not be implemented in such a process. See (A) and see also (B).
  2. A cryptographic RNG's state could be reused due to process forking or virtual machine snapshot resets. See (C) and (D), for example.
  3. If a cryptographic RNG is not "constant-time" (the RNG is data-dependent), its timing differences could be exploited in a security attack.

(A) "Post-Spectre Threat Model Re-Think" in the Chromium source code repository (May 29, 2018).
(B) Bernstein, D.J. "Entropy Attacks!", Feb. 5, 2014.
(C) Everspaugh, A., Zhai, Y., et al. "Not-So-Random Numbers in Virtualized Linux and the Whirlwind RNG", 2014.
(D) Ristenpart, T., Yilek, S. "When Good Randomness Goes Bad: Virtual Machine Reset Vulnerabilities and Hedging Deployed Cryptography", 2010.
For a detailed notion of a secure RNG, see Coretti, Dodis, et al., "Seedless Fruit is the Sweetest: Random Number Generation, Revisited", 2019.

(43) This data can come from nondeterministic sources, and also include process identifiers, time stamps, environment variables, random numbers, virtual machine guest identifiers, and/or other data specific to the session or to the instance of the RNG. See also NIST SP 800-90A and the previous note.

(44) Bernstein, D.J. "Fast-key-erasure random number generators", Jun. 23, 2017.

(45) An example is the "shrinking generator" technique to combine two RNGs; see J. D. Cook, "Using one RNG to sample another", June 4, 2019, for more.

(46) Allowing applications to do so would hamper forward compatibility — the API would then be less free to change how the RNG is implemented in the future (e.g., to use a cryptographic or otherwise "better" RNG), or to make improvements or bug fixes in methods that use that RNG (such as shuffling and Gaussian number generation). (As a notable example, the V8 JavaScript engine recently changed its Math.random() implementation to use a variant of xorshift128+, which is backward compatible because nothing in JavaScript allows Math.random() to be seeded.) Nevertheless, APIs can still allow applications to provide additional input ("entropy") to the RNG in order to increase its randomness rather than to ensure repeatability.

License

Any copyright to this page is released to the Public Domain. In case this is not possible, this page is also licensed under Creative Commons Zero.