# Random Number Generation and Sampling Methods

Begun on June 4, 2017; last updated on Oct. 12, 2017.

Discusses many ways applications can do random number generation and sampling from an underlying RNG and includes pseudocode for many of them.

## Introduction

This page discusses many ways applications can do random number generation and sampling from an underlying random number generator (RNG), often with pseudocode. Those methods include—

- ways to generate uniform random numbers from an underlying RNG (such as the core method,
`RNDINT(N)`

), - ways to generate randomized content and conditions, such as Boolean conditions, shuffling, and sampling unique items from a list, and
- generating non-uniform random numbers, including weighted choice, the normal distribution, and other probability distributions.

Sample Python code that implements many of the methods in this document is available.

All the random number methods presented on this page—

- assume the existence of an underlying RNG,
- make no assumptions on the underlying RNG's implementation (e.g., whether that RNG is a deterministic RNG or some other kind), and
- make no assumptions on the statistical quality or predictability of the underlying RNG.

In general, security, performance, quality, and other considerations will determine what underlying RNG to use in a particular application; I have written more on RNG recommendations in another document.

In general, the following are outside the scope of this document:

- Techniques that are specific to an application programming interface.
- Techniques that are specific to certain kinds of RNGs.
- Generating sequences of unique integers using specific kinds of deterministic RNGs.
- Seemingly random numbers that are specifically generated using hash functions or similar pseudorandom functions (as opposed to RNGs).

## Contents

- Introduction
- Contents
- Notation and Definitions
- Uniform Random Numbers
`RNDINT`

: Random Integers in [0, N]`RNDINTRANGE`

: Random Integers in [N, M]`RNDU01`

: Random Numbers in [0, 1]`RNDNUMRANGE`

: Random Numbers in [X, Y]`RNDINTEXC`

: Random Integers in [0, N)`RNDINTEXCRANGE`

: Random Integers in [N, M)`RNDU01OneExc`

,`RNDU01ZeroExc`

, and`RNDU01ZeroOneExc`

: Random Numbers in [0, 1), (0, 1], or (0, 1)`RNDNUMEXCRANGE`

: Random Numbers in [X, Y)`RNDBITS`

: Random N-Bit Integers- Special Programming Environments

- Randomization Techniques
- General Non-Uniform Distributions
- Specific Non-Uniform Distributions
- Dice
- Normal (Gaussian) Distribution
- Binomial Distribution
- Poisson Distribution
- Gamma Distribution
- Negative Binomial Distribution
- von Mises Distribution
- Stable Distribution
- Hypergeometric Distribution
- Multivariate Normal Distribution
- Dirichlet Distribution: Random Numbers with a Given Positive Sum
- Multinomial Distribution
- Gaussian Copula
- Other Non-Uniform Distributions

- Geometric Sampling
- Conclusion
- Notes
- License

## Notation and Definitions

In this document:

- The
**pseudocode conventions**apply to this document. **Intervals**: The following notation is used for intervals:- [
`a`

,`b`

) means "`a`

or greater, but less than`b`

". - (
`a`

,`b`

) means "greater than`a`

, but less than`b`

". - (
`a`

,`b`

] means "greater than`a`

and less than or equal to`b`

". - [
`a`

,`b`

] means "`a`

or greater and`b`

or less".

- [
- The term
*random number generator*, or*RNG*, means a number generator that seeks to generate independent numbers that seem to occur by chance and that are approximately uniformly distributed.^{(1)} - The
*norm*of one or more numbers is the square root of the sum of squares of those numbers, that is,`sqrt(num1 * num1 + num2 * num2 + ... + numN * numN)`

. - The term
*significand permutations*, with respect to a floating-point format, means the format's radix (number base) raised to the power of the format's precision (the maximum number of significant radix digits that the format can represent without loss). For example—- the 64-bit IEEE 754 binary floating-point format (e.g., Java
`double`

) has 2^{53}(9007199254740992) significand permutations, - the 64-bit IEEE 754 decimal floating-point format has 10
^{16}significand permutations, - the 32-bit IEEE 754 binary floating-point format (e.g., Java
`float`

) has 2^{24}(16777216) significand permutations, - the .NET Framework decimal format (
`System.Decimal`

) has a precision of 28 (since it can represent up to that many significant digits without loss, even though it can represent some numbers greater than 10^{28}), so it has 10^{28}significand permutations, and - arbitrary-precision floating point numbers (e.g., Java
`BigDecimal`

) can have a theoretically arbitrary number of significand permutations.

- the 64-bit IEEE 754 binary floating-point format (e.g., Java

## Uniform Random Numbers

This section describes how an underlying RNG can be used to generate uniformly-distributed random numbers. Here is an overview of the methods described in this section.

- Random Integers:
`RNDINT`

,`RNDINTEXC`

,`RNDINTRANGE`

,`RNDINTRANGEEXC`

. - Random Bits:
`RNDBITS`

. - Random Numbers in 0-1 Bounded Interval:
`RNDU01`

,`RNDU01ZeroExc`

,`RNDU01OneExc`

,`RNDU01ZeroOneExc`

. - Other Random Numbers:
`RNDNUMRANGE`

,`RNDNUMRANGEEXC`

.

One method, `RNDINT`

, described next, can serve as the basis for the remaining methods.

`RNDINT`

: Random Integers in [0, N]

In this document, ** RNDINT(maxInclusive)** is the core method for generating uniform random integers from an underlying RNG, which is called

**in this section. The random integer is**

`RNG()`

**in the interval [0,**. This section explains how

`maxInclusive`

]`RNDINT`

can be implemented for two kinds of underlying RNGs.**Method 1**: If`RNG()`

outputs**integers in the interval [0, positive**(for example, less than 1,000,000 or less than 6), then`MODULUS`

)`RNDINT(maxInclusive)`

can be implemented as in the pseudocode below.^{(2)}**Method 2**: If`RNG()`

outputs**floating-point numbers in the interval [0, 1)**, then find`s`

, where`s`

is the number of*significand permutations*for the floating-point format, and use Method 1 above, where`MODULUS`

is`s`

and`RNG()`

is`floor(RNG() * s)`

instead. (If the RNG outputs arbitrary-precision floating-point numbers,`s`

should be set to the number of different values that are possible from the underlying RNG.)**Other RNGs:**A detailed`RNDINT(maxInclusive)`

implementation for other kinds of RNGs is not given here, since they seem to be lesser seen in practice. Readers who know of such an RNG (provided it's in wide use) should send me a comment.

METHOD RndIntHelperNonPowerOfTwo(maxInclusive) cx = floor(maxInclusive / MODULUS) + 1 while true ret = cx * RNG() // NOTE: If this method is implemented using a fixed- // precision type, the addition operation below should // check for overflow and should reject the number // if overflow would result. ret = ret + RNDINT(cx - 1) if ret <= maxInclusive: return ret end END METHOD METHOD RndIntHelperPowerOfTwo(maxInclusive) // NOTE: Finds the number of bits minus 1 needed // to represent MODULUS. This will be a constant // here, though. modBits = ln(MODULUS)/ln(2) // Calculate the bit count of maxInclusive bitCount = 0 tempnumber = maxInclusive while tempnumber > 0 // NOTE: If the programming language implements // division with two integers by truncating to an // integer, the division can be used as is without // using a "floor" function. tempnumber = floor(tempnumber / 2) bitCount = bitCount + 1 end while true // Build a number with `bitCount` bits tempnumber = 0 while bitCount > 0 wordBits = modBits rngNumber = RNG() if wordBits > bitCount wordBits = bitCount // Truncate number to 'wordBits' bits // NOTE: If the programming language supports a bitwise // AND operator, the mod operation can be implemented // as "rndNumber AND ((1 << wordBits) - 1)" rngNumber = mod(rngNumber, (1 << wordBits)) end tempnumber = tempnumber << wordBits // NOTE: In programming languages that // support the OR operator between two // integers, that operator can replace the // plus operator below. tempnumber = tempnumber + rngNumber bitCount = bitCount - wordBits end // Accept the number if allowed if tempnumber <= maxInclusive: return tempnumber end END METHOD METHOD RNDINT(maxInclusive) // maxInclusive must be 0 or greater if maxInclusive < 0: return error if maxInclusive == 0: return 0 // N equals modulus if maxInclusive == MODULUS - 1: return RNG() // NOTE: Finds the number of bits minus 1 needed // to represent MODULUS (if it's a power of 2). // This will be a constant here, though. modBits=ln(MODULUS)/ln(2) // NOTE: The following condition checks if MODULUS // is a power of 2. This will be a constant here, though. isPowerOfTwo=floor(modBits) == modBits // Special cases if MODULUS is a power of 2 if isPowerOfTwo if maxInclusive == 1: return mod(RNG(), 2) if maxInclusive == 3 and modBits >= 2: return mod(RNG(), 4) if maxInclusive == 255 and modBits >= 8: return mod(RNG(), 256) if maxInclusive == 65535 and modBits >=16: return mod(RNG(), 65535) end if maxInclusive > MODULUS - 1: if isPowerOfTwo return RndIntHelperPowerOfTwo(maxInclusive) else return RndIntHelperNonPowerOfTwo(maxInclusive) end else // NOTE: If the programming language implements // division with two integers by truncating to an // integer, the division can be used as is without // using a "floor" function. nPlusOne = maxInclusive + 1 maxexc = floor((MODULUS - 1) / nPlusOne) * nPlusOne while true ret = RNG() if ret < nPlusOne: return ret if ret < maxexc: return mod(ret, nPlusOne) end end END METHOD

**Notes:**

- To generate a random number that's either -1 or 1, the following idiom can be used:
`(RNDINT(1) * 2 - 1)`

. - To generate a random integer that's divisible by a positive integer (
`DIV`

), generate the integer with any method (such as`RNDINT`

), let`X`

be that integer, then generate`X - mod(X, DIV)`

if`X >= 0`

, or`X - (DIV - mod(abs(X), DIV))`

otherwise. (Depending on the method, the resulting integer may be out of range, in which case this procedure is to be repeated.) - A random 2-dimensional point on an NxM grid can be expressed as a single integer as follows:
- To generate a random NxM point
`P`

, generate`P = RNDINT(N * M - 1)`

(`P`

is thus in the interval [0,`N * M`

)). - To convert a point
`P`

to its 2D coordinates, generate`[mod(P, N), floor(P / N)]`

. (Each coordinate starts at 0.) - To convert 2D coordinates
`coord`

to an NxM point, generate`P = coord[1] * N + coord[0]`

.

- To generate a random NxM point

`RNDINTRANGE`

: Random Integers in [N, M]

The naïve way of generating a **random integer in the interval [ minInclusive, maxInclusive]**, shown below, works well for unsigned integers and arbitrary-precision integers.

METHOD RNDINTRANGE(minInclusive, maxInclusive) // minInclusive must not be greater than maxInclusive if minInclusive > maxInclusive: return error return minInclusive + RNDINT(maxInclusive - minInclusive) END METHOD

The naïve approach won't work as well, though, for signed integer formats if the difference between `maxInclusive`

and `minInclusive`

exceeds the highest possible integer for the format. For fixed-length signed integer formats ^{[(12)]}(#Note12), such random integers can be generated using the following pseudocode. In the pseudocode below, `INT_MAX`

is the highest possible integer in the integer format.

METHOD RNDINTRANGE(minInclusive, maxInclusive) // minInclusive must not be greater than maxInclusive if minInclusive > maxInclusive: return error if minInclusive == maxInclusive: return minInclusive if minInclusive==0: return RNDINT(maxInclusive) // Difference does not exceed maxInclusive if minInclusive > 0 or minInclusive + INT_MAX >= maxInclusive return minInclusive + RNDINT(maxInclusive - minInclusive) end while true ret = RNDINT(INT_MAX) // NOTE: If the signed integer format uses two's-complement // form, use the following line: if RNDINT(1) == 0: ret = -1 - ret // NOTE: If the signed integer format uses sign-magnitude // form (such as .NET's `System.Decimal`) or one's-complement // form, use the following three lines instead of the preceding line; // here, zero will be rejected at a 50% chance because zero occurs // twice in both forms. // negative = RNDINT(1) == 0 // if negative: ret = 0 - ret // if negative and ret == 0: continue if ret >= minInclusive and ret <= maxInclusive: return ret end END METHOD

**Note:**

- To simulate rolling an N-sided die (N greater than 1), generate a random number in the interval [1, N] by
`RNDINTRANGE(1, N)`

. - Generating a random integer with one base-10 digit is equivalent to generating
`RNDINTRANGE(0, 9)`

. - Generating a random integer with N base-10 digits (where N is 2 or greater) is equivalent to generating
`RNDINTRANGE(pow(10, N-1), pow(10, N) - 1)`

.

`RNDU01`

: Random Numbers in [0, 1]

The idiom `RNDINT(X) / X`

(called ** RNDU01()** in this document), generates a

**random number in the interval [0, 1]**, where

`X`

is the number of fractional parts between 0 and 1. (For fixed-precision floating-point number formats, `X`

should equal the number of *significand permutations*for that format. See "Generating uniform doubles in the unit interval" in the

`xoroshiro+`

remarks page for further discussion.)For fixed-precision binary floating-point numbers with fixed exponent range (such as Java's `double`

and `float`

), the following pseudocode for `RNDU01()`

can be used instead. It's based on a technique devised by Allen Downey, who found that dividing a random number by a constant usually does not yield all representable binary floating-point numbers in the desired range. In the pseudocode below, `SIGBITS`

is the binary floating-point format's precision (for examples, see the note for "significand permutations").

METHOD RNDU01() e=-SIGBITS while true if RNDINT(1)==0: e = e - 1 else: break end sig = RNDINT((1 << (SIGBITS - 1)) - 1) if sig==0 and RNDINT(1)==0: e = e + 1 sig = sig + (1 << (SIGBITS - 1)) // NOTE: This multiplication should result in // a floating-point number; if `e` is sufficiently // small, the number will underflow to 0 return sig * pow(2, e) END METHOD

`RNDNUMRANGE`

: Random Numbers in [X, Y]

The naïve way of generating a **random number in the interval [ minInclusive, maxInclusive]**, is shown in the following pseudocode, which generally works well only if the number format can't be negative or that format uses arbitrary precision.

METHOD RNDNUMRANGE(minInclusive, maxInclusive) if minInclusive > maxInclusive: return error return minInclusive + (maxInclusive - minInclusive) * RNDU01() END

For fixed-point or floating-point number formats with fixed precision (such as Java's `double`

and `float`

), the pseudocode above can overflow if the difference between `maxInclusive`

and `minInclusive`

exceeds the maximum possible value for the format. For such formats, the following pseudocode for `RNDU01()`

can be used instead. In the pseudocode below, `NUM_MAX`

is the highest possible value for the number format. The pseudocode assumes that the highest possible value is positive and the lowest possible value is negative.

METHOD RNDNUMRANGE(minInclusive, maxInclusive) if minInclusive > maxInclusive: return error if minInclusive == maxInclusive: return minInclusive // Difference does not exceed maxInclusive if minInclusive >= 0 or minInclusive + NUM_MAX >= maxInclusive return minInclusive + (maxInclusive - minInclusive) * RNDU01() end while true ret = RNDU01() * NUM_MAX // NOTE: If the number format uses sign-magnitude // representation, as is the case for Java `float` and // `double` and .NET's implementation of `System.Decimal`, // for example, use the following: negative = RNDINT(1) == 0 if negative: ret = 0 - ret if negative and ret == 0: continue // NOTE: For fixed-precision fixed-point numbers implemented // using two's complement numbers (note 1), use the following line // instead of the preceding three lines, where `QUANTUM` is the // smallest representable positive number in the fixed-point format: // if RNDINT(1) == 0: ret = (0 - QUANTUM) - ret if ret >= minInclusive and ret <= maxInclusive: return ret end END

`RNDINTEXC`

: Random Integers in [0, N)

`RNDINTEXC(maxExclusive)`

, which generates a **random number in the interval [0, maxExclusive)**,
can be implemented as follows

^{(3)}:

METHOD RNDINTEXC(maxExclusive) if maxExclusive <= 0: return error return RNDINT(maxExclusive - 1) END METHOD

**Note:** The following are alternative ways of generating a random integer in the interval [0, `maxExclusive`

):

`floor(RNDNUMEXCRANGE(0, maxExclusive))`

.- Generate
`N = floor(RNDU01OneExc()*(maxExclusive))`

until`N < maxExclusive`

. (The loop is needed because otherwise, rounding error due to the nature of certain floating-point formats can result in`maxExclusive`

being returned in rare cases.^{(4)})

These approaches, though, are recommended only if the programming language—

- supports floating-point number types and no other number types (an example is JavaScript),
- is a dialect of SQL, or
- doesn't support an integer type that is big enough to fit the number
`maxExclusive - 1`

.

`RNDINTEXCRANGE`

: Random Integers in [N, M)

** RNDINTEXCRANGE** returns a **random integer in the interval [

`minInclusive`

, `maxExclusive`

)**. It can be implemented using `RNDINTRANGE`

, as the following pseudocode demonstrates.METHOD RNDINTEXCRANGE(minInclusive, maxExclusive) if minInclusive >= maxExclusive: return error // NOTE: For signed integer formats, replace the following line // with "if minInclusive >=0 or minInclusive + INT_MAX >= // maxExclusive", where `INT_MAX` has the same meaning // as the pseudocode for `RNDINTRANGE`. if minInclusive >=0 return RNDINTRANGE(minInclusive, maxExclusive - 1) end while true ret = RNDINTRANGE(minInclusive, maxExclusive) if ret < maxExclusive: return ret end END METHOD

`RNDU01OneExc`

, `RNDU01ZeroExc`

, and `RNDU01ZeroOneExc`

: Random Numbers in [0, 1), (0, 1], or (0, 1)

Three methods related to `RNDU01()`

can be implemented as follows, where
`X`

is the number of fractional parts between 0 and 1 (see `RNDU01()`

section).

, can be implemented in one of the following ways:`RNDU01OneExc()`

, interval [0, 1)- Generate
`RNDU01()`

in a loop until a number other than 1.0 is generated this way. This is preferred. `RNDINT(X - 1) / X`

.`RNDINTEXC(X) / X`

.

Note that

`RNDU01OneExc()`

corresponds to`Math.random()`

in Java and JavaScript.- Generate
, can be implemented in one of the following ways:`RNDU01ZeroExc()`

, interval (0, 1]- Generate
`RNDU01()`

in a loop until a number other than 0.0 is generated this way. This is preferred. `(RNDINT(X - 1) + 1) / X`

.`(RNDINTEXC(X) + 1) / X`

.`1.0 - RNDU01OneExc()`

(but this is recommended only if the set of numbers`RNDU01OneExc()`

could return — as opposed to their probability — is evenly distributed).

- Generate
, can be implemented in one of the following ways:`RNDU01ZeroOneExc()`

, interval (0, 1)- Generate
`RNDU01()`

in a loop until a number other than 0.0 or 1.0 is generated this way. This is preferred. `(RNDINT(X - 2) + 1) / X`

.`(RNDINTEXC(X - 1) + 1) / X`

.

- Generate

`RNDNUMEXCRANGE`

: Random Numbers in [X, Y)

** RNDNUMEXCRANGE** returns a **random number in the interval [

`minInclusive`

, `maxExclusive`

)**.
It can be implemented using `RNDNUMRANGE`

, as the following pseudocode demonstrates.METHOD RNDNUMEXCRANGE(minInclusive, maxExclusive) if minInclusive >= maxExclusive: return error while true ret = RNDNUMRANGE(minInclusive, maxExclusive) if ret < maxExclusive: return ret end END METHOD

`RNDBITS`

: Random N-Bit Integers

The idiom `RNDINT((1 << b) - 1)`

, called ** RNDBITS(b)** in this document, is a naïve way of generating a

**uniform random**(with maximum 2

`N`

-bit integer^{b - 1}).

Although this idiom works well for arbitrary-precision integers, it won't work well for the much more popular integer types called *fixed-length two's-complement signed integers* ^{(5)}. For such signed integers as well as fixed-length unsigned integers, `RNDBITS(bits)`

can be implemented using the pseudocode below. In the pseudocode below, `BITCOUNT`

is the number of bits used in the format. Note that for such signed integers, `RNDBITS(bits)`

can return a sequence of bits that resolves to a negative number.

METHOD RNDBITS(bits) if bits<0 or bits > BITCOUNT: return error if bits==0: return 0 if bits==1: return RNDINT(1) if bits==2 and BITCOUNT == 2 return (RNDINT(1) << 1) | RNDINT(1) end if bits==2: return RNDINT(3) bitsMinus2 = bits - 2 // NOTE: The "|" below is the OR operator between // two integers. The following line is implemented this // way to accommodate implementations that use // fixed-length two's-complement signed integers. ret = (RNDINT((1<<bitsMinus2) - 1) << bitsMinus2) | RNDINT(3) // NOTE: If the implementation uses fixed-length // unsigned integers, the following line can replace // the preceding line. Note that the implementation // avoids shifting an integer by BITCOUNT bits or more, // because such behavior is undefined in C and C++. // ret = RNDINT( (((1 << (bits - 1)) - 1) << 1) | 1 ) // NOTE: Alternatively, a list containing powers-of-two // minus 1 can be generated (calculating `floor(pow(2,i)) - 1` // for each relevant index `i` of the list, assuming unlimited // precision) and the following line used instead of the // preceding (assuming `list` is // the list generated this way): // ret = RNDINT( list[bits] ) return ret END METHOD

### Special Programming Environments

In certain programming environments it's often impractical to implement the uniform random number generation methods just described without recurring to other programming languages. These include the following:

- Microsoft Windows batch files (newer versions of which, at least, include a
`%RANDOM%`

variable which returns a random integer in the interval [0, 65535]). `bash`

and other shell scripts (some of which include a`$RANDOM`

variable which returns a random integer in the interval [0, 65535]).SQL dialects, such as—

- MySQL (which has a
`RAND()`

akin to`RNDU01OneExc()`

), - T-SQL (which also has a
`RAND()`

akin to`RNDU01OneExc()`

), - PL/SQL (which often has
`DBMS_RANDOM.VALUE`

akin to either`RNDU01OneExc()`

or`RNDNUMEXCRANGE`

), - PostgreSQL (which has
`RANDOM()`

), and - SQLite (which sometimes has
`RANDOM()`

which returns a random integer in the interval [-2^{63}, 2^{63})),

especially within a single query.

- MySQL (which has a

Readers aware of how these environments can support those uniform random number methods should send me a comment.

Moreover, in functional programming languages such as Haskell, it may be important to separate code that directly uses RNGs from other code, usually by rewriting certain functions to take one or more pregenerated random numbers rather than directly using `RNDINT`

, `RNDNUMRANGE`

, or another random number generation method presented earlier in this document.

## Randomization Techniques

This section describes commonly used randomization techniques, such as shuffling, selection of several unique items, and creating random strings of text.

### Boolean Conditions

To generate a condition that is true at the specified probabilities, use
the following idioms in an `if`

condition:

- True or false with equal probability:
`RNDINT(1) == 0`

. - True with X percent probability:
`RNDINTEXC(100) < X`

. - True with probability X/Y:
`RNDINTEXC(Y) < X`

. - True with odds of X to Y:
`RNDINTEXC(X + Y) < X`

. - True with probability X, where X is from 0 through 1 (a
*Bernoulli trial*):`RNDU01OneExc() < X`

. **Examples:**- True with probability 3/8:
`RNDINTEXC(8) < 3`

. - True with odds of 100 to 1:
`RNDINTEXC(101) < 1`

. - True with 20% probability:
`RNDINTEXC(100) < 20`

.

- True with probability 3/8:

### Shuffling

The Fisher–Yates shuffle method shuffles a list (puts its items in a random order) such that all permutations (arrangements) of that list are equally likely to occur, assuming the RNG it uses can choose from among all permutations of that list. However, that method is also easy to write incorrectly (see also Jeff Atwood, "The danger of naïveté"). The following pseudocode is designed to shuffle a list's contents.

METHOD Shuffle(list) // NOTE: Check size of the list early to prevent // `i` from being less than 0 if the list's size is 0 and // `i` is implemented using an unsigned type available // in certain programming languages. if size(list) >= 2 // Set i to the last item's index i = size(list) - 1 while i > 0 // Choose an item ranging from the first item // up to the item given in `i`. Note that the item // at i+1 is excluded. k = RNDINTEXC(i + 1) // The following is wrong since it introduces biases: // "k = RNDINTEXC(size(list))" // The following is wrong since the algorithm won't // choose from among all possible permutations: // "k = RNDINTEXC(i)" // Swap item at index i with item at index k; // in this case, i and k may be the same tmp = list[i] list[i] = list[k] list[k] = tmp // Move i so it points to the previous item i = i - 1 end end // NOTE: An implementation can return the // shuffled list, as is done here, but this is not required. return list END METHOD

An important consideration with respect to shuffling is the nature of the underlying RNG, as I discuss in further detail in my RNG recommendation document on shuffling.^{(6)}

**Note:** In simulation testing, shuffling is used to relabel items from a dataset at random, where each item in the dataset is assigned one of several labels. In such testing—

- one or more statistics that involve the specific labeling of the original dataset's groups is calculated (such as the difference, maximum, or minimum of means or variances between groups), then
- multiple simulated datasets are generated, where each dataset is generated by—
- merging the groups,
- shuffling the merged dataset, and
- relabeling each item in order such that the number of items in each group for the simulated dataset is the same as for the original dataset, then

- for each simulated dataset, the same statistics are calculated as for the original dataset, then
- the statistics for the simulated datasets are compared with those of the original.

### Creating a Random Character String

To generate a random string of characters (usually a random *alphanumeric string*, or string of letters and digits):

Generate a list of the letters, digits, and/or other characters the string can have. For example, those characters can be—

- the basic digits "0" to "9" (U+0030-U+0039, nos. 48-57),
- the basic upper case letters "A" to "Z" (U+0041-U+005A, nos. 65-90), and
- the basic lower case letters "a" to "z" (U+0061-U+007A, nos. 96-122),

as found in the Basic Latin block of the Unicode Standard.

Build a new string whose characters are chosen from that character list. The pseudocode below demonstrates this by creating a list, rather than a string, where the random characters will be held. It also takes the number of characters as a parameter named

`size`

. (Converting this list to a text string is programming-language-dependent, and the details of the conversion are outside the scope of this page.)METHOD RandomString(characterList, stringSize) i = 0 newString = NewList() while i < stringSize // Choose a character from the list randomChar = characterList[RNDINTEXC(size(characterList))] // Add the character to the string AddItem(newString, randomChar) i = i + 1 end return newString END METHOD

**Notes:**

- If the list of characters is fixed, the list can be statically created at runtime or compile time, or a string type as provided in the programming language can be used to store the list as a string.
- Instead of individual characters, the list can consist of strings of one or more characters each (e.g., words or syllables). In that case, storing the list of strings as a single string is usually not a clean way to store those strings.
- Often applications need to generate a string of characters that's not only random, but also unique. The best way to ensure uniqueness in this case is to store a list (such as a hash table) of strings already generated and to check newly generated strings against the list (or table). Random number generators alone should not be relied on to deliver unique results. Special considerations apply if the strings identify database records, file system paths, or other shared resources; such special considerations include the need to synchronize access, but are not discussed further in this document.
- Generating a random hexadecimal string is equivalent to generating
`RandomString(characterList, stringSize)`

, where`characterList`

is`["0", "1", ..., "9", "A", ..., "F"]`

or`["0", "1", ..., "9", "a", ..., "f"]`

(with ellipses used to save space), and`stringSize`

is the desired size. - For generating a random base-10 digit string, the list of characters passed to
`RandomString`

consists of the basic digits only. - Ways to generate "pronounceable" words or words similar to natural-language words
^{(7)}, or to generate strings that match a regular expression, are too complicated to discuss in this document.

### Sampling With Replacement: Choosing a Random Item from a List

To choose a random item from a list—

- whose size is known in advance, use the idiom
`list[RNDINTEXC(size(list))]`

; or - whose size is not known in advance, generate
`RandomKItemsFromFile(file, 1)`

, in pseudocode given in a later section (the result will be a 1-item list or be an empty list if there are no items).

Choosing an item this way is also known as *sampling with replacement*.

**Notes:**

- Generating a random number in the interval [
`mn`

,`mx`

) in increments equal to`step`

is equivalent to—- generating a list of all numbers in the interval [
`mn`

,`mx`

) of the form`mn + step * x`

, where`x >= 0`

is an integer, then - choosing a random item from the list generated this way.

- generating a list of all numbers in the interval [
*Bootstrapping*is a method of creating a simulated dataset by choosing random items with replacement from an existing dataset until both datasets have the same size. (The simulated dataset can contain duplicates this way.) Usually, multiple simulated datasets are generated this way, one or more statistics, such as the mean, are calculated for each simulated dataset as well as the original dataset, and the statistics for the simulated datasets are compared with those of the original.

### Sampling Without Replacement: Choosing Several Unique Items

There are several techniques for choosing `k`

unique items or values uniformly at random from among `n`

available items or values, depending on whether `n`

is known, how big `n`

and `k`

are, and other considerations.

**If**Use the`n`

is not known in advance:*reservoir sampling*method; see the`RandomKItemsFromFile`

method in the pseudocode below. Although the pseudocode refers to files and lines, the technique applies to any situation when items are retrieved one at a time from a dataset or list whose size is not known in advance.**If items are to be chosen in order:****If**then the`n`

is relatively small,`RandomKItemsInOrder`

method, in the pseudocode below, demonstrates a solution (based on a technique presented in Devroye 1986, p. 620).**If**see the item "If`n`

is relatively large,`n`

is relatively large", later.

**If**Do one of the following:`n`

is relatively small (for example, if there are 200 available items, or there is a range of numbers from 0 to 200 to choose from):- Store all the items in a list, shuffle that list, then choose the first
`k`

items from that list. - If the items are already stored in a list and the list's order can be changed, then shuffle that list and choose the first
`k`

items from the shuffled list. - If the items are already stored in a list and the list's order can't be changed, then store the indices to those items in another list, shuffle the latter list, then choose the first
`k`

indices (or the items corresponding to those indices) from the latter list.

- Store all the items in a list, shuffle that list, then choose the first
**If**Do a`k`

is much smaller than`n`

and the items are stored in a list whose order can be changed:*partial shuffle*of that list, then choose the*last*`k`

items from that list. A*partial shuffle*proceeds as given in the section "Shuffling", except the partial shuffle stops after`k`

swaps have been made (where swapping one item with itself counts as a swap).**If**Do one of the following:`k`

is much smaller than`n`

and`n`

is not very large (for example, less than 5000):- Store all the items in a list, do a
*partial shuffle*of that list, then choose the*last*`k`

items from that list. - If the items are already stored in a list and the list's order can't be changed, then store the indices to those items in another list, do a
*partial shuffle*of the latter list, then choose the*last*`k`

indices (or the items corresponding to those indices) from the latter list.

- Store all the items in a list, do a
**If**Create a hash table storing the indices to items already chosen. When a new index to an item is randomly chosen, check the hash table to see if it's there already. If it's not there already, add it to the hash table. Otherwise, choose a new random index. Repeat this process until`n`

is relatively large (for example, if 32-bit or larger integers will be chosen so that`n`

is 2^{32}or is a greater power of 2):`k`

indices were added to the hash table this way. If the items are to be chosen in order, then a red–black tree, rather than a hash table, can be used to store the indices this way; after`k`

indices are added to the tree, the indices (and the items corresponding to them) can be retrieved in sorted order. Performance considerations involved in storing data in hash tables or red-black trees, and in retrieving data from them, are outside the scope of this document.

Choosing several unique items as just described is also known as *sampling without replacement*.

The following pseudocode implements the `RandomKItemsFromFile`

and `RandomKItemsInOrder`

methods referred to in this section.

METHOD RandomKItemsFromFile(file, k) list = NewList() j = 0 endOfFile = false while j < k // Get the next line from the file item = GetNextLine(file) // The end of the file was reached, break if item == nothing: endOfFile = true break end AddItem(list, item) j = j + 1 end i = 1 + k while endOfFile == false // Get the next line from the file item = GetNextLine(file) // The end of the file was reached, break if item == nothing: break j = RNDINTEXC(i) if j < k: list[j] = item i = i + 1 end // We shuffle at the end in case k or fewer // lines were in the file, since in that // case the items would appear in the same // order as they appeared in the file // if the list weren't shuffled. This line // can be removed, however, if the items // in the returned list need not appear // in random order. Shuffle(list) return list end METHOD RandomKItemsInOrder(list, k) i = 0 kk = k ret = NewList() n = size(list) while i < n and size(ret) < k u = RNDINTEXC(n - i) if u <= kk AddItem(ret, list[i]) kk = kk - 1 end i = i + 1 end return ret END METHOD

**Note:** Removing `k`

random items from a list of `n`

items (`list`

) is equivalent to generating a new
list by `RandomKItemsInOrder(list, n - k)`

.

### Almost-Random Sampling

Some applications (particularly some games) may find it important to control which random numbers appear, to make the random outcomes appear fairer to users. Without this control, a user may experience long streaks of good outcomes or long streaks of bad outcomes, both of which are theoretically possible with a random number generator. To implement this kind of "almost-random" sampling, do one of the following:

- Generate a list of possible outcomes (for example, the list can contain 10 items labeled "good" and three labeled "bad") and shuffle that list. Each time an outcome must be generated, choose the next unchosen outcome from the shuffled list. Once all those outcomes are chosen, shuffle the list and continue.
- Create two lists: one list with the different possible outcomes, and another list of the same size containing an integer weight 0 or greater for each outcome (for example, one list can contain the items "good" and "bad", and the other list can contain the weights 10 and 3, respectively). Each time an outcome must be generated, choose one outcome using the weighted choice without replacement technique. Once all of the weights are 0, re-fill the list of weights with the same weights the list had at the start, and continue.

However, "almost-random" sampling techniques are not recommended—

- whenever computer or information security is involved, or
- in cases (such as in multiplayer networked games) when predicting future random numbers would give a player or user a significant and unfair advantage.

**Note:** Monte Carlo integration uses randomization to estimate a multidimensional integral. It involves evaluating a function at N random points in the domain, adding them up, then dividing the sum by N. The "Variance" MathWorld article gives methods for calculating the estimate's variance. (After calculating the error, or square root of variance, and the estimated integral, both can be multiplied by the volume of the domain.) Often *quasirandom sequences* (also known as *low-discrepancy sequences*, such as Sobol and Halton sequences), often together with an RNG, provide the "random" numbers to sample the function more efficiently. Unfortunately, the methods to produce such sequences are too complicated to show here.

### Choosing a Random Date/Time

Choosing a random date/time at or between two others is equivalent to—

- converting the two input date/times to an integer or number (here called
`date1`

and`date2`

, where`date1`

represents the earlier date/time and`date2`

the other) at the required granularity, for instance, month, day, or hour granularity (the details of such conversion depend on the date/time format and are outside the scope of this document), - generating
`newDate = RNDINTRANGE(date1, date2)`

or`newDate = RNDNUMRANGE(date1, date2)`

, respectively, and - converting
`newDate`

to a date/time.

If either input date/time was generated as the random date, but that is not desired, the process just given can be repeated until such a date/time is not generated this way.

### Generating Random Numbers in Sorted Order

The following pseudocode describes a method that generates random numbers in the interval [0, 1] in sorted order. `count`

is the number of random numbers to generate this way. The method is based on an algorithm from Bentley and Saxe 1979.

METHOD SortedRandom(count) list = NewList() k = count c = 1.0 while k > 0 c = pow(RNDU01(), 1.0 / k) * c AddItem(list, c) end return list END METHOD

Alternatively, random numbers can be generated (using any method and where the numbers have any distribution and range) and stored in a list, and the list then sorted using a sorting algorithm. Details on sorting algorithms, however, are beyond the scope of this document.

### Rejection Sampling

*Rejection sampling* is a simple and flexible technique for generating random content that
meets certain requirements. To implement rejection sampling—

- Generate the random content (such as a random number) by any method and with any distribution and range.
- If the content doesn't meet predetermined criteria, go to step 1.

Example criteria include checking—

- whether a random number is prime,
- whether a random number is divisible or not by certain numbers,
- whether a random number is not among recently chosen random numbers,
- whether a random number was not already chosen (with the aid of a hash table, red-black tree, or similar structure),
- whether a random point is sufficiently distant from previous random points (with the aid of a KD-tree or similar structure), and/or
- whether that number is not included in a "blacklist" of numbers.

(KD-trees, hash tables, red-black trees, and prime-number testing algorithms are outside the scope of this document.)

## General Non-Uniform Distributions

Some applications need to choose random items or numbers such that some of them are more likely to be chosen than others (a *non-uniform* distribution). Most of the techniques in this section show how to use the uniform random number methods to generate such random items or numbers.

### Discrete Weighted Choice

The discrete weighted choice method generates a random item from among a collection of them with separate probabilities of each item being chosen.

The following pseudocode takes a single list `weights`

, and returns the index of a weight from that list. The greater the weight, the more likely its index will be chosen. (Note that there are two possible ways to generate the random number depending on whether the weights are all integers or can be fractional numbers.) Each weight should be 0 or greater.

METHOD DiscreteWeightedChoice(weights) if size(weights) == 0: return error sum = 0 // Get the sum of all weights i = 0 while i < size(weights) sum = sum + weights[i] i = i + 1 end // Choose a random integer/number from 0 to less than // the sum of weights. value = RNDINTEXC(sum) // NOTE: If the weights can be fractional numbers, // use this instead: // value = RNDNUMEXCRANGE(0, sum) // Choose the object according to the given value i = 0 lastItem = size(weights) - 1 runningValue = 0 while i < size(weights) if weights[i] > 0 newValue = runningValue + weights[i] // NOTE: Includes start, excludes end if value < newValue: return i runningValue = newValue lastItem = i end i = i + 1 end // Last resort (might happen because rounding // error happened somehow) return lastItem END METHOD

**Example**:

Assume we have the following list: `["apples", "oranges", "bananas", "grapes"]`

, and `weights`

is the following: `[3, 15, 1, 2]`

. The weight for "apples" is 3, and the weight for "oranges" is 15. Since "oranges" has a higher weight than "apples", the index for "oranges" (1) is more likely to be chosen than the index for "apples" (0) with the `DiscreteWeightedChoice`

method. The following pseudocode implements how to get a randomly chosen item from the list with that method.

index = DiscreteWeightedChoice(weights) // Get the actual item item = list[index]

In the example above, the weights sum to 21. However, the weights do not mean that when 21 items are selected, the index for "apples" will be chosen exactly 3 times, or the index for "oranges" exactly 15 times, for example. Each number generated by `DiscreteWeightedChoice`

is independent from the others, and each weight indicates only a *likelihood* that the corresponding index will be chosen rather than the other indices. And this likelihood doesn't change no matter how many times `DiscreteWeightedChoice`

is given the same weights. This is called a weighted choice *with replacement*, which can be thought of as drawing a ball, then putting it back.

#### Weighted Choice Without Replacement

To implement weighted choice *without replacement* (which can be thought of as drawing a ball *without* putting it back), generate an index by `DiscreteWeightedChoice`

, and then decrease the weight for the chosen index by 1. In this way, when items are selected repeatedly, each weight behaves like the number of "copies" of each item. This technique, though, will only work properly if all the weights are integers 0 or greater. The pseudocode below is an example of this.

// Get the sum of weights // (NOTE: This code assumes that `weights` is // a list that can be modified. If the original weights // are needed for something else, a copy of that // list should be made first, but the copying process // is not shown here. This code also assumes that `list`, // a list of items, was already declared earlier and // has at least as many items as `weights`.) totalWeight = 0 i = 0 while i < size(weights) totalWeight = totalWeight + weights[i] i = i + 1 end // Choose as many items as the sum of weights i = 0 items = NewList() while i < totalWeight index = DiscreteWeightedChoice(weights) // Decrease weight by 1 to implement selection // without replacement. weights[index] = weights[index] - 1 AddItem(items, list[index]) i = i + 1 end

Alternatively, if all the weights are integers 0 or greater and their sum is relatively small, create a list with as many copies of each item as its weight, then shuffle that list. The resulting list will be ordered in a way that corresponds to a weighted random choice without replacement.

#### Choosing Multiple Items

The discrete weighted choice method can also be used for choosing multiple items from a list, whether or not the items have the same probability of being chosen. In this case, after choosing a random index, set the weight for that index to 0 to keep it from being chosen again. The pseudocode below is an example of this.

// (NOTE: This code assumes that `weights` is // a list that can be modified. If the original weights // are needed for something else, a copy of that // list should be made first, but the copying process // is not shown here. This code also assumes that `list`, // a list of items, was already declared earlier and // has at least as many items as `weights`.) chosenItems = NewList() i = 0 // Choose k items from the list while i < k or i < size(weights) index = DiscreteWeightedChoice(weights) // Set the weight for the chosen index to 0 // so it won't be chosen again weights[index] = 0 // Add the item at the chosen index AddItem(chosenItems, list[index]) end // `chosenItems` now contains the items chosen

Sorting a list of items such that higher-weighted items are more likely to appear first is equivalent to the technique presented in this section.

#### Piecewise Constant Distribution

The discrete weighted choice method can also be used to implement a *piecewise constant distribution*, as in the following example. Assume we have the following list: `[0, 5, 10, 11, 15]`

, and `weights`

is the following: `[3, 15, 1, 2]`

. Note that the weight list has one fewer item than the number list. The weight for "0 to 5" (0 or greater, less than 5) is 3, and the weight for "5 to 10" is 15. Since "5 to 10" has a higher weight than "0 to 5", this distribution will choose a number from 5 to 10 more often than a number from 0 to 5. The following pseudocode implements the piecewise constant distribution.

// Choose a random index index = DiscreteWeightedChoice(weights) // Choose a random number in the chosen interval on the list number = RNDNUMEXCRANGE(list[index], list[index + 1])

The code above implements the distribution *with replacement*. Implementing the distribution *without replacement* is similar to implementing discrete weighted choice without replacement; the only change is to replace `AddItem(items, list[index])`

with `AddItem(items, RNDNUMEXCRANGE(list[index], list[index + 1]))`

in the pseudocode.

### Continuous Weighted Choice

The continuous weighted choice method generates a random number that follows a continuous probability distribution (here, a *piecewise linear distribution*).

The pseudocode below takes two lists as follows:

`list`

is a list of numbers (which need not be integers). If the numbers are arranged in ascending order, which they should, the first number in this list can be returned exactly, but not the last number.`weights`

is a list of weights for the given numbers (where each number and its weight have the same index in both lists). The greater a number's weight, the more likely it is that a number close to that number will be chosen. Each weight should be 0 or greater.

METHOD ContinuousWeightedChoice(list, weights) if size(list) <= 0 or size(weights) < size(list): return error if size(list) == 1: return list[0] // Get the sum of all areas between weights sum = 0 areas = NewList() i = 0 while i < size(list) - 1 weightArea = abs((weights[i] + weights[i + 1]) * 0.5 * (list[i + 1] - list[i])) AddItem(areas, weightArea) sum = sum + weightArea i = i + 1 end // Choose a random number value = RNDNUMEXCRANGE(0, sum) // Interpolate a number according to the given value i=0 // Get the number corresponding to the random number runningValue = 0 while i < size(list) - 1 area = areas[i] if area > 0 newValue = runningValue + area // NOTE: Includes start, excludes end if value < newValue // NOTE: The following line can also read // "interp = RNDU01OneExc()", that is, a new number is generated // within the chosen area rather than using the point // already generated. interp = (value - runningValue) / (newValue - runningValue) retValue = list[i] + (list[i + 1] - list[i]) * interp return retValue end runningValue = newValue end i = i + 1 end // Last resort (might happen because rounding // error happened somehow) return list[size(list) - 1] END METHOD

**Example**: Assume `list`

is the following: `[0, 1, 2, 2.5, 3]`

, and `weights`

is the following: `[0.2, 0.8, 0.5, 0.3, 0.1]`

. The weight for 2 is 0.5, and that for 2.5 is 0.3. Since 2 has a higher weight than 2.5, numbers near 2 are more likely to be chosen than numbers near 2.5 with the `ContinuousWeightedChoice`

method.

### Random Numbers from a Distribution of Data Points

To generate a random number (or data point) based on the distribution of a list of numbers (or data points)—

- choose one of the numbers or points at random (see, for example, Choosing a Random Item from a List), and
- add a randomized "jitter" to the chosen number or point; for example—
- add
`Normal(0, sigma)`

to the chosen number, where`sigma`

is the*bandwidth*(which should be as small as allows the estimated distribution to fit the data and remain smooth), or - add a separately generated
`Normal(0, sigma)`

to each component of the chosen point, where`sigma`

is the*bandwidth*^{(8)}.

- add

A detailed discussion on how to calculate bandwidth or on other possible ways to add randomized "jitter" (whose distribution is formally called a *kernel*) is outside the scope of this document. For further information on *kernel density estimation*, which the random number generation technique here is related to, see the Wikipedia articles on single-variable and multiple-variable estimation, or a blog post by M. Kay.

### Random Numbers from an Arbitrary Distribution

Many probability distributions can be defined in terms of any of the following:

- The
*cumulative distribution function*, or*CDF*, returns, for each number, the probability for a randomly generated variable to be equal to or less than that number; the probability is in the interval [0, 1]. - The
*probability density function*, or*PDF*, is the derivative (instantaneous rate of change) of the distribution's CDF (that is, PDF(x) = CDF′(x)). The CDF is also defined as the*integral*of the PDF. Each value of the PDF must be 0 or greater.

If a probability distribution's **PDF is known**, one of the following techniques, among others, can be used to generate random numbers that follow that distribution.

- Use the PDF to calculate the weights for a number of sample points (usually regularly spaced). Create one list with the sampled points in ascending order (the
`list`

) and another list of the same size with the PDF's values at those points (the`weights`

). Finally generate`ContinuousWeightedChoice(list, weights)`

to generate a random number bounded by the lowest and highest sampled point. This technique can be used even if the area under the PDF isn't 1.**OR** - Use
*inverse transform sampling*. Generate`ICDF(RNDU01ZeroOneExc())`

, where`ICDF(X)`

is the distribution's*inverse cumulative distribution function*(*inverse CDF*, or inverse of the CDF) assuming the area under the PDF is 1.**OR** Use

*rejection sampling*. Choose the lowest and highest random number to generate (`minValue`

and`maxValue`

, respectively) and find the maximum value of the PDF at or between those points (`maxDensity`

). The rejection sampling approach is then illustrated with the following pseudocode, where`PDF(X)`

is the distribution's PDF (see also Saucier 2000, p. 39). This technique can be used even if the area under the PDF isn't 1.METHOD ArbitraryDist(minValue, maxValue, maxDensity) if minValue >= maxValue: return error while True: x=RNDNUMEXCRANGE(minValue, maxValue) y=RNDNUMEXCRANGE(0, maxDensity) if y < PDF(x): return x end END METHOD

If both **a PDF and a uniform random variable in the interval [0, 1) ( randomVariable)** are given, then one of the following techniques can be used to generate a random number that follows that distribution:

- Do the same process as method 1, given earlier, except—
- divide the weights in the
`weights`

list by the sum of all weights, and - use a modified version of
`ContinuousWeightedChoice`

that uses`randomVariable`

rather than generating a new random number.**OR**

- divide the weights in the
- Generate
`ICDF(randomVariable)`

, where`ICDF(X)`

is the distribution's inverse CDF (see method 2, given earlier).

If the distribution's **CDF is known**, generate `ICDF(RNDU01ZeroOneExc())`

, where `ICDF(X)`

is the inverse of that CDF.

**Note:** Further details on inverse transform sampling or on how to find integrals or inverses, as well as lists of PDFs and CDFs, are outside the scope of this page.

### Mixtures of Distributions

A *mixture* consists of two or more sampling distributions with separate probabilities of being sampled.
To generate random content from a mixture—

- generate
`index = DiscreteWeightedChoice(weights)`

, where`weights`

is a list of relative probabilities that each distribution in the mixture will be sampled, then - based on the value of
`index`

, generate the random content from the corresponding distribution.

**Examples:**

One mixture consists of two normal distributions with two different means: 1 and -1, but the mean 1 normal will be sampled 80% of the time. The following pseudocode shows how this mixture can be sampled:

index = DiscreteWeightedChoice([80, 20]) number = 0 // If index 0 was chosen, sample from the mean 1 normal if index==0: number = Normal(1, 1) // Else index 1 was chosen, sample from the mean -1 normal else: number = Normal(-1, 1)

Choosing a point uniformly at random from a complex shape (in any number of dimensions) is equivalent to sampling uniformly from a mixture of simpler shapes that make up the complex shape (here, the

`weights`

list holds the area of each simpler shape). For example, a simple closed 2D polygon can be*triangulated*, or decomposed into triangles, and a mixture of those triangles can be sampled.^{(9)}For generating a random integer from multiple nonoverlapping ranges of integers—

- each range has a weight of
`(mx - mn) + 1`

, where`mn`

is that range's minimum and`mx`

is its maximum, and - the chosen range is sampled by generating
`RNDINTRANGE(mn, mx)`

, where`mn`

is the that range's minimum and`mx`

is its maximum.

For generating random numbers, that may or may not be integers, from nonoverlapping number ranges, each weight is

`mx - mn`

instead and the number is sampled by`RNDNUMEXCRANGE(mn, mx)`

instead.- each range has a weight of

### Censored and Truncated Distributions

To sample from a *censored* probability distribution, generate a random number from that distribution and—

- if that number is less than a minimum threshold, use the minimum threshold instead, and/or
- if that number is greater than a maximum threshold, use the maximum threshold instead.

To sample from a *truncated* probability distribution, generate a random number from that distribution and, if that number is less than a minimum threshold and/or higher than a maximum threshold, repeat this process.

### Correlated Random Numbers

According to (Saucier 2000), sec. 3.8, to generate two correlated (dependent) random variables—

- generate two independent and identically distributed random variables
`x`

and`y`

(for example, two`Normal(0, 1)`

variables or two`RNDU01()`

variables), and - calculate
`[x, y*sqrt(1 - rho * rho) + rho * x]`

, where`rho`

is a*correlation coefficient*in the interval -1, 1.

Another way to generate correlated random numbers is explained in the section "Gaussian Copula".

## Specific Non-Uniform Distributions

This section contains information on some of the most common non-uniform sampling distributions.

### Dice

The following method generates a random result of rolling virtual dice.^{(10)} It takes three parameters: the number of dice (`dice`

), the number of sides in each die (`sides`

), and a number to add to the result (`bonus`

) (which can be negative, but the result of the subtraction is 0 if that result is greater).

METHOD DiceRoll(dice, sides, bonus) if dice < 0 or sides < 1: return error if dice == 0: return 0 if sides == 1: return dice ret = 0 if dice > 50 // If there are many dice to roll, // use a faster approach, noting that // the dice-roll distribution approaches // a "discrete" normal distribution as the // number of dice increases. mean = dice * (sides + 1) * 0.5 sigma = sqrt(dice * (sides * sides - 1) / 12) ret = -1 while ret < dice or ret > dice * sides ret = floor(Normal(mean, sigma) + 0.5) end else i = 0 while i < dice ret = ret + RNDINTRANGE(1, sides) i = i + 1 end end ret = ret + bonus if ret < 0: ret = 0 return ret END METHOD

As examples, the result of rolling—

- four six-sided virtual dice ("4d6") is
`DiceRoll(4,6,0)`

, - three ten-sided virtual dice, with 4 added ("3d10 + 4"), is
`DiceRoll(3,10,4)`

, and - two six-sided virtual dice, with 2 subtracted ("2d6 - 2"), is
`DiceRoll(2,6,-2)`

.

### Normal (Gaussian) Distribution

The normal distribution (also called the Gaussian distribution) can model many kinds of measurements or scores whose values are most likely around a given average and are less likely the farther away from that average on either side.

In the pseudocode below, which uses the polar method ^{(11)} to generate two normally-distributed random numbers:

`mu`

(μ) is the mean (average), or the peak of the distribution's "bell curve".`sigma`

(σ), the standard deviation, affects how wide the "bell curve" appears. The probability that a normally-distributed random number will be within one standard deviation from the mean is about 68.3%; within two standard deviations (2 times`sigma`

), about 95.4%; and within three standard deviations, about 99.7%.

METHOD Normal2(mu, sigma) while true a = RNDU01() b = RNDU01() if a != 0 and RNDINT(1) == 0: a = 0 - a if b != 0 and RNDINT(1) == 0: b = 0 - b c = a * a + b * b if c != 0 and c <= 1 c = sqrt(-2 * ln(c) / c) return [c * a, c * b] end end END METHOD

Since `Normal2`

returns two numbers instead of one, but many applications require only one number at a time, a problem arises on how to return one number while storing the other for later retrieval. Ways to solve this problem are outside the scope of this page, however. The name `Normal`

will be used in this document to represent a method that returns only one normally-distributed random number rather than two.

Alternatively, or in addition, the following method (implementing a ratio-of-uniforms technique) can be used to generate normally distributed random numbers.

METHOD Normal(mu, sigma) bmp = 0.8577638849607068 // sqrt(2/exp(1)) while true a=RNDU01ZeroExc() b=RNDNUMRANGE(-bmp,bmp) if b*b <= -a * a * 4 * ln(a) return (b * sigma / a) + mu end end END METHOD

**Notes:**

- In a
*standard normal distribution*,`mu`

= 0 and`sigma`

= 1. - Note that if variance is given, rather than standard deviation, the standard deviation (
`sigma`

) is the variance's square root.

### Binomial Distribution

The following method generates a random integer that follows a binomial distribution. This number—

- expresses the number of successes that have happened after a given number of independently performed trials
(expressed as
`trials`

below), where the probability of a success in each trial is`p`

(which ranges from 0, never, to 1, always, and which can be 0.5, meaning an equal chance of success or failure), and - is also known as
*Hamming distance*, if each trial is treated as a "bit" that's set to 1 for a success and 0 for a failure, and if`p`

is 0.5.

METHOD Binomial(trials, p) if trials < 0: return error if trials == 0: return 0 // Always succeeds if p >= 1.0: return trials // Always fails if p <= 0.0: return 0 i = 0 count = 0 // Suggested by Saucier, R. in "Computer // generation of probability distributions", 2000, p. 49 tp = trials * p if tp > 25 or (tp > 5 and p > 0.1 and p < 0.9) countval = -1 // "countval while countval < 0 or countval > trials countval = floor(Normal(tp, tp) + 0.5) end return countval end if p == 0.5 while i < trials if RNDINT(1) == 0 // Success count = count + 1 end i = i + 1 end else while i < trials if RNDU01OneExc() < p // Success count = count + 1 end i = i + 1 end end return count END METHOD

**Examples:**

- If
`p`

is 0.5, the binomial distribution models the task "Flip N coins, then count the number of heads." - The idiom
`Binomial(N, 0.5) >= C`

is true if at least C coins, among N coins flipped, show the successful outcome (for example, heads if heads is the successful outcome). - The idiom
`Binomial(N, 1/S)`

models the task "Roll N S-sided dice, then count the number of dice that show the number S."

### Poisson Distribution

In the following method, which generates a random integer that follows a *Poisson distribution*—

`mean`

is the average number of independent events of a certain kind per fixed unit of time or space (for example, per day, hour, or square kilometer), and can be an integer or a non-integer, and- the method's return value—
- gives a random number of such events during one such unit, and
- is such that the average of the return values approaches
`mean`

when this method is repeatedly given the same value for`mean`

.

The method given here is based on Knuth's method from 1969.

METHOD Poisson(mean) if mean < 0: return error if mean == 0: return 0 p = 1.0 // Suggested by Saucier, R. in "Computer // generation of probability distributions", 2000, p. 49 if mean > 9 p = -1.0 while p < 0: p = floor(Normal(mean, mean) + 0.5) return p end pn = exp(-mean) count = 0 while true count = count + 1 p = p * RNDU01OneExc() if p <= pn return count - 1 end end END METHOD

### Gamma Distribution

The gamma distribution models expected lifetimes. The following method, which generates a random number that follows a gamma distribution, is based on Marsaglia and Tsang's method from 2000.

METHOD GammaDist(meanLifetime) // Must be greater than 0 if meanLifetime <= 0: return error // Exponential distribution special case if // `meanLifetime` is 1 (see also // Devroye 1986, p. 405) if meanLifetime == 1: return -ln(RNDU01ZeroOneExc()) d = meanLifetime v = 0 if meanLifetime < 1: d = d + 1 d = d - (1.0 / 3) // NOTE: 1.0 / 3 must be a fractional number c = 1.0 / sqrt(9 * d) while true x = 0 while true x = Normal(0, 1) v = c * x + 1; v = v * v * v if v > 0: break end u = RNDU01ZeroExc() x2 = x * x if u < 1 - (0.0331 * x2 * x2): break if ln(u) < (0.5 * x2) + (d * (1 - v + ln(v))): break end ret = d * v if meanLifetime < 1 ret = ret * exp(ln(RNDU01ZeroExc()) / meanLifetime) end return ret end

Extended versions of the gamma distribution:

- The two-parameter gamma distribution (
`GammaDist2(a, b)`

), where`b`

is the scale, is`GammaDist(a) * b`

. Here,`a`

can be seen as the mean lifetime in unspecified units of time, and`b`

indicates the size of each unit of time. - The three-parameter gamma distribution (
`GammaDist3(a, b, c)`

), where`c`

is another shape parameter, is`pow(GammaDist(a), 1.0 / c) * b`

. - The four-parameter gamma distribution (
`GammaDist4(a, b, c, d)`

), where`d`

is the minimum value, is`pow(GammaDist(a), 1.0 / c) * b + d`

.

### Negative Binomial Distribution

A random integer that follows a *negative binomial distribution* expresses the number of failures that have happened after seeing a given number of successes (expressed as `successes`

below), where the probability of a success in each case is `p`

(where `p <= 0`

means never, `p >= 1`

means always, and `p = 0.5`

means an equal chance of success or failure).

METHOD NegativeBinomial(successes, p) // Must be 0 or greater if successes < 0: return error // No failures if no successes or if always succeeds if successes == 0 or p >= 1.0: return 0 // Always fails (NOTE: infinity can be the maximum possible // integer value if NegativeBinomial is implemented to return // an integer) if p <= 0.0: return infinity // NOTE: If 'successes' can be an integer only, // omit the following three lines: if floor(successes) != successes return Poisson(GammaDist(successes) * (1 - p) / p) end count = 0 total = 0 if successes == 1 if p == 0.5 while RNDINT(1) == 0: count = count + 1 return count end // Geometric distribution special case (see Saucier 2000) return floor(ln(RNDU01ZeroExc()) / ln(1.0 - p)) end while true if RNDU01OneExc() < p // Success total = total + 1 if total >= successes return count end else // Failure count = count + 1 end end END METHOD

**Example:** If `p`

is 0.5 and `successes`

is 1, the negative binomial distribution models the task "Flip a coin until you get tails, then count the number of heads."

### von Mises Distribution

In the following method, which generates a random number that follows a *von Mises distribution*, which describes a distribution of circular angles—

`mean`

is the mean angle,`kappa`

is a shape parameter, and- the method can return a number within π of that mean.

The algorithm below is the Best–Fisher algorithm from 1979 (as described in Devroye 1986 with errata incorporated).

METHOD VonMises(mean, kappa) if kappa < 0: return error if kappa == 0 return RNDNUMEXCRANGE(mean-pi, mean+pi) end r = 1.0 + sqrt(4 * kappa * kappa + 1) rho = (r - sqrt(2 * r)) / (kappa * 2) s = (1 + rho * rho) / (2 * rho) while true u = RNDNUMEXCRANGE(-1, 1) v = RNDU01ZeroOneExc() z = cos(pi * u) w = (1 + s*z) / (s + z) y = kappa * (s - w) if y*(2 - y) - v >=0 or ln(y / v) + 1 - y >= 0 if angle<-1: angle=-1 if angle>1: angle=1 // NOTE: Inverse cosine replaced here // with `atan2` equivalent angle = atan2(sqrt(1-w*w),w) if u < 0: angle = -angle return mean + angle end end END METHOD

### Stable Distribution

As more and more independent and identically distributed random variables are added
together, their distribution tends to a *stable distribution*,
which resembles a curve with a single peak, but with generally "fatter" tails than the normal distribution. The pseudocode below uses the Chambers–Mallows–Stuck algorithm. The `Stable`

method, implemented below, takes two parameters:

`alpha`

is a stability index in the interval (0, 2].`beta`

is a skewness in the interval [-1, 1]); if`beta`

is 0, the curve is symmetric.

METHOD Stable(alpha, beta) if alpha <=0 or alpha > 2: return error if beta < -1 or beta > 1: return error halfpi = pi * 0.5 unif=RNDNUMEXCRANGE(-halfpi, halfpi) while unif==-halfpi: unif=RNDNUMEXCRANGE(-halfpi, halfpi) // Cauchy special case if alpha == 1 and beta == 0: return tan(unif) expo=-ln(RNDU01ZeroExc()) c=cos(unif) if alpha == 1 s=sin(unif) return 2.0*((unif*beta+halfpi)*s/c - beta * ln(halfpi*expo*c/(unif*beta+halfpi)))/pi end z=-tan(alpha*halfpi)*beta ug=unif+atan2(-z, 1)/alpha cpow=pow(c, -1.0 / alpha) return pow(1.0+z*z, 1.0 / (2*alpha))* (sin(alpha*ug)*cpow)* pow(cos(unif-alpha*ug)/expo, (1.0 - alpha) / alpha) END METHOD

Extended versions of the stable distribution:

- The four-parameter stable distribution (
`Stable4(alpha, beta, mu, sigma)`

), where`mu`

is the mean and`sigma`

is the scale, is`Stable(alpha, beta) * sigma + mu`

. - The "type 0" stable distribution (
`StableType0(alpha, beta, mu, sigma)`

) is`Stable(alpha, beta) * sigma + (mu - sigma * beta * x)`

, where`x`

is`ln(sigma)*2.0/pi`

if`alpha`

is 1, and`tan(pi*0.5*alpha)`

otherwise.

### Hypergeometric Distribution

The following method generates a random integer that follows a hypergeometric distribution.
When a given number of items are drawn at random without replacement from a collection of items
each labeled either `1`

or `0`

, the random integer expresses the number of items drawn
this way that are labeled `1`

. In the method below, `trials`

is the number of items
drawn at random, `ones`

is the number of items labeled `1`

in the set, and `count`

is
the number of items labeled `1`

or `0`

in that set.

METHOD Hypergeometric(trials, ones, count) if ones < 0 or count < 0 or trials < 0 or ones > count or trials > count return error end if ones == 0: return 0 successes = 0 i = 0 currentCount = count currentOnes = ones while i < trials and currentOnes > 0 if RNDINTEXC(currentCount) < currentOnes currentOnes = currentOnes - 1 successes = successes + 1 end currentCount = currentCount - 1 i = i + 1 end return successes END METHOD

**Example:** In a 52-card deck of Anglo-American playing cards, 12 of the cards are face
cards (jacks, queens, or kings). After the deck is shuffled and seven cards are drawn, the number
of face cards drawn this way follows a hypergeometric distribution where `trials`

is 7, `ones`

is
12, and `count`

is 52.

### Multivariate Normal Distribution

The following pseudocode calculates a random point in space that follows a multivariate normal distribution. The method `MultivariateNormal`

takes the following parameters:

- A list,
`mu`

, which indicates the means to add to each component of the random point.`mu`

can be`nothing`

, in which case each component will have a mean of zero. - A list of lists
`cov`

, that specifies a*covariance matrix*(a symmetric positive definite NxN matrix with as many rows and as many columns as components of the random point.)

For conciseness, the following pseudocode uses `for`

loops, defined as follows. `for X=Y to Z; [Statements] ; end`

is shorthand for `X = Y; while X <= Z; [Statements]; X = X + 1; end`

.

METHOD Decompose(matrix) numrows = size(matrix) if matrix[0]!=numrows: return error // Does a Cholesky decomposition of a matrix // assuming it's positive definite and invertible ret=NewList() for i = 0 to numrows - 1 submat = NewList() for j = 0 to numrows - 1 AddItem(submat, 0) end AddItem(ret, submat) end s1 = sqrt(matrix[0][0]) if s1==0: return ret // For robustness for i = 1 to numrows - 1 ret[0][i]=matrix[0][i]*1.0/s1 end for i = 1 to numrows - 1 sum=0.0 for j = 0 to i - 1 sum = sum + ret[j][i]*ret[j][i] end sq=matrix[i][i]-sum if sq<0: sq=0 // For robustness ret[i][i]=math.sqrt(sq) end for j = 1 to numrows - 1 for i = j + 1 to numrows - 1 // For robustness if ret[j][j]==0: ret[j][i]=0 if ret[j][j]!=0 sum=0 for k = 0 to j - 1 sum = sumret[k][i]*ret[k][j] end ret[j][i]=(matrix[j][i]-sum)*1.0/ret[j][j] end end end return ret END METHOD METHOD MultivariateNormal(mu, cov) mulen=size(cov) if mu != nothing mulen = size(mu) if mulen!=size(cov): return error if mulen!=size(cov[0]): return error end // NOTE: If multiple random points will // be generated using the same covariance // matrix, an implementation can consider // precalculating the decomposed matrix // in advance rather than calculating it here. cho=Decompose(cov) i=0 ret=NewList() while i<mulen nv=Normal(0,1) if mulen == nothing: sum = 0 else: sum=mu[i] j=0 while j<mulen sum=sum+nv*cho[j][i] j=j+1 end AddItem(ret, sum) i=i+1 end return ret end

### Dirichlet Distribution: Random Numbers with a Given Positive Sum

The *Dirichlet distribution* models a distribution of N numbers that sum to a given positive number, `total`

. Generating N `GammaDist(total)`

numbers and dividing them by their sum will result in N random numbers that (approximately) sum to `total`

(see the Wikipedia article). For example, if `total`

is 1, the numbers will (approximately) sum to 1. Note that in the exceptional case that all numbers are 0, the process should repeat. (A more general version of the Dirichlet distribution allows the parameter in `GammaDist`

to vary for each random number.)

The following pseudocode shows how to generate random integers with a given positive sum. (The algorithm for this was presented in Smith and Tromble, "Sampling Uniformly from the Unit Simplex", 2004.) In the pseudocode below—

- the method
`NonzeroIntegersWithSum`

returns`n`

positive integers that sum to`total`

, - the method
`IntegersWithSum`

returns`n`

nonnegative integers that sum to`total`

, and `Sort(list)`

sorts the items in`list`

in ascending order (note that details on sort algorithms are outside the scope of this document).

METHOD NonzeroIntegersWithSum(n, total) if n <= 0 or total <=0: return error ls = NewList() ret = NewList() AddItem(ls, 0) while size(ls) < n c = RNDINTEXCRANGE(1, total) found = false j = 1 while j < size(ls) if ls[j] == c found = true break end j = j + 1 end if found == false: AddItem(ls, c) end Sort(ls) AddItem(ls, total) i = 1 while i < size(ls): AddItem(ret, list[i] - list[i - 1]) i = i + 1 end return ret END METHOD METHOD IntegersWithSum(n, total) if n <= 0 or total <=0: return error ret = NonzeroIntegersWithSum(n, total + n) i = 0 while i < size(ret): ret[i] = ret[i] - 1 i = i + 1 end return ret END METHOD

**Notes:**

- The problem of generating N random numbers with a given positive sum
`sum`

is equivalent to the problem of generating a uniformly distributed point inside an N-dimensional simplex (simplest convex figure) whose edges all have a length of`sum`

units. - Generating
`N`

random numbers with a given positive average`avg`

is equivalent to generating`N`

random numbers with the sum`N * avg`

. - Generating
`N`

random numbers`min`

or greater and with a given positive sum`sum`

is equivalent to generating`N`

random numbers with the sum`sum - n * min`

, then adding`min`

to each number generated this way.

### Multinomial Distribution

The *multinomial distribution* models the number of times each of several mutually exclusive events happens among a given number of trials, where each event can have a separate probability of happening. The pseudocode below is of a method that takes two parameters: `trials`

, which is the number of trials, and `weights`

, which are the relative probabilities of each event. The method tallies the events as they happen and returns a list (with the same size as `weights`

) containing the number of successes for each event.

METHOD Multinomial(trials, weights) if trials < 0: return error // create a list of successes list = NewList() i = 0 while i < size(weights) AddItem(list, 0) i = i + 1 end i = 0 while i < trials // Choose an index index = DiscreteWeightedChoice(weights) // Tally the event at the chosen index list[index] = list[index] + 1 i = i + 1 end return list END METHOD

### Gaussian Copula

Correlated random numbers can be generated by sampling from a multivariate normal distribution, then converting the resulting numbers to uniformly-distributed numbers. In the following pseudocode, which generates correlated uniformly-distributed random numbers this way:

- The parameter
`covar`

is the covariance matrix for the multivariate normal distribution. `erf(v)`

is the error function of the variable`v`

. It's provided here because some popular programming languages, such as JavaScript at the time of this writing, don't include a built-in version of`erf`

. In the method,`EPSILON`

is a very small number to end the iterative calculation.

The pseudocode below is one example of a *copula* (a distribution of groups of two or more correlated uniform random numbers), namely, a *Gaussian copula*.

METHOD erf(v) if v==0: return 0 if v<0: return erf(-v) if v==infinity: return 1 // NOTE: For Java `double`, the following // line can be added: // if v>=6: return 1 i=1 ret=0 zp=-(v*v) zval=1.0 den=1.0 while i < 100 r=v*zval/den den=den+2 ret=ret+r // NOTE: EPSILON can be 10^14, // for example. if abs(r)<EPSILON: break if i==1: zval=zp else: zval = zval*zp/i i = i + 1 end return ret*2/sqrt(pi) END METHOD METHOD GaussianCopula(covar) mvn=MultivariateNormal(nothing, covar) i = 0 sqrt2=sqrt(2) while i < size(covar) stdev=sqrt(covar[i][i]) // Apply the standard normal distribution's CDF // function to get uniform variables mvn[i] = (erf(mvn[i]/(sqrt2))+1)*0.5 i = i + 1 end return mvn END METHOD

Each of the resulting uniform variables will be in the interval [0, 1], and each one can be further transformed to any other probability distribution (which is called a *marginal distribution* here) by one of the methods given in "Random Numbers from an Arbitrary Distribution". (See also Cario and Nelson 1997.)

**Example**: To generate two correlated uniform variables by this method, generate `GaussianCopula([[1, rho], [rho, 1]])`

, where `rho`

is the Pearson correlation coefficient, in the interval [-1, 1]. (Note that *rank correlation* parameters, which can be converted to `rho`

, can better describe the correlation than `rho`

itself. For example, for a bivariate normal distribution, the Spearman coefficient `srho`

can be converted to `rho`

by `rho = sin(srho * pi / 6) * 2`

. Rank correlation parameters are not further discussed in this document.)

### Other Non-Uniform Distributions

Most commonly used:

**Beta distribution (**:`BetaDist(a, b)`

)`x / (x + GammaDist(b))`

, where`x`

is`GammaDist(a)`

and`a`

and`b`

are the two parameters of the beta distribution. The range of the beta distribution is [0, 1).**Cauchy (Lorentz) distribution**:`scale * tan(pi * (RNDU01OneExc()-0.5)) + mu`

, where`mu`

and`scale`

are the two parameters of the Cauchy distribution. This distribution is similar to the normal distribution, but with "fatter" tails.**Chi-squared distribution**:`GammaDist(df * 0.5 + Poisson(sms * 0.5)) * 2`

, where`df`

is the number of degrees of freedom and`sms`

is the sum of mean squares (where`sms`

other than 0 indicates a*noncentral*distribution).**Exponential distribution**:`-ln(RNDU01ZeroExc()) / lamda`

, where`lamda`

is the inverse scale. The`lamda`

is usually the probability that an independent event of a given kind will occur in a given span of time (such as in a given day or year). (This distribution is thus useful for modeling a*Poisson process*.)`1.0 / lamda`

is the scale (mean), which is usually the average waiting time between two independent events of the same kind.**Extreme value distribution**:`a - ln(-ln(RNDU01ZeroOneExc())) * b`

, where`b`

is the scale and`a`

is the location of the distribution's curve peak (mode). This expresses a distribution of maximum values.**Geometric distribution**:`NegativeBinomial(1, p)`

, where`p`

has the same meaning as in the negative binomial distribution. As used here, this is the number of failures that have happened before a success happens. (Saucier 2000, p. 44, also mentions an alternative definition that includes the success.)**Gumbel distribution**:`a + ln(-ln(RNDU01ZeroOneExc())) * b`

, where`b`

is the scale and`a`

is the location of the distribution's curve peak (mode). This expresses a distribution of minimum values.**Inverse gamma distribution**:`b / GammaDist(a)`

, where`a`

and`b`

have the same meaning as in the two-parameter gamma distribution.**Laplace (double exponential) distribution**:`(ln(RNDU01ZeroExc()) - ln(RNDU01ZeroExc())) * beta + mu`

, where`beta`

is the scale and`mu`

is the mean.**Logarithmic distribution**:`min + (max - min) * RNDU01OneExc() * RNDU01OneExc()`

, where`min`

is the minimum value and`max`

is the maximum value (Saucier 2000, p. 26). In this distribution, numbers closer to`min`

are exponentially more likely than numbers closer to`max`

.**Logarithmic normal distribution**:`exp(Normal(mu, sigma))`

, where`mu`

and`sigma`

have the same meaning as in the normal distribution.**Pareto distribution**:`pow(RNDU01ZeroOneExc(), -1.0 / alpha) * minimum`

, where`alpha`

is the shape and`minimum`

is the minimum.**Rayleigh distribution**:`a * sqrt(-2 * ln(RNDU01ZeroExc()))`

, where`a`

is the scale and is greater than 0.**Student's**:*t*-distribution`Normal(cent, 1) / sqrt(GammaDist(df * 0.5) * 2 / df)`

, where`df`

is the number of degrees of freedom, and*cent*is the mean of the normally-distributed random number. A`cent`

other than 0 indicates a*noncentral*distribution.**Triangular distribution**:`ContinuousWeightedChoice([startpt, midpt, endpt], [0, 1, 0])`

. The distribution starts at`startpt`

, peaks at`midpt`

, and ends at`endpt`

.**Weibull distribution**:`b * pow(-ln(RNDU01ZeroExc()),1.0 / a)`

, where`a`

is the shape,`b`

is the scale, and`a`

and`b`

are greater than 0.

Miscellaneous:

**Arcsine distribution**:`min + (max - min) * BetaDist(0.5, 0.5)`

, where`min`

is the minimum value and`max`

is the maximum value (Saucier 2000, p. 14).**Beta binomial distribution**:`Binomial(trials, BetaDist(a, b))`

, where`a`

and`b`

are the two parameters of the beta distribution, and`trials`

is a parameter of the binomial distribution.**Beta-PERT distribution**:`startpt + size * BetaDist(1.0 + (midpt - startpt) * shape / size, 1.0 + (endpt - midpt) * shape / size)`

. The distribution starts at`startpt`

, peaks at`midpt`

, and ends at`endpt`

,`size`

is`endpt - startpt`

, and`shape`

is a shape parameter that's 0 or greater, but usually 4. If the mean (`mean`

) is known rather than the peak,`midpt = 3 * mean / 2 - (startpt + endpt) / 4`

.**Beta prime distribution**:`x / (1 - x)`

, where`x`

is`BetaDist(a, b)`

and`a`

and`b`

are the two parameters of the beta distribution.**Beta negative binomial distribution**:`NegativeBinomial(successes, BetaDist(a, b))`

, where`a`

and`b`

are the two parameters of the beta distribution, and`successes`

is a parameter of the negative binomial distribution. If*successes*is 1, the result is a*Waring–Yule distribution*.**Binormal distribution**:`MultivariateNormal([mu1, mu2], [[s1*s1, s1*s2*rho], [rho*s1*s2, s2*s2]])`

, where`mu1`

and`mu2`

are the means of the two random variables,`s1`

and`s2`

are their standard deviations, and`rho`

is a*correlation coefficient*greater than -1 and less than 1.**Chi distribution**:`sqrt(GammaDist(df * 0.5) * 2)`

, where`df`

is the number of degrees of freedom.**Cosine distribution**:`min + (max - min) * atan2(x, sqrt(1 - x * x)) / pi`

, where`x = RNDNUMRANGE(-1, 1)`

and`min`

is the minimum value and`max`

is the maximum value (Saucier 2000, p. 17; inverse sine replaced with`atan2`

equivalent).**Double logarithmic distribution**:`min + (max - min) * (0.5 + (RNDINT(1) * 2 - 1) * 0.5 * RNDU01OneExc() * RNDU01OneExc())`

, where`min`

is the minimum value and`max`

is the maximum value (see also Saucier 2000, p. 15, which shows the wrong X axes).**Erlang distribution**:`GammaDist(shape) / rate`

, where`shape`

and`rate`

are the two parameters of the Erlang distribution.**Generalized extreme value (Fisher–Tippett) distribution**:`a - (pow(-ln(RNDU01ZeroExc()), -c) - 1) * b / c`

if`c != 0`

, or`a - ln(-ln(RNDU01ZeroOneExc())) * b`

otherwise, where`b`

is the scale,`a`

is the location of the distribution's curve peak (mode), and`c`

is a shape parameter. This expresses a distribution of maximum values.**Half-normal distribution**:`abs(Normal(0, sqrt(pi * 0.5) / invscale)))`

, where`invscale`

is a parameter of the half-normal distribution.**Inverse chi-squared distribution**:`df * scale / (GammaDist(df * 0.5) * 2)`

, where`df`

is the number of degrees of freedom and`scale`

is the scale, usually`1.0 / df`

.**Inverse Gaussian distribution (Wald distribution)**: Generate`n = mu + (mu*mu*y/(2*lamda)) - mu * sqrt(4 * mu * lamda * y + mu * mu * y * y) / (2 * lamda)`

, where`y = pow(Normal(0, 1), 2)`

, then return`n`

if`RNDU01OneExc() <= mu / (mu + n)`

, or`mu * mu / n`

otherwise.`mu`

is the mean and`lamda`

is the scale; both parameters are greater than 0. Based on method published in Devroye 1986.**Kumaraswamy distribution**:`min + (max - min) * pow(1-pow(RNDU01ZeroExc(),1.0/b),1.0/a)`

, where`a`

and`b`

are shape parameters,`min`

is the minimum value, and`max`

is the maximum value.**Lévy distribution**:`sigma * 0.5 / GammaDist(0.5) + mu`

, where`mu`

is the location and`sigma`

is the dispersion.**Logarithmic series distribution**:`floor(1.0 + ln(RNDU01ZeroExc()) / ln(1.0 - pow(1.0 - param, RNDU01ZeroOneExc())))`

, where`param`

is a number greater than 0 and less than 1. Based on method described in Devroye 1986.**Logistic distribution**:`(ln(x/(1.0 - x)) * scale + mean`

, where`x`

is`RNDU01ZeroOneExc()`

and`mean`

and`scale`

are the two parameters of the logistic distribution.**Maxwell distribution**:`scale * sqrt(GammaDist(1.5) * 2)`

, where`scale`

is the scale.**Parabolic distribution**:`min + (max - min) * BetaDist(2, 2)`

, where`min`

is the minimum value and`max`

is the maximum value (Saucier 2000, p. 30).**Pascal distribution**:`NegativeBinomial(successes, p) + successes`

, where`successes`

and`p`

have the same meaning as in the negative binomial distribution, except`successes`

must be an integer.**Pearson VI distribution**:`GammaDist(v) / (GammaDist(w))`

, where`v`

and`w`

are shape parameters greater than 0 (Saucier 2000, p. 33; there, an additional`b`

parameter is defined, but that parameter is canceled out in the source code).**Power distribution**:`pow(RNDU01ZeroOneExc(), 1.0 / alpha)`

, where`alpha`

is the shape. Nominally in the interval (0, 1).**Power law distribution**:`pow(pow(mn,n+1) + (pow(mx,n+1) - pow(mn,n+1)) * RNDU01(), 1.0 / (n+1))`

, where`n`

is the exponent,`mn`

is the minimum, and`mx`

is the maximum. Reference.**Skellam distribution**:`Poisson(mean1) - Poisson(mean2)`

, where`mean1`

and`mean2`

are the means of the two Poisson variables.**Skewed normal distribution**:`Normal(0, x) + mu + alpha * abs(Normal(0, x))`

, where`x`

is`sigma / sqrt(alpha * alpha + 1.0)`

,`mu`

and`sigma`

have the same meaning as in the normal distribution, and`alpha`

is a shape parameter.**Snedecor's (Fisher's)**:*F*-distribution`GammaDist(m * 0.5) * n / (GammaDist(n * 0.5 + Poisson(sms * 0.5)) * m)`

, where`m`

and`n`

are the numbers of degrees of freedom of two random numbers with a chi-squared distribution, and if`sms`

is other than 0, one of those distributions is*noncentral*with sum of mean squares equal to`sms`

.**Zeta distribution**: Generate`n = floor(pow(RNDU01ZeroOneExc(), -1.0 / r))`

, and if`d / pow(2, r) < (d - 1) * RNDU01OneExc() * n / (pow(2, r) - 1.0)`

, where`d = pow((1.0 / n) + 1, r)`

, repeat this process. The parameter`r`

is greater than 0. Based on method described in Devroye 1986. A zeta distribution truncated by rejecting random values greater than some positive integer is called a*Zipf distribution*or*Estoup distribution*. (Note that Devroye uses "Zipf distribution" to refer to the untruncated zeta distribution.)

## Geometric Sampling

This section contains various geometric sampling techniques.

### Random Point Inside a Triangle

The following pseudocode, which
generates a uniformly random point inside a 2-dimensional triangle,
takes three parameters, `p0`

, `p1`

, and `p2`

, each of which is a 2-item list containing the X and Y
coordinates, respectively, of one vertex of the triangle.

METHOD RandomPointInTriangle( x1=p1[0]-p0[0] y1=p1[1]-p0[1] x2=p2[0]-p0[0] y2=p2[1]-p0[1] den=(x1*y2-x2*y1) // Avoid division by zero if den==0: den=0.0000001 r=RNDU01() s=RNDU01() xv=r*x1 + s*x2 yv=r*y1 + s*y2 a=(xv*y2 - x2*yv)/den b=(x1*yv - xv*y1)/den if a<=0 or b<=0 or a+b>=1 return [x1+x2+p0[0]-xv,y1+y2+p0[1]-yv] else return [p0[0]+xv, p0[1]+yv] end end

### Random Points on the Surface of a Hypersphere

To generate an N-dimensional point on the surface of an N-dimensional hypersphere of radius R, generate N `Normal(0, 1)`

random numbers, then divide them by `R / X`

, where `X`

is those numbers' *norm* (if `X`

is 0, the process should repeat). A hypersphere's surface is formed by all points lying 1 unit away from a common point in N-dimensional space. Based on a technique described in MathWorld.

This problem is equivalent to generating a random unit vector (vector with length 1) in N-dimensional space.

### Random Points Inside a Ball

To generate an N-dimensional point inside an N-dimensional ball of radius R, either—

- generate N
`Normal(0, 1)`

random numbers, generate`X = sqrt( S - ln(RNDU01ZeroExc()))`

, where`S`

is the sum of squares of the random numbers, and multiply each random number by`R / X`

(if`X`

is 0, the process should repeat), or - generate N
`RNDNUMRANGE(-R, R)`

random numbers^{(12)}until their*norm*is R or less,

although the former method "may ... be slower" "in practice", according to a MathWorld article, which was the inspiration for the two methods given here.

If the ball is hollow, that is, only points within a range of distances from the center of the ball are allowed, then use either method given earlier to generate a random point for a ball of radius equal to the maximum allowed distance, until the *norm* of the numbers making up that point is within the desired range of distances.

### Random Latitude and Longitude

To generate a random latitude and longitude on a sphere (in radians with west and south coordinates negative) such that the resulting point is (practically) uniformly distributed on the surface of a sphere—

- generate the longitude
`RNDNUMEXCRANGE(-pi, pi)`

, where the longitude ranges from -π to π, and - generate the latitude
`atan2(sqrt(1 - x * x), x) - pi / 2`

, where—`x = RNDNUMRANGE(-1, 1)`

and the latitude ranges from -π/2 to π/2 (the range includes the poles, which have many equivalent forms), or`x = 2 * RNDU01ZeroOneExc() - 1`

and the latitude ranges from -π/2 to π/2 (the range excludes the poles).

Reference: "Sphere Point Picking" in MathWorld (replacing inverse cosine with `atan2`

equivalent).

## Conclusion

This page discussed many ways applications can extract random numbers from random number generators.

Feel free to send comments. They may help improve this page. In particular, corrections to any method given on this page are welcome.

I acknowledge the commenters to the CodeProject version of this page, including George Swan, who referred me to the reservoir sampling method.

Currently, the following are not covered in this document, but may be added based on reader interest:

- Techniques to generate random mazes, graphs, matrices, or paths.
- Brownian motion and other random movement and stochastic processes.

## Notes

^{(1)} This definition includes RNGs that—

- seek to generate random numbers that are at least cost-prohibitive (but not necessarily
*impossible*) to predict, - merely seek to generate number sequences likely to pass statistical tests of randomness,
- are initialized automatically before use,
- are initialized with an application-specified "seed",
- use a deterministic algorithm for random number generation,
- rely, at least primarily, on one or more nondeterministic sources for random number generation (including by extracting uniformly distributed bits from two or more such sources), or
- have a combination of the foregoing properties.

If a number generator uses a nonuniform distribution, but otherwise meets this definition, then it can be converted to one with a uniform distribution, at least in theory, by applying the nonuniform distribution's *cumulative distribution function* (CDF) to each generated number (see also "Random Numbers from an Arbitrary Distribution"). Further details on this kind of conversion, as well a list of CDFs, are outside the scope of this document.

^{(2)} For an exercise solved by this method, see A. Koenig and B. E. Moo, *Accelerated C++*, 2000; see also a blog post by Johnny Chan.

Note that if `MODULUS`

is a power of 2 (for example, 256 or 2^{32}), the `RNDINT`

implementation given may leave unused bits (for example, when truncating a random number to `wordBits`

bits or in the special cases at the start of the method). How a more sophisticated implementation may save those bits for later reuse is beyond this page's scope.

^{(3)} `RNDINTEXC`

is not given as the core random generation method because it's harder to fill integers in popular integer formats with random bits with this method.

^{(4)} In situations where loops are not possible, such as within an SQL query, the idiom `min(floor(RNDU01OneExc() * maxExclusive, maxExclusive - 1))`

, where `min(a,b)`

is the smaller of `a`

and `b`

, returns an integer in the interval [0, `maxExclusive`

); however, such an idiom can have a slight, but for most purposes negligible, bias toward `maxExclusive - 1`

.

^{(5)} This number format describes B-bit signed integers with minimum value -2^{B-1} and maximum value 2^{B-1} - 1, where B is a positive even number of bits; examples include Java's `short`

, `int`

, and `long`

, with 16, 32, and 64 bits, respectively. A *signed integer* is an integer that can be positive, zero, or negative. In *two's-complement form*, nonnegative numbers have the highest (most significant) bit set to zero, and negative numbers have that bit (and all bits beyond) set to one, and a negative number is stored in such form by decreasing its absolute value by 1 and swapping the bits of the resulting number.

^{(6)} It suffices to say here that in general, whenever a deterministic RNG is otherwise called for, such an RNG is good enough for shuffling a 52-item list if its period is 2^{226} or greater. (The *period* is the maximum number of values in a generated sequence for a deterministic RNG before that sequence repeats.)

^{(7)} Such techniques usually involve *Markov chains*, which are outside this page's scope.

^{(8)} A third kind of randomized "jitter" (for multi-component data points) consists of a point generated from a multivariate normal distribution with all the means equal to 0 and a *covariance matrix* that, in this context, serves as a *bandwidth matrix*. The second kind of "jitter" given here is an easy special case of the multivariate normal distribution, where the *bandwidth* corresponds to a bandwidth matrix with diagonal elements equal to *bandwidth*-squared and with zeros everywhere else.

^{(9)} A convex polygon can be trivially decomposed into triangles that have one vertex in common and each have two other adjacent vertices of the original polygon. Triangulation of other polygons is nontrivial and outside the scope of this document.

^{(10)} The "Dice" section used the following sources:

- Red Blob Games, "Probability and Games: Dice Rolls" was the main source for the dice-roll distribution. The method
`random(N)`

in that document corresponds to`RNDINTEXC(N)`

in this document. - The MathWorld article "Dice" provided the mean of the dice roll distribution.
- S. Eger, "Stirling's approximation for central extended binomial coefficients", 2014, helped suggest the variance of the dice roll distribution.

^{(11)} The method that formerly appeared here is the *Box-Muller transformation*: `mu + radius * cos(angle)`

and `mu + radius * sin(angle)`

, where `angle = 2 * pi * RNDU01OneExc()`

and `radius = sqrt(-2 * ln(RNDU01ZeroExc())) * sigma`

, are two independent normally-distributed random numbers. A method of generating approximate standard normal random numbers, which consists of summing twelve `RNDU01OneExc()`

numbers and subtracting by 6 (see also "Irwin–Hall distribution" on Wikipedia), results in values not less than -6 or greater than 6; on the other hand, in a standard normal distribution, results less than -6 or greater than 6 will occur only with a generally negligible probability.

^{(12)} The N numbers generated this way will form a point inside an N-dimensional *hypercube* with length `2 * R`

in each dimension and centered at the origin of space.

## License

This page is licensed under Creative Commons Zero.