Menu - Top - Home - Donate to Me

Random Number Generation and Sampling Methods

Peter Occil

Begun on June 4, 2017; last updated on Sep. 13, 2017.

Discusses many ways in which applications can do random number generation and sampling from an underlying RNG and includes pseudocode for many of them.

Introduction

This page discusses many ways applications can do random number generation and sampling from an underlying random number generator (RNG) and includes pseudocode for many of them.

This methods described in this document can be categorized as follows:

Sample Python code that implements many of the methods in this document is available.

All the random number methods presented on this page—

In general, security, performance, quality, and other considerations will determine what underlying RNG to use in a particular application; I have written more on RNG recommendations in another document.

In general, the following are outside the scope of this document:

Contents

Notation and Definitions

In this document:

Uniform Random Numbers

This section describes how an underlying RNG can be used to generate uniformly-distributed random numbers. Here is an overview of the methods described in this section.

One method, RNDINT, described next, can serve as the basis for the remaining methods.

RNDINT: Random Integers in [0, N]

In this document, RNDINT(maxInclusive) is the core method for generating uniform random integers from an underlying RNG, which is called RNG() in this section. The random integer is in the interval [0, maxInclusive]. This section explains how RNDINT(maxInclusive) can be implemented for two kinds of underlying RNGs.

Method 1: If RNG() outputs integers in the interval [0, positive MODULUS) (for example, less than 1,000,000 or less than 6), then RNDINT(maxInclusive) can be implemented as in the pseudocode below.(7) Note that:

Method 2: If RNG() outputs floating-point numbers in the interval [0, 1), then find s, where s is the number of significand permutations for the floating-point format, and use Method 1 above, where MODULUS is s and RNG() is floor(RNG() * s) instead. (If the RNG outputs arbitrary-precision floating-point numbers, s should be set to the number of different values that are possible from the underlying RNG.)

Other RNGs: A detailed RNDINT(maxInclusive) implementation for other kinds of RNGs is not given here, since they seem to be lesser seen in practice. Readers who know of an RNG that is in wide use and outputs numbers of a kind other than already described in this section should send me a comment.

METHOD RndIntHelperNonPowerOfTwo(maxInclusive)
    cx = floor(maxInclusive / MODULUS) + 1
    while true
       ret = cx * RNG()
       // NOTE: If this method is implemented using a fixed-
       // precision type, the addition operation below should
       // check for overflow and should reject the number
       // if overflow would result.
       ret = ret + RNDINT(cx - 1)
       if ret <= maxInclusive: return ret
    end
END METHOD

METHOD RndIntHelperPowerOfTwo(maxInclusive)
        // NOTE: Finds the number of bits minus 1 needed
        // to represent MODULUS. This will be a constant
        // here, though.
        modBits = ln(MODULUS)/ln(2)
        // Calculate the bit count of maxInclusive
        bitCount = 0
        tempnumber = maxInclusive
        while tempnumber > 0
               // NOTE: If the programming language implements
               // division with two integers by truncating to an
               // integer, the division can be used as is without
               // using a "floor" function.
               tempnumber = floor(tempnumber / 2)
               bitCount = bitCount + 1
        end
        while true
               // Build a number with `bitCount` bits
                tempnumber = 0
                while bitCount > 0
                     wordBits = modBits
                     rngNumber = RNG()
                     if wordBits > bitCount
                        wordBits = bitCount
                        // Truncate number to 'wordBits' bits
                        // NOTE: If the programming language supports a bitwise
                        // AND operator, the mod operation can be implemented
                        // as "rndNumber AND ((1 << wordBits) - 1)"
                        rngNumber = mod(rngNumber, (1 << wordBits))
                     end
                     tempnumber = tempnumber << wordBits
                     // NOTE: In programming languages that
                     // support the OR operator between two
                     // integers, that operator can replace the
                     // plus operator below.
                     tempnumber = tempnumber + rngNumber
                     bitCount = bitCount - wordBits
                end
                // Accept the number if allowed
                if tempnumber <= maxInclusive: return tempnumber
         end
END METHOD

METHOD RNDINT(maxInclusive)
  // maxInclusive must be 0 or greater
  if maxInclusive < 0: return error
  if maxInclusive == 0: return 0
  // N equals modulus
  if maxInclusive == MODULUS - 1: return RNG()
  // NOTE: Finds the number of bits minus 1 needed
  // to represent MODULUS (if it's a power of 2).
  // This will be a constant here, though.
  modBits=ln(MODULUS)/ln(2)
  // NOTE: The following condition checks if MODULUS
  // is a power of 2.  This will be a constant here, though.
  isPowerOfTwo=floor(modBits) == modBits
  // Special cases if MODULUS is a power of 2
  if isPowerOfTwo
       if maxInclusive == 1: return mod(RNG(), 2)
       if maxInclusive == 3 and modBits >= 2: return mod(RNG(), 4)
       if maxInclusive == 255 and modBits >= 8: return mod(RNG(), 256)
       if maxInclusive == 65535 and modBits >=16: return mod(RNG(), 65535)
   end
  if maxInclusive > MODULUS - 1:
     if isPowerOfTwo
       return RndIntHelperPowerOfTwo(maxInclusive)
     else
       return RndIntHelperNonPowerOfTwo(maxInclusive)
     end
  else
    // NOTE: If the programming language implements
    // division with two integers by truncating to an
    // integer, the division can be used as is without
    // using a "floor" function.
          nPlusOne = maxInclusive + 1
          maxexc = floor((MODULUS - 1) / nPlusOne) * nPlusOne
          while true
                    ret = RNG()
                    if ret < nPlusOne: return ret
                    if ret < maxexc: return mod(ret, nPlusOne)
          end
  end
END METHOD

Notes:

RNDINTRANGE: Random Integers in [N, M]

The naïve way of generating a random integer in the interval [minInclusive, maxInclusive], shown below, works well for unsigned integers and arbitrary-precision integers.

 METHOD RNDINTRANGE(minInclusive, maxInclusive)
   // minInclusive must not be greater than maxInclusive
   if minInclusive > maxInclusive: return error
   return minInclusive + RNDINT(maxInclusive - minInclusive)
 END METHOD

The naïve approach won't work as well, though, for signed integer formats if the difference between maxInclusive and minInclusive exceeds the highest possible integer for the format. For fixed-length signed integer formats (1), such random integers can be generated using the following pseudocode. In the pseudocode below, INT_MAX is the highest possible integer in the integer format.

METHOD RNDINTRANGE(minInclusive, maxInclusive)
   // minInclusive must not be greater than maxInclusive
   if minInclusive > maxInclusive: return error
   if minInclusive == maxInclusive: return minInclusive
   if minInclusive==0: return RNDINT(maxInclusive)
   // Difference does not exceed maxInclusive
   if minInclusive > 0 or minInclusive + INT_MAX >= maxInclusive
       return minInclusive + RNDINT(maxInclusive - minInclusive)
   end
   while true
     ret = RNDINT(INT_MAX)
     // NOTE: If the signed integer format uses two's-complement
     // form, use the following line:
     if RNDINT(1) == 0: ret = -1 - ret
     // NOTE: If the signed integer format uses sign-magnitude
     // form (such as .NET's `System.Decimal`) or one's-complement
     // form,  use the following three lines instead of the preceding line;
     // here, zero will be rejected at a 50% chance because zero occurs
     // twice in both forms.
     // negative = RNDINT(1) == 0
     // if negative: ret = 0 - ret
     // if negative and ret == 0: continue
     if ret >= minInclusive and ret <= maxInclusive: return ret
   end
END METHOD

Note:

RNDU01: Random Numbers in [0, 1]

The idiom RNDINT(X) / X (called RNDU01() in this document), generates a random number in the interval [0, 1], where X is the number of fractional parts between 0 and 1. (For fixed-precision floating-point number formats, X should equal the number of significand permutations for that format. See "Generating uniform doubles in the unit interval" in the xoroshiro+ remarks page for further discussion.)

For fixed-precision binary floating-point numbers with fixed exponent range (such as Java's double and float), the following pseudocode for RNDU01() can be used instead. It's based on a technique devised by Allen Downey, who found that dividing a random number by a constant usually does not yield all representable binary floating-point numbers in the desired range. In the pseudocode below, SIGBITS is the binary floating-point format's precision (for examples, see the note for "significand permutations").

METHOD RNDU01()
    e=-SIGBITS
    while true
        if RNDINT(1)==0: e = e - 1
      else: break
    end
    sig = RNDINT((1 << (SIGBITS - 1)) - 1)
    if sig==0 and RNDINT(1)==0: e = e + 1
    sig = sig + (1 << (SIGBITS - 1))
    // NOTE: This multiplication should result in
    // a floating-point number; if `e` is sufficiently
    // small, the number will underflow to 0
    return sig * pow(2, e)
END METHOD

RNDNUMRANGE: Random Numbers in [X, Y]

The naïve way of generating a random number in the interval [minInclusive, maxInclusive], is shown in the following pseudocode, which generally works well only if the number format can't be negative or that format uses arbitrary precision.

METHOD RNDNUMRANGE(minInclusive, maxInclusive)
    if minInclusive > maxInclusive: return error
    return minInclusive + (maxInclusive - minInclusive) * RNDU01()
END

For fixed-point or floating-point number formats with fixed precision (such as Java's double and float), the pseudocode above can overflow if the difference between maxInclusive and minInclusive exceeds the maximum possible value for the format. For such formats, the following pseudocode for RNDU01() can be used instead. In the pseudocode below, NUM_MAX is the highest possible value for the number format. The pseudocode assumes that the highest possible value is positive and the lowest possible value is negative.

METHOD RNDNUMRANGE(minInclusive, maxInclusive)
   if minInclusive > maxInclusive: return error
   if minInclusive == maxInclusive: return minInclusive
   // Difference does not exceed maxInclusive
   if minInclusive >= 0 or minInclusive + NUM_MAX >= maxInclusive
       return minInclusive + (maxInclusive - minInclusive) * RNDU01()
   end
   while true
     ret = RNDU01() * NUM_MAX
     // NOTE: If the number format uses sign-magnitude
     // representation, as is the case for Java `float` and
     // `double` and .NET's implementation of `System.Decimal`,
     // for example, use the following:
     negative = RNDINT(1) == 0
     if negative: ret = 0 - ret
     if negative and ret == 0: continue
     // NOTE: For fixed-precision fixed-point numbers implemented
     // using two's complement numbers (note 1), use the following line
     // instead of the preceding three lines, where `QUANTUM` is the
     // smallest representable positive number in the fixed-point format:
     // if RNDINT(1) == 0: ret = (0 - QUANTUM) - ret
     if ret >= minInclusive and ret <= maxInclusive: return ret
   end
END

RNDINTEXC: Random Integers in [0, N)

RNDINTEXC(maxExclusive), which generates a random number in the interval [0, maxExclusive), can be implemented as follows(8):

 METHOD RNDINTEXC(maxExclusive)
    if maxExclusive <= 0: return error
    return RNDINT(maxExclusive - 1)
 END METHOD

Note: The following are alternative ways of generating a random integer in the interval [0, maxExclusive):

These approaches, though, are recommended only if the programming language—

RNDINTEXCRANGE: Random Integers in [N, M)

RNDINTEXCRANGE returns a **random integer in the interval [minInclusive, maxExclusive)**. It can be implemented using RNDINTRANGE, as the following pseudocode demonstrates.

METHOD RNDINTEXCRANGE(minInclusive, maxExclusive)
   if minInclusive >= maxExclusive: return error
   // NOTE: For signed integer formats, replace the following line
   // with "if minInclusive >=0 or minInclusive + INT_MAX >=
   // maxExclusive", where `INT_MAX` has the same meaning
   // as the pseudocode for `RNDINTRANGE`.
   if minInclusive >=0
     return RNDINTRANGE(minInclusive, maxExclusive - 1)
   end
   while true
     ret = RNDINTRANGE(minInclusive, maxExclusive)
     if ret < maxExclusive: return ret
   end
END METHOD

RNDU01OneExc, RNDU01ZeroExc, and RNDU01ZeroOneExc: Random Numbers in [0, 1), (0, 1], or (0, 1)

Three methods related to RNDU01() can be implemented as follows, where X is the number of fractional parts between 0 and 1 (see RNDU01() section).

RNDNUMEXCRANGE: Random Numbers in [X, Y)

RNDNUMEXCRANGE returns a **random number in the interval [minInclusive, maxExclusive)**. It can be implemented using RNDNUMRANGE, as the following pseudocode demonstrates.

METHOD RNDNUMEXCRANGE(minInclusive, maxExclusive)
   if minInclusive >= maxExclusive: return error
   while true
     ret = RNDNUMRANGE(minInclusive, maxExclusive)
     if ret < maxExclusive: return ret
   end
END METHOD

RNDBITS: Random N-Bit Integers

The idiom RNDINT((1 << b) - 1), called RNDBITS(b) in this document, is a naïve way of generating a uniformly distributed random N-bit integer (with maximum 2b - 1).

Although this idiom works well for arbitrary-precision integers, it won't work well for the much more popular integer types called fixed-length two's-complement signed integers (1). For such signed integers as well as fixed-length unsigned integers, RNDBITS(bits) can be implemented using the pseudocode below. In the pseudocode below, BITCOUNT is the number of bits used in the format. Note that for such signed integers, RNDBITS(bits) can return a sequence of bits that resolves to a negative number.

METHOD RNDBITS(bits)
     if bits<0 or bits > BITCOUNT: return error
     if bits==0: return 0
     if bits==1: return RNDINT(1)
     if bits==2 and BITCOUNT == 2
         return (RNDINT(1) << 1) | RNDINT(1)
     end
     if bits==2: return RNDINT(3)
     bitsMinus2 = bits - 2
     // NOTE: The "|" below is the OR operator between
     // two integers.  The following line is implemented this
     // way to accommodate implementations that use
     // fixed-length two's-complement signed integers.
     ret = (RNDINT((1<<bitsMinus2) - 1) << bitsMinus2) | RNDINT(3)
     // NOTE: If the implementation uses fixed-length
     // unsigned integers, the following line can replace
     // the preceding line.  Note that the implementation
     // avoids shifting an integer by BITCOUNT bits or more,
     // because such behavior is undefined in C and C++.
     // ret = RNDINT( (((1 << (bits - 1)) - 1) << 1) | 1 )
     // NOTE: Alternatively, a list containing powers-of-two
     // minus 1 can be generated (calculating `floor(pow(2,i)) - 1`
     // for each relevant index `i` of the list, assuming unlimited
     // precision) and the following line used instead of the
     // preceding (assuming `list` is
     // the list generated this way):
     // ret = RNDINT( list[bits] )
     return ret
END METHOD

Special Programming Environments

In certain programming environments it's often impractical to implement the uniform random number generation methods just described without recurring to other programming languages. These include the following:

Readers aware of how these environments can support those uniform random number methods should send me a comment.

Moreover, in functional programming languages such as Haskell, it may be important to separate code that directly uses RNGs from other code, usually by rewriting certain functions to take one or more pregenerated random numbers rather than directly using RNDINT, RNDNUMRANGE, or another random number generation method presented earlier in this document.

Randomization Techniques

This section describes commonly used randomization techniques, such as shuffling, selection of several unique items, and creating random strings of text.

Boolean Conditions

To generate a condition that is true at the specified probabilities, use the following idioms in an if condition:

Shuffling

The Fisher–Yates shuffle method shuffles a list (puts its items in a random order) such that all permutations (arrangements) of that list are equally likely to occur, assuming the RNG it uses can choose from among all permutations of that list. However, that method is also easy to write incorrectly (see also Jeff Atwood, "The danger of naïveté"). The following pseudocode is designed to shuffle a list's contents.

METHOD Shuffle(list)
   // NOTE: Check size of the list early to prevent
   // `i` from being less than 0 if the list's size is 0 and
   // `i` is implemented using an unsigned type available
   // in certain programming languages.
   if size(list) >= 2
      // Set i to the last item's index
      i = size(list) - 1
      while i > 0
         // Choose an item ranging from the first item
         // up to the item given in `i`. Note that the item
         // at i+1 is excluded.
         k = RNDINTEXC(i + 1)
         // The following is wrong since it introduces biases:
         // "k = RNDINTEXC(size(list))"
         // The following is wrong since the algorithm won't
         // choose from among all possible permutations:
         // "k = RNDINTEXC(i)"
         // Swap item at index i with item at index k;
         // in this case, i and k may be the same
         tmp = list[i]
         list[i] = list[k]
         list[k] = tmp
         // Move i so it points to the previous item
         i = i - 1
      end
   end
   // NOTE: An implementation can return the
   // shuffled list, as is done here, but this is not required.
   return list
END METHOD

An important consideration with respect to shuffling is the nature of the underlying RNG, as I discuss in further detail in my RNG recommendation document on shuffling. It suffices to say here that in general, whenever a deterministic RNG is otherwise called for, such an RNG is good enough for shuffling a 52-item list if its period is 2226 or greater. (The period is the maximum number of values in a generated sequence for a deterministic RNG before that sequence repeats.)

Note: In simulation testing, shuffling is used to relabel items from a dataset at random, where each item in the dataset is assigned one of several labels. In such testing—

Creating a Random Character String

To generate a random string of characters (usually a random alphanumeric string, or string of letters and digits):

  1. Generate a list of the letters, digits, and/or other characters the string can have. For example, those characters can be—

    • the basic digits "0" to "9" (U+0030-U+0039, nos. 48-57),
    • the basic upper case letters "A" to "Z" (U+0041-U+005A, nos. 65-90), and
    • the basic lower case letters "a" to "z" (U+0061-U+007A, nos. 96-122),

    as found in the Basic Latin block of the Unicode Standard.

  2. Build a new string whose characters are chosen from that character list. The pseudocode below demonstrates this by creating a list, rather than a string, where the random characters will be held. It also takes the number of characters as a parameter named size. (Converting this list to a text string is programming-language-dependent, and the details of the conversion are outside the scope of this page.)

    METHOD RandomString(characterList, stringSize)
            i = 0
            newString = NewList()
            while i < stringSize
                    // Choose a character from the list
                    randomChar = characterList[RNDINTEXC(size(characterList))]
                    // Add the character to the string
                    AddItem(newString, randomChar)
                    i = i + 1
            end
            return newString
    END METHOD
    

Notes:

Sampling With Replacement: Choosing a Random Item from a List

To choose a random item from a list—

Choosing an item this way is also known as sampling with replacement.

Notes:

Sampling Without Replacement: Choosing Several Unique Items

There are several techniques for choosing k unique items or values uniformly at random from among n available items or values, depending on whether n is known, how big n and k are, and other considerations.

Choosing several unique items as just described is also known as sampling without replacement.

The following pseudocode implements the RandomKItemsFromFile and RandomKItemsInOrder methods referred to in this section.

    METHOD RandomKItemsFromFile(file, k)
       list = NewList()
       j = 0
       endOfFile = false
       while j < k
          // Get the next line from the file
          item = GetNextLine(file)
          // The end of the file was reached, break
          if item == nothing:
             endOfFile = true
             break
          end
          AddItem(list, item)
          j = j + 1
       end
       i = 1 + k
       while endOfFile == false
          // Get the next line from the file
          item = GetNextLine(file)
          // The end of the file was reached, break
          if item == nothing: break
          j = RNDINTEXC(i)
          if j < k: list[j] = item
          i = i + 1
       end
       // We shuffle at the end in case k or fewer
       // lines were in the file, since in that
       // case the items would appear in the same
       // order as they appeared in the file
       // if the list weren't shuffled.  This line
       // can be removed, however, if the items
       // in the returned list need not appear
       // in random order.
       Shuffle(list)
       return list
    end

   METHOD RandomKItemsInOrder(list, k)
           i = 0
           kk = k
           ret = NewList()
           n = size(list)
           while i  < n and size(ret) < k
             u = RNDINTEXC(n - i)
             if u <= kk
              AddItem(ret, list[i])
              kk = kk - 1
             end
             i = i + 1
          end
          return ret
    END METHOD

Note: Removing k random items from a list of n items (list) is equivalent to generating a new list by RandomKItemsInOrder(list, n - k).

Almost-Random Sampling

Some applications (particularly some games) may find it important to control which random numbers appear, to make the random outcomes appear fairer to users. Without this control, a user may experience long streaks of good outcomes or long streaks of bad outcomes, both of which are theoretically possible with a random number generator. To implement this kind of "almost-random" sampling, do one of the following:

However, "almost-random" sampling techniques are not recommended—

Note: Monte Carlo integration uses randomization to estimate a multidimensional integral. It involves evaluating a function at N random points in the domain, adding them up, then dividing the sum by N. The "Variance" MathWorld article gives methods for calculating the estimate's variance. (After calculating the error, or square root of variance, and the estimated integral, both can be multiplied by the volume of the domain.) Often quasirandom sequences (also known as low-discrepancy sequences, such as Sobol and Halton sequences), often together with an RNG, provide the "random" numbers to sample the function more efficiently. Unfortunately, the methods to produce such sequences are too complicated to show here.

Choosing a Random Date/Time

Choosing a random date/time at or between two others is equivalent to—

If either input date/time was generated as the random date, but that is not desired, the process just given can be repeated until such a date/time is not generated this way.

Generating Random Numbers in Sorted Order

The following pseudocode describes a method that generates random numbers in the interval [0, 1] in sorted order. count is the number of random numbers to generate this way. The method is based on an algorithm from Bentley and Saxe 1979.

 METHOD SortedRandom(count)
    list = NewList()
    k = count
    c = 1.0
    while k > 0
        c = pow(RNDU01(), 1.0 / k) * c
        AddItem(list, c)
    end
    return list
 END METHOD

Alternatively, random numbers can be generated (using any method and where the numbers have any distribution and range) and stored in a list, and the list then sorted using a sorting algorithm. Details on sorting algorithms, however, are beyond the scope of this document.

Rejection Sampling

Rejection sampling is a simple and flexible technique for generating random content that meets certain requirements. To implement rejection sampling—

  1. Generate the random content (such as a random number) by any method and with any distribution and range.
  2. If the content doesn't meet predetermined criteria, go to step 1.

Example criteria include checking—

(KD-trees, hash tables, red-black trees, and prime-number testing algorithms are outside the scope of this document.)

General Non-Uniform Distributions

Some applications need to choose random items or numbers such that some of them are more likely to be chosen than others (a non-uniform distribution). Most of the techniques in this section show how to use the uniform random number methods to generate such random items or numbers.

Discrete Weighted Choice

The discrete weighted choice method generates a random item from among a collection of them with separate probabilities of each item being chosen.

The following pseudocode takes a single list weights, and returns the index of a weight from that list. The greater the weight, the more likely its index will be chosen. (Note that there are two possible ways to generate the random number depending on whether the weights are all integers or can be fractional numbers.) Each weight should be 0 or greater.

METHOD DiscreteWeightedChoice(weights)
    if size(weights) == 0: return error
    sum = 0
    // Get the sum of all weights
    i = 0
    while i < size(weights)
        sum = sum + weights[i]
        i = i + 1
    end
    // Choose a random integer/number from 0 to less than
    // the sum of weights.
    value = RNDINTEXC(sum)
    // NOTE: If the weights can be fractional numbers,
    // use this instead:
    // value = RNDNUMEXCRANGE(0, sum)
    // Choose the object according to the given value
    i = 0
    lastItem = size(weights) - 1
    runningValue = 0
    while i < size(weights)
       if weights[i] > 0
          newValue = runningValue + weights[i]
          // NOTE: Includes start, excludes end
          if value < newValue: return i
          runningValue = newValue
          lastItem = i
       end
       i = i + 1
    end
    // Last resort (might happen because rounding
    // error happened somehow)
    return lastItem
END METHOD

Example:

Assume we have the following list: ["apples", "oranges", "bananas", "grapes"], and weights is the following: [3, 15, 1, 2]. The weight for "apples" is 3, and the weight for "oranges" is 15. Since "oranges" has a higher weight than "apples", the index for "oranges" (1) is more likely to be chosen than the index for "apples" (0) with the DiscreteWeightedChoice method. The following pseudocode implements how to get a randomly chosen item from the list with that method.

    index = DiscreteWeightedChoice(weights)
    // Get the actual item
    item = list[index]

In the example above, the weights sum to 21. However, the weights do not mean that when 21 items are selected, the index for "apples" will be chosen exactly 3 times, or the index for "oranges" exactly 15 times, for example. Each number generated by DiscreteWeightedChoice is independent from the others, and each weight indicates only a likelihood that the corresponding index will be chosen rather than the other indices. And this likelihood doesn't change no matter how many times DiscreteWeightedChoice is given the same weights. This is called a weighted choice with replacement, which can be thought of as drawing a ball, then putting it back.

Weighted Choice Without Replacement

To implement weighted choice without replacement (which can be thought of as drawing a ball without putting it back), generate an index by DiscreteWeightedChoice, and then decrease the weight for the chosen index by 1. In this way, when items are selected repeatedly, each weight behaves like the number of "copies" of each item. This technique, though, will only work properly if all the weights are integers 0 or greater. The pseudocode below is an example of this.

// Get the sum of weights
// (NOTE: This code assumes that `weights` is
// a list that can be modified.  If the original weights
// are needed for something else, a copy of that
// list should be made first, but the copying process
// is not shown here.  This code also assumes that `list`,
// a list of items, was already declared earlier and
// has at least as many items as `weights`.)
totalWeight = 0
i = 0
while i < size(weights)
    totalWeight = totalWeight + weights[i]
    i = i + 1
end
// Choose as many items as the sum of weights
i = 0
items = NewList()
while i < totalWeight
    index = DiscreteWeightedChoice(weights)
    // Decrease weight by 1 to implement selection
    // without replacement.
    weights[index] = weights[index] - 1
    AddItem(items, list[index])
    i = i + 1
end

Alternatively, if all the weights are integers 0 or greater and their sum is relatively small, create a list with as many copies of each item as its weight, then shuffle that list. The resulting list will be ordered in a way that corresponds to a weighted random choice without replacement.

Choosing Multiple Items

The discrete weighted choice method can also be used for choosing multiple items from a list, whether or not the items have the same probability of being chosen. In this case, after choosing a random index, set the weight for that index to 0 to keep it from being chosen again. The pseudocode below is an example of this.

// (NOTE: This code assumes that `weights` is
// a list that can be modified.  If the original weights
// are needed for something else, a copy of that
// list should be made first, but the copying process
// is not shown here.  This code also assumes that `list`,
// a list of items, was already declared earlier and
// has at least as many items as `weights`.)
chosenItems = NewList()
i = 0
// Choose k items from the list
while i < k or i < size(weights)
    index = DiscreteWeightedChoice(weights)
    // Set the weight for the chosen index to 0
    // so it won't be chosen again
    weights[index] = 0
    // Add the item at the chosen index
    AddItem(chosenItems, list[index])
end
// `chosenItems` now contains the items chosen

Sorting a list of items such that higher-weighted items are more likely to appear first is equivalent to the technique presented in this section.

Piecewise Constant Distribution

The discrete weighted choice method can also be used to implement a piecewise constant distribution, as in the following example. Assume we have the following list: [0, 5, 10, 11, 15], and weights is the following: [3, 15, 1, 2]. Note that the weight list has one fewer item than the number list. The weight for "0 to 5" (0 or greater, less than 5) is 3, and the weight for "5 to 10" is 15. Since "5 to 10" has a higher weight than "0 to 5", this distribution will choose a number from 5 to 10 more often than a number from 0 to 5. The following pseudocode implements the piecewise constant distribution.

// Choose a random index
index = DiscreteWeightedChoice(weights)
// Choose a random number in the chosen interval on the list
number = RNDNUMEXCRANGE(list[index], list[index + 1])

The code above implements the distribution with replacement. Implementing the distribution without replacement is similar to implementing discrete weighted choice without replacement; the only change is to replace AddItem(items, list[index]) with AddItem(items, RNDNUMEXCRANGE(list[index], list[index + 1])) in the pseudocode.

Continuous Weighted Choice

The continuous weighted choice method generates a random number that follows a continuous probability distribution (here, a piecewise linear distribution).

The pseudocode below takes two lists as follows:


METHOD ContinuousWeightedChoice(list, weights)
    if size(list) <= 0 or size(weights) < size(list): return error
    if size(list) == 1: return list[0]
    // Get the sum of all areas between weights
    sum = 0
    areas = NewList()
    i = 0
    while i < size(list) - 1
      weightArea = abs((weights[i] + weights[i + 1]) * 0.5 *
            (list[i + 1] - list[i]))
      AddItem(areas, weightArea)
      sum = sum + weightArea
       i = i + 1
    end
    // Choose a random number
    value = RNDNUMEXCRANGE(0, sum)
    // Interpolate a number according to the given value
    i=0
    // Get the number corresponding to the random number
    runningValue = 0
    while i < size(list) - 1
     area = areas[i]
     if area > 0
      newValue = runningValue + area
      // NOTE: Includes start, excludes end
      if value < newValue
       // NOTE: The following line can also read
       // "interp = RNDU01OneExc()", that is, a new number is generated
       // within the chosen area rather than using the point
       // already generated.
       interp = (value - runningValue) / (newValue - runningValue)
       retValue = list[i] + (list[i + 1] - list[i]) * interp
       return retValue
      end
      runningValue = newValue
     end
     i = i + 1
    end
    // Last resort (might happen because rounding
    // error happened somehow)
    return list[size(list) - 1]
END METHOD

Example: Assume list is the following: [0, 1, 2, 2.5, 3], and weights is the following: [0.2, 0.8, 0.5, 0.3, 0.1]. The weight for 2 is 0.5, and that for 2.5 is 0.3. Since 2 has a higher weight than 2.5, numbers near 2 are more likely to be chosen than numbers near 2.5 with the ContinuousWeightedChoice method.

Random Numbers from a Distribution of Data Points

To generate a random number (or data point) based on the distribution of a list of numbers (or data points)—

A detailed discussion on how to calculate bandwidth or on other possible ways to add randomized "jitter" (whose distribution is formally called a kernel) is outside the scope of this document. For further information on kernel density estimation, which the random number generation technique here is related to, see the Wikipedia articles on single-variable and multiple-variable estimation, or a blog post by M. Kay.

Random Numbers from an Arbitrary Distribution

Many probability distributions can be defined in terms of any of the following:

If a probability distribution's PDF is known, one of the following techniques, among others, can be used to generate random numbers that follow that distribution.

  1. Use the PDF to calculate the weights for a number of sample points (usually regularly spaced). Create one list with the sampled points in ascending order (the list) and another list of the same size with the PDF's values at those points (the weights). Finally generate ContinuousWeightedChoice(list, weights) to generate a random number bounded by the lowest and highest sampled point. This technique can be used even if the area under the PDF isn't 1. OR
  2. Use inverse transform sampling. Generate ICDF(RNDU01ZeroOneExc()), where ICDF(X) is the distribution's inverse cumulative distribution function (inverse CDF, or inverse of the CDF) assuming the area under the PDF is 1. OR
  3. Use rejection sampling. Choose the lowest and highest random number to generate (minValue and maxValue, respectively) and find the maximum value of the PDF at or between those points (maxDensity). The rejection sampling approach is then illustrated with the following pseudocode, where PDF(X) is the distribution's PDF (see also Saucier 2000, p. 39). This technique can be used even if the area under the PDF isn't 1.

    METHOD ArbitraryDist(minValue, maxValue, maxDensity)
         if minValue >= maxValue: return error
         while True:
             x=RNDNUMEXCRANGE(minValue, maxValue)
             y=RNDNUMEXCRANGE(0, maxDensity)
             if y < PDF(x): return x
         end
    END METHOD
    

If both a PDF and a uniform random variable in the interval [0, 1) (randomVariable) are given, then one of the following techniques can be used to generate a random number that follows that distribution:

  1. Do the same process as method 1, given earlier, except—
    • divide the weights in the weights list by the sum of all weights, and
    • use a modified version of ContinuousWeightedChoice that uses randomVariable rather than generating a new random number. OR
  2. GenerateICDF(randomVariable), where ICDF(X) is the distribution's inverse CDF (see method 2, given earlier).

If the distribution's CDF is known, generate ICDF(RNDU01ZeroOneExc()), where ICDF(X) is the inverse of that CDF.

Note: Further details on inverse transform sampling or on how to find integrals or inverses, as well as lists of PDFs and CDFs, are outside the scope of this page.

Mixtures of Distributions

A mixture consists of two or more sampling distributions with separate probabilities of being sampled. To generate random content from a mixture—

  1. generate index = DiscreteWeightedChoice(weights), where weights is a list of relative probabilities that each distribution in the mixture will be sampled, then
  2. based on the value of index, generate the random content from the corresponding distribution.

Examples:

Censored and Truncated Distributions

To sample from a censored probability distribution, generate a random number from that distribution and—

To sample from a truncated probability distribution, generate a random number from that distribution and, if that number is less than a minimum threshold and/or higher than a maximum threshold, repeat this process.

Correlated Random Numbers

According to (Saucier 2000), sec. 3.8, to generate two correlated (dependent) random variables—

Another way to generate correlated random numbers is explained in the section "Gaussian Copula".

Specific Non-Uniform Distributions

This section contains information on some of the most common non-uniform sampling distributions.

Dice

The following method generates a random result of rolling virtual dice.(9) It takes three parameters: the number of dice (dice), the number of sides in each die (sides), and a number to add to the result (bonus) (which can be negative, but the result of the subtraction is 0 if that result is greater).

METHOD DiceRoll(dice, sides, bonus)
    if dice < 0 or sides < 1: return error
    if dice == 0: return 0
    if sides == 1: return dice
    ret = 0
    if dice > 50
        // If there are many dice to roll,
        // use a faster approach, noting that
        // the dice-roll distribution approaches
        // a "discrete" normal distribution as the
        // number of dice increases.
        mean = dice * (sides + 1) * 0.5
        sigma = sqrt(dice * (sides * sides - 1) / 12)
        ret = -1
        while ret < dice or ret > dice * sides
            ret = floor(Normal(mean, sigma) + 0.5)
        end
     else
         i = 0
         while i < dice
              ret = ret + RNDINTRANGE(1, sides)
              i = i + 1
          end
     end
     ret = ret + bonus
     if ret < 0: ret = 0
     return ret
END METHOD

As examples, the result of rolling—

Normal (Gaussian) Distribution

The normal distribution (also called the Gaussian distribution) can model many kinds of measurements or scores whose values are most likely around a given average and are less likely the farther away from that average on either side.

In the pseudocode below, which uses the polar method (2) to generate two normally-distributed random numbers:


METHOD Normal2(mu, sigma)
  while true
    a = RNDU01()
    b = RNDU01()
    if a != 0 and RNDINT(1) == 0: a = 0 - a
    if b != 0 and RNDINT(1) == 0: b = 0 - b
    c = a * a + b * b
    if c != 0 and c <= 1
       c = sqrt(-2 * ln(c) / c)
       return [c * a, c * b]
    end
  end
END METHOD

Since Normal2 returns two numbers instead of one, but many applications require only one number at a time, a problem arises on how to return one number while storing the other for later retrieval. Ways to solve this problem are outside the scope of this page, however. The name Normal will be used in this document to represent a method that returns only one normally-distributed random number rather than two.

Alternatively, or in addition, the following method (implementing a ratio-of-uniforms technique) can be used to generate normally distributed random numbers.

METHOD Normal(mu, sigma)
    bmp = 0.8577638849607068 // sqrt(2/exp(1))
    while true
        a=RNDU01ZeroExc()
        b=RNDNUMRANGE(-bmp,bmp)
        if b*b <= -a * a * 4 * ln(a)
            return (b * sigma / a) + mu
        end
    end
END METHOD

Notes:

Binomial Distribution

The following method generates a random integer that follows a binomial distribution. This number—


METHOD Binomial(trials, p)
    if trials < 0: return error
    if trials == 0: return 0
    // Always succeeds
    if p >= 1.0: return trials
    // Always fails
    if p <= 0.0: return 0
    i = 0
    count = 0
    // Suggested by Saucier, R. in "Computer
    // generation of probability distributions", 2000, p. 49
    tp = trials * p
    if tp > 25 or (tp > 5 and p > 0.1 and p < 0.9)
         countval = -1
         // "countval
         while countval < 0 or countval > trials
              countval = floor(Normal(tp, tp) + 0.5)
         end
         return countval
    end
    if p == 0.5
        while i < trials
            if RNDINT(1) == 0
                // Success
                count = count + 1
            end
            i = i + 1
        end
    else
        while i < trials
            if RNDU01OneExc() < p
                // Success
                count = count + 1
            end
            i = i + 1
        end
    end
    return count
END METHOD

Examples:

Poisson Distribution

In the following method, which generates a random integer that follows a Poisson distribution

The method given here is based on Knuth's method from 1969.

METHOD Poisson(mean)
    if mean < 0: return error
    if mean == 0: return 0
    p = 1.0
    // Suggested by Saucier, R. in "Computer
    // generation of probability distributions", 2000, p. 49
    if mean > 9
        p = -1.0
        while p < 0: p = floor(Normal(mean, mean) + 0.5)
        return p
    end
    pn = exp(-mean)
    count = 0
    while true
        count = count + 1
        p = p * RNDU01OneExc()
        if p <= pn
                return count - 1
        end
    end
END METHOD

Gamma Distribution

The gamma distribution models expected lifetimes. The following method, which generates a random number that follows a gamma distribution, is based on Marsaglia and Tsang's method from 2000.

METHOD GammaDist(meanLifetime)
    // Must be greater than 0
    if meanLifetime <= 0: return error
    // Exponential distribution special case if
    // `meanLifetime` is 1 (see also
    // Devroye 1986, p. 405)
    if meanLifetime == 1: return -ln(RNDU01ZeroOneExc())
    d = meanLifetime
    v = 0
    if meanLifetime < 1: d = d + 1
    d = d - (1.0 / 3) // NOTE: 1.0 / 3 must be a fractional number
    c = 1.0 / sqrt(9 * d)
    while true
        x = 0
        while true
           x = Normal(0, 1)
           v = c * x + 1;
           v = v * v * v
           if v > 0: break
        end
        u = RNDU01ZeroExc()
        x2 = x * x
        if u < 1 - (0.0331 * x2 * x2): break
        if ln(u) < (0.5 * x2) + (d * (1 - v + ln(v))): break
    end
    ret = d * v
    if meanLifetime < 1
       ret = ret * exp(ln(RNDU01ZeroExc()) / meanLifetime)
    end
    return ret
end

Extended versions of the gamma distribution:

Negative Binomial Distribution

A random integer that follows a negative binomial distribution expresses the number of failures that have happened after seeing a given number of successes (expressed as successes below), where the probability of a success in each case is p (where p <= 0 means never, p >= 1 means always, and p = 0.5 means an equal chance of success or failure).

METHOD NegativeBinomialInt(successes, p)
    // Must be 0 or greater
    if successes < 0: return error
    // No failures if no successes or if always succeeds
    if successes == 0 or p >= 1.0: return 0
    // Always fails (NOTE: infinity can be the maximum possible
    // integer value if NegativeBinomialInt is implemented to return
    // an integer)
    if p <= 0.0: return infinity
    // NOTE: If 'successes' can be an integer only,
    // omit the following three lines:
    if floor(successes) != successes
        return Poisson(GammaDist(successes) * (1 - p) / p)
    end
    count = 0
    total = 0
    if successes == 1
        if p == 0.5
          while RNDINT(1) == 0: count = count + 1
           return count
        end
        // Geometric distribution special case (see Saucier 2000)
        return floor(ln(RNDU01ZeroExc()) / ln(1.0 - p))
    end
    while true
        if RNDU01OneExc() < p
            // Success
            total = total + 1
            if total >= successes
                    return count
            end
        else
            // Failure
            count = count + 1
        end
    end
END METHOD

Example: If p is 0.5 and successes is 1, the negative binomial distribution models the task "Flip a coin until you get tails, then count the number of heads."

von Mises Distribution

In the following method, which generates a random number that follows a von Mises distribution, which describes a distribution of circular angles—

The algorithm below is the Best–Fisher algorithm from 1979 (as described in Devroye 1986 with errata incorporated).

METHOD VonMises(mean, kappa)
    if kappa < 0: return error
    if kappa == 0
        return RNDNUMEXCRANGE(mean-pi, mean+pi)
    end
    r = 1.0 + sqrt(4 * kappa * kappa + 1)
    rho = (r - sqrt(2 * r)) / (kappa * 2)
    s = (1 + rho * rho) / (2 * rho)
    while true
        u = RNDNUMEXCRANGE(-1, 1)
        v = RNDU01ZeroOneExc()
        z = cos(pi * u)
        w = (1 + s*z) / (s + z)
        y = kappa * (s - w)
        if y*(2 - y) - v >=0 or ln(y / v) + 1 - y >= 0
           if angle<-1: angle=-1
           if angle>1: angle=1
           angle = acos(w)
           if u < 0: angle = -angle
           return mean + angle
        end
    end
END METHOD

Stable Distribution

As more and more independent and identically distributed random variables are added together, their distribution tends to a stable distribution, which resembles a curve with a single peak, but with generally "fatter" tails than the normal distribution. The pseudocode below uses the Chambers–Mallows–Stuck algorithm. The Stable method, implemented below, takes two parameters:


METHOD Stable(alpha, beta)
     if alpha <=0 or alpha > 2: return error
     if beta < -1 or beta > 1: return error
    halfpi = pi * 0.5
    unif=RNDNUMEXCRANGE(-halfpi, halfpi)
    while unif==-halfpi: unif=RNDNUMEXCRANGE(-halfpi, halfpi)
    // Cauchy special case
    if alpha == 1 and beta == 0: return tan(unif)
    expo=-ln(RNDU01ZeroExc())
    c=cos(unif)
    if alpha == 1
            s=sin(unif)
            return 2.0*((unif*beta+halfpi)*s/c -
                beta * ln(halfpi*expo*c/(unif*beta+halfpi)))/pi
    end
    z=-tan(alpha*halfpi)*beta
    ug=unif+atan(-z)/alpha
    cpow=pow(c, -1.0 / alpha)
    return pow(1.0+z*z, 1.0 / (2*alpha))*
        (sin(alpha*ug)*cpow)*
        pow(cos(unif-alpha*ug)/expo, (1.0 - alpha) / alpha)
END METHOD

Extended versions of the stable distribution:

Hypergeometric Distribution

The following method generates a random integer that follows a hypergeometric distribution. When a given number of items are drawn at random without replacement from a collection of items each labeled either 1 or 0, the random integer expresses the number of items drawn this way that are labeled 1. In the method below, trials is the number of items drawn at random, ones is the number of items labeled 1 in the set, and count is the number of items labeled 1 or 0 in that set.

METHOD Hypergeometric(trials, ones, count)
    if ones < 0 or count < 0 or trials < 0 or ones > count or trials > count
            return error
    end
    if ones == 0: return 0
   successes = 0
    i = 0
    currentCount = count
    currentOnes = ones
    while i < trials and currentOnes > 0
            if RNDINTEXC(currentCount) < currentOnes
                    currentOnes = currentOnes - 1
                    successes = successes + 1
            end
            currentCount = currentCount - 1
            i = i + 1
    end
    return successes
END METHOD

Example: In a 52-card deck of Anglo-American playing cards, 12 of the cards are face cards (jacks, queens, or kings). After the deck is shuffled and seven cards are drawn, the number of face cards drawn this way follows a hypergeometric distribution where trials is 7, ones is 12, and count is 52.

Multivariate Normal Distribution

The following pseudocode calculates a random point in space that follows a multivariate normal distribution. The method MultivariateNormal takes the following parameters:

For conciseness, the following pseudocode uses for loops, defined as follows. for X=Y to Z; [Statements] ; end is shorthand for X = Y; while X <= Z; [Statements]; X = X + 1; end.

METHOD Decompose(matrix)
  numrows = size(matrix)
  if matrix[0]!=numrows: return error
  // Does a Cholesky decomposition of a matrix
  // assuming it's positive definite and invertible
  ret=NewList()
  for i = 0 to numrows - 1
    submat = NewList()
    for j = 0 to numrows - 1
      AddItem(submat, 0)
    end
    AddItem(ret, submat)
  end
  s1 = sqrt(matrix[0][0])
  if s1==0: return ret // For robustness
  for i = 1 to numrows - 1
    ret[0][i]=matrix[0][i]*1.0/s1
  end
  for i = 1 to numrows - 1
    sum=0.0
    for j = 0 to i - 1
      sum = sum + ret[j][i]*ret[j][i]
    end
    sq=matrix[i][i]-sum
    if sq<0: sq=0 // For robustness
    ret[i][i]=math.sqrt(sq)
  end
  for j = 1 to numrows - 1
    for i = j + 1 to numrows - 1
      // For robustness
      if ret[j][j]==0: ret[j][i]=0
      if ret[j][j]!=0
        sum=0
        for k = 0 to j - 1
          sum = sumret[k][i]*ret[k][j]
        end
        ret[j][i]=(matrix[j][i]-sum)*1.0/ret[j][j]
      end
    end
  end
  return ret
END METHOD

METHOD MultivariateNormal(mu, cov)
  mulen=size(cov)
  if mu != nothing
   mulen = size(mu)
    if mulen!=size(cov): return error
    if mulen!=size(cov[0]): return error
  end
  // NOTE: If multiple random points will
  // be generated using the same covariance
  // matrix, an implementation can consider
  // precalculating the decomposed matrix
  // in advance rather than calculating it here.
  cho=Decompose(cov)
  i=0
  ret=NewList()
  while i<mulen
    nv=Normal(0,1)
    if mulen == nothing: sum = 0
    else: sum=mu[i]
    j=0
    while j<mulen
      sum=sum+nv*cho[j][i]
      j=j+1
    end
    AddItem(ret, sum)
    i=i+1
  end
  return ret
end

Dirichlet Distribution: Random Numbers with a Given Positive Sum

The Dirichlet distribution models a distribution of N numbers that sum to a given positive number, total. Generating N GammaDist(total) numbers and dividing them by their sum will result in N random numbers that (approximately) sum to total (see the Wikipedia article). For example, if total is 1, the numbers will (approximately) sum to 1. Note that in the exceptional case that all numbers are 0, the process should repeat. (A more general version of the Dirichlet distribution allows the parameter in GammaDist to vary for each random number.)

The following pseudocode shows how to generate random integers with a given positive sum. (The algorithm for this was presented in Smith and Tromble, "Sampling Uniformly from the Unit Simplex", 2004.) In the pseudocode below—


METHOD NonzeroIntegersWithSum(n, total)
    if n <= 0 or total <=0: return error
    ls = NewList()
    ret = NewList()
    AddItem(ls, 0)
    while size(ls) < n
            c = RNDINTEXCRANGE(1, total)
            found = false
            j = 1
            while j < size(ls)
                    if ls[j] == c
                            found = true
                            break
                    end
                    j = j + 1
            end
            if found == false: AddItem(ls, c)
    end
    Sort(ls)
    AddItem(ls, total)
     i = 1
    while i < size(ls):
            AddItem(ret, list[i] - list[i - 1])
            i = i + 1
    end
    return ret
END METHOD

METHOD IntegersWithSum(n, total)
    if n <= 0 or total <=0: return error
    ret = NonzeroIntegersWithSum(n, total + n)
     i = 0
    while i < size(ret):
            ret[i] = ret[i] - 1
            i = i + 1
    end
    return ret
END METHOD

Notes:

Multinomial Distribution

The multinomial distribution models the number of times each of several mutually exclusive events happens among a given number of trials, where each event can have a separate probability of happening. The pseudocode below is of a method that takes two parameters: trials, which is the number of trials, and weights, which are the relative probabilities of each event. The method tallies the events as they happen and returns a list (with the same size as weights) containing the number of successes for each event.

METHOD Multinomial(trials, weights)
    if trials < 0: return error
    // create a list of successes
    list = NewList()
    i = 0
    while i < size(weights)
       AddItem(list, 0)
       i = i + 1
    end
    i = 0
    while i < trials
        // Choose an index
        index = DiscreteWeightedChoice(weights)
        // Tally the event at the chosen index
        list[index] = list[index] + 1
        i = i + 1
    end
    return list
END METHOD

Gaussian Copula

Correlated random numbers can be generated by sampling from a multivariate normal distribution, then converting the resulting numbers to uniformly-distributed numbers. In the following pseudocode, which generates correlated uniformly-distributed random numbers this way:

The pseudocode below is one example of a copula (a distribution of groups of two or more correlated uniform random numbers), namely, a Gaussian copula.

METHOD erf(v)
    if v==0: return 0
    if v<0: return erf(-v)
    if v==infinity: return 1
    // NOTE: For IEEE 64-bit IEEE 754 binary
    // floating-point (Java `double`), the following
    // line can be added:
    // if v>=6: return 1
    i=1
    ret=0
    zp=-(v*v)
    zval=1.0
    den=1.0
    while i < 100
        r=v*zval/den
        den=den+2
        ret=ret+r
        // NOTE: EPSILON can be 10^14,
        // for example.
        if abs(r)<EPSILON: break
        if i==1: zval=zp
        else: zval = zval*zp/i
        i = i + 1
    end
    return ret*2/sqrt(pi)
END METHOD

METHOD GaussianCopula(covar)
    mvn=MultivariateNormal(nothing, covar)
    i = 0
   sqrt2=sqrt(2)
   while i < size(covar)
      stdev=sqrt(covar[i][i])
      // Apply the standard normal distribution's CDF
      // function to get uniform variables
      mvn[i] = (erf(mvn[i]/(sqrt2))+1)*0.5
       i = i + 1
   end
   return mvn
END METHOD

Each of the resulting uniform variables will be in the interval [0, 1], and each one can be further transformed to any other probability distribution (which is called a marginal distribution here) by one of the methods given in "Random Numbers from an Arbitrary Distribution". (See also Cario and Nelson 1997.)

Example: To generate two correlated uniform variables by this method, generate GaussianCopula([[1, rho], [rho, 1]]), where rho is the Pearson correlation coefficient, in the interval [-1, 1]. (Note that rank correlation parameters, which can be converted to rho, can better describe the correlation than rho itself. For example, for a bivariate normal distribution, the Spearman coefficient srho can be converted to rho by rho = sin(srho * pi / 6) * 2. Rank correlation parameters are not further discussed in this document.)

Other Non-Uniform Distributions

Most commonly used:

Miscellaneous:

Geometric Sampling

This section contains various geometric sampling techniques.

Random Point Inside a Triangle

The following pseudocode, which generates a uniformly random point inside a 2-dimensional triangle, takes three parameters, p0, p1, and p2, each of which is a 2-item list containing the X and Y coordinates, respectively, of one vertex of the triangle.

    METHOD RandomPointInTriangle(
            x1=p1[0]-p0[0]
            y1=p1[1]-p0[1]
            x2=p2[0]-p0[0]
            y2=p2[1]-p0[1]
            den=(x1*y2-x2*y1)
            // Avoid division by zero
            if den==0: den=0.0000001
            r=RNDU01()
            s=RNDU01()
            xv=r*x1 + s*x2
            yv=r*y1 + s*y2
            a=(xv*y2 - x2*yv)/den
            b=(x1*yv - xv*y1)/den
            if a<=0 or b<=0 or a+b>=1
                    return [x1+x2+p0[0]-xv,y1+y2+p0[1]-yv]
            else
                    return [p0[0]+xv, p0[1]+yv]
            end
    end

Random Points on the Surface of a Hypersphere

To generate an N-dimensional point on the surface of an N-dimensional hypersphere of radius R, generate N Normal(0, 1) random numbers, then divide them by R / X, where X is those numbers' norm (if X is 0, the process should repeat). A hypersphere's surface is formed by all points lying 1 unit away from a common point in N-dimensional space. Based on a technique described in MathWorld.

This problem is equivalent to generating a random unit vector (vector with length 1) in N-dimensional space.

Random Points Inside a Ball

To generate an N-dimensional point inside an N-dimensional ball of radius R, either—

although the former method "may ... be slower" "in practice", according to a MathWorld article, which was the inspiration for the two methods given here.

If the ball is hollow, that is, only points within a range of distances from the center of the ball are allowed, then use either method given earlier to generate a random point for a ball of radius equal to the maximum allowed distance, until the norm of the numbers making up that point is within the desired range of distances.

Random Latitude and Longitude

To generate a random latitude and longitude on a sphere (in radians with west and south coordinates negative) such that the resulting point is (practically) uniformly distributed on the surface of a sphere—

Reference: "Sphere Point Picking" in MathWorld.

Conclusion

This page discussed many ways applications can extract random numbers from random number generators.

Feel free to send comments. They may help improve this page. In particular, corrections to any method given on this page are welcome.

I acknowledge the commenters to the CodeProject version of this page, including George Swan, who referred me to the reservoir sampling method.

Currently, the following are not covered in this document, but may be added based on reader interest:

Notes

(1) This number format describes B-bit signed integers with minimum value -2B-1 and maximum value 2B-1 - 1, where B is a positive even number of bits; examples include Java's short, int, and long, with 16, 32, and 64 bits, respectively. A signed integer is an integer that can be positive, zero, or negative. In two' s-complement form, nonnegative numbers have the highest (most significant) bit set to zero, and negative numbers have that bit (and all bits beyond) set to one, and a negative number is stored in such form by decreasing its absolute value by 1 and swapping the bits of the resulting number.

(2) The method that formerly appeared here is the Box-Muller transformation: mu + radius * cos(angle) and mu + radius * sin(angle), where angle = 2 * pi * RNDU01OneExc() and radius = sqrt(-2 * ln(RNDU01ZeroExc())) * sigma, are two independent normally-distributed random numbers. A method of generating approximate standard normal random numbers, which consists of summing twelve RNDU01OneExc() numbers and subtracting by 6 (see also "Irwin–Hall distribution" on Wikipedia), results in values not less than -6 or greater than 6; on the other hand, in a standard normal distribution, results less than -6 or greater than 6 will occur only with a generally negligible probability.

(3) The N numbers generated this way will form a point inside an N-dimensional hypercube with length 2 * R in each dimension and centered at the origin of space.

(4) A third kind of randomized "jitter" (for multi-component data points) consists of a point generated from a multivariate normal distribution with all the means equal to 0 and a covariance matrix that, in this context, serves as a bandwidth matrix. The second kind of "jitter" given here is an easy special case of the multivariate normal distribution, where the bandwidth corresponds to a bandwidth matrix with diagonal elements equal to bandwidth-squared and with zeros everywhere else.

(5) In situations where loops are not possible, such as within an SQL query, the idiom min(floor(RNDU01OneExc() * maxExclusive, maxExclusive - 1)), where min(a,b) is the smaller of a and b, returns an integer in the interval [0, maxExclusive); however, such an idiom can have a slight, but for most purposes negligible, bias toward maxExclusive - 1.

(6) This definition includes RNGs that—

If a number generator uses a nonuniform distribution, but otherwise meets this definition, then it can be converted to one with a uniform distribution, at least in theory, by applying the nonuniform distribution's cumulative distribution function (CDF) to each generated number (see also "Random Numbers from an Arbitrary Distribution"). Further details on this kind of conversion, as well a list of CDFs, are outside the scope of this document.

(7) For an exercise solved by this method, see A. Koenig and B. E. Moo, Accelerated C++, 2000; see also a blog post by Johnny Chan.

(8) RNDINTEXC is not given as the core random generation method because it's harder to fill integers in popular integer formats with random bits with this method.

(9)The "Dice" section used the following sources:

(10) Such techniques usually involve Markov chains, which are outside this page's scope.

(11) A convex polygon can be trivially decomposed into triangles that have one vertex in common and each have two other adjacent vertices of the original polygon. Triangulation of other polygons is nontrivial and outside the scope of this document.

License

This page is licensed under Creative Commons Zero.