Menu - Top - Home - Donate to Me

Random Number Generation and Sampling Methods

Peter Occil

Begun on June 4, 2017; last updated on Dec. 15, 2018.

Discusses many ways applications can do random number generation and sampling from an underlying RNG and includes pseudocode for many of them.

Introduction

This page discusses many ways applications can do random number generation and sampling from an underlying random number generator (RNG), often with pseudocode. Those methods include—

Sample Python code that implements many of the methods in this document is available.

All the random number methods presented on this page are ultimately based on an underlying RNG; however, the methods make no assumptions on that RNG's implementation (e.g., whether that RNG is a deterministic RNG or some other kind) or on that RNG's statistical quality or predictability.

In general, this document does not cover how to choose an underlying RNG for a particular application, including in terms of security, performance, and quality. Hash functions are also not specifically covered in this document. I have written more on RNG recommendations in another document.

About This Document

This is an open-source document; for an updated version, see the source code or its rendering on GitHub. You can send comments on this document either on CodeProject or on the GitHub issues page.

Comments on any aspect of this document are welcome, but especially comments on the following:

Contents

Notation and Definitions

Uniform Random Numbers

This section describes how an underlying RNG can be used to generate independent uniformly-distributed random numbers. Here is an overview of the methods described in this section.

One method, RNDINT, described next, can serve as the basis for the remaining methods.

RNDINT: Random Integers in [0, N]

In this document, RNDINT(maxInclusive) is the core method for generating independent uniform random integers from an underlying RNG, which is called RNG() in this section. The random integer is in the interval [0, maxInclusive]. If RNG() outputs integers in the interval [0, positive MODULUS) (examples of MODULUS include 1,000,000 and 6), then RNDINT(maxInclusive) can be implemented as in the pseudocode below.(2) (RNGs that output numbers in the interval [0, 1), such as Wichmann–Hill and dSFMT, are also seen in practice, but building RNDINT for those RNGs is more complicated unless the set of numbers they could return, as opposed to their probability, is evenly distributed, so that they can be transformed into an RNG that outputs integers instead.)

METHOD RndIntHelperNonPowerOfTwo(maxInclusive)
    cx = floor(maxInclusive / MODULUS) + 1
    while true
       ret = cx * RNG()
       // NOTE: If this method is implemented using a fixed-
       // precision type, the addition operation below should
       // check for overflow and should reject the number
       // if overflow would result.
       ret = ret + RNDINT(cx - 1)
       if ret <= maxInclusive: return ret
    end
END METHOD

METHOD RndIntHelperPowerOfTwo(maxInclusive)
        // NOTE: Finds the number of bits minus 1 needed
        // to represent MODULUS (in other words, the number
        // of random bits returned by RNG() ). This will
        // be a constant here, though.
        modBits = ln(MODULUS)/ln(2)
        // Calculate the bit count of maxInclusive
        bitCount = 0
        tempnumber = maxInclusive
        while tempnumber > 0
               // NOTE: If the programming language implements
               // division with two integers by truncating to an
               // integer, the division can be used as is without
               // using a "floor" function.
               tempnumber = floor(tempnumber / 2)
               bitCount = bitCount + 1
        end
        while true
               // Build a number with `bitCount` bits
                tempnumber = 0
                while bitCount > 0
                     wordBits = modBits
                     rngNumber = RNG()
                     if wordBits > bitCount
                        wordBits = bitCount
                        // Truncate number to 'wordBits' bits
                        // NOTE: If the programming language supports a bitwise
                        // AND operator, the mod operation can be implemented
                        // as "rndNumber AND ((1 << wordBits) - 1)"
                        rngNumber = rem(rngNumber, (1 << wordBits))
                     end
                     tempnumber = tempnumber << wordBits
                     // NOTE: In programming languages that
                     // support the OR operator between two
                     // integers, that operator can replace the
                     // plus operator below.
                     tempnumber = tempnumber + rngNumber
                     bitCount = bitCount - wordBits
                end
                // Accept the number if allowed
                if tempnumber <= maxInclusive: return tempnumber
         end
END METHOD

METHOD RNDINT(maxInclusive)
  // maxInclusive must be 0 or greater
  if maxInclusive < 0: return error
  if maxInclusive == 0: return 0
  // N equals modulus
  if maxInclusive == MODULUS - 1: return RNG()
  // NOTE: Finds the number of bits minus 1 needed
  // to represent MODULUS (if it's a power of 2).
  // This will be a constant here, though.
  modBits=ln(MODULUS)/ln(2)
  // NOTE: The following condition checks if MODULUS
  // is a power of 2.  This will be a constant here, though.
  isPowerOfTwo=floor(modBits) == modBits
  // Special cases if MODULUS is a power of 2
  if isPowerOfTwo
       if maxInclusive == 1: return rem(RNG(), 2)
       if maxInclusive == 3 and modBits >= 2: return rem(RNG(), 4)
       if maxInclusive == 255 and modBits >= 8: return rem(RNG(), 256)
       if maxInclusive == 65535 and modBits >=16: return rem(RNG(), 65535)
   end
  if maxInclusive > MODULUS - 1:
     if isPowerOfTwo
       return RndIntHelperPowerOfTwo(maxInclusive)
     else
       return RndIntHelperNonPowerOfTwo(maxInclusive)
     end
  else
    // NOTE: If the programming language implements
    // division with two integers by truncating to an
    // integer, the division can be used as is without
    // using a "floor" function.
          nPlusOne = maxInclusive + 1
          maxexc = floor((MODULUS - 1) / nPlusOne) * nPlusOne
          while true
                    ret = RNG()
                    if ret < nPlusOne: return ret
                    if ret < maxexc: return rem(ret, nPlusOne)
          end
  end
END METHOD

Notes:

  1. To generate a random number that's either -1 or 1, the following idiom can be used: (RNDINT(1) * 2 - 1).
  2. To generate a random integer that's divisible by a positive integer (DIV), generate the integer with any method (such as RNDINT), let X be that integer, then generate X - rem(X, DIV) if X >= 0, or X - (DIV - rem(abs(X), DIV)) otherwise. (Depending on the method, the resulting integer may be out of range, in which case this procedure is to be repeated.)
  3. A random 2-dimensional point on an NxM grid can be expressed as a single integer as follows:
    • To generate a random NxM point P, generate P = RNDINT(N * M - 1) (P is thus in the interval [0, N * M)).
    • To convert a point P to its 2D coordinates, generate [rem(P, N), floor(P / N)]. (Each coordinate starts at 0.)
    • To convert 2D coordinates coord to an NxM point, generate P = coord[1] * N + coord[0].
  4. In functional programming languages such as Haskell, RNDINT(), as well as RNG() itself and other random-number-generating methods in this document, can be implemented by taking a seed as an additional parameter, and returning a list of two items — the random number and a new seed (as in the Haskell package AC-Random). This works only if the underlying RNG is deterministic.

RNDINTRANGE: Random Integers in [N, M]

The naïve way of generating a random integer in the interval [minInclusive, maxInclusive], shown below, works well for unsigned integers and arbitrary-precision integers.

 METHOD RNDINTRANGE(minInclusive, maxInclusive)
   // minInclusive must not be greater than maxInclusive
   if minInclusive > maxInclusive: return error
   return minInclusive + RNDINT(maxInclusive - minInclusive)
 END METHOD

The naïve approach won't work as well, though, for signed integer formats if the difference between maxInclusive and minInclusive exceeds the highest possible integer for the format. For fixed-length signed integer formats (3), such random integers can be generated using the following pseudocode. In the pseudocode below, INT_MAX is the highest possible integer in the integer format.

METHOD RNDINTRANGE(minInclusive, maxInclusive)
   // minInclusive must not be greater than maxInclusive
   if minInclusive > maxInclusive: return error
   if minInclusive == maxInclusive: return minInclusive
   if minInclusive==0: return RNDINT(maxInclusive)
   // Difference does not exceed maxInclusive
   if minInclusive > 0 or minInclusive + INT_MAX >= maxInclusive
       return minInclusive + RNDINT(maxInclusive - minInclusive)
   end
   while true
     ret = RNDINT(INT_MAX)
     // NOTE: If the signed integer format uses two's-complement
     // form, use the following line:
     if RNDINT(1) == 0: ret = -1 - ret
     // NOTE: If the signed integer format has positive and negative
     // zero, as is the case, e.g., for Java `float` and
     // `double` and .NET's implementation of `System.Decimal`,
     // use the following three lines instead of the preceding line;
     // here, zero will be rejected at a 50% chance because zero occurs
     // twice in both forms.
     // negative = RNDINT(1) == 0
     // if negative: ret = 0 - ret
     // if negative and ret == 0: continue
     if ret >= minInclusive and ret <= maxInclusive: return ret
   end
END METHOD

Examples:

  1. To simulate rolling an N-sided die (N greater than 1), generate a random number in the interval [1, N] by RNDINTRANGE(1, N).
  2. Generating a random integer with one base-10 digit is equivalent to generating RNDINTRANGE(0, 9).
  3. Generating a random integer with N base-10 digits (where N is 2 or greater), where the first digit can't be 0, is equivalent to generating RNDINTRANGE(pow(10, N-1), pow(10, N) - 1).
  4. Pseudocode like the following can be used to choose a random date-and-time bounded by two others (date1, date2). In the following pseudocode, DATETIME_TO_NUMBER and NUMBER_TO_DATETIME convert a date-and-time to or from a number, respectively, at the required granularity, for instance, month, day, or hour granularity (the details of such conversion depend on the date-and-time format and are outside the scope of this document).

    dtnum1 = DATETIME_TO_NUMBER(date1)
    dtnum2 = DATETIME_TO_NUMBER(date2)
    // Choose a random date-and-time
    // in [dtnum1, dtnum2].  Any other
    // random selection strategy can be
    // used here.
    num = RNDINTRANGE(date1, date2)
    result = NUMBER_TO_DATETIME(num)
    

RNDU01, RNDU01OneExc, RNDU01ZeroExc, and RNDU01ZeroOneExc: Random Numbers Bounded by 0 and 1

This section defines four methods that generate a random number bounded by 0 and 1. For each method below, the alternatives are ordered from most preferred to least preferred, and X and INVX are defined later.

Definitions of Constants

X is the highest integer p such that all multiples of 1/p in the interval [0, 1] are representable in the number format in question. For example—

INVX is the constant 1 divided by X.

Alternative Implementation for RNDU01

For Java's double and float (or generally, any fixed-precision binary floating-point format with fixed exponent range), the following pseudocode for RNDU01() can be used instead. See also (Downey 2007)(4). In the pseudocode below, SIGBITS is the binary floating-point format's precision (the number of binary digits the format can represent without loss; e.g., 53 for Java's double).

METHOD RNDU01()
    e=-SIGBITS
    while true
        if RNDINT(1)==0: e = e - 1
      else: break
    end
    sig = RNDINT((1 << (SIGBITS - 1)) - 1)
    if sig==0 and RNDINT(1)==0: e = e + 1
    sig = sig + (1 << (SIGBITS - 1))
    // NOTE: This multiplication should result in
    // a floating-point number; if `e` is sufficiently
    // small, the number might underflow to 0
    return sig * pow(2, e)
END METHOD

RNDINTEXC: Random Integers in [0, N)

RNDINTEXC(maxExclusive), which generates a random integer in the interval [0, maxExclusive), can be implemented as follows(5):

 METHOD RNDINTEXC(maxExclusive)
    if maxExclusive <= 0: return error
    return RNDINT(maxExclusive - 1)
 END METHOD

RNDINTEXC can also be implemented in terms of RNDU01OneExc() as follows (this can be better for some programming languages, such as JavaScript, but not for others that readily allow the approach above)(6):

 METHOD RNDINTEXC(maxExclusive)
   if maxExclusive <= 0: return error
   while true
     // The loop is needed because otherwise, rounding
     // error due to the nature of certain floating-point
     // formats can result in `maxExclusive` being returned
     // in rare cases.
     n = floor(RNDU01OneExc()*(maxExclusive));
     if n < maxExclusive: return n
   end
 END METHOD

Example: Generating a random number in the interval [mn, mx) in increments equal to step is equivalent to generating mn+step*RNDINTEXC(ceil((mx-mn)/(1.0*step)))

RNDINTEXCRANGE: Random Integers in [N, M)

RNDINTEXCRANGE returns a random integer in the interval [minInclusive, maxExclusive). It can be implemented using RNDINTRANGE, as the following pseudocode demonstrates.

METHOD RNDINTEXCRANGE(minInclusive, maxExclusive)
   if minInclusive >= maxExclusive: return error
   // NOTE: For signed integer formats, replace the following line
   // with "if minInclusive >=0 or minInclusive + INT_MAX >=
   // maxExclusive", where `INT_MAX` has the same meaning
   // as in the pseudocode for `RNDINTRANGE`.
   if minInclusive >=0
     return RNDINTRANGE(minInclusive, maxExclusive - 1)
   end
   while true
     ret = RNDINTRANGE(minInclusive, maxExclusive)
     if ret < maxExclusive: return ret
   end
END METHOD

RNDNUMRANGE: Random Numbers in [X, Y]

The naïve way of generating a random number in the interval [minInclusive, maxInclusive], is shown in the following pseudocode, which generally works well only if the number format can't be negative or that format uses arbitrary precision.

METHOD RNDNUMRANGE(minInclusive, maxInclusive)
    if minInclusive > maxInclusive: return error
    return minInclusive + (maxInclusive - minInclusive) * RNDU01()
END METHOD

For other number formats (including Java's double and float), the pseudocode above can overflow if the difference between maxInclusive and minInclusive exceeds the maximum possible value for the format. For such formats, the following pseudocode for RNDNUMRANGE() can be used instead. In the pseudocode below, NUM_MAX is the highest possible finite number for the number format. The pseudocode assumes that the highest possible value is positive and the lowest possible value is negative.

METHOD RNDNUMRANGE(minInclusive, maxInclusive)
   if minInclusive > maxInclusive: return error
   if minInclusive == maxInclusive: return minInclusive
   // usual: Difference does not exceed maxInclusive
   usual=minInclusive >= 0 or
       minInclusive + NUM_MAX >= maxInclusive
   rng=NUM_MAX
   if usual: rng = (maxInclusive - minInclusive)
   while true
     ret = rng * RNDU01()
     if usual: return minInclusive + ret
     // NOTE: If the number format has positive and negative
     // zero, as is the case for Java `float` and
     // `double` and .NET's implementation of `System.Decimal`,
     // for example, use the following:
     negative = RNDINT(1) == 0
     if negative: ret = 0 - ret
     if negative and ret == 0: continue
     // NOTE: For fixed-precision fixed-point numbers implemented
     // using two's complement numbers (note 1), use the following line
     // instead of the preceding three lines, where `QUANTUM` is the
     // smallest representable positive number in the fixed-point format:
     // if RNDINT(1) == 0: ret = (0 - QUANTUM) - ret
     if ret >= minInclusive and ret <= maxInclusive: return ret
   end
END METHOD

REMARK: Multiplying by RNDU01() in both cases above is not ideal, since doing so merely stretches that number to fit the range if the range is greater than 1. There may be more sophisticated ways to fill the gaps that result this way in RNDNUMRANGE.(7)

RNDNUMEXCRANGE: Random Numbers in [X, Y)

RNDNUMEXCRANGE returns a random number in the interval [minInclusive, maxExclusive). It can be implemented using RNDNUMRANGE, as the following pseudocode demonstrates.

METHOD RNDNUMEXCRANGE(minInclusive, maxExclusive)
   if minInclusive >= maxExclusive: return error
   while true
     ret = RNDNUMRANGE(minInclusive, maxExclusive)
     if ret < maxExclusive: return ret
   end
END METHOD

Uniform Random Bits

The idiom RNDINT((1 << b) - 1) is a naïve way of generating a uniform random b-bit integer (with maximum 2b - 1).

In practice, memory is usually divided into bytes, or 8-bit unsigned integers in the interval [0, 255]. In this case, a byte array or a block of memory can be filled with random bits by setting each byte to RNDINT(255). (There may be faster, RNG-specific ways to fill memory with random bytes, such as with RNGs that generate random numbers in parallel. These ways are not detailed in this document.)

Certain Programming Environments

For certain programming environments, there are special considerations:

Whenever possible, the methods in this document should be implemented in a more general-purpose programming language than query languages, shell scripts, and batch files, especially if information security is a goal.

Randomization Techniques

This section describes commonly used randomization techniques, such as shuffling, selection of several unique items, and creating random strings of text.

Boolean (True/False) Conditions

To generate a condition that is true at the specified probabilities, use the following idioms in an if condition:

Examples:

Random Sampling

This section contains ways to choose one or more items from among a collection of them, where each item has the same chance to be chosen as any other. This is called random sampling and can be done with replacement or without replacement.

Sampling With Replacement: Choosing a Random Item from a List

To choose a random item from a list—

Choosing an item this way is also known as sampling with replacement.

Sampling Without Replacement: Choosing Several Unique Items

There are several techniques (each a sampling without replacement) for choosing k unique items or values uniformly at random from among n available items or values, depending on such things as whether n is known and how big n and k are.

  1. If n is not known in advance: Use the reservoir sampling method; see the RandomKItemsFromFile method, in pseudocode given later.
  2. If n is relatively small (for example, if there are 200 available items, or there is a range of numbers from 0 to 200 to choose from): If items are to be chosen from a list in relative order, then the RandomKItemsInOrder method, in pseudocode given later, demonstrates a solution. Otherwise, one of the following will choose k items in random order:
    • Store all the items in a list, shuffle that list, then choose the first k items from that list.
    • If the items are already stored in a list and the list's order can be changed, then shuffle that list and choose the first k items from the shuffled list.
    • If the items are already stored in a list and the list's order can't be changed, then store the indices to those items in another list, shuffle the latter list, then choose the first k indices (or the items corresponding to those indices) from the latter list.
    • If k is much smaller than n, proceed as in item 3 instead.
  3. If k is much smaller than n: The first three cases below will choose k items in random order:
    • If the items are stored in a list whose order can be changed: Do a partial shuffle of that list, then choose the last k items from that list. A partial shuffle proceeds as given in the section "Shuffling", except the partial shuffle stops after k swaps have been made (where swapping one item with itself counts as a swap).
    • Otherwise, if the items are stored in a list and n is not very large (for example, less than 5000): Store the indices to those items in another list, do a partial shuffle of the latter list, then choose the last k indices (or the items corresponding to those indices) from the latter list.
    • Otherwise, if n is not very large: Store all the items in a list, do a partial shuffle of that list, then choose the last k items from that list.
    • Otherwise, see item 5.
  4. If n - k is much smaller than n and the sampled items need not be in random order: Proceed as in step 3, except the partial shuffle involves n - k swaps and the first k items are chosen rather than the last k.
  5. Otherwise (for example, if 32-bit or larger integers will be chosen so that n is 232, or if n is otherwise very large): Create a data structure to store the indices to items already chosen. When a new index to an item is randomly chosen, add it to the data structure if it's not already there, or if it is, choose a new random index. Repeat this process until k indices were added to the data structure this way. Examples of suitable data structures are—

    • a hash table,
    • a compressed bit set (e.g, "roaring bitmap", EWAH), and
    • a self-sorting data structure such as a red–black tree, if the random items are to be retrieved in sorted order or in index order.

    An alternative approach is to use a linear congruential generator with full period and with modulus greater than n. When using this approach, generate a seed with RNDINTRANGE(1, n - 1), initialize the generator with the seed, then take the first k integers less than or equal to n with that generator, then subtract 1 from each such integer. These will be the randomly sampled indices (starting at 0) to items in the list.(9)

Shuffling

The Fisher–Yates shuffle method shuffles a list (puts its items in a random order) such that all permutations (arrangements) of that list are equally likely to occur, assuming the RNG it uses can choose any one of those permutations. However, that method is also easy to write incorrectly — see also (Atwood 2007)(10). The following pseudocode is designed to shuffle a list's contents.

METHOD Shuffle(list)
   // NOTE: Check size of the list early to prevent
   // `i` from being less than 0 if the list's size is 0 and
   // `i` is implemented using an unsigned type available
   // in certain programming languages.
   if size(list) >= 2
      // Set i to the last item's index
      i = size(list) - 1
      while i > 0
         // Choose an item ranging from the first item
         // up to the item given in `i`. Note that the item
         // at i+1 is excluded.
         k = RNDINTEXC(i + 1)
         // The following is wrong since it introduces biases:
         // "k = RNDINTEXC(size(list))"
         // The following is wrong since the algorithm won't
         // choose from among all possible permutations:
         // "k = RNDINTEXC(i)"
         // Swap item at index i with item at index k;
         // in this case, i and k may be the same
         tmp = list[i]
         list[i] = list[k]
         list[k] = tmp
         // Move i so it points to the previous item
         i = i - 1
      end
   end
   // NOTE: An implementation can return the
   // shuffled list, as is done here, but this is not required.
   return list
END METHOD

An important consideration with respect to shuffling is the nature of the underlying RNG, as I discuss in further detail in my RNG recommendation document on shuffling.(11)

Random Character Strings

To generate a random string of characters:

  1. Generate a list of the letters, digits, and/or other characters the string can have. Examples are given later in this section.
  2. Build a new string whose characters are chosen from that character list. The pseudocode below demonstrates this by creating a list, rather than a string, where the random characters will be held. It also takes the number of characters as a parameter named stringSize. (How to convert this list to a text string depends on the programming language and is outside the scope of this page.)

 

METHOD RandomString(characterList, stringSize)
  i = 0
  newString = NewList()
  while i < stringSize
    // Choose a character from the list
    randomChar = characterList[RNDINTEXC(size(characterList))]
    // Add the character to the string
    AddItem(newString, randomChar)
    i = i + 1
  end
  return newString
END METHOD

The following are examples of character lists:

  1. For an alphanumeric string, or string of letters and digits, the characters can be the basic digits "0" to "9" (U+0030-U+0039, nos. 48-57), the basic upper case letters "A" to "Z" (U+0041-U+005A, nos. 65-90), and the basic lower case letters "a" to "z" (U+0061-U+007A, nos. 96-122), as given in the Unicode Standard.
  2. For a base-10 digit string, the characters can be the basic digits only.
  3. For a base-16 digit (hexadecimal) string, the characters can be the basic digits as well as the basic letters "A" to "F" or "a" to "f" (not both).

Notes:

  1. If the list of characters is fixed, the list can be created in advance at runtime or compile time, or (if every character takes up the same number of code units) a string type as provided in the programming language can be used to store the list as a string.
  2. Unique random strings: Often applications need to generate a string of characters that's not only random, but also unique. This can be done by storing a list (such as a hash table) of strings already generated and checking newly generated strings against that list.(12)
  3. Word generation: This technique could also be used to generate "pronounceable" words, but this is less flexible than other approaches; see also "Weighted Choice With Replacement".

Pseudocode for Random Sampling

The following pseudocode implements two methods:

  1. RandomKItemsFromFile implements reservoir sampling; it chooses up to k random items from a file of indefinite size (file). Although the pseudocode refers to files and lines, the technique applies to any situation when items are retrieved one at a time from a data set or list whose size is not known in advance. See the comments to find out how RandomKItemsFromFile can be used to choose an item at random only if it meets certain criteria (see "Rejection Sampling" for example criteria).
  2. RandomKItemsInOrder returns a list of up to k random items from the given list (list), in the order in which they appeared in the list. It is based on a technique presented in Devroye 1986, p. 620.

 

METHOD RandomKItemsFromFile(file, k)
  list = NewList()
  j = 0
  index = 0
  while true
    // Get the next line from the file
    item = GetNextLine(file)
    thisIndex = index
    index = index + 1
    // If the end of the file was reached, break
    if item == nothing: break
    // NOTE 1: The following line is OPTIONAL
    // and can be used to choose only random lines
    // in the file that meet certain criteria,
    // expressed as MEETS_CRITERIA below.
    // ------
    // if not MEETS_CRITERIA(item): continue
    // ------
    if j < k // phase 1 (fewer than k items)
      AddItem(list, item)
      // NOTE 2: To add the line number (starting at
      // 0) rather than the item, use the following
      // line instead of the previous one:
      // AddItem(list, thisIndex)
      j = j + 1
    else // phase 2
      j = RNDINT(thisIndex)
      if j < k: list[j] = item
      // NOTE 3: To add the line number (starting at
      // 0) rather than the item, use the following
      // line instead of the previous one:
      // if j < k: list[j] = thisIndex
    end
  end
  // NOTE 4: We shuffle at the end in case k or
  // fewer lines were in the file, since in that
  // case the items would appear in the same
  // order as they appeared in the file
  // if the list weren't shuffled.  This line
  // can be removed, however, if the items
  // in the returned list need not appear
  // in random order.
  if size(list)>=2: Shuffle(list)
  return list
end

METHOD RandomKItemsInOrder(list, k)
  i = 0
  kk = k
  ret = NewList()
  n = size(list)
  while i < n and size(ret) < k
    u = RNDINTEXC(n - i)
    if u <= kk
      AddItem(ret, list[i])
      kk = kk - 1
    end
    i = i + 1
  end
  return ret
END METHOD

Examples:

  1. Assume a file (file) has the lines "f", "o", "o", "d", in that order. If we modify RandomKItemsFromFile as given in notes 2 and 3 there, and treat MEETS_CRITERIA(item) above as item == "o" (in note 1 of that method), then we can choose a random line number of an "o" line by RandomKItemsFromFile(file, 1).
  2. Removing k random items from a list of n items (list) is equivalent to generating a new list by RandomKItemsInOrder(list, n - k).
  3. Filtering: If an application needs to sample the same list (with or without replacement) repeatedly, but only from among a selection of that list's items, it can create a list of items it wants to sample from (or a list of indices to those items), and sample from the new list instead.(13) This won't work well, though, for lists of indefinite or very large size.

Rejection Sampling

Rejection sampling is a simple and flexible approach for generating random content that meets certain requirements. To implement rejection sampling:

  1. Generate the random content (such as a random number) by any method and with any distribution and range.
  2. If the content doesn't meet predetermined criteria, go to step 1.

Example criteria include checking—

(KD-trees, hash tables, red-black trees, prime-number testing algorithms, and regular expressions are outside the scope of this document.)

Generating Random Numbers in Sorted Order

The following pseudocode describes a method that generates random numbers in the interval [0, 1] in descending order; see (Bentley and Saxe 1980)(14). count is the number of random numbers to generate this way.

 METHOD SortedRandom(count)
    list = NewList()
    k = count
    c = 1.0
    while k > 0
        c = pow(RNDU01(), 1.0 / k) * c
        AddItem(list, c)
        k = k - 1
    end
    return list
 END METHOD

Alternatively, random numbers can be generated (using any method and where the numbers have any distribution and range) and stored in a list, and the list then sorted using a sorting algorithm. Details on sorting algorithms, however, are beyond the scope of this document.

Random Walks

A random walk is a process with random behavior over time. A simple form of random walk involves generating a random number that changes the state of the walk. The pseudocode below generates a random walk of n random numbers, where STATEJUMP() is the next number to add to the current state (see examples later in this section).

METHOD RandomWalk(n)
  // Create a new list with an initial state
  list=[0]
  // Add 'n' new numbers to the list.
  for i in 0...n: AddItem(list, list[i] + STATEJUMP())
  return list
END METHOD

Examples:

  1. If STATEJUMP() is RNDINT(1) * 2 - 1, the random walk generates numbers that differ by -1 or 1, chosen at random.
  2. If STATEJUMP() is RNDNUMRANGE(-1, 1), the random state is advanced by a random real number in the interval [-1, 1].
  3. If STATEJUMP() is Binomial(1, p), the random walk models a binomial process, where the state is advanced with probability p.
  4. If STATEJUMP() is Binomial(1, p) * 2 - 1, the random walk generates numbers that each differ from the last by -1 or 1 depending on the probability p.

Notes:

  1. A random process can also be simulated by creating a list of random numbers generated the same way. Such a process generally models behavior over time that does not depend on the time or the current state. Examples of this include Normal(0, 1) (for modeling Gaussian white noise) and Binomial(1, p) (for modeling a Bernoulli process, where each number is 0 or 1 depending on the probability p).
  2. Some random walks model random behavior at every moment, not just at discrete times. One example is a Wiener process, with random states and jumps that are normally distributed (a process of this kind is also known as Brownian motion). (For a random walk that follows a Wiener process, STATEJUMP() is Normal(mu * timediff, sigma * sqrt(timediff)), where mu is the average value per time unit, sigma is the volatility, and timediff is the time difference between samples.)
  3. Some random walks model state changes happening at random times. One example is a Poisson process, in which the time between each event is a random exponential variable (the random time is -ln(RNDU01ZeroOneExc()) / rate, where rate is the average number of events per time unit; an inhomogeneous Poisson process results if rate can vary with the "timestamp" before each event jump).

Low-Discrepancy Sequences

A low-discrepancy sequence (or quasirandom sequence) is a sequence of numbers that follow a uniform distribution, but are less likely to form "clumps" than independent uniform random numbers are. The following are examples:

 

METHOD MLCG(seed) // m = 262139
  if seed<=0: return error
  return rem(92717*seed,262139)/262139.0
END METHOD

In most cases, RNGs can be used to generate a "seed" to start the low-discrepancy sequence at.

Randomization in Simulations

Simulation testing uses shuffling and bootstrapping to help draw conclusions on data through randomization.

After creating the simulated data sets, one or more statistics, such as the mean, are calculated for each simulated data set as well as the original data set, then the statistics for the simulated data sets are compared with those of the original (such comparisons are outside the scope of this document).

Randomization also occurs in numerical calculation.

General Non-Uniform Distributions

Some applications need to choose random items or numbers such that some of them are more likely to be chosen than others (a non-uniform distribution). Most of the techniques in this section show how to use the uniform random number methods to generate such random items or numbers.

Weighted Choice

The weighted choice method generates a random item or number from among a collection of them with separate probabilities of each item or number being chosen. There are several kinds of weighted choice.

Weighted Choice With Replacement

The first kind is called weighted choice with replacement (which can be thought of as drawing a ball, then putting it back), where the probability of choosing each item doesn't change as items are chosen.

The following pseudocode implements a method WeightedChoice that takes a single list weights of weights (numbers 0 or greater), and returns the index of a weight from that list. The greater the weight, the more likely its index will be chosen. (Note that there are two possible ways to generate the random number depending on whether the weights are all integers or can be fractional numbers.)

METHOD WeightedChoice(weights)
    if size(weights) == 0: return error
    sum = 0
    // Get the sum of all weights
    i = 0
    while i < size(weights)
        sum = sum + weights[i]
        i = i + 1
    end
    // Choose a random integer/number from 0 to less than
    // the sum of weights.
    value = RNDINTEXC(sum)
    // NOTE: If the weights can be fractional numbers,
    // use this instead:
    // value = RNDNUMEXCRANGE(0, sum)
    // Choose the object according to the given value
    i = 0
    lastItem = size(weights) - 1
    runningValue = 0
    while i < size(weights)
       if weights[i] > 0
          newValue = runningValue + weights[i]
          // NOTE: Includes start, excludes end
          if value < newValue: return i
          runningValue = newValue
          lastItem = i
       end
       i = i + 1
    end
    // Last resort (might happen because rounding
    // error happened somehow)
    return lastItem
END METHOD

Examples:

  1. Assume we have the following list: ["apples", "oranges", "bananas", "grapes"], and weights is the following: [3, 15, 1, 2]. The weight for "apples" is 3, and the weight for "oranges" is 15. Since "oranges" has a higher weight than "apples", the index for "oranges" (1) is more likely to be chosen than the index for "apples" (0) with the WeightedChoice method. The following pseudocode implements how to get a randomly chosen item from the list with that method.

    index = WeightedChoice(weights) // Get the actual item item = list[index]

  2. Assume the weights from example 1 are used and the list contains ranges of numbers instead of strings: [[0, 5], [5, 10], [10, 11], [11, 13]]. If a random range is chosen, a random number can be chosen from that range using code like the following: number = RNDNUMEXCRANGE(item[0], item[1]). (See also "Mixtures of Distributions".)

  3. Assume the weights from example 1 are used and the list contains the following: [0, 5, 10, 11, 13] (one more item than the weights). This expresses four ranges, the same as in example 2. After a random index is chosen with index = WeightedChoice(weights), a random number can be chosen from the corresponding range using code like the following: number = RNDNUMEXCRANGE(list[index], list[index + 1]). (This is how the C++ library expresses a piecewise constant distribution.)
  4. A Markov chain models one or more states (for example, individual letters or syllables), and stores the probabilities to transition from one state to another (e.g., "b" to "e" with a probability of 0.2, or "b" to "b" with a probability of 0.01). Thus, each state can be seen as having its own list of weights for each relevant state transition. For example, a Markov chain for generating "pronounceable" words, or words similar to natural-language words, can include "start" and "stop" states for the start and end of the word, respectively.

Weighted Choice Without Replacement (Multiple Copies)

To implement weighted choice without replacement (which can be thought of as drawing a ball without putting it back), generate an index by WeightedChoice, and then decrease the weight for the chosen index by 1. In this way, each weight behaves like the number of "copies" of each item. This technique, though, will only work properly if all the weights are integers 0 or greater. The pseudocode below is an example of this.

// Get the sum of weights
// (NOTE: This code assumes that `weights` is
// a list that can be modified.  If the original weights
// are needed for something else, a copy of that
// list should be made first, but the copying process
// is not shown here.  This code also assumes that `list`,
// a list of items, was already declared earlier and
// has at least as many items as `weights`.)
totalWeight = 0
i = 0
while i < size(weights)
    totalWeight = totalWeight + weights[i]
    i = i + 1
end
// Choose as many items as the sum of weights
i = 0
items = NewList()
while i < totalWeight
    index = WeightedChoice(weights)
    // Decrease weight by 1 to implement selection
    // without replacement.
    weights[index] = weights[index] - 1
    AddItem(items, list[index])
    i = i + 1
end

Alternatively, if all the weights are integers 0 or greater and their sum is relatively small, create a list with as many copies of each item as its weight, then shuffle that list. The resulting list will be ordered in a way that corresponds to a weighted random choice without replacement.

Note: The weighted sampling described in this section can be useful to some applications (particularly some games) that wish to control which random numbers appear, to make the random outcomes appear fairer to users (e.g., to avoid long streaks of good outcomes or of bad outcomes). When used for this purpose, each item represents a different outcome (e.g., "good" or "bad"), and the lists are replenished once no further items can be chosen. However, this kind of sampling should not be used for this purpose whenever information security (ISO/IEC 27000) is involved, including when predicting future random numbers would give a player or user a significant and unfair advantage.

Weighted Choice Without Replacement (Single Copies)

Weighted choice can also choose items from a list, where each item has a separate probability of being chosen and can be chosen no more than once. In this case, after choosing a random index, set the weight for that index to 0 to keep it from being chosen again. The pseudocode below is an example of this.

// (NOTE: This code assumes that `weights` is
// a list that can be modified.  If the original weights
// are needed for something else, a copy of that
// list should be made first, but the copying process
// is not shown here.  This code also assumes that `list`,
// a list of items, was already declared earlier and
// has at least as many items as `weights`.)
chosenItems = NewList()
i = 0
// Choose k items from the list
while i < k or i < size(weights)
    index = WeightedChoice(weights)
    // Set the weight for the chosen index to 0
    // so it won't be chosen again
    weights[index] = 0
    // Add the item at the chosen index
    AddItem(chosenItems, list[index])
end
// `chosenItems` now contains the items chosen

The technique presented here can solve the problem of sorting a list of items such that higher-weighted items are more likely to appear first.

Weighted Choice Without Replacement (Indefinite-Size List)

If the number of items in a list is not known in advance, then the following pseudocode implements a RandomKItemsFromFileWeighted that selects up to k random items from a file (file) of indefinite size (similarly to RandomKItemsFromFile). See (Efraimidis and Spirakis 2005)(17), and see also (Efraimidis 2015)(18). In the pseudocode below, WEIGHT_OF_ITEM(item, thisIndex) is an arbitrary function that calculates the weight of an individual item based on its value and its index (starting at 0); the item is ignored if its weight is 0 or less.

METHOD RandomKItemsFromFileWeighted(file, k)
  list = NewList()
  j = 0
  index = 0
  skIndex = 0
  smallestKey = 0
  t = 0
  while true
    // Get the next line from the file
    item = GetNextLine(file)
    thisIndex = index
    index = index + 1
    // If the end of the file was reached, break
    if item == nothing: break
    weight = WEIGHT_OF_ITEM(item, thisIndex)
    // Ignore if item's weight is 0 or less
    if weight <= 0: continue
    key = pow(RNDU01(),1.0/weight)
    t = smallestKey
    if index == 0 or key < smallestKey
      skIndex = index
      smallestKey = key
    end
    if j < k // phase 1 (fewer than k items)
      AddItem(list, item)
      // To add the line number (starting at
      // 0) rather than the item, use the following
      // line instead of the previous one:
      // AddItem(list, thisIndex)
      j = j + 1
    else // phase 2
      if t < key: list[skIndex] = item
      // To add the line number (starting at
      // 0) rather than the item, use the following
      // line instead of the previous one:
      // if t < key: list[skIndex] = thisIndex
    end
  end
  // Optional shuffling here.
  // See NOTE 4 in RandomKItemsFromFile code.
  if size(list)>=2: Shuffle(list)
  return list
end

Note: As (Efraimidis 2015)(18) points out, weighted choice with replacement on an indefinite-length data set is equivalent to doing one or more concurrent runs of weighted choice without replacement on that data set with k = 1. (In the algorithm above, each run starts at the same position in the data set, but file is treated as a view of that data set that traverses that data set independently of any other view.)

Continuous Weighted Choice

The continuous weighted choice method generates a random number that follows a continuous probability distribution (here, a piecewise linear distribution).

The pseudocode below takes two lists as follows:

 

METHOD ContinuousWeightedChoice(list, weights)
    if size(list) <= 0 or size(weights) < size(list): return error
    if size(list) == 1: return list[0]
    // Get the sum of all areas between weights
    sum = 0
    areas = NewList()
    i = 0
    while i < size(list) - 1
      weightArea = abs((weights[i] + weights[i + 1]) * 0.5 *
            (list[i + 1] - list[i]))
      AddItem(areas, weightArea)
      sum = sum + weightArea
       i = i + 1
    end
    // Choose a random number
    value = RNDNUMEXCRANGE(0, sum)
    // Interpolate a number according to the given value
    i=0
    // Get the number corresponding to the random number
    runningValue = 0
    while i < size(list) - 1
     area = areas[i]
     if area > 0
      newValue = runningValue + area
      // NOTE: Includes start, excludes end
      if value < newValue
       // NOTE: The following line can also read
       // "interp = RNDU01OneExc()", that is, a new number is generated
       // within the chosen area rather than using the point
       // already generated.
       interp = (value - runningValue) / (newValue - runningValue)
       retValue = list[i] + (list[i + 1] - list[i]) * interp
       return retValue
      end
      runningValue = newValue
     end
     i = i + 1
    end
    // Last resort (might happen because rounding
    // error happened somehow)
    return list[size(list) - 1]
END METHOD

Example: Assume list is the following: [0, 1, 2, 2.5, 3], and weights is the following: [0.2, 0.8, 0.5, 0.3, 0.1]. The weight for 2 is 0.5, and that for 2.5 is 0.3. Since 2 has a higher weight than 2.5, numbers near 2 are more likely to be chosen than numbers near 2.5 with the ContinuousWeightedChoice method.

Mixtures of Distributions

A mixture consists of two or more probability distributions with separate probabilities of being sampled. To generate random content from a mixture—

  1. generate index = WeightedChoice(weights), where weights is a list of relative probabilities that each distribution in the mixture will be sampled, then
  2. based on the value of index, generate the random content from the corresponding distribution.

Examples:

  1. One mixture consists of two normal distributions with two different means: 1 and -1, but the mean 1 normal will be sampled 80% of the time. The following pseudocode shows how this mixture can be sampled:

    index = WeightedChoice([80, 20])
    number = 0
    // If index 0 was chosen, sample from the mean 1 normal
    if index==0: number = Normal(1, 1)
    // Else index 1 was chosen, so sample from the mean -1 normal
    else: number = Normal(-1, 1)
    
  2. A hyperexponential distribution is a mixture of exponential distributions, each one with a separate weight and separate rate. An example is below.

    index = WeightedChoice([0.6, 0.3, 0.1])
    // Rates of the three exponential distributions
    rates = [0.3, 0.1, 0.05]
    // Generate an exponential random number with chosen rate
    number = -ln(RNDU01ZeroOneExc()) / rates[index]
    
  3. Choosing a point uniformly at random from a complex shape (in any number of dimensions) is equivalent to sampling uniformly from a mixture of simpler shapes that make up the complex shape (here, the weights list holds the content of each simpler shape). (Content is called area in 2D and volume in 3D.) For example, a simple closed 2D polygon can be triangulated, or decomposed into triangles, and a mixture of those triangles can be sampled.(19)

  4. For generating a random integer from multiple nonoverlapping ranges of integers—

    • each range has a weight of (mx - mn) + 1, where mn is that range's minimum and mx is its maximum, and
    • the chosen range is sampled by generating RNDINTRANGE(mn, mx), where mn is the that range's minimum and mx is its maximum.

    For generating random numbers, that may or may not be integers, from nonoverlapping number ranges, each weight is mx - mn instead and the number is sampled by RNDNUMEXCRANGE(mn, mx) instead.

Transformations of Random Numbers

Random numbers can be generated by combining and/or transforming one or more random numbers and/or discarding some of them.

As an example, "Probability and Games: Damage Rolls" by Red Blob Games includes interactive graphics showing score distributions for lowest-of, highest-of, drop-the-lowest, and reroll game mechanics.(20) These and similar distributions can be generalized as follows.

Generate two or more random numbers, each with a separate probability distribution, then:

  1. Highest-of: Choose the highest generated number.
  2. Drop-the-lowest: Add all generated numbers except the lowest.
  3. Reroll-the-lowest: Add all generated numbers except the lowest, then add a number generated randomly by a separate probability distribution.
  4. Lowest-of: Choose the lowest generated number.
  5. Drop-the-highest: Add all generated numbers except the highest.
  6. Reroll-the-highest: Add all generated numbers except the highest, then add a number generated randomly by a separate probability distribution.
  7. Sum: Add all generated numbers.
  8. Mean: Find the mean of all generated numbers (see the appendix).
  9. Geometric transformation: Treat the numbers as an n dimensional point, then apply a geometric transformation, such as a rotation or other affine transformation(21), to that point.

If the probability distributions are the same, then strategies 1 to 3 make higher numbers more likely, and strategies 4 to 6, lower numbers.

Notes:

  1. Variants of strategy 4 — e.g., choosing the second-, third-, or nth-lowest number — are formally called second-, third-, or nth-order statistics distributions, respectively.
  2. As an extension of strategy 9 (geometric transformations), a random point (x, y) can be rotated to derive a point with correlated random coordinates (old x, new x) as follows (see (Saucier 2000)(22), sec. 3.8): [x, y*sqrt(1 - rho * rho) + rho * x], where x and y are independent random numbers from the same distribution, and rho is a correlation coefficient in the interval [-1, 1] (if rho is 0, the variables are uncorrelated).

Examples:

  1. The idiom min(RNDINTRANGE(1, 6), RNDINTRANGE(1, 6)) takes the lowest of two six-sided die results. Due to this approach, 1 is more likely to occur than 6.
  2. The idiom RNDINTRANGE(1, 6) + RNDINTRANGE(1, 6) takes the result of two six-sided dice (see also "Dice").
  3. Sampling a Bates distribution involves sampling n random numbers by RNDNUMRANGE(minimum, maximum), then finding the mean of those numbers (see the appendix).
  4. A compound Poisson distribution models the sum of n random numbers each generated the same way, where n follows a Poisson distribution (e.g., n = Poisson(10) for an average of 10 numbers).
  5. A hypoexponential distribution models the sum of n random numbers following an exponential distribution, each with a separate lamda parameter (see "Gamma Distribution").

Random Numbers from a Distribution of Data Points

Generating random numbers (or data points) based on how a list of numbers (or data points) is distributed involves a family of techniques called density estimation, which include histograms, kernel density estimation, and Gaussian mixture models. These techniques seek to model the distribution of data points in a given data set, where areas with more points are more likely to be sampled.

  1. Histograms are sets of one or more bins, which are generally of equal size. Histograms are mixtures, where each bin's weight is the number of data points in that bin. After a bin is randomly chosen, a random data point that could fit in that bin is generated (that point need not be an existing data point).
  2. Gaussian mixture models are also mixtures, in this case, mixtures of one or more Gaussian (normal) distributions.
  3. Kernel distributions are mixtures of sampling distributions, one for each data point. Estimating a kernel distribution is called kernel density estimation. To sample from a kernel distribution:
    1. Choose one of the numbers or points in the list at random with replacement.
    2. Add a randomized "jitter" to the chosen number or point; for example, add a separately generated Normal(0, sigma) to the chosen number or each component of the chosen point, where sigma is the bandwidth(23).

This document doesn't detail how to build a density estimation model.(24)

Another way to generate random numbers based on a distribution of data points is known as stochastic interpolation, described in (Saucier 2000)(22), sec. 5.3.4.

Random Numbers from an Arbitrary Distribution

Many probability distributions can be defined in terms of any of the following:

If a probability distribution's PDF is known, one of the following techniques, among others, can be used to generate random numbers that follow that distribution.

  1. Use the PDF to calculate the weights for a number of sample points (usually regularly spaced). Create one list with the sampled points in ascending order (the list) and another list of the same size with the PDF's values at those points (the weights). Finally generate ContinuousWeightedChoice(list, weights) to generate a random number bounded by the lowest and highest sampled point. This technique can be used even if the area under the PDF isn't 1. OR
  2. Use rejection sampling. Choose the lowest and highest random number to generate (minValue and maxValue, respectively) and find the maximum value of the PDF at or between those points (maxDensity). The rejection sampling approach is then illustrated with the following pseudocode, where PDF(X) is the distribution's PDF (see also Saucier 2000, p. 39). This technique can be used even if the area under the PDF isn't 1.

    METHOD ArbitraryDist(minValue, maxValue, maxDensity)
         if minValue >= maxValue: return error
         while True
             x=RNDNUMEXCRANGE(minValue, maxValue)
             y=RNDNUMEXCRANGE(0, maxDensity)
             if y < PDF(x): return x
         end
    END METHOD
    

If both a PDF and a uniform random variable in the interval [0, 1) (randomVariable) are given, then the following technique, among other possible techniques, can be used: Create list and weights as given in method 1, then divide each item in weights by the sum of weights's items, then generate ContinuousWeightedChoice(list, weights) (except that method is modified to use value = randomVariable rather than value = RNDNUMEXCRANGE(0, sum)).

If the distribution's CDF is known, an approach called inverse transform sampling can be used: Generate ICDF(RNDU01ZeroOneExc()), where ICDF(X) is the distribution's inverse CDF. The Python sample code includes from_interp and numbers_from_cdf methods that implement this approach numerically.

Note: Further details on inverse transform sampling or on how to find inverses, as well as lists of PDFs and CDFs, are outside the scope of this page.

Censored and Truncated Distributions

To sample from a censored probability distribution, generate a random number from that distribution and—

To sample from a truncated probability distribution, generate a random number from that distribution and, if that number is less than a minimum threshold and/or higher than a maximum threshold, repeat this process.

Specific Non-Uniform Distributions

This section contains information on some of the most common non-uniform probability distributions.

Dice

The following method generates a random result of rolling virtual dice.(26) It takes three parameters: the number of dice (dice), the number of sides in each die (sides), and a number to add to the result (bonus) (which can be negative, but the result of the subtraction is 0 if that result is greater).

METHOD DiceRoll(dice, sides, bonus)
    if dice < 0 or sides < 1: return error
    if dice == 0: return 0
    if sides == 1: return dice
    ret = 0
    if dice > 50
        // If there are many dice to roll,
        // use a faster approach, noting that
        // the dice-roll distribution approaches
        // a "discrete" normal distribution as the
        // number of dice increases.
        mean = dice * (sides + 1) * 0.5
        sigma = sqrt(dice * (sides * sides - 1) / 12)
        ret = -1
        while ret < dice or ret > dice * sides
            ret = round(Normal(mean, sigma))
        end
    else
         i = 0
         while i < dice
              ret = ret + RNDINTRANGE(1, sides)
              i = i + 1
          end
    end
    ret = ret + bonus
    if ret < 0: ret = 0
    return ret
END METHOD

Examples: The result of rolling—

Normal (Gaussian) Distribution

The normal distribution (also called the Gaussian distribution) can be implemented using the pseudocode below, which uses the polar method (27) to generate two normally-distributed random numbers:

 

METHOD Normal2(mu, sigma)
  while true
    a = RNDU01()
    b = RNDU01()
    if a != 0 and RNDINT(1) == 0: a = 0 - a
    if b != 0 and RNDINT(1) == 0: b = 0 - b
    c = a * a + b * b
    if c != 0 and c <= 1
       c = sqrt(-2 * ln(c) / c)
       return [a * mu * c + sigma, b * mu * c + sigma]
    end
  end
END METHOD

Since Normal2 returns two numbers instead of one, but many applications require only one number at a time, a problem arises on how to return one number while storing the other for later retrieval. Ways to solve this problem are outside the scope of this page, however. The name Normal will be used in this document to represent a method that returns only one normally-distributed random number rather than two.

Alternatively, or in addition, the following method (implementing a ratio-of-uniforms technique) can be used to generate normally distributed random numbers.

METHOD Normal(mu, sigma)
    bmp = sqrt(2.0/exp(1.0)) // about 0.8577638849607068
    while true
        a=RNDU01ZeroExc()
        b=RNDNUMRANGE(-bmp,bmp)
        if b*b <= -a * a * 4 * ln(a)
            return (b * sigma / a) + mu
        end
    end
END METHOD

Notes:

Binomial Distribution

A random integer that follows a binomial distribution

 

METHOD Binomial(trials, p)
    if trials < 0: return error
    if trials == 0: return 0
    // Always succeeds
    if p >= 1.0: return trials
    // Always fails
    if p <= 0.0: return 0
    i = 0
    count = 0
    // Suggested by Saucier, R. in "Computer
    // generation of probability distributions", 2000, p. 49
    tp = trials * p
    if tp > 25 or (tp > 5 and p > 0.1 and p < 0.9)
         countval = -1
         while countval < 0 or countval > trials
              countval = round(Normal(tp, tp))
         end
         return countval
    end
    if p == 0.5
        while i < trials
            if RNDINT(1) == 0
                // Success
                count = count + 1
            end
            i = i + 1
        end
    else
        while i < trials
            if RNDU01OneExc() < p
                // Success
                count = count + 1
            end
            i = i + 1
        end
    end
    return count
END METHOD

Examples:

Poisson Distribution

The following method generates a random integer that follows a Poisson distribution and is based on Knuth's method from 1969. In the method—

 

METHOD Poisson(mean)
    if mean < 0: return error
    if mean == 0: return 0
    p = 1.0
    // Suggested by Saucier, R. in "Computer
    // generation of probability distributions", 2000, p. 49
    if mean > 9
        p = -1.0
        while p < 0: p = round(Normal(mean, mean))
        return p
    end
    pn = exp(-mean)
    count = 0
    while true
        p = p * RNDU01OneExc()
        if p <= pn: return count
        count = count + 1
    end
END METHOD

Gamma Distribution

The following method generates a random number that follows a gamma distribution and is based on Marsaglia and Tsang's method from 2000. Usually, the number expresses either—

Here, meanLifetime must be an integer or noninteger greater than 0, and scale is a scaling parameter that is greater than 0, but usually 1.

METHOD GammaDist(meanLifetime, scale)
    // Needs to be greater than 0
    if meanLifetime <= 0 or scale <= 0: return error
    // Exponential distribution special case if
    // `meanLifetime` is 1 (see also Devroye 1986, p. 405)
    if meanLifetime == 1: return -ln(RNDU01ZeroOneExc()) * scale
    d = meanLifetime
    v = 0
    if meanLifetime < 1: d = d + 1
    d = d - (1.0 / 3) // NOTE: 1.0 / 3 must be a fractional number
    c = 1.0 / sqrt(9 * d)
    while true
        x = 0
        while true
           x = Normal(0, 1)
           v = c * x + 1;
           v = v * v * v
           if v > 0: break
        end
        u = RNDU01ZeroExc()
        x2 = x * x
        if u < 1 - (0.0331 * x2 * x2): break
        if ln(u) < (0.5 * x2) + (d * (1 - v + ln(v))): break
    end
    ret = d * v
    if meanLifetime < 1
       ret = ret * exp(ln(RNDU01ZeroExc()) / meanLifetime)
    end
    return ret * scale
end

Distributions based on the gamma distribution:

Beta Distribution

In the following method, which generates a random number that follows a beta distribution, a and b are two parameters each greater than 0. The range of the beta distribution is [0, 1).

METHOD BetaDist(self, a, b, nc)
  if b==1 and a==1: return RNDU01()
  if a==1: return 1.0-pow(RNDU01(),1.0/b)
  if b==1: return pow(RNDU01(),1.0/a)
  x=GammaDist(a,1)
  return x/(x+GammaDist(b,1))
END METHOD

Note: A noncentral beta distribution is sampled by generating BetaDist(a + Poisson(nc), b), where nc is greater than 0.

Negative Binomial Distribution

A random integer that follows a negative binomial distribution expresses the number of failures that have happened after seeing a given number of successes (expressed as successes below), where the probability of a success in each case is p (where p <= 0 means never, p >= 1 means always, and p = 0.5 means an equal chance of success or failure).

METHOD NegativeBinomial(successes, p)
    // Needs to be 0 or greater
    if successes < 0: return error
    // No failures if no successes or if always succeeds
    if successes == 0 or p >= 1.0: return 0
    // Always fails (NOTE: infinity can be the maximum possible
    // integer value if NegativeBinomial is implemented to return
    // an integer)
    if p <= 0.0: return infinity
    // NOTE: If 'successes' can be an integer only,
    // omit the following three lines:
    if floor(successes) != successes
        return Poisson(GammaDist(successes, (1 - p) / p))
    end
    count = 0
    total = 0
    if successes == 1
        if p == 0.5
          while RNDINT(1) == 0: count = count + 1
           return count
        end
        // Geometric distribution special case (see Saucier 2000)
        return floor(ln(RNDU01ZeroExc()) / ln(1.0 - p))
    end
    while true
        if RNDU01OneExc() < p
            // Success
            total = total + 1
            if total >= successes
                    return count
            end
        else
            // Failure
            count = count + 1
        end
    end
END METHOD

Example: If p is 0.5 and successes is 1, the negative binomial distribution models the task "Flip a coin until you get tails, then count the number of heads."

von Mises Distribution

In the following method, which generates a random number that follows a von Mises distribution, which describes a distribution of circular angles—

The algorithm below is the Best–Fisher algorithm from 1979 (as described in Devroye 1986 with errata incorporated).

METHOD VonMises(mean, kappa)
    if kappa < 0: return error
    if kappa == 0
        return RNDNUMEXCRANGE(mean-pi, mean+pi)
    end
    r = 1.0 + sqrt(4 * kappa * kappa + 1)
    rho = (r - sqrt(2 * r)) / (kappa * 2)
    s = (1 + rho * rho) / (2 * rho)
    while true
        u = RNDNUMEXCRANGE(-1, 1)
        v = RNDU01ZeroOneExc()
        z = cos(pi * u)
        w = (1 + s*z) / (s + z)
        y = kappa * (s - w)
        if y*(2 - y) - v >=0 or ln(y / v) + 1 - y >= 0
           if angle<-1: angle=-1
           if angle>1: angle=1
           // NOTE: Inverse cosine replaced here
           // with `atan2` equivalent
           angle = atan2(sqrt(1-w*w),w)
           if u < 0: angle = -angle
           return mean + angle
        end
    end
END METHOD

Stable Distribution

As more and more independent random numbers from the same distribution are added together, their distribution tends to a stable distribution, which resembles a curve with a single peak, but with generally "fatter" tails than the normal distribution. The pseudocode below uses the Chambers–Mallows–Stuck algorithm. The Stable method, implemented below, takes two parameters:

 

METHOD Stable(alpha, beta)
    if alpha <=0 or alpha > 2: return error
    if beta < -1 or beta > 1: return error
    halfpi = pi * 0.5
    unif=RNDNUMEXCRANGE(-halfpi, halfpi)
    while unif==-halfpi: unif=RNDNUMEXCRANGE(-halfpi, halfpi)
    // Cauchy special case
    if alpha == 1 and beta == 0: return tan(unif)
    expo=-ln(RNDU01ZeroExc())
    c=cos(unif)
    if alpha == 1
            s=sin(unif)
            return 2.0*((unif*beta+halfpi)*s/c -
                beta * ln(halfpi*expo*c/(unif*beta+halfpi)))/pi
    end
    z=-tan(alpha*halfpi)*beta
    ug=unif+atan2(-z, 1)/alpha
    cpow=pow(c, -1.0 / alpha)
    return pow(1.0+z*z, 1.0 / (2*alpha))*
        (sin(alpha*ug)*cpow)*
        pow(cos(unif-alpha*ug)/expo, (1.0 - alpha) / alpha)
END METHOD

Extended versions of the stable distribution:

Hypergeometric Distribution

The following method generates a random integer that follows a hypergeometric distribution. When a given number of items are drawn at random without replacement from a collection of items each labeled either 1 or 0, the random integer expresses the number of items drawn this way that are labeled 1. In the method below, trials is the number of items drawn at random, ones is the number of items labeled 1 in the set, and count is the number of items labeled 1 or 0 in that set.

METHOD Hypergeometric(trials, ones, count)
    if ones < 0 or count < 0 or trials < 0 or ones > count or trials > count
            return error
    end
    if ones == 0: return 0
   successes = 0
    i = 0
    currentCount = count
    currentOnes = ones
    while i < trials and currentOnes > 0
            if RNDINTEXC(currentCount) < currentOnes
                    currentOnes = currentOnes - 1
                    successes = successes + 1
            end
            currentCount = currentCount - 1
            i = i + 1
    end
    return successes
END METHOD

Example: In a 52-card deck of Anglo-American playing cards, 12 of the cards are face cards (jacks, queens, or kings). After the deck is shuffled and seven cards are drawn, the number of face cards drawn this way follows a hypergeometric distribution where trials is 7, ones is 12, and count is 52.

Multivariate Normal (Multinormal) Distribution

The following pseudocode calculates a random point in space that follows a multivariate normal (multinormal) distribution. The method MultivariateNormal takes the following parameters:

 

METHOD Decompose(matrix)
  numrows = size(matrix)
  if size(matrix[0])!=numrows: return error
  // Does a Cholesky decomposition of a matrix
  // assuming it's positive definite and invertible
  ret=NewList()
  for i in 0...numrows
    submat = NewList()
    for j in 0...numrows: AddItem(submat, 0)
    AddItem(ret, submat)
  end
  s1 = sqrt(matrix[0][0])
  if s1==0: return ret // For robustness
  for i in 0...numrows
    ret[0][i]=matrix[0][i]*1.0/s1
  end
  for i in 0...numrows
    sum=0.0
    for j in 0...i: sum = sum + ret[j][i]*ret[j][i]
    sq=matrix[i][i]-sum
    if sq<0: sq=0 // For robustness
    ret[i][i]=math.sqrt(sq)
  end
  for j in 0...numrows
    for i in (j + 1)...numrows
      // For robustness
      if ret[j][j]==0: ret[j][i]=0
      if ret[j][j]!=0
        sum=0
        for k in 0...j: sum = sum + ret[k][i]*ret[k][j]
        ret[j][i]=(matrix[j][i]-sum)*1.0/ret[j][j]
      end
    end
  end
  return ret
END METHOD

METHOD MultivariateNormal(mu, cov)
  mulen=size(cov)
  if mu != nothing
    mulen = size(mu)
    if mulen!=size(cov): return error
    if mulen!=size(cov[0]): return error
  end
  // NOTE: If multiple random points will
  // be generated using the same covariance
  // matrix, an implementation can consider
  // precalculating the decomposed matrix
  // in advance rather than calculating it here.
  cho=Decompose(cov)
  i=0
  ret=NewList()
  variables=NewList()
  for j in 0...mulen: AddItem(variables, Normal(0, 1))
  while i<mulen
    nv=Normal(0,1)
    sum = 0
    if mu == nothing: sum=mu[i]
    for j in 0...mulen: sum=sum+variables[j]*cho[j][i]
    AddItem(ret, sum)
    i=i+1
  end
  return ret
end

Examples:

  1. A binormal distribution (two-variable multinormal distribution) can be sampled using the following idiom: MultivariateNormal([mu1, mu2], [[s1*s1, s1*s2*rho], [rho*s1*s2, s2*s2]]), where mu1 and mu2 are the means of the two random variables, s1 and s2 are their standard deviations, and rho is a correlation coefficient greater than -1 and less than 1 (0 means no correlation).
  2. A log-multinormal distribution can be sampled by generating numbers from a multinormal distribution, then applying exp(n) to the resulting numbers, where n is each number generated this way.
  3. A Beckmann distribution can be sampled by calculating sqrt(x*x+y*y), where x and y are the two numbers in a binormal random pair (see example 1).

Random Numbers with a Given Positive Sum

Generating N GammaDist(total, 1) numbers and dividing them by their sum will result in N random numbers that (approximately) sum to total (see a Wikipedia article). For example, if total is 1, the numbers will (approximately) sum to 1. Note that in the exceptional case that all numbers are 0, the process should repeat.

The following pseudocode shows how to generate random integers with a given positive sum. (The algorithm for this was presented in (Smith and Tromble 2004)(28).) In the pseudocode below—

 

METHOD NonzeroIntegersWithSum(n, total)
    if n <= 0 or total <=0: return error
    ls = NewList()
    ret = NewList()
    AddItem(ls, 0)
    while size(ls) < n
            c = RNDINTEXCRANGE(1, total)
            found = false
            j = 1
            while j < size(ls)
                    if ls[j] == c
                            found = true
                            break
                    end
                    j = j + 1
            end
            if found == false: AddItem(ls, c)
    end
    Sort(ls)
    AddItem(ls, total)
    for i in 1...size(ls): AddItem(ret, list[i] - list[i - 1])
    return ret
END METHOD

METHOD IntegersWithSum(n, total)
    if n <= 0 or total <=0: return error
    ret = NonzeroIntegersWithSum(n, total + n)
    for i in 0...size(ret): ret[i] = ret[i] - 1
    return ret
END METHOD

Notes:

Multinomial Distribution

The multinomial distribution models the number of times each of several mutually exclusive events happens among a given number of trials, where each event can have a separate probability of happening. The pseudocode below is of a method that takes two parameters: trials, which is the number of trials, and weights, which are the relative probabilities of each event. The method tallies the events as they happen and returns a list (with the same size as weights) containing the number of successes for each event.

METHOD Multinomial(trials, weights)
    if trials < 0: return error
    // create a list of successes
    list = NewList()
    for i in 0...size(weights): AddItem(list, 0)
    for i in 0...trials
        // Choose an index
        index = WeightedChoice(weights)
        // Tally the event at the chosen index
        list[index] = list[index] + 1
    end
    return list
END METHOD

Gaussian and Other Copulas

A copula is a distribution describing the correlation (dependence) between random numbers.

One example is a Gaussian copula; this copula is sampled by sampling from a multinormal distribution, then converting the resulting numbers to uniformly-distributed, but correlated, numbers. In the following pseudocode, which implements a Gaussian copula:

 

METHOD GaussianCopula(covar)
   mvn=MultivariateNormal(nothing, covar)
   for i in 0...size(covar)
      // Apply the normal distribution's CDF
      // to get uniform variables
      mvn[i] = (erf(mvn[i]/(sqrt(2)*sqrt(covar[i][i])))+1)*0.5
   end
   return mvn
END METHOD

Each of the resulting uniform numbers will be in the interval [0, 1], and each one can be further transformed to any other probability distribution (which is called a marginal distribution here) by one of the methods given in "Random Numbers from an Arbitrary Distribution". (See also Cario and Nelson 1997.)

Examples:

  1. To generate two correlated uniform variables with a Gaussian copula, generate GaussianCopula([[1, rho], [rho, 1]]), where rho is the Pearson correlation coefficient, in the interval [-1, 1]. (Note that rank correlation parameters, which can be converted to rho, can better describe the correlation than rho itself. For example, for a two-variable Gaussian copula, the Spearman coefficient srho can be converted to rho by rho = sin(srho * pi / 6) * 2. Rank correlation parameters are not further discussed in this document.)
  2. The following example generates two random numbers that follow a Gaussian copula with exponential marginals (rho is the Pearson correlation coefficient, and rate1 and rate2 are the rates of the two exponential marginals).

    METHOD CorrelatedExpo(rho, rate1, rate2)
       copula = GaussianCopula([[1, rho], [rho, 1]])
       // Transform to exponentials using that
       // distribution's inverse CDF
       return [-ln(copula[0]) / rate1, -ln(copula[1]) / rate2]
    END METHOD
    

Other kinds of copulas describe different kinds of correlation between random numbers. Examples of other copulas are—

Other Non-Uniform Distributions

Most commonly used:

Miscellaneous:

The Python sample code also contains implementations of the power normal distribution, the power lognormal distribution, the negative multinomial distribution, the multivariate t-distribution, the multivariate t-copula, and the multivariate Poisson distribution.

Geometric Sampling

This section contains various geometric sampling techniques.

Random Points Inside a Simplex

The following pseudocode generates, uniformly at random, a point inside an n-dimensional simplex (simplest convex figure, such as a line segment, triangle, or tetrahedron). It takes an array points, a list consisting of the n plus one vertices of the simplex, all of a single dimension n or greater.

METHOD RandomPointInSimplex(points):
   ret=NewList()
   if size(points) > size(points[0])+1: return error
   if size(points)==1 // Return a copy of the point
     for i in 0...size(points[0]): AddItem(ret,points[0][i])
     return ret
   end
   gammas=NewList()
   // Sample from a Dirichlet distribution
   simplexDims=size(points)-1
   for i in 0...size(points): AddItem(gammas, -ln(RNDU01ZeroOneExc()))
   tsum=0
   for i in 0...size(gammas): tsum = tsum + gammas[i]
   tot = 0
   for i in 0...size(gammas) - 1
       gammas[i] = gammas[i] / tsum
       tot = tot + gammas[i]
   end
   tot = 1.0 - tot
   for i in 0...size(points[0]): AddItem(ret, points[0][i]*tot)
   for i in 1...size(points)
      for j in 0...size(points[0])
         ret[j]=ret[j]+points[i][j]*gammas[i-1]
      end
   end
   return ret
END METHOD

Random Points on the Surface of a Hypersphere

The following pseudocode shows how to generate, uniformly at random, an N-dimensional point on the surface of an N-dimensional hypersphere of radius radius (if radius is 1, the result can also serve as a unit vector in N-dimensional space). Here, Norm is given in the appendix. See also (Weisstein)(30).

METHOD RandomPointInHypersphere(dims, radius)
  ret=[]
  for i in 0...dims: AddItem(ret, Normal(0, 1))
  invnorm=1.0/Norm(ret)
  for i in 0...dims: ret[i]=ret[i]*invnorm
  return ret
END METHOD

Random Points Inside a Ball or Shell

To generate, uniformly at random, an N-dimensional point inside an N-dimensional ball of radius R, either—

although the former method "may ... be slower" "in practice", according to a MathWorld article, which was the inspiration for the two methods given here.

To generate, uniformly at random, a point inside an N-dimensional spherical shell (a hollow ball) with inner radius A and outer radius B (where A is less than B), either—

Random Latitude and Longitude

To generate, uniformly at random, a point on the surface of a sphere in the form of a latitude and longitude (in radians with west and south coordinates negative)—

Reference: "Sphere Point Picking" in MathWorld (replacing inverse cosine with atan2 equivalent).

Conclusion

This page discussed many ways applications can extract random numbers from random number generators.

I acknowledge the commenters to the CodeProject version of this page, including George Swan, who referred me to the reservoir sampling method.

Notes

(1) An RNG meeting this definition can—

If the software and/or hardware uses a nonuniform distribution, but otherwise meets this definition, it can be converted to use a uniform distribution, at least in theory, using unbiasing, deskewing, or randomness extraction, which are outside the scope of this document.

(2) For an exercise solved by this method, see A. Koenig and B. E. Moo, Accelerated C++, 2000; see also a blog post by Johnny Chan. In addition, M. O'Neill discusses various methods, both biased and unbiased, for generating random integers in a range with an RNG in a blog post from July 2018.

Note that if MODULUS is a power of 2 (for example, 256 or 232), the RNDINT implementation given in the pseudocode may leave unused bits (for example, when truncating a random number to wordBits bits or in the special cases at the start of the method). How a more sophisticated implementation may save those bits for later reuse is beyond this page's scope.

(3) This number format describes B-bit signed integers with minimum value -2B-1 and maximum value 2B-1 - 1, where B is a positive even number of bits; examples include Java's short, int, and long, with 16, 32, and 64 bits, respectively. A signed integer is an integer that can be positive, zero, or negative. In two's-complement form, nonnegative numbers have the highest (most significant) bit set to zero, and negative numbers have that bit (and all bits beyond) set to one, and a negative number is stored in such form by swapping the bits of a number equal to that number's absolute value minus 1.

(4) Downey, A. B. "Generating Pseudo-random Floating Point Values", 2007

(5) RNDINTEXC is not given as the core random generation method because it's harder to fill integers in popular integer formats with random bits with this method.

(6) Depending on how RNDU01*() methods are implemented, some integers can never occur with this implementation if maxExclusive is very large. For example, if RNDU01OneExc() is implemented as RNDINT(255)/256, the resulting number will have no more than 8 bits set to 1, so that not all numbers can "randomly" occur with maxExclusive greater than 256.

In situations where loops are not possible, such as within an SQL query, the idiom min(floor(RNDU01OneExc() * maxExclusive, maxExclusive - 1)) returns an integer in the interval [0, maxExclusive); however, such an idiom can have a slight, but for most purposes negligible, bias toward maxExclusive - 1. It should be used only in cases outside of information security.

(7) See, for example, the Stack Overflow question "How to generate a number in arbitrary range using random()={0..1} preserving uniformness and density?", questions/8019589.

(8) Describing differences between SQL dialects is outside the scope of this document, but Flourish SQL describes many such differences, including those concerning RNGs.

(9) P. L'Ecuyer suggests parameters for full-period linear congruential generators in "Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure", January 1999.

(10) Jeff Atwood, "The danger of naïveté", Dec. 7, 2007.

(11) It suffices to say here that in general, whenever a deterministic RNG is otherwise called for, such an RNG is good enough for shuffling a 52-item list if its period is 2226 or greater. (The period is the maximum number of values in a generated sequence for a deterministic RNG before that sequence repeats.)

(12) If the strings identify database records, file system paths, or other shared resources, special considerations apply, including the need to synchronize access to those resources. If the string will uniquely identify database records (e.g., Web site users) and is not secret, an application should consider using an auto-incrementing row number instead, if supported by the database.

(13) See also the Stack Overflow question "Random index of a non zero value in a numpy array".

(14) Bentley, Jon Louis and James B. Saxe. "Generating Sorted Lists of Random Numbers." ACM Trans. Math. Softw. 6 (1980): 359-364.

(15) P. L'Ecuyer, "Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure", January 1999.

(16) Brownlee, J. "A Gentle Introduction to the Bootstrap Method", Machine Learning Mastery, May 25, 2018.

(17) Efraimidis, P. and Spirakis, P. "Weighted Random Sampling (2005; Efraimidis, Spirakis)", 2005.

(18) Efraimidis, P. "Weighted Random Sampling over Data Streams". arXiv:1012.0256v2 [cs.DS], 2015.

(19) A convex polygon can be trivially decomposed into triangles that have one vertex in common and each have two other adjacent vertices of the original polygon. Triangulation of other polygons is nontrivial and outside the scope of this document.

(20) That article also mentions a critical-hit distribution, which is actually a mixture of two distributions: one roll of dice and the sum of two rolls of dice.

(21) An affine transformation is one that keeps parallel lines parallel.

(22) Saucier, R. "Computer Generation of Statistical Distributions", March 2000.

(23) "Jitter", as used in this step, follows a distribution formally called a kernel, of which the normal distribution is one example. Bandwidth should be as low or as high as allows the estimated distribution to fit the data and remain smooth. A more complex kind of "jitter" (for multi-component data points) consists of a point generated from a multinormal distribution with all the means equal to 0 and a covariance matrix that, in this context, serves as a bandwidth matrix. "Jitter" and bandwidth are not further discussed in this document.

(24) Other references on density estimation include a Wikipedia article on multiple-variable kernel density estimation, and a blog post by M. Kay.

(25) More formally—

provided the PDF's values are all 0 or greater and the area under the PDF's curve is 1.

(26) The "Dice" section used the following sources:

(27) The method that formerly appeared here is the Box‐Muller transformation: mu + radius * cos(angle) and mu + radius * sin(angle), where angle = 2 * pi * RNDU01OneExc() and radius = sqrt(-2 * ln(RNDU01ZeroExc())) * sigma, are two independent normally-distributed random numbers. A method of generating approximate standard normal random numbers, which consists of summing twelve RNDU01OneExc() numbers and subtracting by 6 (see also "Irwin–Hall distribution" on Wikipedia), results in values not less than -6 or greater than 6; on the other hand, in a standard normal distribution, results less than -6 or greater than 6 will occur only with a generally negligible probability.

(28) Smith and Tromble, "Sampling Uniformly from the Unit Simplex", 2004.

(29) Hofert, M., and Maechler, M. "Nested Archimedean Copulas Meet R: The nacopula Package". Journal of Statistical Software 39(9), 2011, pp. 1-20.

(30) Weisstein, Eric W. "Hypersphere Point Picking". From MathWorld—A Wolfram Web Resource.

(31) The N numbers generated this way will form a point inside an N-dimensional hypercube with length 2 * R in each dimension and centered at the origin of space.

Appendix

 

Implementation of erf

The pseudocode below shows how the error function erf can be implemented, in case the programming language used doesn't include a built-in version of erf (such as JavaScript at the time of this writing). In the pseudocode, EPSILON is a very small number to end the iterative calculation.

METHOD erf(v)
    if v==0: return 0
    if v<0: return -erf(-v)
    if v==infinity: return 1
    // NOTE: For Java `double`, the following
    // line can be added:
    // if v>=6: return 1
    i=1
    ret=0
    zp=-(v*v)
    zval=1.0
    den=1.0
    while i < 100
        r=v*zval/den
        den=den+2
        ret=ret+r
        // NOTE: EPSILON can be pow(10,14),
        // for example.
        if abs(r)<EPSILON: break
        if i==1: zval=zp
        else: zval = zval*zp/i
        i = i + 1
    end
    return ret*2/sqrt(pi)
END METHOD

Mean and Variance Calculation

The following method calculates the mean and the bias-corrected sample variance of a list of real numbers. It returns a two-item list containing the mean and that kind of variance in that order. It uses the Welford method presented by J. D. Cook. (Sample variance is the estimated variance of a population or distribution based on a random sample of that population or distribution.)

METHOD MeanAndVariance(list)
    if size(list)==0: return [0, 0]
    if size(list)==1: return [list[0], 0]
    xm=list[0]
    xs=0
    i=1
    while i < size(list)
        c = list[i]
        i = i + 1
        cxm = (c - xm)
        xm = xm + cxm *1.0/ i
        xs = xs + cxm * (c - xm)
    end
    return [xm, xs*1.0/(size(list)-1)]
END METHOD

Norm Calculation

The following method calculates the norm of a vector (list of numbers).

METHOD Norm(vec)
  ret=0
  for i in 0...size(vec): ret=ret+vec[i]*vec[i]
  return sqrt(ret)
END METHOD

License

This page is licensed under Creative Commons Zero.