# More Algorithms for Arbitrary-Precision Sampling

**Abstract:** This page contains additional algorithms for arbitrary-precision sampling of continuous distributions, Bernoulli factory algorithms (biased-coin to biased-coin algorithms), and algorithms to produce heads with an irrational probability. They supplement my pages on Bernoulli factory algorithms and partially-sampled random numbers.

**2020 Mathematics Subject Classification:** 68W20, 60-08, 60-04.

## Introduction

This page contains additional algorithms for arbitrary-precision sampling of continuous distributions, Bernoulli factory algorithms (biased-coin to biased-coin algorithms), and algorithms to produce heads with an irrational probability. These samplers are designed to not rely on floating-point arithmetic. They may depend on algorithms given in the following pages:

**Partially-Sampled Random Numbers for Accurate Sampling of Continuous Distributions****Bernoulli Factory Algorithms**

## Contents

**Introduction****Contents****Bernoulli Factories and Irrational Probability Simulation****Certain Numbers Based on the Golden Ratio****Ratio of Lower Gamma Functions (γ(***m*,*x*)/γ(*m*, 1)).**Derivative (slope) of arctan(***λ*)**cosh(***λ*) − 1**exp(***λ*/4)/2**sinh(***λ*)/2**1/(exp(1) +***c*− 2)**exp(1) − 2****tanh(***λ*)**Euler–Mascheroni Constant***γ**π*/4*π*/4 − 1/2 or (*π*− 2)/4**(***π*− 3)/4*π*− 3**4/(3****π*)**(1 + exp(***k*)) / (1 + exp(*k*+ 1))**Other Probabilities and Factory Functions****Certain Piecewise Linear Functions****Sampling Distributions Using Incomplete Information****Pushdown Automata for Square-Root-Like Functions**

**General Arbitrary-Precision Samplers****Specific Arbitrary-Precision Samplers****Rayleigh Distribution****Hyperbolic Secant Distribution****Reciprocal of Power of Uniform****Distribution of***U*/(1−*U*)**Arc-Cosine Distribution****Logistic Distribution****Cauchy Distribution**- Exponential Distribution with Unknown Rate
*λ*, Lying in (0, 1] **Exponential Distribution with Rate ln(***x*)**Symmetric Geometric Distribution****Lindley Distribution and Lindley-Like Mixtures****Gamma Distribution****One-Dimensional Epanechnikov Kernel****Uniform Distribution Inside Rectellipse**

**Requests and Open Questions****Notes****Appendix****Ratio of Uniforms****Implementation Notes for Box/Shape Intersection****Probability Transformations****SymPy Code for Piecewise Linear Factory Functions****Derivation of My Algorithm for min(***λ*, 1/2)**Sampling Distributions Using Incomplete Information: Omitted Algorithms****Pushdown Automata and Algebraic Functions**

**License**

## Bernoulli Factories and Irrational Probability Simulation

In the methods below, *λ* is the unknown probability of heads of the coin involved in the Bernoulli Factory problem.

### Certain Numbers Based on the Golden Ratio

The following algorithm given by Fishman and Miller (2013)^{(1)} finds the continued fraction expansion of certain numbers described as—

*G*(*m*,*ℓ*) = (*m*+ sqrt(*m*^{2}+ 4 **ℓ*))/2

or (*m*− sqrt(*m*^{2}+ 4 **ℓ*))/2,

whichever results in a real number greater than 1, where *m* is a positive integer and *ℓ* is either 1 or −1. In this case, *G*(1, 1) is the golden ratio.

First, define the following operations:

**Get the previous and next Fibonacci-based number given**:*k*,*m*, and*ℓ*- If
*k*is 0 or less, return an error. - Set
*g0*to 0,*g1*to 1,*x*to 0, and*y*to 0. - Do the following
*k*times: Set*y*to*m***g1*+*ℓ***g0*, then set*x*to*g0*, then set*g0*to*g1*, then set*g1*to*y*. - Return
*x*and*y*, in that order.

- If
**Get the partial denominator given**(this partial denominator is part of the continued fraction expansion found by Fishman and Miller):*pos*,*k*,*m*, and*ℓ***Get the previous and next Fibonacci-based number given**, call them*k*,*m*, and*ℓ**p*and*n*, respectively.- If
*ℓ*is 1 and*k*is odd, return*p*+*n*. - If
*ℓ*is −1 and*pos*is 0, return*n*−*p*− 1. - If
*ℓ*is 1 and*pos*is 0, return (*n*+*p*) − 1. - If
*ℓ*is −1 and*pos*is even, return*n*−*p*− 2. (The paper had an error here; the correction given here was verified by Miller via personal communication.) - If
*ℓ*is 1 and*pos*is even, return (*n*+*p*) − 2. - Return 1.

An application of the continued fraction algorithm is the following algorithm that generates 1 with probability *G*(*m*, *ℓ*)^{−k} and 0 otherwise, where *k* is an integer that is 1 or greater (see "Continued Fractions" in my page on Bernoulli factory algorithms). The algorithm starts with *pos* = 0, then the following steps are taken:

**Get the partial denominator given**, call it*pos*,*k*,*m*, and*ℓ**kp*.- Do the following process repeatedly, until this run of the algorithm returns a value:
- With probability
*kp*/(1 +*kp*), return a number that is 1 with probability 1/*kp*and 0 otherwise. - Do a separate run of the currently running algorithm, but with
*pos*=*pos*+ 1. If the separate run returns 1, return 0.

- With probability

### Ratio of Lower Gamma Functions (γ(*m*, *x*)/γ(*m*, 1)).

- Set
*ret*to the result of**kthsmallest**with the two parameters*m*and*m*. (Thus,*ret*is distributed as*u*^{1/m}where*u*is a uniform random variate in [0, 1]; although**kthsmallest**accepts only integers, this formula works for any*m*greater than 0.) - Set
*k*to 1, then set*u*to point to the same value as*ret*. - Generate a uniform(0, 1) random variate
*v*. - If
*v*is less than*u*: Set*u*to*v*, then add 1 to*k*, then go to step 3. - If
*k*is odd, return a number that is 1 if*ret*is less than*x*and 0 otherwise. (If*ret*is implemented as a uniform partially-sampled random number (PSRN), this comparison should be done via**URandLessThanReal**.) If*k*is even, go to step 1.

Derivation: See Formula 1 in the section "**Probabilities Arising from Certain Permutations**", where:

`ECDF(x)`

is the cumulative distribution function for the uniform distribution on the interval [0, 1], namely*x*if*x*is in [0, 1], 0 if*x*is less than 0, and 1 otherwise.`DPDF(x)`

is the probability density function for the maximum of*m*uniform(0,1) random variates, namely*m***x*^{m−1}if*x*is in [0, 1], and 0 otherwise.

### Derivative (slope) of arctan(*λ*)

This algorithm involves the series expansion of this function (1 − *λ*^{2} + *λ*^{4} − ...) and involves the general martingale algorithm.

- Set
*u*to 1, set*w*to 1, set*ℓ*to 0, and set*n*to 1. - Generate a uniform(0, 1) random variate
*ret*. - (The remaining steps are done repeatedly, until the algorithm returns a value.) If
*w*is not 0, flip the input coin and multiply*w*by the result of the flip. Do this step again. - If
*n*is even, set*u*to*ℓ*+*w*. Otherwise, set*ℓ*to*u*−*w*. - If
*ret*is less than (or equal to)*ℓ*, return 1. If*ret*is less than*u*, go to the next step. If neither is the case, return 0. (If*ret*is a uniform PSRN, these comparisons should be done via the**URandLessThanReal algorithm**, which is described in my**article on PSRNs**.) - Add 1 to
*n*and go to step 3.

### cosh(*λ*) − 1

There are two algorithms.

The first algorithm involves an application of the general martingale algorithm to a power series that equals cosh(*λ*)−1. Specifically, the power series is the function's *Taylor series* at 0, which is *λ*^{2}/(2!) + *λ*^{4}/(4!) + .... See (Łatuszyński et al. 2009/2011, algorithm 3)^{(2)}. (In this document, *n*! = 1*2*3*...**n* is known as *n* factorial.)

- Set
*u*to 0, set*w*to 1, set*ℓ*to 0, and set*n*to 1. - Generate a uniform(0, 1) random variate
*ret*. - If
*w*is not 0, flip the input coin and multiply*w*by the result of the flip. Do this step again. - If
*w*is 0, set*u*to*ℓ*and go to step 6. (The estimate*λ*^{n*2}is 0, so no more terms are added and we use*ℓ*as the final estimate for cosh(*λ*)−1.) - Let
*m*be (*n**2), let*α*be 1/(*m*!) (a term of the Taylor series at 0), and let*err*be 2/((*m*+1)!) (the error term). Add*α*to*ℓ*, then set*u*to*ℓ*+*err*. - If
*ret*is less than (or equal to)*ℓ*, return 1. If*ret*is less than*u*, go to the next step. If neither is the case, return 0. (If*ret*is a uniform PSRN, these comparisons should be done via the**URandLessThanReal algorithm**, which is described in my**article on PSRNs**.) - Add 1 to
*n*and go to step 3.

In this algorithm, the error term, which follows from the *Lagrange remainder* for Taylor series, has a numerator of 2 because 2 is higher than the maximum value that the function's slope, slope-of-slope, etc. functions can achieve anywhere in the interval [0, 1].

The second algorithm is one I found that takes advantage of the convex combination method.

- ("Geometric" random variate
*n*.) Generate unbiased random bits until a zero is generated this way. Set*n*to 2 plus the number of ones generated this way. (The number*n*is generated with probability*g*(*n*), as given below.) - (The next two steps succeed with probability
*w*_{n}(*λ*)/*g*(*n*).) If*n*is odd, return 0. Otherwise, with probability 2^{n−1}/(*n*!), go to the next step. Otherwise, return 0. - Flip the input coin
*n*times or until a flip returns 0, whichever happens first. Return 1 if all the flips, including the last, returned 1. Otherwise, return 0.

Derivation: Follows from rewriting cosh(*λ*)−1 as the following series: ∑_{n=0,1,...} *w*_{n}(*λ*) = ∑_{n=0,1,...} *g*(*n*)*(*w*_{n}(*λ*)/*g*(*n*)), where—

*g*(*n*) is (1/2)*(1/2)^{n−2}if*n*≥2, or 0 otherwise, and*w*_{n}(*λ*) is*λ*^{n}/(*n*!) if*n*≥2 and*n*is even, or 0 otherwise.

### exp(*λ*/4)/2

- ("Geometric" random variate
*n*.) Let*c*be 0. Generate unbiased random bits until a zero is generated this way. Set*n*to*c*plus the number of ones generated this way. (The number*n*is generated with probability*g*(*n*), as given below.) - (The next two steps succeed with probability
*w*_{n}(*λ*)/*g*(*n*).) With probability 1/(2^{n}*(*n*!)), go to the next step. Otherwise, return 0. - Flip the input coin
*n*times or until a flip returns 0, whichever happens first. Return 1 if all the flips, including the last, returned 1. Otherwise, return 0.

Derivation: Follows from rewriting exp(*λ*/4)/2 in a similar manner to cosh(*λ*)−1, where this time—

*g*(*n*) is (1/2)*(1/2)^{n−c}if*n*≥*c*, or 0 otherwise (the "geometric" probabilities"), and*w*_{n}(*λ*) is the appropriate term for*n*in the target function's Taylor series expansion at 0, starting with*n*=*c*.

Additional functions:

To simulate: | Follow this algorithm, except the probability in step 2 is: | And c is: |
---|---|---|

exp(λ)/4. |
2^{n−1}/(n!). |
0. |

exp(λ)/6. |
2^{n}/(3*(n!)). |
0. |

exp(λ/2)/2. |
1/(n!). |
0. |

(exp(λ)−1)/2. |
2^{n−1}/(n!). |
1. |

Note:All these functions are infinite series that map the interval [0, 1] to [0, 1] and can be written asf(x) =a[0]*x^{0}+ ... +a[i]*x^{i}+ ..., where thecoefficientsa[i] are 0 or greater.

This kind of function has three properties: (1) it's non-negative for everyx; (2) it's either constant or monotone increasing; (3) it'sconvex(its "slope" or "velocity" doesn't decrease asxincreases).

To show the function is convex, find the "slope-of-slope" function offand show it's non-negative for everyxin the domain. To do so, first find the "slope": omit the first term and for each remaining term, replacea[i]*x^{i}witha[i]*i*x^{i−1}. The resulting "slope" function is still an infinite series with coefficients 0 or greater. Hence, so will the "slope" of this "slope" function, so the result follows by induction.

### sinh(*λ*)/2

This algorithm involves an application of the general martingale algorithm to the Taylor series at 0 for sinh(*λ*)/2, which is *λ*^{1}/(1!*2) + *λ*^{3}/(3!*2) + ..., or as used here, *λ**(1/2 + *λ*^{2}/(3!*2) + *λ*^{4}/(5!*2) + ...).

- Flip the input coin. If it returns 0, return 0.
- Set
*u*to 0, set*w*to 1, set*ℓ*to 1/2 (the first term is added already), and set*n*to 1. - Generate a uniform(0, 1) random variate
*ret*. - Do the following process repeatedly, until this algorithm returns a value:
- If
*w*is not 0, flip the input coin and multiply*w*by the result of the flip. Do this substep again. - If
*w*is 0, set*u*to*ℓ*and go to the fourth substep. (No more terms are added here.) - Let
*m*be (*n**2+1), let*α*be 1/(*m*!*2) (a term of the Taylor series at 0), and let*err*be 1/((*m*+1)!) (the error term). Add*α*to*ℓ*, then set*u*to*ℓ*+*err*. - If
*ret*is less than (or equal to)*ℓ*, return 1. If*ret*is less than*u*, go to the next substep. If neither is the case, return 0. (If*ret*is a uniform PSRN, these comparisons should be done via the**URandLessThanReal algorithm**, which is described in my**article on PSRNs**.) - Add 1 to
*n*.

- If

### 1/(exp(1) + *c* − 2)

Involves the continued fraction expansion and Bernoulli Factory algorithm 3 for continued fractions. In this algorithm, *c*≥1 is a rational number.
The algorithm begins with *pos* equal to 1. Then the following steps are taken.

- Do the following process repeatedly until this run of the algorithm returns a value:
- If
*pos*is divisible by 3 (that is, if rem(*pos*, 3) equals 0): Let*k*be (*pos*/3)*2. With probability*k*/(1+*k*), return a number that is 1 with probability 1/*k*and 0 otherwise. - If
*pos*is 1: With probability*c*/(1+*c*), return a number that is 1 with probability 1/*c*and 0 otherwise. - If
*pos*is greater than 1 and not divisible by 3: Generate an unbiased random bit. If that bit is 1 (which happens with probability 1/2), return 1. - Do a separate run of the currently running algorithm, but with
*pos*=*pos*+ 1. If the separate run returns 1, return 0.

- If

### exp(1) − 2

Involves the continued fraction expansion and Bernoulli Factory algorithm 3 for continued fractions. Run the algorithm for **1/(exp(1)+ c−2)** above with

*c*= 1, except the algorithm begins with

*pos*equal to 2 rather than 1 (because the continued fractions are almost the same).

### tanh(*λ*)

There are two algorithms.

The first takes advantage of the so-called Lambert's continued fraction for tanh(.), as well as Bernoulli Factory algorithm 3 for continued fractions. The algorithm begins with *k* equal to 1. Then the following steps are taken.

- If
*k*is 1: Generate an unbiased random bit. If that bit is 1 (which happens with probability 1/2), flip the input coin and return the result. - Do the following process repeatedly, until this run of the algorithm returns a value:
- If
*k*is greater than 1, then do the following with probability*k*/(1+*k*):- Flip the input coin twice. If any of these flips returns 0, return 0. Otherwise, return a number that is 1 with probability 1/
*k*and 0 otherwise.

- Flip the input coin twice. If any of these flips returns 0, return 0. Otherwise, return a number that is 1 with probability 1/
- Do a separate run of the currently running algorithm, but with
*k*=*k*+ 2. If the separate run returns 1, return 0.

- If

The second algorithm involves an alternating series expansion of tanh(.) and involves the general martingale algorithm.

First, define the following operation:

**Get the**:*m*^{th}Bernoulli number- If
*m*is 0, 1, 2, 3, or 4, return 1, −1/2, 1/6, 0, or −1/30, respectively. Otherwise, if*m*is odd, return 0. - Set
*i*to 2 and*v*to 1 − (*m*+1)/2. - While
*i*is less than*m*:**Get the**, call it*i*^{th}Bernoulli number*b*. Add*b**choose(*m*+1,*i*) to*v*.^{(3)}- Add 2 to
*i*.

- Return −
*v*/(*m*+1).

- If

The algorithm is then as follows:

- Flip the input coin. If it returns 0, return 0.
- Set
*u*to 1, set*w*to 1, set*ℓ*to 0, and set*n*to 1. - Generate a uniform(0, 1) random variate
*ret*. - Do the following process repeatedly, until the algorithm returns a value:
- If
*w*is not 0, flip the input coin. If the flip returns 0, set*w*to 0. Do this substep again. - (Calculate the next term of the alternating series for tanh.) Let
*m*be 2*(*n*+1).**Get the**, call it*m*^{th}Bernoulli number*b*. Let*t*be abs(*b*)*2^{m}*(2^{m}−1)/(*m*!). - If
*n*is even, set*u*to*ℓ*+*w***t*. Otherwise, set*ℓ*to*u*−*w***t*. - If
*ret*is less than (or equal to)*ℓ*, return 1. If*ret*is less than*u*, go to the next step. If neither is the case, return 0. (If*ret*is a uniform PSRN, these comparisons should be done via the**URandLessThanReal algorithm**, which is described in my**article on PSRNs**.) - Add 1 to
*n*.

- If

### Euler–Mascheroni Constant *γ*

As **I learned**, the fractional part of 1/*U*, where *U* is a uniform random variate in (0, 1), has a mean equal to 1 minus the Euler–Mascheroni constant *γ*.^{(4)} This leads to the following algorithm to sample a probability equal to *γ*:

- Generate a PSRN for the reciprocal of a uniform random variate, as described in
**another page of mine**. - Set the PSRN's integer part to 0, then run
**SampleGeometricBag**on that PSRN. Return 0 if the run returns 1, or 1 otherwise.

*π*/4

The following algorithm to sample the probability *π*/4 is based on the section "**Uniform Distribution Inside N-Dimensional Shapes**", especially its Note 5.

- Set
*S*to 2. Then set*c1*and*c2*to 0. - Do the following process repeatedly, until the algorithm returns a value:
- Set
*c1*to 2**c1*plus an unbiased random bit (either 0 or 1 with equal probability). Then, set*c2*to 2**c2*plus an unbiased random bit. - If ((
*c1*+1)^{2}+ (*c2*+1)^{2}) <*S*^{2}, return 1. (Point is inside the quarter disk, whose area is*π*/4.) - If ((
*c1*)^{2}+ (*c2*)^{2}) >*S*^{2}, return 0. (Point is outside the quarter disk.) - Multiply
*S*by 2.

- Set

*π*/4 − 1/2 or (*π* − 2)/4

Follows the *π*/4 algorithm, except it samples from a quarter disk with an area equal to 1/2 removed.

- Set
*S*to 2. Then set*c1*and*c2*to 0. - Do the following process repeatedly, until the algorithm returns a value:
- Set
*c1*to 2**c1*plus an unbiased random bit (either 0 or 1 with equal probability). Then, set*c2*to 2**c2*plus an unbiased random bit. - Set
*diamond*to*MAYBE*and*disk*to*MAYBE*. - If ((
*c1*+1) + (*c2*+1)) <*S*, set*diamond*to*YES*. - If ((
*c1*) + (*c2*)) >*S*, set*diamond*to*NO*. - If ((
*c1*+1)^{2}+ (*c2*+1)^{2}) <*S*^{2}, set*disk*to*YES*. - If ((
*c1*)^{2}+ (*c2*)^{2}) >*S*^{2}, set*disk*to*NO*. - If
*disk*is*YES*and*diamond*is*NO*, return 1. Otherwise, if*diamond*is*YES*or*disk*is*NO*, return 0. - Multiply
*S*by 2.

- Set

### (*π* − 3)/4

Follows the *π*/4 algorithm, except it samples from a quarter disk with enough boxes removed from it to total an area equal to 3/4.

- Set
*S*to 32. Then set*c1*to a uniform random integer in the half-open interval [0,*S*) and*c2*to another uniform random integer in [0,*S*). - (Retained boxes.) If
*c1*is 0 and*c2*is 0, or if*c1*is 0 and*c2*is 1, return 1. - (Removed boxes.) If ((
*c1*+1)^{2}+ (*c2*+1)^{2}) < 1024, return 0. - Multiply
*S*by 2. - (Sample the modified quarter disk.) Do the following process repeatedly, until the algorithm returns a value:
- Set
*c1*to 2**c1*plus an unbiased random bit (either 0 or 1 with equal probability). Then, set*c2*to 2**c2*plus an unbiased random bit. - If ((
*c1*+1)^{2}+ (*c2*+1)^{2}) <*S*^{2}, return 1. (Point is inside the quarter disk, whose area is*π*/4.) - If ((
*c1*)^{2}+ (*c2*)^{2}) >*S*^{2}, return 0. (Point is outside the quarter disk.) - Multiply
*S*by 2.

- Set

*π* − 3

Similar to the *π*/4 algorithm. First it samples a point inside an area covering 1/4 of the unit square, then inside that area, it determines whether that point is inside another area covering (*π* − 3)/4 of the unit square. Thus, the algorithm acts as though it samples ((*π* − 3)/4) / (1/4) = *π* − 3.

- Set
*S*to 2. Then set*c1*and*c2*to 0. - Do the following process repeatedly, until the algorithm aborts it or returns a value:
- Set
*S*to 32. Then set*c1*to a uniform random integer in the half-open interval [0,*S*) and*c2*to another uniform random integer in [0,*S*). - (Return 1 if in retained boxes.) If
*c1*is 0 and*c2*is 0, or if*c1*is 0 and*c2*is 1, return 1. - (Check if outside removed boxes.) If ((
*c1*+1)^{2}+ (*c2*+1)^{2}) >= 1024, abort this process and go to step 3. (Otherwise,*c1*and*c2*are rejected and this process continues.)

- Set
- Set
*S*to 64. - (Sample the modified quarter disk.) Do the following process repeatedly, until the algorithm returns a value:
- Set
*c1*to 2**c1*plus an unbiased random bit (either 0 or 1 with equal probability). Then, set*c2*to 2**c2*plus an unbiased random bit. - If ((
*c1*+1)^{2}+ (*c2*+1)^{2}) <*S*^{2}, return 1. (Point is inside the quarter disk, whose area is*π*/4.) - If ((
*c1*)^{2}+ (*c2*)^{2}) >*S*^{2}, return 0. (Point is outside the quarter disk.) - Multiply
*S*by 2.

- Set

Note:Only a limited set of (c1,c2) pairs, including (0, 0) and (0, 1), will pass step 2 of this algorithm. Thus it may be more efficient to choose one of them uniformly at random, rather than do step 2 as shown. If (0, 0) or (0, 1) is chosen this way, the algorithm returns 1.

### 4/(3**π*)

Given that the point (*x*, *y*) has positive coordinates and lies inside a disk of radius 1 centered at (0, 0), the mean value of *x* is 4/(3**π*). This leads to the following algorithm to sample that probability:

- Generate two PSRNs in the form of a uniformly chosen point inside a 2-dimensional quarter hypersphere (see "
**Uniform Distribution Inside N-Dimensional Shapes**" below, as well as the examples). - Let
*x*be one of those PSRNs. Run**SampleGeometricBag**on that PSRN and return the result (which will be either 0 or 1).

Note:The mean value 4/(3*π) can be derived as follows. The relative probability thatxis "close" tozisp(z) = sqrt(1 −z*z), wherezis in the interval [0, 1]. Now find the area under the graph ofz*p(z)/c(wherec=π/4 is the area under the graph ofp(z)). The result is the mean value 4/(3*π). The following Python code prints this mean value using the SymPy computer algebra library:`p=sqrt(1-z*z); c=integrate(p,(z,0,1)); print(integrate(z*p/c,(z,0,1)));`

.

### (1 + exp(*k*)) / (1 + exp(*k* + 1))

This algorithm simulates this probability by computing lower and upper bounds of exp(1), which improve as more and more digits are calculated. These bounds are calculated through an algorithm by Citterio and Pavani (2016)^{(5)}. Note the use of the methodology in Łatuszyński et al. (2009/2011, algorithm 2)^{(6)} in this algorithm. In this algorithm, *k* must be an integer 0 or greater.

- If
*k*is 0, run the**algorithm for 2 / (1 + exp(2))**and return the result. If*k*is 1, run the**algorithm for (1 + exp(1)) / (1 + exp(2))**and return the result. - Generate a uniform(0, 1) random variate, call it
*ret*. - If
*k*is 3 or greater, return 0 if*ret*is greater than 38/100, or 1 if*ret*is less than 36/100. (This is an early return step. If*ret*is implemented as a uniform PSRN, these comparisons should be done via the**URandLessThanReal algorithm**, which is described in my**article on PSRNs**.) - Set
*d*to 2. - Calculate a lower and upper bound of exp(1) (
*LB*and*UB*, respectively) in the form of rational numbers whose numerator has at most*d*digits, using the Citterio and Pavani algorithm. For details, see later. - Set
*rl*to (1+*LB*^{k}) / (1+*UB*^{k + 1}), and set*ru*to (1+*UB*^{k}) / (1+*LB*^{k + 1}); both these numbers should be calculated using rational arithmetic. - If
*ret*is greater than*ru*, return 0. If*ret*is less than*rl*, return 1. (If*ret*is implemented as a uniform PSRN, these comparisons should be done via**URandLessThanReal**.) - Add 1 to
*d*and go to step 5.

The following implements the parts of Citterio and Pavani's algorithm needed to calculate lower and upper bounds for exp(1) in the form of rational numbers.

Define the following operations:

**Setup:**Set*p*to the list`[0, 1]`

, set*q*to the list`[1, 0]`

, set*a*to the list`[0, 0, 2]`

(two zeros, followed by the integer part for exp(1)), set*v*to 0, and set*av*to 0.**Ensure**While*n*:*v*is less than or equal to*n*:- (Ensure partial denominator
*v*, starting from 0, is available.) If*v*+ 2 is greater than or equal to the size of*a*, append 1,*av*, and 1, in that order, to the list*a*, then add 2 to*av*. - (Calculate convergent
*v*, starting from 0.) Append*a*[*n*+2] **p*[*n*+1]+*p*[*n*] to the list*p*, and append*a*[*n*+2] **q*[*n*+1]+*q*[*n*] to the list*q*. (Positions in lists start at 0. For example,*p*[0] means the first item in*p*;*p*[1] means the second; and so on.) - Add 1 to
*v*.

- (Ensure partial denominator
**Get the numerator for convergent**Ensure*n*:*n*, then return*p*[*n*+2].**Get convergent**Ensure*n*:*n*, then return*p*[*n*+2]/*q*[*n*+2].**Get semiconvergent***n*given*d*:- Ensure
*n*, then set*m*to floor(((10^{d})−1−*p*[*n*+1])/*p*[*n*+2]). - Return (
*p*[*n*+2] **m*+*p*[*n*+1]) / (*q*[*n*+2] **m*+*q*[*n*+1]).

- Ensure

Then the algorithm to calculate lower and upper bounds for exp(1), given *d*, is as follows:

- Set
*i*to 0, then run the**setup**. **Get the numerator for convergent**, call it*i**c*. If*c*is less than 10^{d}, add 1 to*i*and repeat this step. Otherwise, go to the next step.**Get convergent**and*i*− 1**get semiconvergent**, call them*i*− 1 given*d**conv*and*semi*, respectively.- If (
*i*− 1) is odd, return*semi*as the lower bound and*conv*as the upper bound. Otherwise, return*conv*as the lower bound and*semi*as the upper bound.

### Other Probabilities and Factory Functions

Algorithms in bold are given either in this page or in the "**Bernoulli Factory Algorithms**" page.

To simulate: | Follow this algorithm: |
---|---|

3 − exp(1) | Run the algorithm for exp(1) − 2, then return 1 minus the result. |

1/(exp(1)−1) | Run the algorithm for 1/(exp(1)+ with c−2)c = 1. |

1/(1+exp(1)) | Run the algorithm for 1/(exp(1)+ with c−2)c = 3. |

expit(λ)*2−1 |
(Equals tanh(λ/2). expit(x) = 1−1/(1+exp(x)) = exp(x)/(1+exp(x)). λ is unknown heads probability of a coin.)Create μ coin that does the following: "Generate an unbiased random bit. If that bit is 0, return 0. Otherwise, flip the input coin and return the result."Run algorithm for tanh( with λ)λ being the μ coin. |

n/π |
(n is 1, 2, or 3.)Create λ coin for algorithm .π − 3Run algorithm for with d / (c + λ)d=n and c=3. |

r/π |
(r is a rational number in open interval (0, 3).)Create λ coin for algorithm .π − 3Create μ coin that does: "With probability r − floor(r), return 1; otherwise return 0."Run algorithm for ( with d + μ) / (c + λ)d=floor(r) and c=3. |

exp(1)/π |
Create μ coin for algorithm exp(1) − 2.Create λ coin for algorithm .π − 3Run algorithm for ( with d + μ) / (c + λ)d=2 and c=3. |

exp(1)/4 | Follow the algorithm for exp(, except the probability in step 2 is 2λ/4)/2^{n−1}/(n!), c is 0, and step 3 is replaced with "Return 1." |

### Certain Piecewise Linear Functions

Let *f*(*λ*) be a function of the form min(*λ***mult*, 1−*ε*). This is a piecewise linear function with two pieces: a rising linear part and a constant part.

This section describes how to calculate the Bernstein coefficients for polynomials that converge from above and below to *f*, based on Thomas and Blanchet (2012)^{(7)}. These polynomials can then be used to generate heads with probability *f*(*λ*) using the algorithms given in "**General Factory Functions**".

In this section, **fbelow( n, k)** and

**fabove(**are the

*n*,*k*)*k*

^{th}coefficients (with

*k*starting at 0) of the lower and upper polynomials, respectively, in Bernstein form of degree

*n*.

The code in the **appendix** uses the computer algebra library SymPy to calculate a list of parameters for a sequence of polynomials converging from above. The method to do so is called `calc_linear_func(eps, mult, count)`

, where `eps`

is *ε*, `mult`

= *mult*, and `count`

is the number of polynomials to generate. Each item returned by `calc_linear_func`

is a list of two items: the degree of the polynomial, and a *Y parameter*. The procedure to calculate the required polynomials is then logically as follows (as written, it runs very slowly, though):

- Set
*i*to 1. - Run
`calc_linear_func(eps, mult, i)`

and get the degree and*Y parameter*for the last listed item, call them*n*and*y*, respectively. - Set
*x*to −((*y*−(1−*ε*))/*ε*)^{5}/*mult*+*y*/*mult*. (This exact formula doesn't appear in the Thomas and Blanchet paper; rather it comes from the**supplemental source code**uploaded by A. C. Thomas at my request. - For degree
*n*,**fbelow(**is min((*n*,*k*)*k*/*n*)**mult*, 1−*ε*), and**fabove(**is min((*n*,*k*)*k*/*n*)**y*/*x*,*y*). (**fbelow**matches*f*because*f*is*concave*in the interval [0, 1], which roughly means that its rate of growth there never goes up.) - Add 1 to
*i*and go to step 2.

It would be interesting to find general formulas to find the appropriate polynomials (degrees and *Y parameters*) given only the values for *mult* and *ε*, rather than find them "the hard way" via `calc_linear_func`

. For this procedure, the degrees and *Y parameters* can be upper bounds, as long as the sequence of degrees is monotonically increasing and the sequence of Y parameters is nonincreasing.

Note:In Nacu and Peres (2005)^{(8)}, the following polynomial sequences were suggested to simulate min(λ*2, 1 − 2*ε), providedε< 1/8, wherenis a power of 2. However, with these sequences, an extraordinary number of input coin flips is required to simulate this function each time.

fbelow(= min((n,k)k/n)*2, 1 − 2*ε).fabove(= min((n,k)k/n)*2, 1 − 2*ε) +

(max(0,k/n+3*ε −1/2)/(ε/(1−sqrt(2)/2)))*sqrt(2/n) +

(72*max(0,k/n−1/9)/(1−exp(−2*ε*ε)))*exp(−2*ε*ε*n).

My own algorithm for min(*λ*, 1/2) is as follows. See the **appendix** for the derivation of this algorithm.

- Generate an unbiased random bit. If that bit is 1 (which happens with probability 1/2), flip the input coin and return the result.
- Run the algorithm for min(
*λ*, 1−*λ*) given later, and return the result of that run.

And the algorithm for min(*λ*, 1−*λ*) is as follows:

- (Random walk.) Generate unbiased random bits until more zeros than ones are generated this way for the first time. Then set
*m*to (*n*−1)/2+1, where*n*is the number of bits generated this way. - (Build a degree-
*m**2 polynomial equivalent to (4**λ**(1−*λ*))^{m}/2.) Let*z*be (4^{m}/2)/choose(*m**2,*m*). Define a polynomial of degree*m**2 whose (*m**2)+1 Bernstein coefficients are all zero except the*m*^{th}coefficient (starting at 0), whose value is*z*. Elevate the degree of this polynomial enough times so that all its coefficients are 1 or less (degree elevation increases the polynomial's degree without changing its shape or position; see the derivation in the appendix). Let*d*be the new polynomial's degree. - (Simulate the polynomial, whose degree is
*d*(Goyal and Sigman 2012)^{(9)}.) Flip the input coin*d*times and set*h*to the number of ones generated this way. Let*a*be the*h*^{th}Bernstein coefficient (starting at 0) of the new polynomial. With probability*a*, return 1. Otherwise, return 0.

I suspected that the required degree *d* would be floor(*m**2/3)+1, as described in the appendix. With help from the **MathOverflow community**, steps 2 and 3 of the algorithm above can be described more efficiently as follows:

- (3.) Let
*r*be floor(*m**2/3)+1, and let*d*be*m**2+*r*. - (4.) (Simulate the polynomial, whose degree is
*d*.) Flip the input coin*d*times and set*h*to the number of ones generated this way. Let*a*be (1/2) * 2^{m*2}*choose(*r*,*h*−*m*)/choose(*d*,*h*) (the polynomial's*h*^{th}Bernstein coefficient starting at 0; the first term is 1/2 because the polynomial being simulated has the value 1/2 at the point 1/2). With probability*a*, return 1. Otherwise, return 0.

The min(*λ*, 1−*λ*) algorithm can be used to simulate certain other piecewise linear functions with three breakpoints, and algorithms for those functions are shown in the following table. In the table, *μ* is the unknown probability of heads of a second input coin.

Breakpoints | Algorithm |
---|---|

0 at 0; 1/2 at 1/2; and μ at 1. |
Flip the μ input coin. If it returns 1, flip the λ input coin and return the result. Otherwise, run the algorithm for min(λ, 1−λ) using the λ input coin, and return the result of that run. |

(1−μ)/2 at 0; 1/2 at 1/2; and μ/2 at 1. |
Generate an unbiased random bit. If that bit is 1, run the algorithm for min(λ, 1−λ) using the λ input coin, and return the result of that run. Otherwise, flip the μ input coin. If it returns 1, flip the λ input coin and return the result. Otherwise, flip the λ input coin and return 1 minus the result. |

0 at 0; μ/2 at 1/2; and μ/2 at 1. |
Flip the μ input coin. If it returns 0, return 0. Otherwise, generate an unbiased random bit. If that bit is 1 (which happens with probability 1/2), flip the λ input coin and return the result. Otherwise, run the algorithm for min(λ, 1−λ) using the λ input coin, and return the result of that run. |

μ at 0; 1/2 at 1/2; and 0 at 1. |
Flip the μ input coin. If it returns 1, flip the λ input coin and return 1 minus the result. Otherwise, run the algorithm for min(λ, 1−λ) using the λ input coin, and return the result of that run. |

1 at 0; 1/2 at 1/2; and μ at 1. |
Flip the μ input coin. If it returns 0, flip the λ input coin and return 1 minus the result. Otherwise, run the algorithm for min(λ, 1−λ) using the λ input coin, and return 1 minus the result of that run. |

μ at 0; 1/2 at 1/2; and 1 at 1. |
Flip the μ input coin. If it returns 0, flip the λ input coin and return the result. Otherwise, run the algorithm for min(λ, 1−λ) using the λ input coin, and return 1 minus the result of that run. |

B at 0; B+(A/2) at 1/2; and B+(A/2) at 1. |
(A≤1 and B≤1−A are rational numbers.) With probability 1−A, return a number that is 1 with probability B/(1−A) and 0 otherwise. Otherwise, generate an unbiased random bit. If that bit is 1, flip the λ input coin and return the result. Otherwise, run the algorithm for min(λ, 1−λ) using the λ input coin, and return the result of that run. |

### Sampling Distributions Using Incomplete Information

The Bernoulli factory is a special case of the problem of **sampling a probability distribution with unknown parameters**. This problem can be described as sampling from a new distribution using an *oracle* (black box) that produces numbers of an incompletely known distribution. In the Bernoulli factory problem, this oracle is a *coin that shows heads or tails where the probability of heads is unknown*. The rest of this section deals with oracles that go beyond coins.

**Algorithm 1.** Say we have an oracle that produces independent random variates in the interval [*a*, *b*], and these numbers have an unknown mean of *μ*. The goal is now to produce non-negative random variates whose expected ("average") value is *f*(*μ*). Unless *f* is constant, this is possible if and only if—

*f*is continuous on [*a*,*b*], and*f*(*μ*) is bounded from below by*ε**min((*μ*−*a*)^{n}, (*b*−*μ*)^{n}) for some integer*n*and some*ε*greater than 0 (loosely speaking,*f*is non-negative and neither touches 0 inside (*a*,*b*) nor moves away from 0 more slowly than a polynomial)

(Jacob and Thiery 2015)^{(10)}. (Here, *a* and *b* are both rational numbers and may be less than 0.)

In the algorithm below, let *K* be a rational number greater than the maximum value of *f* in the interval [*a*, *b*], and let *g*(*λ*) = *f*(*a* + (*b*−*a*)**λ*)/*K*.

- Create a
*λ*input coin that does the following: "Take a number from the oracle, call it*x*. With probability (*x*−*a*)/(*b*−*a*) (see note below), return 1. Otherwise, return 0." - Run a Bernoulli factory algorithm for
*g*(*λ*), using the*λ*input coin. Then return*K*times the result.

Note:The check "With probability (x−a)/(b−a)" is exact if the oracle produces only rational numbers. If the oracle can produce irrational numbers (such as numbers that follow a beta distribution or another continuous distribution), then the code for the oracle should use uniformpartially-sampled random numbers (PSRNs). In that case, the check can be implemented as follows. Letxbe a uniform PSRN representing a number generated by the oracle. SetytoRandUniformFromReal(b−a), then the check succeeds ifURandLess(y,UniformAddRational(x, −a)) returns 1, and fails otherwise.

Example:Suppose an oracle produces random variates in the interval [3, 13] with unknown meanμ, and we seek to use the oracle to produce non-negative random variates with meanf(μ) = −319/100 +μ*103/50 −μ^{2}*11/100, which is a polynomial with Bernstein coefficients [2, 9, 5] in the given interval. Then since 8 is greater than the maximum offin that interval,g(λ) is a degree-2 polynomial with Bernstein coefficients [2/8, 9/8, 5/8] in the interval [0, 1].gcan't be simulated as is, though, but by increasingg's degree to 3 we get the Bernstein coefficients [1/4, 5/6, 23/24, 5/8], which are all less than 1 so we can proceed with the following algorithm (see "Certain Polynomials"):

- Set
headsto 0.- Generate three random variates from the oracle (which must produce random variates in the interval [3, 13]). For each number
x: With probability (x−3)/(10−3), add 1 toheads.- Depending on
heads, return 8 (that is, 1 times the upper bound) with the given probability, or 0 otherwise:heads=0 → probability 1/4; 1 → 5/6; 2 → 23/24; 3 → 5/8.

**Algorithm 2.** This algorithm takes an oracle and produces non-negative random variates whose expected ("average") value is the mean of *f*(*X*), where *X* is a number produced by the oracle. The algorithm appears in the appendix, however, because it requires applying an arbitrary function (here, *f*) to a potentially irrational number.

**Algorithm 3.** For this algorithm, see the appendix.

**Algorithm 4.** Say there is an oracle in the form of an *n*-sided fair die (*n*≥2) with an unknown number of faces, where each face shows a different integer in the interval [0, *n*). The question arises: Which probability distributions based on *n* can be sampled with this oracle? This question was studied in the French-language dissertation of R. Duvignau (2015, section 5.2)^{(11)}, and the following are four of these distributions.

** Bernoulli 1/n.** It's trivial to generate a Bernoulli variate that is 1 with probability 1/

*n*and 0 otherwise: just take a number from the oracle and return either 1 if that number is 0, or 0 otherwise. Alternatively, take two numbers from the oracle and return either 1 if both are the same, or 0 otherwise (Duvignau 2015, p. 153)

^{(11)}.

** Random variate with mean n.** Likewise, it's trivial to generate variates with a mean of

*n*: Do "Bernoulli 1/n" trials as described above until a trial returns 0, then return the number of trials done this way. (This is often called 1 plus a "geometric" random variate, and has a mean of

*n*.)

** Binomial with parameters n and 1/n.** Using the oracle, the following algorithm generates a binomial variate of this kind (Duvignau 2015, Algorithm 20)

^{(11)}:

- Take items from the oracle until the same item is taken twice.
- Create a list consisting of the items taken in step 1, except for the last item taken, then shuffle that list.
- In the shuffled list, count the number of items that didn't change position after being shuffled, then return that number.

** Binomial with parameters n and k/n.** Duvignau 2015 also includes an algorithm (Algorithm 25) to generate a binomial variate of this kind using the oracle (where

*k*is a known integer such that 0 <

*k*and

*k*≤

*n*):

- Take items from the oracle until
*k*different items were taken this way. Let*U*be a list of these*k*items, in the order in which they were first taken. - Create an empty list
*L*. - For each integer
*i*in [0,*k*):- Create an empty list
*M*. - Take an item from the oracle. If the item is in
*U*at a position**less than**(positions start at 0), repeat this substep. Otherwise, if the item is not in*i**M*, add it to*M*and repeat this substep. Otherwise, go to the next substep. - Shuffle the list
*M*, then add to*L*each item that didn't change position after being shuffled (if not already present in*L*).

- Create an empty list
- For each integer
*i*in [0,*k*):- Let
*P*be the item at position*i*in*U*. - Take an item from the oracle. If the item is in
*U*at position(positions start at 0), repeat this substep.*i*or less - If the last item taken in the previous substep is in
*U*at a position**greater than**, add*i**P*to*L*(if not already present).

- Let
- Return the number of items in
*L*.

Note:Duvignau proved a result (Theorem 5.2) that answers the question: Which probability distributions based on the unknownncan be sampled with the oracle?^{(12)}The result applies to a family of (discrete) distributions with the same unknown parametern, starting with either 1 or a greater integer. Let Supp(m) be the set of values taken on by the distribution with parameter equal tom. Then that family can be sampled using the oracle if and only if:

- There is a computable function
f(k) that outputs a positive number.- For each
n, Supp(n) is included in Supp(n+1).- For every
kand for everynstarting with the greater of 2 or the firstnfor whichkis in Supp(n), the probability of seeingkgiven parameternis at least (1/n)^{f(k)}(roughly speaking, the probability doesn't decay at a faster than polynomial rate asnincreases).

### Pushdown Automata for Square-Root-Like Functions

A *pushdown automaton* is a state machine that keeps a stack of symbols. In this document, the input for this automaton is a stream of flips of a coin that shows heads with probability *λ*, and the output is 0 or 1 depending on which state the automaton ends up in when it empties the stack (Mossel and Peres 2005)^{(13)}. That paper shows that a pushdown automaton, as defined here, can simulate only *algebraic functions*, that is, functions that can be a solution of a system of polynomial equations. The **appendix** defines these machines in more detail and has proofs on which algebraic functions are possible with pushdown automata.

The following algorithm extends the square-root construction of Flajolet et al. (2010)^{(14)}, takes an input coin with probability of heads *λ*, and returns 1 with probability—

*f*(*λ*) = (1 −*λ*)/sqrt(1 + 4**λ***g*(*λ*)*(*g*(*λ*) − 1)), or equivalently,*f*(*λ*) = (1 −*λ*) * ∑_{n=0,1,...}*λ*^{n}**g*(*λ*)^{n}*(1 −*g*(*λ*))^{n}*choose(2**n*,*n*) = (1 −*λ*) * ∑_{n=0,1,...}(*λ***g*(*λ*)*(1 −*g*(*λ*)))^{n}*choose(2**n*,*n*), or equivalently,*f*(*λ*) = (1 −*λ*) * OGF(*λ***g*(*λ*)*(1 −*g*(*λ*))),

and 0 otherwise, where:

*g*(*λ*) is a continuous function that maps the half-open interval [0, 1) to the closed interval [0, 1] and admits a Bernoulli factory. If*g*is a rational function (a ratio of two polynomials) with rational coefficients, then*f*is algebraic and can be simulated by a*pushdown automaton*, as in the algorithm below. But this algorithm will still work even if*g*is not a rational function.OGF(

*x*) = ∑_{n=0,1,...}*x*^{n}*choose(2**n*,*n*) is the algorithm's ordinary generating function (also known as counting generating function).

- Set
*d*to 0. - Do the following process repeatedly until this run of the algorithm returns a value:
- Flip the input coin. If it returns 1, go to the next substep. Otherwise, return either 1 if
*d*is 0, or 0 otherwise. - Run a Bernoulli factory algorithm for
*g*(*λ*). If the run returns 1, add 1 to*d*. Otherwise, subtract 1 from*d*. Do this substep again.

- Flip the input coin. If it returns 1, go to the next substep. Otherwise, return either 1 if

As a pushdown automaton, this algorithm (except the "Do this substep again" part) can be expressed as follows. Let the stack have the single symbol EMPTY, and start at the state POS-S1. Based on the current state, the last coin flip (HEADS or TAILS), and the symbol on the top of the stack, set the new state and replace the top stack symbol with zero, one, or two symbols. These *transition rules* can be written as follows:

- (POS-S1, HEADS,
*topsymbol*) → (POS-S2, {*topsymbol*}) (set state to POS-S2, keep*topsymbol*on the stack). - (NEG-S1, HEADS,
*topsymbol*) → (NEG-S2, {*topsymbol*}). - (POS-S1, TAILS, EMPTY) → (ONE, {}) (set state to ONE, pop the top symbol from the stack).
- (NEG-S1, TAILS, EMPTY) → (ONE, {}).
- (POS-S1, TAILS, X) → (ZERO, {}).
- (NEG-S1, TAILS, X) → (ZERO, {}).
- (ZERO,
*flip*,*topsymbol*) → (ZERO, {}). - (POS-S2,
*flip*,*topsymbol*) → Add enough transition rules to the automaton to simulate*g*(*λ*) by a finite-state machine (only possible if*g*is rational with rational coefficients (Mossel and Peres 2005)^{(13)}). Transition to POS-S2-ZERO if the machine outputs 0, or POS-S2-ONE if the machine outputs 1. - (NEG-S2,
*flip*,*topsymbol*) → Same as before, but the transitioning states are NEG-S2-ZERO and NEG-S2-ONE, respectively. - (POS-S2-ONE,
*flip*,*topsymbol*) → (POS-S1, {*topsymbol*, X}) (replace top stack symbol with*topsymbol*, then push X to the stack). - (POS-S2-ZERO,
*flip*, EMPTY) → (NEG-S1, {EMPTY, X}). - (POS-S2-ZERO,
*flip*, X) → (POS-S1, {}). - (NEG-S2-ZERO,
*flip*,*topsymbol*) → (NEG-S1, {*topsymbol*, X}). - (NEG-S2-ONE,
*flip*, EMPTY) → (POS-S1, {EMPTY, X}). - (NEG-S2-ONE,
*flip*, X) → (NEG-S1, {}).

The machine stops when it removes EMPTY from the stack, and the result is either ZERO (0) or ONE (1).

For the following algorithm, which extends the end of Note 1 of the Flajolet paper, the probability is—

*f*(*λ*) = (1 −*λ*) * ∑_{n=0,1,...}*λ*^{H*n}**g*(*λ*)^{n}*(1 −*g*(*λ*))^{(H−1)*n}*choose(*H***n*,*n*),

where *H* ≥ 2 is an integer, and *g* has the same meaning as earlier.

- Set
*d*to 0. - Do the following process repeatedly until this run of the algorithm returns a value:
- Flip the input coin. If it returns 1, go to the next substep. Otherwise, return either 1 if
*d*is 0, or 0 otherwise. - Run a Bernoulli factory algorithm for
*g*(*λ*). If the run returns 1, add (*H*−1) to*d*. Otherwise, subtract 1 from*d*. (Note: This substep is not done again.)

- Flip the input coin. If it returns 1, go to the next substep. Otherwise, return either 1 if

The following algorithm simulates the probability—

*f*(*λ*) = (1 −*λ*) * ∑_{n=0,1,...}*λ*^{n}* (∑_{m=0,1,...,n}*W*(*n*,*m*)**g*(*λ*)^{m}*(1 −*g*(*λ*))^{n−m}*choose(*n*,*m*))

= (1 −*λ*) * ∑_{n=0,1,...}*λ*^{n}* (∑_{m=0,1,...,n}*V*(*n*,*m*)**g*(*λ*)^{m}*(1 −*g*(*λ*))^{n−m}),

where *g* has the same meaning as earlier; *W*(*n*, *m*) is 1 if *m***H* equals (*n*−*m*)**T*, or 0 otherwise; and *H*≥1 and *T*≥1 are integers. (In the first formula, the sum in parentheses is a polynomial in Bernstein form, in the variable *g*(*λ*) and with only zeros and ones as coefficients. Because of the *λ*^{n}, the polynomial gets smaller as *n* gets larger. *V*(*n*, *m*) is the number of *n*-letter words that have *m* heads *and* describe a walk that ends at the beginning.)

- Set
*d*to 0. - Do the following process repeatedly until this run of the algorithm returns a value:
- Flip the input coin. If it returns 1, go to the next substep. Otherwise, return either 1 if
*d*is 0, or 0 otherwise. - Run a Bernoulli factory algorithm for
*g*(*λ*). If the run returns 1 ("heads"), add*H*to*d*. Otherwise ("tails"), subtract*T*from*d*. (Note: This substep is not done again.)

- Flip the input coin. If it returns 1, go to the next substep. Otherwise, return either 1 if

## General Arbitrary-Precision Samplers

### Uniform Distribution Inside N-Dimensional Shapes

The following is a general way to describe an arbitrary-precision sampler for generating a point uniformly at random inside a geometric shape located entirely in the hypercube [0, *d1*]×[0, *d2*]×...×[0,*dN*] in *N*-dimensional space, where *d1*, ..., *dN* are integers greater than 0. The algorithm will generally work if the shape is reasonably defined; the technical requirements are that the shape must have a zero-volume (Lebesgue measure zero) boundary and a nonzero finite volume, and must assign zero probability to any zero-volume subset of it (such as a set of individual points).

The sampler's description has the following skeleton.

- Generate
*N*empty uniform partially-sampled random numbers (PSRNs), with a positive sign, an integer part of 0, and an empty fractional part. Call the PSRNs*p1*,*p2*, ...,*pN*. - Set
*S*to*base*, where*base*is the base of digits to be stored by the PSRNs (such as 2 for binary or 10 for decimal). Then set*N*coordinates to 0, call the coordinates*c1*,*c2*, ...,*cN*. Then set*d*to 1. Then, for each coordinate (*c1*, ...,*cN*), set that coordinate to an integer in [0,*dX*), chosen uniformly at random, where*dX*is the corresponding dimension's size. - For each coordinate (
*c1*, ...,*cN*), multiply that coordinate by*base*and add a digit chosen uniformly at random to that coordinate. - This step uses a function known as
**InShape**, which takes the coordinates of a box and returns one of three values:*YES*if the box is entirely inside the shape;*NO*if the box is entirely outside the shape; and*MAYBE*if the box is partly inside and partly outside the shape, or if the function is unsure.**InShape**, as well as the divisions of the coordinates by*S*, should be implemented using rational arithmetic. Instead of dividing those coordinates this way, an implementation can pass*S*as a separate parameter to**InShape**. See the**appendix**for further implementation notes. In this step, run**InShape**using the current box, whose coordinates in this case are ((*c1*/*S*,*c2*/*S*, ...,*cN*/*S*), ((*c1*+1)/*S*, (*c2*+1)/*S*, ..., (*cN*+1)/*S*)). - If the result of
**InShape**is*YES*, then the current box was accepted. If the box is accepted this way, then at this point,*c1*,*c2*, etc., will each store the*d*digits of a coordinate in the shape, expressed as a number in the interval [0, 1], or more precisely, a range of numbers. (For example, if*base*is 10,*d*is 3, and*c1*is 342, then the first coordinate is 0.342..., or more precisely, a number in the interval [0.342, 0.343].) In this case, do the following:- For each coordinate (
*c1*, ...,*cN*), transfer that coordinate's least significant digits to the corresponding PSRN's fractional part. The variable*d*tells how many digits to transfer to each PSRN this way. Then, for each coordinate (*c1*, ...,*cN*), set the corresponding PSRN's integer part to floor(*cX*/*base*^{d}), where*cX*is that coordinate. (For example, if*base*is 10,*d*is 3, and*c1*is 7342, set*p1*'s fractional part to [3, 4, 2] and*p1*'s integer part to 7.) - For each PSRN (
*p1*, ...,*pN*), optionally fill that PSRN with uniform random digits as necessary to give its fractional part the desired number of digits (similarly to**FillGeometricBag**). - For each PSRN, optionally do the following: Generate an unbiased random bit. If that bit is 1 (which happens with probability 1/2), set that PSRN's sign to negative. (This will result in a symmetric shape in the corresponding dimension. This step can be done for some PSRNs and not others.)
- Return the PSRNs
*p1*, ...,*pN*, in that order.

- For each coordinate (
- If the result of
**InShape**is*NO*, then the current box lies outside the shape and is rejected. In this case, go to step 2. - If the result of
**InShape**is*MAYBE*, it is not known whether the current box lies fully inside or fully outside the shape, so multiply*S*by*base*, then add 1 to*d*, then go to step 3.

Notes:

- See (Li and El Gamal 2016)
^{(15)}and (Oberhoff 2018)^{(16)}for related work on encoding random points uniformly distributed in a shape.- Rejection sampling on a shape is subject to the "curse of dimensionality", since typical shapes of high dimension will tend to cover much less volume than their bounding boxes, so that it would take a lot of time on average to accept a high-dimensional box. Moreover, the more area the shape takes up in the bounding box, the higher the acceptance rate.
- Devroye (1986, chapter 8, section 3)
^{(17)}describes grid-based methods to optimize random point generation. In this case, the space is divided into a grid of boxes each with size 1/base^{k}in all dimensions; the result ofInShapeis calculated for each such box and that box labeled with the result; all boxes labeledNOare discarded; and the algorithm is modified by adding the following after step 2: "2a. Choose a precalculated box uniformly at random, then setc1, ...,cNto that box's coordinates, then setdtokand setStobase^{k}. If a box labeledYESwas chosen, follow the substeps in step 5. If a box labeledMAYBEwas chosen, multiplySbybaseand add 1 tod." (For example, ifbaseis 10,kis 1,Nis 2, andd1=d2= 1, the space could be divided into a 10×10 grid, made up of 100 boxes each of size (1/10)×(1/10). Then,InShapeis precalculated for the box with coordinates ((0, 0), (1, 1)), the box ((0, 1), (1, 2)), and so on [the boxes' coordinates are stored as just given, butInShapeinstead uses those coordinates divided bybase^{k}, or 10^{1}in this case], each such box is labeled with the result, and boxes labeledNOare discarded. Finally the algorithm above is modified as just given.)- Besides a grid, another useful data structure is a
mapped regular paving(Harlow et al. 2012)^{(18)}, which can be described as a binary tree with nodes each consisting of zero or two child nodes and a marking value. Start with a box that entirely covers the desired shape. CalculateInShapefor the box. If it returnsYESorNOthen mark the box withYESorNO, respectively; otherwise it returnsMAYBE, so divide the box along its first widest coordinate into two sub-boxes, set the parent box's children to those sub-boxes, then repeat this process for each sub-box (or if the nesting level is too deep, instead mark each sub-box withMAYBE). Then, to generate a random point (with a base-2 fractional part), start from the root, then: (1) If the box is markedYES, return a uniform random point between the given coordinates using theRandUniformInRangealgorithm; or (2) if the box is markedNO, start over from the root; or (3) if the box is markedMAYBE, get the two child boxes bisected from the box, choose one of them with equal probability (e.g., choose the left child if an unbiased random bit is 0, or the right child otherwise), mark the chosen child with the result ofInShapefor that child, and repeat this process with that child; or (4) the box has two child boxes, so choose one of them with equal probability and repeat this process with that child.- The algorithm can be adapted to return 1 with probability equal to its acceptance rate (which equals the shape's volume divided by the hyperrectangle's volume), and return 0 with the opposite probability. In this case, replace steps 5 and 6 with the following: "5. If the result of
InShapeisYES, return 1.; 6. If the result ofInShapeisNO, return 0." (I thank BruceET of the Cross Validated community for leading me to this insight.)

Examples:

- The following example generates a point inside a quarter diamond (centered at (0, ..., 0), "radius"
kwherekis an integer greater than 0): Letd1, ...,dNbek. LetInShapereturnYESif ((c1+1) + ... + (cN+1)) <S*k;NOif (c1+ ... +cN) >S*k; andMAYBEotherwise. ForN=2, the acceptance rate (see note 5) is 1/2. For a full diamond, step 5.3 in the algorithm is done for each of theNdimensions.- The following example generates a point inside a quarter hypersphere (centered at (0, ..., 0), radius
kwherekis an integer greater than 0): Letd1, ...,dNbek. LetInShapereturnYESif ((c1+1)^{2}+ ... + (cN+1)^{2}) < (S*k)^{2};NOif (c1^{2}+ ... +cN^{2}) > (S*k)^{2}; andMAYBEotherwise. ForN=2, the acceptance rate (see note 5) isπ/4. For a full hypersphere with radius 1, step 5.3 in the algorithm is done for each of theNdimensions. In the case of a 2-dimensional disk, this algorithm thus adapts the well-known rejection technique of generating X and Y coordinates until X^{2}+Y^{2}< 1 (e.g., (Devroye 1986, p. 230 et seq.)^{(17)}).- The following example generates a point inside a quarter
astroid(centered at (0, ..., 0), radiuskwherekis an integer greater than 0): Letd1, ...,dNbek. LetInShapereturnYESif ((sk−c1−1)^{2}+ ... + (sk−cN−1)^{2}) >sk^{2};NOif ((sk−c1)^{2}+ ... + (sk−cN)^{2}) <sk^{2}; andMAYBEotherwise, wheresk=S*k. ForN=2, the acceptance rate (see note 5) is 1 −π/4. For a full astroid, step 5.3 in the algorithm is done for each of theNdimensions.

### Building an Arbitrary-Precision Sampler

If a continuous distribution—

- has a probability density function (PDF), or a function proportional to the PDF, with a known symbolic form,
- has a cumulative distribution function (CDF) with a known symbolic form,
- takes on only values 0 or greater, and
- has a PDF that has an infinite tail to the right, is bounded from above (that is,
*PDF(0)*is other than infinity), and decreases monotonically,

it may be possible to describe an arbitrary-precision sampler for that distribution. Such a description has the following skeleton.

- With probability
*A*, set*intval*to 0, then set*size*to 1, then go to step 4.*A*is calculated as (*CDF*(1) −*CDF*(0)) / (1−*CDF*(0)), where*CDF*is the distribution's CDF. This should be found analytically using a computer algebra system such as SymPy.- The symbolic form of
*A*will help determine which Bernoulli factory algorithm, if any, will simulate the probability; if a Bernoulli factory exists, it should be used.

- Set
*intval*to 1 and set*size*to 1. - With probability
*B*(*size*,*intval*), go to step 4. Otherwise, add*size*to*intval*, then multiply*size*by 2, then repeat this step.- This step chooses an interval beyond 1, and grows this interval by geometric steps, so that an appropriate interval is chosen with the correct probability.
- The probability
*B*(*size*,*intval*) is the probability that the interval is chosen given that the previous intervals weren't chosen, and is calculated as (*CDF*(*size*+*intval*) −*CDF*(*intval*)) / (1−*CDF*(*intval*)). This should be found analytically using a computer algebra system such as SymPy. - The symbolic form of
*B*will help determine which Bernoulli factory algorithm, if any, will simulate the probability; if a Bernoulli factory exists, it should be used.

- Generate an integer in the interval [
*intval*,*intval*+*size*) uniformly at random, call it*i*. - Create a positive-sign zero-integer-part uniform PSRN,
*ret*. - Create an input coin that calls
**SampleGeometricBag**on the PSRN*ret*. Run a Bernoulli factory algorithm that simulates the probability*C*(*i*,*λ*), using the input coin (here,*λ*is the probability built up in*ret*via**SampleGeometricBag**, and lies in the interval [0, 1]). If the call returns 0, go to step 4.- The probability
*C*(*i*,*λ*) is calculated as*PDF*(*i*+*λ*) /*M*, where*PDF*is the distribution's PDF or a function proportional to the PDF, and should be found analytically using a computer algebra system such as SymPy. - In this formula,
*M*is any convenient number in the interval [*PDF*(*intval*), max(1,*PDF*(*intval*))], and should be as low as feasible.*M*serves to ensure that*C*is as close as feasible to 1 (to improve acceptance rates), but no higher than 1. The choice of*M*can vary for each interval (each value of*intval*, which can only be 0, 1, or a power of 2). Any such choice for*M*preserves the algorithm's correctness because the PDF has to be monotonically decreasing and a new interval isn't chosen when*λ*is rejected. - The symbolic form of
*C*will help determine which Bernoulli factory algorithm, if any, will simulate the probability; if a Bernoulli factory exists, it should be used.

- The probability
- The PSRN
*ret*was accepted, so set*ret*'s integer part to*i*, then optionally fill*ret*with uniform random digits as necessary to give its fractional part the desired number of digits (similarly to**FillGeometricBag**), then return*ret*.

Examples of algorithms that use this skeleton are the algorithm for the **ratio of two uniform random variates**, as well as the algorithms for the Rayleigh distribution and for the reciprocal of power of uniform, both given later.

Perhaps the most difficult part of describing an arbitrary-precision sampler with this skeleton is finding the appropriate Bernoulli factory for the probabilities *A*, *B*, and *C*, especially when these probabilities have a non-trivial symbolic form.

Note:The algorithm skeleton uses ideas similar to the inversion-rejection method described in (Devroye 1986, ch. 7, sec. 4.6)^{(17)}; an exception is that instead of generating a uniform random variate and comparing it to calculations of a CDF, this algorithm uses conditional probabilities of choosing a given piece, probabilities labeledAandB. This approach was taken so that the CDF of the distribution in question is never directly calculated in the course of the algorithm, which furthers the goal of sampling with arbitrary precision and without using floating-point arithmetic.

### Mixtures

A *mixture* involves sampling one of several distributions, where each distribution has a separate probability of being sampled. In general, an arbitrary-precision sampler is possible if all of the following conditions are met:

- There is a finite number of distributions to choose from.
- The probability of sampling each distribution is a rational number, or it can be expressed as a function for which a
**Bernoulli factory algorithm**exists. - For each distribution, an arbitrary-precision sampler exists.

Example:One example of a mixture is two beta distributions, with separate parameters. One beta distribution is chosen with probability exp(−3) (a probability for which a Bernoulli factory algorithm exists) and the other is chosen with the opposite probability. For the two beta distributions, an arbitrary-precision sampling algorithm exists (see my article onpartially-sampled random numbers (PSRNs)for details).

### Weighted Choice Involving PSRNs

Given *n* uniform PSRNs, called *weights*, with labels starting from 0 and ending at *n*−1, the following algorithm chooses an integer in [0, *n*) with probability proportional to its weight. Each weight's sign must be positive.

- Create an empty list, then for each weight starting with weight 0, add the weight's integer part plus 1 to that list. For example, if the weights are [2.22...,0.001...,1.3...], in that order, the list will be [3, 1, 2], corresponding to integers 0, 1, and 2, in that order. Call the list just created the
*rounded weights list*. - Choose an integer
*i*with probability proportional to the weights in the rounded weights list. This can be done, for example, by taking the result of**WeightedChoice**(*list*), where*list*is the rounded weights list and**WeightedChoice**is given in "**Randomization and Samping Methods**". - Run
**URandLessThanReal**(*w*,*rw*), where*w*is the original weight for integer*i*, and*rw*is the rounded weight for integer*i*in the rounded weights list. That algorithm returns 1 if*w*turns out to be less than*rw*. If the result is 1, return*i*. Otherwise, go to step 2.

### Cumulative Distribution Functions

Suppose we have a real number *z* (which might be a uniform PSRN or a rational number). If a continuous distribution—

- has a probability density function (PDF) (as with the normal or exponential distribution), and
- has an arbitrary-precision sampler that returns a uniform PSRN
*X*,

then it's possible to generate 1 with the same probability as the sampler returns an *X* that is less than or equal to *z*, as follows:

- Run the arbitrary-precision sampler to generate
*X*, a uniform PSRN. - Run
**URandLess**(if*z*is a uniform PSRN) or**URandLessThanReal**(if*z*is a real number) with parameters*X*and*z*, in that order, and return the result.

Specifically, the probability of returning 1 is the *cumulative distribution function* (CDF) for the distribution of *X*.

Notes:

- Although step 2 of the algorithm checks whether
Xis merely less thanz, this is still correct; because the distribution ofXhas a PDF,Xis less thanzwith the same probability asXis less than or equal toz.- All probability distributions have a CDF, not just those with a PDF, but also discrete ones such as Poisson or binomial.

## Specific Arbitrary-Precision Samplers

### Rayleigh Distribution

The following is an arbitrary-precision sampler for the Rayleigh distribution with parameter *s*, which is a rational number greater than 0.

- Set
*k*to 0, and set*y*to 2 **s***s*. - With probability exp(−(
*k** 2 + 1)/*y*), go to step 3. Otherwise, add 1 to*k*and repeat this step. (The probability check should be done with the**exp(−**in "*x*/*y*) algorithm**Bernoulli Factory Algorithms**", with*x*/*y*= (*k** 2 + 1)/*y*.) - (Now we sample the piece located at [
*k*,*k*+ 1).) Create a positive-sign zero-integer-part uniform PSRN, and create an input coin that returns the result of**SampleGeometricBag**on that uniform PSRN. - Set
*ky*to*k***k*/*y*. - (At this point, we simulate exp(−
*U*^{2}/*y*), exp(−*k*^{2}/*y*) , exp(−*U***k**2/*y*), as well as a scaled-down version of*U*+*k*, where*U*is the number built up by the uniform PSRN.) Call the**exp(−**with*x*/*y*) algorithm*x*/*y*=*ky*, then call the**exp(−(**using the input coin from step 2,*λ*^{k}**x*)) algorithm*x*= 1/*y*, and*k*= 2, then call the first or third algorithm for**exp(−(**using the same input coin,*λ*^{k}**c*))*c*= floor(*k** 2 /*y*), and*k*= 1, then call the**sub-algorithm**given later with the uniform PSRN and*k*=*k*. If all of these calls return 1, the uniform PSRN was accepted. Otherwise, remove all digits from the uniform PSRN's fractional part and go to step 4. - If the uniform PSRN, call it
*ret*, was accepted by step 5, set*ret*'s integer part to*k*, then optionally fill*ret*with uniform random digits as necessary to give its fractional part the desired number of digits (similarly to**FillGeometricBag**), and return*ret*.

The sub-algorithm below simulates a probability equal to (*U*+*k*)/*base*^{z}, where *U* is the number built by the uniform PSRN, *base* is the base (radix) of digits stored by that PSRN, *k* is an integer 0 or greater, and *z* is the number of significant digits in *k* (for this purpose, *z* is 0 if *k* is 0).

For base 2:

- Set
*N*to 0. - Generate an unbiased random bit. If that bit is 1 (which happens with probability 1/2), go to the next step. Otherwise, add 1 to
*N*and repeat this step. - If
*N*is less than*z*, return rem(*k*/ 2^{z − 1 − N}, 2). (Alternatively, shift*k*to the right, by*z*− 1 −*N*bits, then return*k**AND*1, where "*AND*" is a bitwise AND-operation.) - Subtract
*z*from*N*. Then, if the item at position*N*in the uniform PSRN's fractional part (positions start at 0) is not set to a digit (e.g., 0 or 1 for base 2), set the item at that position to a digit chosen uniformly at random (e.g., either 0 or 1 for base 2), increasing the capacity of the uniform PSRN's fractional part as necessary. - Return the item at position
*N*.

For bases other than 2, such as 10 for decimal, this can be implemented as follows (based on **URandLess**):

- Set
*i*to 0. - If
*i*is less than*z*:- Set
*da*to rem(*k*/ 2^{z − 1 − i},*base*), and set*db*to a digit chosen uniformly at random (that is, an integer in the interval [0,*base*)). - Return 1 if
*da*is less than*db*, or 0 if*da*is greater than*db*.

- Set
- If
*i*is*z*or greater:- If the digit at position (
*i*−*z*) in the uniform PSRN's fractional part is not set, set the item at that position to a digit chosen uniformly at random (positions start at 0 where 0 is the most significant digit after the point, 1 is the second most significant, etc.). - Set
*da*to the item at that position, and set*db*to a digit chosen uniformly at random (that is, an integer in the interval [0,*base*)). - Return 1 if
*da*is less than*db*, or 0 if*da*is greater than*db*.

- If the digit at position (
- Add 1 to
*i*and go to step 3.

### Hyperbolic Secant Distribution

The following algorithm adapts the rejection algorithm from p. 472 in (Devroye 1986)^{(17)} for arbitrary-precision sampling.

- Generate
*ret*, an exponential random variate with a rate of 1 via the**ExpRand**or**ExpRand2**algorithm described in my article on**PSRNs**. This number will be a uniform PSRN. - Set
*ip*to 1 plus*ret*'s integer part. - (The rest of the algorithm accepts
*ret*with probability 1/(1+*ret*).) With probability*ip*/(1+*ip*), generate a number that is 1 with probability 1/*ip*and 0 otherwise. If that number is 1,*ret*was accepted, in which case optionally fill it with uniform random digits as necessary to give its fractional part the desired number of digits (similarly to**FillGeometricBag**), then set*ret*'s sign to positive or negative with equal probability, then return*ret*. - Call
**SampleGeometricBag**on*ret*'s fractional part (ignore*ret*'s integer part and sign). If the call returns 1, go to step 1. Otherwise, go to step 3.

### Reciprocal of Power of Uniform

The following algorithm generates a PSRN of the form 1/*U*^{1/x}, where *U* is a uniform random variate in [0, 1] and *x* is an integer greater than 0.

- Set
*intval*to 1 and set*size*to 1. - With probability (4
^{x}−2^{x})/4^{x}, go to step 3. Otherwise, add*size*to*intval*, then multiply*size*by 2, then repeat this step. - Generate an integer in the interval [
*intval*,*intval*+*size*) uniformly at random, call it*i*. - Create a positive-sign zero-integer-part uniform PSRN,
*ret*. - Create an input coin that calls
**SampleGeometricBag**on the PSRN*ret*. Call the**algorithm for**in "*d*^{k}/ (*c*+*λ*)^{k}**Bernoulli Factory Algorithms**", using the input coin, where*d*=*intval*,*c*=*i*, and*k*=*x*+ 1 (here,*λ*is the probability built up in*ret*via**SampleGeometricBag**, and lies in the interval [0, 1]). If the call returns 0, go to step 3. - The PSRN
*ret*was accepted, so set*ret*'s integer part to*i*, then optionally fill*ret*with uniform random digits as necessary to give its fractional part the desired number of digits (similarly to**FillGeometricBag**), then return*ret*.

This algorithm uses the skeleton described earlier in "Building an Arbitrary-Precision Sampler". Here, the probabilities *A*, *B*, and *C* are as follows:

*A*= 0, since the random variate can't lie in the interval [0, 1).*B*= (4^{x}−2^{x})/4^{x}.*C*= (*x*/(*i*+*λ*)^{x+1}) /*M*. Ideally,*M*is either*x*if*intval*is 1, or*x*/*intval*^{x+1}otherwise. Thus, the ideal form for*C*is*intval*^{x+1}/(*i*+*λ*)^{x+1}.

### Distribution of *U*/(1−*U*)

The following algorithm generates a PSRN distributed as *U*/(1−*U*), where *U* is a uniform random variate in [0, 1].

- Generate an unbiased random bit. If that bit is 1 (which happens with probability 1/2), set
*intval*to 0, then set*size*to 1, then go to step 4. - Set
*intval*to 1 and set*size*to 1. - With probability
*size*/(*size*+*intval*+ 1), go to step 4. Otherwise, add*size*to*intval*, then multiply*size*by 2, then repeat this step. - Generate an integer in the interval [
*intval*,*intval*+*size*) uniformly at random, call it*i*. - Create a positive-sign zero-integer-part uniform PSRN,
*ret*. - Create an input coin that calls
**SampleGeometricBag**on the PSRN*ret*. Call the**algorithm for**in "*d*^{k}/ (*c*+*λ*)^{k}**Bernoulli Factory Algorithms**", using the input coin, where*d*=*intval*+ 1,*c*=*i*+ 1, and*k*= 2 (here,*λ*is the probability built up in*ret*via**SampleGeometricBag**, and lies in the interval [0, 1]). If the call returns 0, go to step 4. - The PSRN
*ret*was accepted, so set*ret*'s integer part to*i*, then optionally fill*ret*with uniform random digits as necessary to give its fractional part the desired number of digits (similarly to**FillGeometricBag**), then return*ret*.

This algorithm uses the skeleton described earlier in "Building an Arbitrary-Precision Sampler". Here, the probabilities *A*, *B*, and *C* are as follows:

*A*= 1/2.*B*=*size*/(*size*+*intval*+ 1).*C*= (1/(*i*+*λ*+1)^{2}) /*M*. Ideally,*M*is 1/(*intval*+1)^{2}. Thus, the ideal form for*C*is (*intval*+1)^{2}/(*i*+*λ*+1)^{2}.

### Arc-Cosine Distribution

Here we reimplement an example from Devroye's book *Non-Uniform Random Variate Generation* (Devroye 1986, pp. 128–129)^{(17)}. The following arbitrary-precision sampler generates a random variate from a distribution with the following cumulative distribution function (CDF): `1 - cos(pi*x/2).`

The random variate will be in the interval [0, 1]. This algorithm's result is the same as applying acos(*U*)*2/π, where *U* is a uniform [0, 1] random variate, as pointed out by Devroye. The algorithm follows.

- Call the
**kthsmallest**algorithm with`n = 2`

and`k = 2`

, but without filling it with digits at the last step. Let*ret*be the result. - Set
*m*to 1. - Call the
**kthsmallest**algorithm with`n = 2`

and`k = 2`

, but without filling it with digits at the last step. Let*u*be the result. - With probability 4/(4*
*m***m*+ 2**m*), call the**URandLess**algorithm with parameters*u*and*ret*in that order, and if that call returns 1, call the**algorithm for π / 4**, described in "**Bernoulli Factory Algorithms**", twice, and if both of these calls return 1, add 1 to*m*and go to step 3. (Here, we incorporate an erratum in the algorithm on page 129 of the book.) - If
*m*is odd, optionally fill*ret*with uniform random digits as necessary to give its fractional part the desired number of digits (similarly to**FillGeometricBag**), and return*ret*. - If
*m*is even, go to step 1.

And here is Python code that implements this algorithm. The code uses floating-point arithmetic only at the end, to convert the result to a convenient form, and it relies on methods from *randomgen.py* and *bernoulli.py*.

def example_4_2_1(rg, bern, precision=53): while True: ret=rg.kthsmallest_psrn(2,2) k=1 while True: u=rg.kthsmallest_psrn(2,2) kden=4*k*k+2*k # erratum incorporated if randomgen.urandless(rg,u, ret) and \ rg.zero_or_one(4, kden)==1 and \ bern.zero_or_one_pi_div_4()==1 and \ bern.zero_or_one_pi_div_4()==1: k+=1 elif (k&1)==1: return randomgen.urandfill(rg,ret,precision)/(1<<precision) else: break

### Logistic Distribution

The following new algorithm generates a partially-sampled random number that follows the logistic distribution.

- Set
*k*to 0. - (Choose a 1-unit-wide piece of the logistic density.) Run the
**algorithm for (1+exp(**described in "*k*))/(1+exp(*k*+1))**Bernoulli Factory Algorithms**"). If the call returns 0, add 1 to*k*and repeat this step. Otherwise, go to step 3. - (The rest of the algorithm samples from the chosen piece.) Generate a uniform(0, 1) random variate, call it
*f*. - (Steps 4 through 7 succeed with probability exp(−(
*f*+*k*))/(1+exp(−(*f*+*k*)))^{2}.) Generate an unbiased random bit. If that bit is 1 (which happens with probability 1/2), go to step 3. - Run the
**algorithm for exp(−**(described in "Bernoulli Factory Algorithms"), then*k*/1)**sample from the number**(e.g., call*f***SampleGeometricBag**on*f*if*f*is implemented as a uniform PSRN). If any of these calls returns 0, go to step 4. - Generate an unbiased random bit. If that bit is 1 (which happens with probability 1/2), accept
*f*. If*f*is accepted this way, set*f*'s integer part to*k*, then optionally fill*f*with uniform random digits as necessary to give its fractional part the desired number of digits (similarly to**FillGeometricBag**), then set*f*'s sign to positive or negative with equal probability, then return*f*. - Run the
**algorithm for exp(−**and*k*/1)**sample from the number**(e.g., call*f***SampleGeometricBag**on*f*if*f*is implemented as a uniform PSRN). If both calls return 1, go to step 3. Otherwise, go to step 6.

### Cauchy Distribution

Uses the skeleton for the uniform distribution inside N-dimensional shapes.

- Generate two empty PSRNs, with a positive sign, an integer part of 0, and an empty fractional part. Call the PSRNs
*p1*and*p2*. - Set
*S*to*base*, where*base*is the base of digits to be stored by the PSRNs (such as 2 for binary or 10 for decimal). Then set*c1*and*c2*each to 0. Then set*d*to 1. - Multiply
*c1*and*c2*each by*base*and add a digit chosen uniformly at random to that coordinate. - If ((
*c1*+1)^{2}+ (*c2*+1)^{2}) <*S*^{2}, then do the following:- Transfer
*c1*'s least significant digits to*p1*'s fractional part, and transfer*c2*'s least significant digits to*p2*'s fractional part. The variable*d*tells how many digits to transfer to each PSRN this way. (For example, if*base*is 10,*d*is 3, and*c1*is 342, set*p1*'s fractional part to [3, 4, 2].) - Run the
**UniformDivide**algorithm (described in the article on PSRNs) on*p1*and*p2*, in that order, then set the resulting PSRN's sign to positive or negative with equal probability, then return that PSRN.

- Transfer
- If (
*c1*^{2}+*c2*^{2}) >*S*^{2}, then go to step 2. - Multiply
*S*by*base*, then add 1 to*d*, then go to step 3.

### Exponential Distribution with Unknown Rate *λ*, Lying in (0, 1]

Exponential random variates can be generated using an input coin of unknown probability of heads of *λ* (which can be in the interval (0, 1]), by generating arrival times in a *Poisson process* of rate 1, then *thinning* the process using the coin. The arrival times that result will be exponentially distributed with rate *λ*. I found the basic idea in the answer to a **Mathematics Stack Exchange question**, and thinning of Poisson processes is discussed, for example, in Devroye (1986, chapter six)^{(17)}. The algorithm follows:

- Generate an exponential(1) random variate using the
**ExpRand**or**ExpRand2**algorithm (with*λ*= 1), call it*ex*. - (Thinning step.) Flip the input coin. If it returns 1, return
*ex*. - Generate another exponential(1) random variate using the
**ExpRand**or**ExpRand2**algorithm (with*λ*= 1), call it*ex2*. Then run**UniformAdd**on*ex*and*ex2*and set*ex*to the result. Then go to step 2.

Notice that the algorithm's average running time increases as *λ* decreases.

### Exponential Distribution with Rate ln(*x*)

The following new algorithm generates a partially-sampled random number that follows the exponential distribution with rate ln(*x*). This is useful for generating a base-*x* logarithm of a uniform(0,1) random variate. This algorithm has two supported cases:

*x*is a rational number that's greater than 1. In that case, let*b*be floor(ln(*x*)/ln(2)).*x*is a uniform PSRN with a positive sign and an integer part of 1 or greater. In that case, let*b*be floor(ln(*i*)/ln(2)), where*i*is*x*'s integer part.

The algorithm follows.

- (Samples the integer part of the random variate.) Generate a number that is 1 with probability 1/
*x*and 0 otherwise, repeatedly until a zero is generated this way. Set*k*to the number of ones generated this way. (This is also known as a "geometric random variate", but this terminology is avoided because it has conflicting meanings in academic works.)- If
*x*is a rational number and a power of 2, this step can be implemented by generating blocks of*b*unbiased random bits until a**non-zero**block of bits is generated this way, then setting*k*to the number of**all-zero**blocks of bits generated this way. - If
*x*is a uniform PSRN, this step is implemented as follows: Run the first subalgorithm (later in this section) repeatedly until a run returns 0. Set*k*to the number of runs that returned 1 this way.

- If
- (The rest of the algorithm samples the fractional part.) Create
*f*, a uniform PSRN with a positive sign, an empty fractional part, and an integer part of 0. - Create a
*μ*input coin that does the following: "**Sample from the number**(e.g., call*f***SampleGeometricBag**on*f*if*f*is implemented as a uniform PSRN), then run the**algorithm for ln(1+**(given in "Bernoulli Factory Algorithms") with*y*/*z*)*y*/*z*= 1/1. If both calls return 1, return 1. Otherwise, return 0." (This simulates the probability*λ*=*f**ln(2).) Then:- If
*x*is a rational number, but not a power of 2, also create a*ν*input coin that does the following: "**Sample from the number**, then run the*f***algorithm for ln(1 +**with*y*/*z*)*y*/*z*= (*x*−2^{b})/2^{b}. If both calls return 1, return 1. Otherwise, return 0." - If
*x*is a uniform PSRN, also create a*ρ*input coin that does the following: "Return the result of the second subalgorithm (later in this section), given*x*and*b*", and a*ν*input coin that does the following: "**Sample from the number**, then run the*f***algorithm for ln(1 +**, using the*λ*)*ρ*input coin. If both calls return 1, return 1. Otherwise, return 0."

- If
- Run the
**algorithm for exp(−**(described in "Bernoulli Factory Algorithms")*λ*)*b*times, using the*μ*input coin. If a*ν*input coin was created in step 3, run the same algorithm once, using the*ν*input coin. If all these calls return 1, accept*f*. If*f*is accepted this way, set*f*'s integer part to*k*, then optionally fill*f*with uniform random digits as necessary to give its fractional part the desired number of digits (similarly to**FillGeometricBag**), then return*f*. - If
*f*was not accepted by the previous step, go to step 2.

Note: Abounded exponentialrandom variate with rate ln(x) and bounded bymhas a similar algorithm to this one. Step 1 is changed to read as follows: "Do the followingmtimes or until a zero is generated, whichever happens first: 'Generate a number that is 1 with probability 1/xand 0 otherwise'. Then setkto the number of ones generated this way. (kis a so-called bounded-geometric(1−1/x,m) random variate, which an algorithm of Bringmann and Friedrich (2013)^{(19)}can generate as well. Ifxis a power of 2, this can be implemented by generating blocks ofbunbiased random bits until anon-zeroblock of bits ormblocks of bits are generated this way, whichever comes first, then settingkto the number ofall-zeroblocks of bits generated this way.) Ifkism, returnm(thismis a constant, not a uniform PSRN; if the algorithm would otherwise return a uniform PSRN, it can return something else in order to distinguish this constant from a uniform PSRN)." Additionally, instead of generating a uniform(0,1) random variate in step 2, a uniform(0,μ) random variate can be generated instead, such as a uniform PSRN generated viaRandUniformFromReal, to implement an exponential distribution bounded bym+μ(whereμis a real number in the interval (0, 1)).

The following generator for the **rate ln(2)** is a special case of the previous algorithm and is useful for generating a base-2 logarithm of a uniform(0,1) random variate. Unlike the similar algorithm of Ahrens and Dieter (1972)^{(20)}, this one doesn't require a table of probability values.

- (Samples the integer part of the random variate. This will be geometrically distributed with parameter 1/2.) Generate unbiased random bits until a zero is generated this way. Set
*k*to the number of ones generated this way. - (The rest of the algorithm samples the fractional part.) Generate a uniform (0, 1) random variate, call it
*f*. - Create an input coin that does the following: "
**Sample from the number**(e.g., call*f***SampleGeometricBag**on*f*if*f*is implemented as a uniform PSRN), then run the**algorithm for ln(1+**(given in "Bernoulli Factory Algorithms") with*y*/*z*)*y*/*z*= 1/1. If both calls return 1, return 1. Otherwise, return 0." (This simulates the probability*λ*=*f**ln(2).) - Run the
**algorithm for exp(−**(described in "Bernoulli Factory Algorithms"), using the input coin from the previous step. If the call returns 1, accept*λ*)*f*. If*f*is accepted this way, set*f*'s integer part to*k*, then optionally fill*f*with uniform random digits as necessary to give its fractional part the desired number of digits (similarly to**FillGeometricBag**), then return*f*. - If
*f*was not accepted by the previous step, go to step 2.

The first subalgorithm samples the probability 1/*x*, where *x*≥1 is a uniform PSRN:

- Set
*c*to*x*'s integer part. With probability*c*/ (1 +*c*), return a number that is 1 with probability 1/*c*and 0 otherwise. - Run
**SampleGeometricBag**on*x*(which ignores*x*'s integer part and sign). If the run returns 1, return 0. Otherwise, go to step 1.

The second subalgorithm samples the probability (*x*−2^{b})/2^{b}, where *x*≥1 is a uniform PSRN and *b*≥0 is an integer:

- Subtract 2
^{b}from*x*'s integer part, then create*y*as**RandUniformFromReal**(2^{b}), then run**URandLessThanReal**(*x*,*y*), then add 2^{b}back to*x*'s integer part. - Return the result of
**URandLessThanReal**from step 1.

### Symmetric Geometric Distribution

Samples from the symmetric geometric distribution from (Ghosh et al. 2012)^{(21)}, with parameter *λ*, in the form of an input coin with unknown probability of heads of *λ*.

- Flip the input coin until it returns 0. Set
*n*to the number of times the coin returned 1 this way. - Run a
**Bernoulli factory algorithm for 1/(2−**, using the input coin. If the run returns 1, return*λ*)*n*. Otherwise, return −1 −*n*.

This is similar to an algorithm mentioned in an appendix in Li (2021)^{(22)}, in which the input coin—

- has
*λ*= 1−exp(−*ε*), and - can be built as follows using another input coin with probability of heads
*ε*: "Run a**Bernoulli factory algorithm for exp(−**using the*λ*)*ε*input coin, then return 1 minus the result."

### Lindley Distribution and Lindley-Like Mixtures

A random variate that follows the Lindley distribution (Lindley 1958)^{(23)} with parameter *θ* (a real number greater than 0) can be generated as follows:

- With probability
*w*=*θ*/(1+*θ*), generate an exponential random variate with a rate of*θ*via**ExpRand**or**ExpRand2**(described in my article on PSRNs) and return that number. - Otherwise, generate two exponential random variates with a rate of
*θ*via**ExpRand**or**ExpRand2**, then generate their sum by applying the**UniformAdd**algorithm, then return that sum.

For the Garima distribution (Shanker 2016)^{(24)}, *w* = (1+*θ*)/(2+*θ*).

For the i-Garima distribution (Singh and Das 2020)^{(25)}, *w* = (2+*θ*)/(3+*θ*).

For the mixture-of-weighted-exponential-and-weighted-gamma distribution in (Iqbal and Iqbal 2020)^{(26)}, two exponential random variates (rather than one) are generated in step 1, and three (rather than two) are generated in step 2.

Note:Ifθis a uniform PSRN, then the check "With probabilityw=θ/(1+θ)" can be implemented by running the Bernoulli factory algorithm for(, whered+μ) / ((d+μ) + (c+λ))cis 1;λrepresents an input coin that always returns 0;disθ's integer part, andμis an input coin that runsSampleGeometricBagonθ's fractional part. The check succeeds if the Bernoulli factory algorithm returns 1.

### Gamma Distribution

The path to building an arbitrary-precision gamma sampler makes use of two algorithms: one for integer parameters of *a*, and another for rational parameters of *a* in [0, 1). Both algorithms can be combined into an arbitrary-precision gamma generator for rational parameters *a*>0.

First is an arbitrary-precision sampler for the sum of *n* independent exponential random variates (also known as the Erlang(*n*) or gamma(*n*) distribution), implemented via partially-sampled uniform random variates. Obviously, this algorithm is inefficient for large values of *n*.

- Generate
*n*exponential random variates with a rate of 1 via the**ExpRand**or**ExpRand2**algorithm described in my article on**partially-sampled random numbers (PSRNs)**. These numbers will be uniform PSRNs; this algorithm won't work for exponential PSRNs (e-rands), described in the same article, because the sum of two e-rands may follow a subtly wrong distribution. By contrast, generating exponential random variates via rejection from the uniform distribution will allow unsampled digits to be sampled uniformly at random without deviating from the exponential distribution. - Generate the sum of the random variates generated in step 1 by applying the
**UniformAdd**algorithm given in another document.

The second algorithm takes a parameter *a*, which must be a rational number in the interval (0, 1]. Adapted from Berman's gamma generator, as given in Devroye 1986, p. 419. Because of the power-of-uniform sub-algorithm this algorithm works only if the PSRN's fractional digits are binary (zeros or ones).

- Create a positive-sign zero-integer-part uniform PSRN,
*ret*. If*a*is 1, instead generate an exponential random variate with a rate of 1 via the**ExpRand**or**ExpRand2**algorithm and return that variate. - Generate a PSRN
*ret*using the**power-of-uniform sub-algorithm**(in the page on PSRNs) with*px*/*py*= 1/*a*. - (The following two steps succeed with probability (1−
*ret*)^{1−a}.) Create an input coin that does the following: "Flip the input coin and return 1 minus the result." - Run the
**algorithm for**with*λ*^{x/y}*x*/*y*= 1−*a*, using the input coin from step 3. If the run returns 0, go to step 1. - (At this point,
*ret*is distributed as beta(*a*, 2 −*a*).) Generate two exponential random variates with a rate of 1 via**ExpRand**or**ExpRand2**, then generate their sum by applying the**UniformAdd**algorithm. Call the sum*z*. - Run the
**UniformMultiply**algorithm on*ret*and*z*, and return the result of that algorithm.

The third algorithm combines both algorithms and works for any rational parameter *a*>0.

- Let
*n*= floor(*a*). Generate*ret*, a sum of*n*exponential variates generated via the first algorithm in this section. If*n*=*a*, return*ret*. - Let
*frac*be*a*− floor(*a*). Generate*ret2*, a gamma variate generated via the second algorithm in this section, with*a*=*frac*. - Run the
**UniformAdd**algorithm on*ret*and*ret2*and return the result of that algorithm.

### One-Dimensional Epanechnikov Kernel

Adapted from Devroye and Györfi (1985, p. 236)^{(27)}.

- Generate three empty PSRNs
*a*,*b*, and*c*, with a positive sign, an integer part of 0, and an empty fractional part. - Run
**URandLess**on*a*and*c*in that order, then run**URandLess**on*b*and*c*in that order. If both runs return 1, set*c*to point to*b*. - Generate an unbiased random bit. If that bit is 1 (which will happen with probability 1/2), set
*c*'s sign to negative. - Return
*c*.

### Uniform Distribution Inside Rectellipse

The following example generates a point inside a quarter ** rectellipse** centered at (0, 0) with—

- horizontal "radius"
*d1*, which is an integer greater than 0, - vertical "radius"
*d2*, which is an integer greater than 0, and - squareness parameter
*σ*, which is a rational number in [0, 1].

Use the algorithm in "**Uniform Distribution Inside N-Dimensional Shapes**", except:

Let

**InShape**return—*YES*if (*σ***x1***y1*)^{2}/(*d1***d2*)^{2}− (*x2*/*d1*)^{2}− (*y2*/*d2*)^{2}+ 1 > 0,*NO*if (*σ***x2***y2*)^{2}/(*d1***d2*)^{2}− (*x1*/*d1*)^{2}− (*y1*/*d2*)^{2}+ 1 < 0, and*MAYBE*otherwise,

where

*x1*=*C1*/*S*,*x2*= (*C1*+1)/*S*,*y1*=*C2*/*S*, and*y2*= (*C2*+1)/*S*(these four values define the bounds, along the X and Y dimensions, of the box passed to**InShape**).

For a full rectellipse, step 5.3 in the algorithm is done for each of the two dimensions.

## Requests and Open Questions

- We would like to see new implementations of the following:
- Algorithms that implement
**InShape**for specific closed curves, specific closed surfaces, and specific signed distance functions. Recall that**InShape**determines whether a box lies inside, outside, or partly inside or outside a given curve or surface. - Descriptions of new arbitrary-precision algorithms that use the skeleton given in the section "Building an Arbitrary-Precision Sampler".

- Algorithms that implement
- The appendix contains implementation notes for
**InShape**, which determines whether a box is outside or partially or fully inside a shape. However, practical implementations of**InShape**will generally only be able to evaluate a shape pointwise. What are necessary and/or sufficient conditions that allow an implementation to correctly classify a box just by evaluating the shape pointwise? - Take a polynomial
*f*(*λ*) of even degree*n*of the form choose(*n*,*n*/2)**λ*^{n/2}*(1−*λ*)^{n/2}**k*, where*k*is greater than 1 (thus all*f*'s Bernstein coefficients are 0 except for the middle one, which equals*k*). Suppose*f*(1/2) lies in the interval (0, 1). If we do the degree elevation, described in the appendix, enough times (at least*r*times), then*f*'s Bernstein coefficients will all lie in [0, 1]. The question is: how many degree elevations are enough? A**MathOverflow answer**showed that*r*is at least*m*= (*n*/*f*(1/2)^{2})/(1−*f*(1/2)^{2}), but is it true that floor(*m*)+1 elevations are enough? A

is a finite-state machine that generates a real number's base-2 expansion such as 0.110101100..., driven by flips of a coin. A*finite-state generator**pushdown generator*is a finite-state generator with a stack of memory. Both generators produce real numbers with a given probability distribution. For example, a generator with a loop that outputs 0 or 1 at an equal chance produces a*uniform distribution*. The following questions ask what kinds of distributions are possible with these generators.- Of the probability distributions that a finite-state generator can generate, what is the exact class of:
*Discrete distributions*(those that cover a finite or countably infinite set of values)?*Absolutely continuous distributions*(those with a probability density function such as the uniform or triangular distribution)?*Singular distributions*(covering an uncountable but zero-volume set)?- Absolutely continuous distributions with
*continuous*density functions?

- Same question as 1, but for pushdown generators.
- Of the probability distributions that a pushdown generator can generate, what is the exact class of distributions with piecewise density functions whose pieces have infinitely many "slope" functions? (The answer is known for finite-state generators.)

- Of the probability distributions that a finite-state generator can generate, what is the exact class of:

## Notes

^{(1)}Fishman, D., Miller, S.J., "Closed Form Continued Fraction Expansions of Special Quadratic Irrationals", ISRN Combinatorics Vol. 2013, Article ID 414623 (2013).^{(2)}Łatuszyński, K., Kosmidis, I., Papaspiliopoulos, O., Roberts, G.O., "**Simulating events of unknown probabilities via reverse time martingales**", arXiv:0907.4018v2 [stat.CO], 2009/2011.^{(3)}choose(*n*,*k*) = (1*2*3*...**n*)/((1*...**k*)*(1*...*(*n*−*k*))) =*n*!/(*k*! * (*n*−*k*)!) is a*binomial coefficient*, or the number of ways to choose*k*out of*n*labeled items. It can be calculated, for example, by calculating*i*/(*n*−*i*+1) for each integer*i*in the interval [*n*−*k*+1,*n*], then multiplying the results (Yannis Manolopoulos. 2002. "**Binomial coefficient computation: recursion or iteration?**", SIGCSE Bull. 34, 4 (December 2002), 65–67). For every*m*>0, choose(*m*, 0) = choose(*m*,*m*) = 1 and choose(*m*, 1) = choose(*m*,*m*−1) =*m*; also, in this document, choose(*n*,*k*) is 0 when*k*is less than 0 or greater than*n*.^{(4)}It can also be said that the area under the graph of*x*− floor(1/*x*), where*x*is in the closed interval [0, 1], equals 1 minus*γ*. See, for example, Havil, J.,*Gamma: Exploring Euler's Constant*, 2003.^{(5)}Citterio, M., Pavani, R., "A Fast Computation of the Best k-Digit Rational Approximation to a Real Number", Mediterranean Journal of Mathematics 13 (2016).^{(6)}Łatuszyński, K., Kosmidis, I., Papaspiliopoulos, O., Roberts, G.O., "**Simulating events of unknown probabilities via reverse time martingales**", arXiv:0907.4018v2 [stat.CO], 2009/2011.^{(7)}Thomas, A.C., Blanchet, J., "**A Practical Implementation of the Bernoulli Factory**", arXiv:1106.2508v3 [stat.AP], 2012.^{(8)}Nacu, Şerban, and Yuval Peres. "**Fast simulation of new coins from old**", The Annals of Applied Probability 15, no. 1A (2005): 93-115.^{(9)}Goyal, V. and Sigman, K., 2012. On simulating a class of Bernstein polynomials. ACM Transactions on Modeling and Computer Simulation (TOMACS), 22(2), pp.1-5.^{(10)}Jacob, P.E., Thiery, A.H., "On nonnegative unbiased estimators", Ann. Statist., Volume 43, Number 2 (2015), 769-784.^{(11)}Duvignau, R., 2015. Maintenance et simulation de graphes aléatoires dynamiques (Doctoral dissertation, Université de Bordeaux).^{(12)}There are many distributions that can be sampled using the oracle, by first generating unbiased random bits via randomness extraction methods, but then these distributions won't use the unknown*n*in general. Duvignau proved Theorem 5.2 for an oracle that outputs*arbitrary*but still distinct items, as opposed to integers, but this case can be reduced to the integer case (see section 4.1.3).^{(13)}Mossel, Elchanan, and Yuval Peres. New coins from old: computing with unknown bias. Combinatorica, 25(6), pp.707-724, 2005.^{(14)}Flajolet, P., Pelletier, M., Soria, M., "**On Buffon machines and numbers**", arXiv:0906.5560 [math.PR], 2010^{(15)}C.T. Li, A. El Gamal, "**A Universal Coding Scheme for Remote Generation of Continuous Random Variables**", arXiv:1603.05238v1 [cs.IT], 2016^{(16)}Oberhoff, Sebastian, "**Exact Sampling and Prefix Distributions**",*Theses and Dissertations*, University of Wisconsin Milwaukee, 2018.^{(17)}Devroye, L.,, 1986.*Non-Uniform Random Variate Generation*^{(18)}Harlow, J., Sainudiin, R., Tucker, W., "Mapped Regular Pavings",*Reliable Computing*16 (2012).^{(19)}Bringmann, K. and Friedrich, T., 2013, July. "Exact and efficient generation of geometric random variates and random graphs", in*International Colloquium on Automata, Languages, and Programming*(pp. 267-278).^{(20)}Ahrens, J.H., and Dieter, U., "Computer methods for sampling from the exponential and normal distributions",*Communications of the ACM*15, 1972.^{(21)}Ghosh, A., Roughgarden, T., and Sundararajan, M., "Universally Utility-Maximizing Privacy Mechanisms",*SIAM Journal on Computing*41(6), 2012.^{(22)}Li, L., 2021. Bayesian Inference on Ratios Subject to Differentially Private Noise (Doctoral dissertation, Duke University).^{(23)}Lindley, D.V., "Fiducial distributions and Bayes' theorem",*Journal of the Royal Statistical Society Series B*, 1958.^{(24)}Shanker, R., "Garima distribution and its application to model behavioral science data",*Biom Biostat Int J.*4(7), 2016.^{(25)}Singh, B.P., Das, U.D., "**On an Induced Distribution and its Statistical Properties**", arXiv:2010.15078 [stat.ME], 2020.^{(26)}Iqbal, T. and Iqbal, M.Z., 2020. On the Mixture Of Weighted Exponential and Weighted Gamma Distribution. International Journal of Analysis and Applications, 18(3), pp.396-408.^{(27)}Devroye, L., Györfi, L.,*Nonparametric Density Estimation: The L1 View*, 1985.^{(28)}Kinderman, A.J., Monahan, J.F., "Computer generation of random variables using the ratio of uniform deviates",*ACM Transactions on Mathematical Software*3(3), pp. 257-260, 1977.^{(29)}Daumas, M., Lester, D., Muñoz, C., "**Verified Real Number Calculations: A Library for Interval Arithmetic**", arXiv:0708.3721 [cs.MS], 2007.^{(30)}Karney, C.F.F., "**Sampling exactly from the normal distribution**", arXiv:1303.6257v2 [physics.comp-ph], 2014.^{(31)}I thank D. Eisenstat from the*Stack Overflow*community for leading me to this insight.^{(32)}Brassard, G., Devroye, L., Gravel, C., "Remote Sampling with Applications to General Entanglement Simulation",*Entropy*2019(21)(92),**https://doi.org/10.3390/e21010092**.^{(33)}Devroye, L., Gravel, C., "**Random variate generation using only finitely many unbiased, independently and identically distributed random bits**", arXiv:1502.02539v6 [cs.IT], 2020.^{(34)}Keane, M. S., and O'Brien, G. L., "A Bernoulli factory",*ACM Transactions on Modeling and Computer Simulation*4(2), 1994.^{(35)}Wästlund, J., "**Functions arising by coin flipping**", 1999.^{(36)}Dale, H., Jennings, D. and Rudolph, T., 2015, "Provable quantum advantage in randomness processing",*Nature communications*6(1), pp. 1-4.^{(37)}Tsai, Yi-Feng, Farouki, R.T., "Algorithm 812: BPOLY: An Object-Oriented Library of Numerical Algorithms for Polynomials in Bernstein Form",*ACM Trans. Math. Softw.*27(2), 2001.^{(38)}Lee, A., Doucet, A. and Łatuszyński, K., 2014. "**Perfect simulation using atomic regeneration with application to Sequential Monte Carlo**", arXiv:1407.5770v1 [stat.CO].^{(39)}Icard, Thomas F., "Calibrating generative models: The probabilistic Chomsky–Schützenberger hierarchy",*Journal of Mathematical Psychology*95 (2020): 102308.^{(40)}Etessami, K. and Yannakakis, M., "Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations",*Journal of the ACM*56(1), pp.1-66, 2009.^{(41)}Dughmi, Shaddin, Jason Hartline, Robert D. Kleinberg, and Rad Niazadeh. "Bernoulli Factories and Black-box Reductions in Mechanism Design." Journal of the ACM (JACM) 68, no. 2 (2021): 1-30.^{(42)}Esparza, J., Kučera, A. and Mayr, R., 2004, July. Model checking probabilistic pushdown automata. In*Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science*, 2004. (pp. 12-21). IEEE.^{(43)}Elder, Murray, Geoffrey Lee, and Andrew Rechnitzer. "Permutations generated by a depth 2 stack and an infinite stack in series are algebraic."*Electronic Journal of Combinatorics*22(1), 2015.^{(44)}Knuth, Donald E. and Andrew Chi-Chih Yao. "The complexity of nonuniform random variate generation", in*Algorithms and Complexity: New Directions and Recent Results*, 1976.^{(45)}Vatan, F., "Distribution functions of probabilistic automata", in*Proceedings of the thirty-third annual ACM symposium on Theory of computing (STOC '01)*, pp. 684-693, 2001.^{(46)}Kindler, Guy and D. Romik, "On distributions computable by random walks on graphs,"*SIAM Journal on Discrete Mathematics*17 (2004): 624-633.^{(47)}Vatan (2001) claims that a finite-state generator has a continuous`CDF`

(unless it produces a single value with probability 1), but this is not necessarily true if the generator has a state that outputs 0 forever.^{(48)}Adamczewski, B., Cassaigne, J. and Le Gonidec, M., 2020. On the computational complexity of algebraic numbers: the Hartmanis–Stearns problem revisited. Transactions of the American Mathematical Society, 373(5), pp.3085-3115.^{(49)}Cobham, A., "On the Hartmanis-Stearns problem for a class of tag machines", in*IEEE Conference Record of 1968 Ninth Annual Symposium on Switching and Automata Theory*1968.^{(50)}Adamczewski, B., Bugeaud, Y., "On the complexity of algebraic numbers I. Expansions in integer bases",*Annals of Mathematics*165 (2007).^{(51)}Richman, F. (2012). Algebraic functions, calculus style. Communications in Algebra, 40(7), 2671-2683.

## Appendix

### Ratio of Uniforms

The Cauchy sampler given earlier demonstrates the *ratio-of-uniforms* technique for sampling a distribution (Kinderman and Monahan 1977)^{(28)}. It involves transforming the distribution's density function (PDF) into a compact shape. The ratio-of-uniforms method appears here in the appendix, particularly since it can involve calculating upper and lower bounds of transcendental functions which, while it's possible to achieve in rational arithmetic (Daumas et al., 2007)^{(29)}, is less elegant than, say, the normal distribution sampler by Karney (2014)^{(30)}, which doesn't require calculating logarithms or other transcendental functions.

This algorithm works for any univariate (one-variable) distribution as long as—

- for every
*x*,*PDF*(*x*) < ∞ and*PDF*(*x*)**x*^{2}< ∞, where*PDF*is the distribution's PDF or a function proportional to the PDF, *PDF*is continuous almost everywhere, and- either—
- the distribution's ratio-of-uniforms shape (the transformed PDF) is covered entirely by the rectangle [0, ceil(
*d1*)]×[0, ceil(*d2*)], where*d1*is not less than the highest value of*x**sqrt(*PDF*(*x*)) anywhere, and*d2*is not less than the highest value of sqrt(*PDF*(*x*)) anywhere, or - half of that shape is covered this way and the shape is symmetric about the
*v*-axis.

- the distribution's ratio-of-uniforms shape (the transformed PDF) is covered entirely by the rectangle [0, ceil(

The algorithm follows.

- Generate two empty PSRNs, with a positive sign, an integer part of 0, and an empty fractional part. Call the PSRNs
*p1*and*p2*. - Set
*S*to*base*, where*base*is the base of digits to be stored by the PSRNs (such as 2 for binary or 10 for decimal). Then set*c1*to an integer in the interval [0,*d1*), chosen uniformly at random, then set*c2*to an integer in [0,*d2*), chosen uniformly at random, then set*d*to 1. - Multiply
*c1*and*c2*each by*base*and add a digit chosen uniformly at random to that coordinate. - Run an
**InShape**function that determines whether the transformed PDF is covered by the current box. In principle, this is the case when*z*≤ 0 everywhere in the box, where*u*lies in [*c1*/*S*, (*c1*+1)/*S*],*v*lies in [*c2*/*S*, (*c2*+1)/*S*], and*z*is*v*^{2}−*PDF*(*u*/*v*).**InShape**returns*YES*if the box is fully inside the transformed PDF,*NO*if the box is fully outside it, and*MAYBE*in any other case, or if evaluating*z*fails for a given box (e.g., because ln(0) would be calculated or*v*is 0). See the next section for implementation notes. - If
**InShape**as described in step 4 returns*YES*, then do the following:- Transfer
*c1*'s least significant digits to*p1*'s fractional part, and transfer*c2*'s least significant digits to*p2*'s fractional part. The variable*d*tells how many digits to transfer to each PSRN this way. Then set*p1*'s integer part to floor(*c1*/*base*^{d}) and*p2*'s integer part to floor(*c2*/*base*^{d}). (For example, if*base*is 10,*d*is 3, and*c1*is 7342, set*p1*'s fractional part to [3, 4, 2] and*p1*'s integer part to 7.) - Run the
**UniformDivide**algorithm (described in the article on PSRNs) on*p1*and*p2*, in that order. - If the transformed PDF is symmetric about the
*v*-axis, set the resulting PSRN's sign to positive or negative with equal probability. Otherwise, set the PSRN's sign to positive. - Return the PSRN.

- Transfer
- If
**InShape**as described in step 4 returns*NO*, then go to step 2. - Multiply
*S*by*base*, then add 1 to*d*, then go to step 3.

Examples:

- For the normal distribution,
x^{2}/2), so thatzafter a logarithmic transformation (see next section) becomes 4*ln(v) + (u/v)^{2}, and since the distribution's ratio-of-uniforms shape is symmetric about thev-axis, the return value's sign is positive or negative with equal probability.- For the standard lognormal distribution (
Gibrat's distribution),x) is proportional to exp(−(ln(x))^{2}/2)/x, so thatzafter a logarithmic transformation becomes 2*ln(v)−(−ln(u/v)^{2}/2 − ln(u/v)), and the returned PSRN has a positive sign.- For the gamma distribution with shape parameter
a> 1,x) is proportional tox^{a−1}*exp(−x), so thatzafter a logarithmic transformation becomes 2*ln(v)−(a−1)*ln(u/v)−(u/v), or 0 ifuorvis 0, and the returned PSRN has a positive sign.

### Implementation Notes for Box/Shape Intersection

The "**Uniform Distribution Inside N-Dimensional Shapes**" algorithm uses a function called **InShape** to determine whether an axis-aligned box is either outside a shape, fully inside the shape, or partially inside the shape. The following are notes that will aid in developing a robust implementation of **InShape** for a particular shape, especially because the boxes being tested can be arbitrarily small.

**InShape**, as well as the divisions of the coordinates by*S*, should be implemented using rational arithmetic. Instead of dividing those coordinates this way, an implementation can pass*S*as a separate parameter to**InShape**.If the shape is convex, and the point (0, 0, ..., 0) is on or inside that shape,

**InShape**can return—*YES*if all the box's corners are in the shape;*NO*if none of the box's corners are in the shape and if the shape's boundary does not intersect with the box's boundary; and*MAYBE*in any other case, or if the function is unsure.

In the case of two-dimensional shapes, the shape's corners are (

*c1*/*S*,*c2*/*S*), ((*c1*+1)/*S*,*c2*/*S*), (*c1*,(*c2*+1)/*S*), and ((*c1*+1)/*S*, (*c2*+1)/*S*). However, checking for box/shape intersections this way is non-trivial to implement robustly, especially if interval arithmetic is not used.If the shape is given as an inequality of the form

*f*(*t1*, ...,*tN*) ≤ 0,**InShape**should use rational interval arithmetic (such as the one given in (Daumas et al., 2007)^{(29)}), where the two bounds of each interval are rational numbers with arbitrary-precision numerators and denominators. Then,**InShape**should build one interval for each dimension of the box and evaluate*f*using those intervals^{(31)}with an accuracy that increases as*S*increases. Then,**InShape**can return—*YES*if the interval result of*f*has an upper bound less than or equal to 0;*NO*if the interval result of*f*has a lower bound greater than 0; and*MAYBE*in any other case.

For example, if

*f*is (*t1*^{2}+*t2*^{2}−1), which describes a quarter disk,**InShape**should build two intervals, namely*t1*= [*c1*/*S*, (*c1*+1)/*S*] and*t2*= [*c2*/*S*, (*c2*+1)/*S*], and evaluate*f*(*t1*,*t2*) using interval arithmetic.One thing to point out, though: If

*f*calls the exp(*x*) function where*x*can potentially have a high absolute value, say 10000 or higher, the exp function can run a very long time in order to calculate proper bounds for the result, since the number of digits in exp(*x*) grows linearly with*x*. In this case, it may help to transform the inequality to its logarithmic version. For example, by applying ln(.) to each side of the inequality*y*^{2}≤ exp(−(*x*/*y*)^{2}/2), the inequality becomes 2*ln(*y*) ≤ −(*x*/*y*)^{2}/2 and thus becomes 2*ln(*y*) + (*x*/*y*)^{2}/2 ≤ 0 and thus becomes 4*ln(*y*) + (*x*/*y*)^{2}≤ 0.If the shape is such that every axis-aligned line segment that begins in one face of the hypercube and ends in another face crosses the shape at most once, ignoring the segment's endpoints (an example is an axis-aligned quarter of a circular disk where the disk's center is (0, 0)), then

**InShape**can return—*YES*if all the box's corners are in the shape;*NO*if none of the box's corners are in the shape; and*MAYBE*in any other case, or if the function is unsure.

If

**InShape**uses rational interval arithmetic, it can build an interval per dimension*per corner*, evaluate the shape for each corner individually and with an accuracy that increases as*S*increases, and treat a corner as inside or outside the shape only if the result of the evaluation clearly indicates that. Using the example of a quarter disk,**InShape**can build eight intervals, namely an*x*- and*y*-interval for each of the four corners; evaluate (*x*^{2}+*y*^{2}−1) for each corner; and return*YES*only if all four results have upper bounds less than or equal to 0,*NO*only if all four results have lower bounds greater than 0, and*MAYBE*in any other case.- If
**InShape**expresses a shape in the form of a, namely a function that describes the closest distance from any point in space to the shape's boundary, it can return—*signed distance function**YES*if the signed distance (or an upper bound of such distance) at each of the box's corners, after dividing their coordinates by*S*, is less than or equal to −*σ*(where*σ*is an upper bound for sqrt(*N*)/(*S**2), such as 1/*S*);*NO*if the signed distance (or a lower bound of such distance) at each of the box's corners is greater than*σ*; and*MAYBE*in any other case, or if the function is unsure.

**InShape**implementations can also involve a shape's*implicit curve*or*algebraic curve*equation (for closed curves), its*implicit surface*equation (for closed surfaces), or its*signed distance field*(a quantized version of a signed distance function).- An
**InShape**function can implement a set operation (such as a union, intersection, or difference) of several simpler shapes, each with its own**InShape**function. The final result depends on the set operation (such as union or intersection) as well as the result returned by each component for a given box. The following are examples of set operations:- For unions, the final result is
*YES*if any component returns*YES*;*NO*if all components return*NO*; and*MAYBE*otherwise. - For intersections, the final result is
*YES*if all components return*YES*;*NO*if any component returns*NO*; and*MAYBE*otherwise. - For differences between two shapes, the final result is
*YES*if the first shape returns*YES*and the second returns*NO*;*NO*if the first shape returns*NO*or if both shapes return*YES*; and*MAYBE*otherwise. - For the exclusive OR of two shapes, the final result is
*YES*if one shape returns*YES*and the other returns*NO*;*NO*if both shapes return*NO*or both return*YES*; and*MAYBE*otherwise.

- For unions, the final result is

### Probability Transformations

The following algorithm takes a uniform partially-sampled random number (PSRN) as a "coin" and flips that "coin" using **SampleGeometricBag** (a method described in my **article on PSRNs**). Given that "coin" and a function *f* as described below, the algorithm returns 1 with probability *f*(*U*), where *U* is the number built up by the uniform PSRN (see also (Brassard et al., 2019)^{(32)}, (Devroye 1986, p. 769)^{(17)}, (Devroye and Gravel 2020)^{(33)}. In the algorithm:

- The uniform PSRN's sign must be positive and its integer part must be 0.
For correctness,

*f*(*U*) must meet the following conditions:- If the algorithm will be run multiple times with the same PSRN,
*f*(*U*) must be the constant 0 or 1, or be continuous and polynomially bounded on the open interval (0, 1) (polynomially bounded means that both*f*(*U*) and 1−*f*(*U*) are bounded from below by min(*U*^{n}, (1−*U*)^{n}) for some integer*n*(Keane and O'Brien 1994)^{(34)}). - Otherwise,
*f*(*U*) must map the interval [0, 1] to [0, 1] and be continuous everywhere or "almost everywhere".

The first set of conditions is the same as those for the Bernoulli factory problem (see "

**About Bernoulli Factories**) and ensure this algorithm is unbiased (see also Łatuszyński et al. 2009/2011)^{(2)}.- If the algorithm will be run multiple times with the same PSRN,

The algorithm follows.

- Set
*v*to 0 and*k*to 1. - (
*v*acts as a uniform random variate in [0, 1] to compare with*f*(*U*).) Set*v*to*b***v*+*d*, where*b*is the base (or radix) of the uniform PSRN's digits, and*d*is a digit chosen uniformly at random. - Calculate an approximation of
*f*(*U*) as follows:- Set
*n*to the number of items (sampled and unsampled digits) in the uniform PSRN's fractional part. - Of the first
*n*digits (sampled and unsampled) in the PSRN's fractional part, sample each of the unsampled digits uniformly at random. Then let*uk*be the PSRN's digit expansion up to the first*n*digits after the point. - Calculate the lowest and highest values of
*f*in the interval [*uk*,*uk*+*b*^{−n}], call them*fmin*and*fmax*. If abs(*fmin*−*fmax*) ≤ 2 **b*^{−k}, calculate (*fmax*+*fmin*) / 2 as the approximation. Otherwise, add 1 to*n*and go to the previous substep.

- Set
- Let
*pk*be the approximation's digit expansion up to the*k*digits after the point. For example, if*f*(*U*) is*π*/5,*b*is 10, and*k*is 3,*pk*is 628. - If
*pk*+ 1 ≤*v*, return 0. If*pk*− 2 ≥*v*, return 1. If neither is the case, add 1 to*k*and go to step 2.

Notes:

- This algorithm is related to the Bernoulli factory problem, where the input probability is unknown. However, the algorithm doesn't exactly solve that problem because it has access to the input probability's value to some extent.
- This section appears in the appendix because this article is focused on algorithms that don't rely on calculations of irrational numbers.

### SymPy Code for Piecewise Linear Factory Functions

def bernstein_n(func, x, n, pt=None): # Bernstein operator. # Create a polynomial that approximates func, which in turn uses # the symbol x. The polynomial's degree is n and is evaluated # at the point pt (or at x if not given). if pt==None: pt=x ret=0 v=[binomial(n,j) for j in range(n//2+1)] for i in range(0, n+1): oldret=ret bino=v[i] if i<len(v) else v[n-i] ret+=func.subs(x,S(i)/n)*bino*pt**i*(1-pt)**(n-i) if pt!=x and ret==oldret and ret>0: break return ret def inflec(y,eps=S(2)/10,mult=2): # Calculate the inflection point (x) given y, eps, and mult. # The formula is not found in the paper by Thomas and # Blanchet 2012, but in # the supplemental source code uploaded by # A.C. Thomas. po=5 # Degree of y-to-x polynomial curve eps=S(eps) mult=S(mult) x=-((y-(1-eps))/eps)**po/mult + y/mult return x def xfunc(y,sym,eps=S(2)/10,mult=2): # Calculate Bernstein "control polygon" given y, # eps, and mult. return Min(sym*y/inflec(y,eps,mult),y) def calc_linear_func(eps=S(5)/10, mult=1, count=10): # Calculates the degrees and Y parameters # of a sequence of polynomials that converge # from above to min(x*mult, 1-eps). # eps must be in the interval (0, 1). # Default is 10 polynomials. polys=[] eps=S(eps) mult=S(mult) count=S(count) bs=20 ypt=1-(eps/4) x=symbols('x') tfunc=Min(x*mult,1-eps) tfn=tfunc.subs(x,(1-eps)/mult).n() xpt=xfunc(ypt,x,eps=eps,mult=mult) bits=5 i=0 lastbxn = 1 diffs=[] while i<count: bx=bernstein_n(xpt,x,bits,(1-eps)/mult) bxn=bx.n() if bxn > tfn and bxn < lastbxn: # Dominates target function #if oldbx!=None: # diffs.append(bx) # diffs.append(oldbx-bx) #oldbx=bx oldxpt=xpt lastbxn = bxn polys.append([bits,ypt]) print(" [%d,%s]," % (bits,ypt)) # Find y2 such that y2 < ypt and # bernstein_n(oldxpt,x,bits,inflec(y2, ...)) >= y2, # so that next Bernstein expansion will go # underneath the previous one while True: ypt-=(ypt-(1-eps))/4 xpt=inflec(ypt,eps=eps,mult=mult).n() bxs=bernstein_n(oldxpt,x,bits,xpt).n() if bxs>=ypt.n(): break xpt=xfunc(ypt,x,eps=eps,mult=mult) bits+=20 i+=1 else: bits=int(bits*200/100) return polys calc_linear_func(count=8)

### Derivation of My Algorithm for min(*λ*, 1/2)

The following explains how the algorithm is derived.

The function min(*λ*, 1/2) can be rewritten as *A* + *B* where—

*A*= (1/2) **λ*, and*B*= (1/2) * min(*λ*, 1−*λ*)

= (1/2) * ((1−sqrt(1−4**λ**(1−*λ*)))/2)

= (1/2) * ∑_{k = 1, 2, ...}*g*(*k*) **h*_{k}(*λ*),

revealing that the function is a **convex combination**, and *B* is itself a convex combination where—

*g*(*k*) = choose(2**k*,*k*)/((2**k*−1)*2^{2*k}), and*h*_{k}(*λ*) = (4**λ**(1−*λ*))^{k}/ 2 = (*λ**(1−*λ*))^{k}* 4^{k}/ 2

(see also Wästlund (1999)^{(35)}; Dale et al. (2015)^{(36)}). The right-hand side of *h*, which is the polynomial built in step 3 of the algorithm, is a polynomial of degree *k**2 with Bernstein coefficients—

*z*= (4^{v}/2) / choose(*v**2,*v*) at*v*=*k*, and- 0 elsewhere.

Unfortunately, *z* is generally greater than 1, so that the polynomial can't be simulated, as is, using the Bernoulli factory algorithm for **polynomials in Bernstein form**. Fortunately, the polynomial's degree can be elevated to bring the Bernstein coefficients to 1 or less (for degree elevation and other algorithms, see (Tsai and Farouki 2001)^{(37)}). Moreover, due to the special form of the Bernstein coefficients in this case, the degree elevation process can be greatly simplified. Given an even degree *d* as well as *z* (as defined above), the degree elevation is as follows:

- Set
*r*to floor(*d*/3) + 1. (This starting value is because when this routine finishes,*r*/*d*appears to converge to 1/3 as*d*gets large, for the polynomial in question.) Let*c*be choose(*d*,*d*/2). - Create a list of
*d*+*r*+1 Bernstein coefficients, all zeros. - For each integer
*i*in the interval [0,*d*+*r*]:- If
*d*/2 is in the interval [max(0,*i*−*r*), min(*d*,*i*)], set the*i*^{th}Bernstein coefficient (starting at 0) to*z***c**choose(*r*,*i*−*d*/2)* / choose(*d*+*r*,*i*).

- If
- If all the Bernstein coefficients are 1 or less, return them. Otherwise, add
*d*/2 to*r*and go to step 2.

### Sampling Distributions Using Incomplete Information: Omitted Algorithms

**Algorithm 2.** Say we have an *oracle* that produces independent random real numbers whose expected ("average") value is a known or unknown mean. The goal is now to produce non-negative random variates whose expected value is the mean of *f*(*X*), where *X* is a number produced by the oracle. This is possible whenever *f* has a finite minimum and maximum and the mean of *f*(*X*) is not less than *δ*, where *δ* is a known rational number greater than 0. The algorithm to do so follows (see Lee et al. 2014)^{(38)}:

- Let
*m*be a rational number equal to or greater than the maximum value of abs(*f*(*μ*)) anywhere. Create a*ν*input coin that does the following: "Take a number from the oracle, call it*x*. With probability abs(*f*(*x*))/*m*, return a number that is 1 if*f*(*x*) < 0 and 0 otherwise. Otherwise, repeat this process." - Use one of the
**linear Bernoulli factories**to simulate 2**ν*(2 times the*ν*coin's probability of heads), using the*ν*input coin, with*ϵ*=*δ*/*m*. If the factory returns 1, return 0. Otherwise, take a number from the oracle, call it*ξ*, and return abs(*f*(*ξ*)).

Example:An example from Lee et al. (2014)^{(38)}. Say the oracle produces uniform random variates in [0, 3*π], and letf(ν) = sin(ν). Then the mean off(X) is 2/(3*π), which is greater than 0 and found in SymPy by`sympy.stats.E(sin(sympy.stats.Uniform('U',0,3*pi)))`

, so the algorithm can produce non-negative random variates whose expected ("average") value is that mean.

Notes:

- Averaging to the mean of
f(X) (that is,E[f(X)] whereE[.] means expected or average value) is not the same as averaging tof(μ) whereμis the mean of the oracle's numbers (that is,f(E[X])). For example, ifXis 0 or 1 with equal probability, andf(ν) = exp(−ν), thenE[f(X)] = exp(0) + (exp(−1) − exp(0))*(1/2), andf(E[X]) =f(1/2) = exp(−1/2).(Lee et al. 2014, Corollary 4)

^{(38)}: Iff(μ) is known to return only values in the interval [a,c], the mean off(X) is not less thanδ,δ>b, andδandbare known numbers, then Algorithm 2 can be modified as follows:

- Use
f(ν) =f(ν) −b, and useδ=δ−b.mis taken as max(b−a,c−b).- When Algorithm 2 finishes, add
bto its return value.- The check "With probability abs(
f(x))/m" is exact if the oracle produces only rational numbersandiff(x) outputs only rational numbers. If the oracle orfcan produce irrational numbers (such as numbers that follow a beta distribution or another continuous distribution), then this check should be implemented using uniformpartially-sampled random numbers (PSRNs).

**Algorithm 3.** Say we have an *oracle* that produces independent random real numbers that are all greater than or equal to *a* (which is a known rational number), whose mean (*μ*) is unknown. The goal is to use the oracle to produce non-negative random variates with mean *f*(*μ*). This is possible only if *f* is 0 or greater everywhere in the interval [*a*, *∞*) and is nondecreasing in that interval (Jacob and Thiery 2015)^{(10)}. This can be done using the algorithm below. In the algorithm:

*f*(*μ*) must be a function that can be written as the following infinite series expansion:*c*[0]**z*^{0}+*c*[1]**z*^{1}+ ..., where*z*=*μ*−*a*and all*c*[*i*] are 0 or greater.*ψ*is a rational number close to 1, such as 95/100. (The exact choice is arbitrary and can be less or greater for efficiency purposes, but must be greater than 0 and less than 1.)

The algorithm follows.

- Set
*ret*to 0,*prod*to 1,*k*to 0, and*w*to 1. (*w*is the probability of generating*k*or more random variates in a single run of the algorithm.) - If
*k*is greater than 0: Take a number from the oracle, call it*x*, and multiply*prod*by*x*−*a*. - Add
*c*[*k*]**prod*/*w*to*ret*. - Multiply
*w*by*ψ*and add 1 to*k*. - With probability
*ψ*, go to step 2. Otherwise, return*ret*.

Now, assume the oracle's numbers are all less than or equal to *b* (rather than greater than or equal to *a*), where *b* is a known rational number. Then *f* must be 0 or greater everywhere in (−*∞*, *b*] and be nonincreasing there (Jacob and Thiery 2015)^{(10)}, and the algorithm above can be used with the following modifications: (1) In the note on the infinite series, *z* = *b* −*μ*; (2) in step 2, multiply *prod* by *b* − *x* rather than *x* − *a*.

Note:This algorithm is exact if the oracle produces only rational numbersandif allc[i] are rational numbers. If the oracle can produce irrational numbers, then they should be implemented using uniform PSRNs. See also note 3 on Algorithm 2.

### Pushdown Automata and Algebraic Functions

This section has mathematical proofs showing which kinds of algebraic functions (functions that can be a solution of a system of polynomial equations) can be simulated with a pushdown automaton (a state machine with a stack).

The following summarizes what can be established about these algebraic functions:

- sqrt(
*λ*) can be simulated. - Every rational function with rational coefficients that maps the open interval (0, 1) to (0, 1) can be simulated.
- If
*f*(*λ*) can be simulated, so can any Bernstein-form polynomial in the variable*f*(*λ*) with coefficients that can be simulated. - If
*f*(*λ*) and*g*(*λ*) can be simulated, so can*f*(*λ*)**g*(*λ*),*f*(*g*(*λ*)), and*g*(*f*(*λ*)). - If a full-domain pushdown automaton (defined later) can generate words of a given length with a given probability (a
*probability distribution*of word lengths), then the probability generating function for that distribution can be simulated, as well as for that distribution conditioned on a finite set or periodic infinite set of word lengths (e.g., all odd word lengths only). - If a stochastic context-free grammar (defined later) can generate a probability distribution of word lengths, and terminates with probability 1, then the probability generating function for that distribution can be simulated.
- Every quadratic irrational in (0, 1) can be simulated.

It is not yet known whether the following functions can be simulated: *λ*^{1/p} for prime numbers *p* greater than 2, or min(*λ*, 1−*λ*).

The following definitions are used in this section:

A

*pushdown automaton*has a finite set of*states*and a finite set of*stack symbols*, one of which is called EMPTY, and takes a biased coin. It starts at a given state and its stack starts with EMPTY. On each iteration:- The automaton flips the coin.
- Based on the coin flip (HEADS or TAILS), the current state, and the top stack symbol, it moves to a new state (or keeps it unchanged) and replaces the top stack symbol with zero, one or two symbols. Thus, there are three kinds of
*transition rules*:- (
*state*,*flip*,*symbol*) → (*state2*, {*symbol2*}): move to*state2*, replace top stack symbol with same or different one. - (
*state*,*flip*,*symbol*) → (*state2*, {*symbol2*,*new*}): move to*state2*, replace top stack symbol with*symbol2*, then*push*a new symbol (*new*) onto the stack. - (
*state*,*flip*,*symbol*) → (*state2*, {}): move to*state2*,*pop*the top symbol from the stack.

- (

When the stack is empty, the machine stops, and returns either 0 or 1 depending on the state it ends up at. (Because each left-hand side has no more than one possible transition, the automaton is

*deterministic*.)A

*full-domain pushdown automaton*means a pushdown automaton that terminates with probability 1 given a coin with probability of heads*λ*, for every*λ*in the open interval (0, 1).**PDA**is the class of functions that map the open interval (0, 1) to (0, 1) and can be simulated by a full-domain pushdown automaton.**PDA**also includes the constant functions 0 and 1.**ALGRAT**is the class of functions that map the open interval (0, 1) to (0, 1), are continuous, and are algebraic over the rational numbers (they satisfy a nonzero polynomial system whose coefficients are rational numbers).**ALGRAT**also includes the constant functions 0 and 1.- A
*probability generating function*has the form*p*_{0}**λ*^{0}+*p*_{1}**λ*^{1}+ ..., where*p*_{i}(a*coefficient*) is the probability of getting*i*.

Notes:

- Mossel and Peres (2005)
^{(13)}defined pushdown automata to start with a non-empty stack ofarbitrarysize, and to allow each rule to replace the top symbol with anarbitrarynumber of symbols. Both cases can be reduced to the definition in this section.- Pushdown automata, as defined here, are very similar to so-called
probabilistic right-linear indexed grammars(Icard 2020)^{(39)}and can be translated to those grammars as well as toprobabilistic pushdown systems(Etessami and Yannakakis 2009)^{(40)}, as long as those grammars and systems use only transition probabilities that are rational numbers.

**Proposition 0** (Mossel and Peres 2005^{(39)}, Theorem 1.2): *A full-domain pushdown automaton can simulate a function that maps (0, 1) to (0, 1) only if the function is in class ALGRAT.*

It is not known whether **ALGRAT** and **PDA** are equal, but the following can be established about **PDA**:

**Lemma 1A:** *Let g(λ) be a function in the class PDA, and suppose a pushdown automaton F has two rules of the form ( state, HEADS, stacksymbol) → RHS1 and (state, TAILS, stacksymbol) → RHS2, where state and stacksymbol are a specific state/symbol pair among the left-hand sides of F's rules. Then there is a pushdown automaton that transitions to RHS1 with probability g(λ) and to RHS2 with probability 1−g(λ) instead.*

*Proof:* If RHS1 and RHS2 are the same, then the conclusion holds and nothing has to be done. Thus assume RHS1 and RHS2 are different.

Let *G* be the full-domain pushdown automaton for *g*. Assume that machines *F* and *G* stop when they pop EMPTY from the stack. If this is not the case, transform both machines by renaming the symbol EMPTY to EMPTY′′, adding a new symbol EMPTY′′ and new starting state X0, and adding rules (X0, *flip*, EMPTY) → (*start*, {EMPTY′′}) and rule (*state*, *flip*, EMPTY) → (*state*, {}) for all states other than X0, where *start* is the starting state of *F* or *G*, as the case may be.

Now, rename each state of *G* as necessary so that the sets of states of *F* and of *G* are disjoint. Then, add to *F* a new stack symbol EMPTY′ (or a name not found in the stack symbols of G, as the case may be). Then, for the following two pairs of rules in *F*, namely—

(*state*, HEADS, *stacksymbol*) → (*state2heads*, *stackheads*), and

(*state*, TAILS, *stacksymbol*) → (*state2tails*, *stacktails*),

add two new states *state*_{0} and *state*_{1} that correspond to *state* and have names different from all other states, and replace that rule with the following rules:

(*state*, HEADS, *stacksymbol*) → (*gstart*, {*stacksymbol*, EMPTY′}),

(*state*, TAILS, *stacksymbol*) → (*gstart*, {*stacksymbol*, EMPTY′}),

(*state*_{0}, HEADS, *stacksymbol*) → (*state2heads*, *stackheads*),

(*state*_{0}, TAILS, *stacksymbol*) → (*state2heads*, *stackheads*),

(*state*_{1}, HEADS, *stacksymbol*) → (*state2tails*, *stacktails*), and

(*state*_{1}, TAILS, *stacksymbol*) → (*state2tails*, *stacktails*),

where *gstart* is the starting state for *G*, and copy the rules of the automaton for *G* onto *F*, but with the following modifications:

- Replace the symbol EMPTY in
*G*with EMPTY′. - Replace each state in
*G*with a name distinct from all other states in*F*. - Replace each rule in
*G*of the form (*state*,*flip*, EMPTY′) → (*state2*, {}), where*state2*is a final state of*G*associated with output 1, with the rule (*state*,*flip*, EMPTY′) → (*state*_{1}, {}). - Replace each rule in
*G*of the form (*state*,*flip*, EMPTY′) → (*state2*, {}), where*state2*is a final state of*G*associated with output 0, with the rule (*state*,*flip*, EMPTY′) → (*state*_{0}, {}).

Then, the final states of the new machine are the same as those for the original machine *F*. □

**Lemma 1B:** *There are pushdown automata that simulate the probabilities 0 and 1.*

*Proof:* The probability 0 automaton has the rules (START, HEADS, EMPTY) → (START, {}) and (START, TAILS, EMPTY) → (START, {}), and its only state START is associated with output 0. The probability 1 automaton is the same, except START is associated with output 1. Both automata obviously terminate with probability 1. □

Because of Lemma 1A, it's possible to label each left-hand side of a pushdown automaton's rules with not just HEADS or TAILS, but also a rational number or another function in **PDA**, as long as for each state/symbol pair, the probabilities for that pair sum to 1. For example, rules like the following are now allowed:

(START, 1/2, EMPTY) → ..., (START, sqrt(*λ*)/2, EMPTY) → ..., (START, (1 − sqrt(*λ*))/2, EMPTY) → ....

**Proposition 1A:** *If f(λ) is in the class PDA, then so is every polynomial written as—*

∑_{i = 0, ..., n} choose(*n*, *i*) * *f*(*λ*)^{i} * (1 − *f*(*λ*))^{n − i} * *a*[*i*],

*where n is the polynomial's degree and a[i] is a function in the class PDA.*

*Proof Sketch*: This corresponds to a two-stage pushdown automaton that follows the algorithm of Goyal and Sigman (2012)^{(9)}: The first stage counts the number of "heads" shown when flipping the f(λ) coin, and the second stage flips another coin that has success probability *a*[*i*], where *i* is the number of "heads". The automaton's transitions take advantage of Lemma 1A. □

**Proposition 1:** *If f(λ) and g(λ) are functions in the class PDA, then so is their product, namely f(λ)*g(λ).*

*Proof:* Special case of Proposition 1A with *n*=1, *f*(*λ*)=*f*(*λ*), *a*[0]=0 (using Lemma 1B), and *a*[1]=*g*(*λ*). □

**Corollary 1A:** *If f(λ), g(λ), and h(λ) are functions in the class PDA, then so is f(λ)*g(λ) + (1−f(λ))*h(λ).*

*Proof:* Special case of Proposition 1A with *n*=1, *f*(*λ*)=*f*(*λ*), *a*[0]=*h*(*λ*), and *a*[1]=*g*(*λ*). □

**Proposition 2:** *If f(λ) and g(λ) are functions in the class PDA, then so is their composition, namely f(g(λ)) or f∘g(λ).*

*Proof:* Let *F* be the full-domain pushdown automaton for *f*. For each state/symbol pair among the left-hand sides of *F*'s rules, apply Lemma 1A to the automaton *F*, using the function *g*. Then the new machine *F* terminates with probability 1 because the original *F* and the original automaton for *g* do for every *λ* in (0, 1), and because the automaton for *g* maps to (0, 1) where *F* terminates with probability 1. Moreover, *f* is in class **PDA** by Theorem 1.2 of (Mossel and Peres 2005)^{(13)} because the machine is a full-domain pushdown automaton. □

**Proposition 3:** *Every rational function with rational coefficients that maps (0, 1) to (0, 1) is in class PDA.*

*Proof:* These functions can be simulated by a finite-state machine (Mossel and Peres 2005)^{(13)}. This corresponds to a full-domain pushdown automaton that has no stack symbols other than EMPTY, never pushes symbols onto the stack, and pops the only symbol EMPTY from the stack whenever it transitions to a final state of the finite-state machine. □

Note:An unbounded stack size is necessary for a pushdown automaton to simulate functions that a finite-state machine can't. With a bounded stack size, there is a finite-state machine where each state not only holds the pushdown automaton's original state, but also encodes the contents of the stack (which is possible because the stack's size is bounded); each operation that would push, pop, or change the top symbol transitions to a state with the appropriate encoding of the stack instead.

**Proposition 4:** *If a full-domain pushdown automaton can generate words with the same letter such that the length of each word follows a probability distribution, then that distribution's probability generating function is in class PDA.*

*Proof:* Let *F* be a full-domain pushdown automaton. Add one state FAILURE, then augment *F* with a special "letter-generating" operation as follows. Add the following rule that pops all symbols from the stack:

(FAILURE, *flip*, *stacksymbol*) → (FAILURE, {}),

and for each rule of the following form that transitions to a letter-generating operation (where S and T are arbitrary states):

(S, *flip*, *stacksymbol*) → (T, *newstack*),

add another state S′ (with a name that differs from all other states) and replace that rule with the following rules:

(S, *flip*, *stacksymbol*) → (S′, {*stacksymbol*}),

(S′, HEADS, *stacksymbol*) → (T, *newstack*), and

(S′, TAILS, *stacksymbol*) → (FAILURE, {}).

Then if the stack is empty upon reaching the FAILURE state, the result is 0, and if the stack is empty upon reaching any other state, the result is 1. By (Dughmi et al. 2021)^{(41)}, the machine now simulates the distribution's probability generating function. Moreover, the function is in class **PDA** by Theorem 1.2 of (Mossel and Peres 2005)^{(13)} because the machine is a full-domain pushdown automaton. □

Define a *stochastic context-free grammar* as follows. The grammar consists of a finite set of *nonterminals* and a finite set of *letters*, and rewrites one nonterminal (the starting nonterminal) into a word. The grammar has three kinds of rules (in generalized Chomsky Normal Form (Etessami and Yannakakis 2009)^{(40)}):

*X*→*a*(rewrite*X*to the letter*a*).*X*→_{p}(*a*,*Y*) (with rational probability*p*, rewrite*X*to the letter*a*followed by the nonterminal*Y*). For the same left-hand side, all the*p*must sum to 1.*X*→ (*Y*,*Z*) (rewrite*X*to the nonterminals*Y*and*Z*in that order).

Instead of a letter (such as *a*), a rule can use *ε* (the empty string). (The grammar is *context-free* because the left-hand side has only a single nonterminal, so that no context from the word is needed to parse it.)

**Proposition 5:** *Every stochastic context-free grammar can be transformed into a pushdown automaton. If the automaton is a full-domain pushdown automaton and the grammar has a one-letter alphabet, the automaton can generate words such that the length of each word follows the same distribution as the grammar, and that distribution's probability generating function is in class PDA.*

*Proof Sketch:* In the equivalent pushdown automaton:

*X*→*a*becomes the two rules—

(START, HEADS,*X*) → (*letter*, {}), and

(START, TAILS,*X*) → (*letter*, {}).

Here,*letter*is either START or a unique state in*F*that "detours" to a letter-generating operation for*a*and sets the state back to START when finished (see Proposition 4). If*a*is*ε*,*letter*is START and no letter-generating operation is done.*X*→_{pi}(*a*_{i},*Y*_{i}) (all rules with the same nonterminal*X*) are rewritten to enough rules to transition to a letter-generating operation for*a*_{i}, and swap the top stack symbol with*Y*_{i}, with probability*p*_{i}, which is possible with just a finite-state machine (see Proposition 4) because all the probabilities are rational numbers (Mossel and Peres 2005)^{(13)}. If*a*_{i}is*ε*, no letter-generating operation is done.*X*→ (*Y*,*Z*) becomes the two rules—

(START, HEADS,*X*) → (START, {*Z*,*Y*}), and

(START, TAILS,*X*) → (START, {*Z*,*Y*}).

Here, *X* is the stack symbol EMPTY if *X* is the grammar's starting nonterminal. Now, assuming the automaton is full-domain, the rest of the result follows easily. For a single-letter alphabet, the grammar corresponds to a system of polynomial equations, one for each rule in the grammar, as follows:

*X*→*a*becomes*X*= 1 if*a*is the empty string (*ε*), or*X*=*λ*otherwise.- For each nonterminal
*X*, all*n*rules of the form*X*→_{pi}(*a*_{i},*Y*_{i}) become the equation*X*=*p*_{1}**λ*_{1}**Y*_{1}+*p*_{2}**λ*_{2}**Y*_{2}+ ... +*p*_{n}**λ*_{n}**Y*_{n}, where*λ*_{i}is either 1 if*a*_{i}is*ε*, or*λ*otherwise. *X*→ (*Y*,*Z*) becomes*X*=*Y***Z*.

Solving this system for the grammar's starting nonterminal, and applying Proposition 4, leads to the *probability generating function* for the grammar's word distribution. (See also Flajolet et al. 2010^{(14)}, Icard 2020^{(39)}.) □

Example:The stochastic context-free grammar—X→_{1/2}(a,X1),X1→ (X,X2),X2→ (X,X),X→_{1/2}(a,X3),X3→ε,

which encodes ternary trees (Flajolet et al. 2010)^{(14)}, corresponds to the equationX= (1/2) *λ*X*X*X+ (1/2)*λ*1, and solving this equation forXleads to the probability generating function for such trees, which is a complicated expression.

Notes:

A stochastic context-free grammar in which all the probabilities are 1/2 is called a

binary stochastic grammar(Flajolet et al. 2010^{(14)}). If the starting nonterminal hasnrules of probability 1/n, then the grammar can be called an "n-ary stochastic grammar". It is even possible for a nonterminal to have two rules of probabilityλand (1−λ), which are used when the input coin returns 1 (HEADS) or 0 (TAILS), respectively.If a pushdown automaton simulates the function

f(λ), thenfcorresponds to a special system of equations, built as follows (Mossel and Peres 2005)^{(13)}; see also (Esparza et al. 2004)^{(42)}. For each state of the automaton (call the stateen), include the following equations in the system based on the automaton's transition rules:

- (
st,p,sy) → (s2, {}) becomes eitherα_{st,sy,en}=pifs2isen, orα_{st,sy,en}= 0 otherwise.- (
st,p,sy) → (s2, {sy1}) becomesα_{st,sy,en}=p*α_{s2,sy1,en}.- (
st,p,sy) → (s2, {sy1,sy2}) becomesα_{st,sy,en}=p*α_{s2,sy2,σ[1]}*α_{σ[1],sy1,en}+ ... +p*α_{s2,sy2,σ[n]}*α_{σ[n],sy1,en}, whereσ[i]is one of the machine'snstates.(Here,

pis the probability of using the given transition rule; the special value HEADS becomesλ, and the special value TAILS becomes 1−λ.) Now, each time multiple equations have the same left-hand side, combine them into one equation with the same left-hand side, but with the sum of their right-hand sides. Then, for any variable of the formα_{a,b,c}not yet present in the system, include the equationα_{a,b,c}= 0. Then, for each final statefsthat returns 1, solve the system for the variableα_{START,EMPTY,fs}(where START is the automaton's starting state) to get a solution (a function) that maps (0, 1) to (0, 1). (Each solve can produce multiple solutions, but only one of them will map (0, 1) to (0, 1) assuming everypis either HEADS or TAILS.) Finally, add all the solutions to getf(λ).Assume there is a pushdown automaton (

F) that follows Definition 1 except it uses a set ofNinput letters (and not simply HEADS or TAILS), accepts an input word if the stack is empty, and rejects the word if the machine reaches a configuration without a transition rule. Then a pushdown automaton in the full sense of Definition 1 (G) can be built. In essence:

- Add a new FAILURE state, which when reached, pops all symbols from the stack.
- For each pair (
state,stacksymbol) forF, add a set of rules that generate one of the input letters (each letterigenerated with probabilityf_{ i}(λ), which must be a function inPDA), then use the generated letter to perform the transition stated in the corresponding rule forF. If there is no such transition, transition to the FAILURE state instead.- When the stack is empty, output 0 if
Gis in the FAILURE state, or 1 otherwise.Then

Greturns 1 with the same probability asFaccepts an input word with letters randomly generated as in the second step. Also, one of theNletters can be a so-called "end-of-string" symbol, so that a pushdown automaton can be built that accepts "empty strings"; an example is (Elder et al. 2015)^{(43)}.

**Proposition 6:** *If a full-domain pushdown automaton can generate a distribution of words with the same letter, there is a full-domain pushdown automaton that can generate a distribution of such words conditioned on—*

*a finite set of word lengths, or**a periodic infinite set of word lengths.*

One example of a finite set of word lengths is {1, 3, 5, 6}, where only words of length 1, 3, 5, or 6 are allowed. A *periodic infinite set* is defined by a finite set of integers such as {1}, as well as an integer modulus such as 2, so that in this example, all integers congruent to 1 modulo 2 (that is, all odd integers) are allowed word lengths and belong to the set.

*Proof Sketch:*

As in Lemma 1A, assume that the automaton stops when it pops EMPTY from the stack. Let

*S*be the finite set (e.g., {1, 3, 5, 6}), and let*M*be the maximum value in the finite set. For each integer*i*in [0,*M*], make a copy of the automaton and append the integer*i*to the name of each of its states. Combine the copies into a new automaton*F*, and let its start state be the start state for copy 0. Now, whenever*F*generates a letter, instead of transitioning to the next state after the letter-generating operation (see Proposition 4), transition to the corresponding state for the next copy (e.g., if the operation would transition to copy 2's version of "XYZ", namely "2_XYZ", transition to "3_XYZ" instead), or if the last copy is reached, transition to the last copy's FAILURE state. If*F*would transition to a failure state corresponding to a copy not in*S*(e.g., "0_FAILURE", "2_FAILURE", "3_FAILURE" in this example), first all symbols other than EMPTY are popped from the stack and then*F*transitions to its start state (this is a so-called "rejection" operation). Now, all the final states (except FAILURE states) for the copies corresponding to the values in*S*(e.g., copies 1, 3, 5, 6 in the example) are treated as returning 1, and all other states are treated as returning 0.Follow (1), except as follows: (A)

*M*is equal to the integer modulus minus 1. (B) For the last copy of the automaton, instead of transitioning to the next state after the letter-generating operation (see Proposition 4), transition to the corresponding state for copy 0 of the automaton. □

**Proposition 7:** *Every constant function equal to a quadratic irrational number in the interval (0, 1) is in class PDA.*

A *continued fraction* is one way to write a real number. For purposes of the following proof, every real number in (0, 1) has the following *continued fraction expansion*: 0 + 1 / (*a*[1] + 1 / (*a*[2] + 1 / (*a*[3] + ... ))), where each *a*[*i*], a *partial denominator*, is an integer greater than 0. A *quadratic irrational number* is an irrational number of the form (*b*+sqrt(*c*))/*d*, where *b*, *c*, and *d* are rational numbers.

*Proof:* By Lagrange's continued fraction theorem, every quadratic irrational number has a continued fraction expansion that is eventually periodic; the expansion can be described using a finite number of partial denominators, the last "few" of which repeat forever. The following example describes a periodic continued fraction expansion: [0; 1, 2, (5, 4, 3)], which is the same as [0; 1, 2, 5, 4, 3, 5, 4, 3, 5, 4, 3, ...]. In this example, the partial denominators are the numbers after the semicolon; the size of the period (`(5, 4, 3)`

) is 3; and the size of the non-period (`1, 2`

) is 2.

Given a periodic expansion, and with the aid of an algorithm for simulating **continued fractions**, a recursive Markov chain for the expansion (Etessami and Yannakakis 2009)^{(40)} can be described as follows. The chain's components are all built on the following template. The template component has one entry E, one inner node N, one box, and two exits X0 and X1. The box has one *call port* as well as two *return ports* B0 and B1.

- From E: Go to N with probability
*x*, or to the box's call port with probability 1 −*x*. - From N: Go to X1 with probability
*y*, or to X0 with probability 1 −*y*. - From B0: Go to E with probability 1.
- From B1: Go to X0 with probability 1.

Let *p* be the period size, and let *n* be the non-period size. Now the recursive Markov chain to be built has *n*+*p* components:

- For each
*i*in [1,*n*+1], there is a component labeled*i*. It is the same as the template component, except*x*=*a*[*i*]/(1 +*a*[*i*]), and*y*= 1/*a*[*i*]. The component's single box goes to the component labeled*i*+1,*except*that for component*n*+*p*, the component's single box goes to the component labeled*n*+1.

According to Etessami and Yannakakis (2009)^{(40)}, the recursive Markov chain can be translated to a pushdown automaton of the kind used in this section. Now all that's left is to argue that the recursive Markov chain terminates with probability 1. For every component in the chain, it goes from its entry to its box with probability 1/2 or less (because each partial numerator must be 1 or greater). Thus, the component is never more likely than not to recurse, and there are otherwise no probability-1 loops in each component, so the overall chain terminates with probability 1. □

**Lemma 1:** *The square root function sqrt(λ) is in class PDA.*

*Proof:* See (Mossel and Peres 2005)^{(13)}. □

**Corollary 1:** *The function f(λ) = λ ^{m/(2n)}, where n ≥ 1 is an integer and where m ≥ 1 is an integer, is in class PDA.*

*Proof:* Start with the case *m*=1. If *n* is 1, write *f* as sqrt(*λ*); if *n* is 2, write *f* as sqrt∘sqrt(*λ*); and for general *n*, write *f* as sqrt∘sqrt∘...∘sqrt(*λ*), with *n* instances of sqrt. Because this is a composition and sqrt can be simulated by a full-domain pushdown automaton, so can *f*.

For general *m* and *n*, write *f* as (sqrt∘sqrt∘...∘sqrt(*λ*))^{m}, with *n* instances of sqrt. This involves doing *m* multiplications of sqrt∘sqrt∘...∘sqrt, and because this is an integer power of a function that can be simulated by a full-domain pushdown automaton, so can *f*.

Moreover, *f* is in class **PDA** by Theorem 1.2 of (Mossel and Peres 2005)^{(13)} because the machine is a full-domain pushdown automaton. □

#### Finite-State and Pushdown Generators

Another interesting class of machines (called *pushdown generators* here) are similar to pushdown automata (see above), with the following exceptions:

- Each transition rule can also, optionally, output a base-
*N*digit in its right-hand side. An example is: (*state*,*flip*,*sy*) → (*digit*,*state2*, {*sy2*}). - The machine must output infinitely many digits if allowed to run forever.
- Rules that would pop the last symbol from the stack are not allowed.

The "output" of the machine is now a real number *X* in the interval [0, 1], in the form of the base-*N* digit expansion `0.dddddd...`

, where `dddddd...`

are the digits produced by the machine from left to right. In the rest of this section:

`CDF(z)`

is the cumulative distribution function of*X*, or the probability that*X*is*z*or less.`PDF(z)`

is the probability density function of*X*, or the "slope" function of`CDF(z)`

, or the relative probability of choosing a number "close" to*z*at random.

A *finite-state generator* (Knuth and Yao 1976)^{(44)} is the special case where the probability of heads is 1/2, each digit is either 0 or 1, rules can't push stack symbols, and only one stack symbol is used. Then if `PDF(z)`

has infinitely many "slope" functions on the open interval (0, 1), it must be a polynomial with rational coefficients and not equal 0 at any irrational point on (0, 1) (Vatan 2001)^{(45)}, (Kindler and Romik 2004)^{(46)}, and it can be shown that the mean of *X* must be a rational number. ^{(47)}

**Proposition 8.** *Suppose a finite-state generator can generate a probability distribution that takes on finitely many values. Then:*

*Each value occurs with a rational probability.**Each value is either rational or transcendental.*

A real number is *transcendental* if it can't be a root of a nonzero polynomial with integer coefficients. Thus, part 2 means, for example, that irrational, non-transcendental numbers such as 1/sqrt(2) and the golden ratio minus 1 can't be generated exactly.

Proving this proposition involves the following lemma, which shows that a finite-state generator is related to a machine with a one-way read-only input and a one-way write-only output:

**Lemma 2.** *A finite-state generator can fit the model of a one-way transducer-like k-machine (as defined in Adamczewski et al. (2020) ^{(48)} section 5.3), for some k equal to 2 or greater.*

*Proof Sketch:* There are two cases.

Case 1: If every transition rule of the generator outputs a digit, then *k* is the number of unique inputs among the generator's transition rules (usually, there are two unique inputs, namely HEADS and TAILS), and the model of a finite-state generator is modified as follows:

- A
*configuration*of the finite-state generator consists of its current state together with either the last coin flip result or, if the coin wasn't flipped yet, the empty string. - The
*output function*takes a configuration described above and returns a digit. If the coin wasn't flipped yet, the function returns an arbitrary digit (which is not used in proposition 4.6 of the Adamczewski paper).

Case 2: If at least one transition rule does not output a digit, then the finite-state generator can be transformed to a machine where HEADS/TAILS is replaced with 50% probabilities, then transformed to an equivalent machine whose rules always output one or more digits, as claimed in Lemma 5.2 of Vatan (2001)^{(45)}. In case the resulting generator has rules that output more than one digit, additional states and rules can be added so that the generator's rules output only one digit as desired. Now at this point the generator's probabilities will be rational numbers. Now transform the generator from probabilities to inputs of size *k*, where *k* is the product of those probabilities, by adding additional rules as desired. □

*Proof of Proposition 8:* Let *n* be an integer greater than 0. Take a finite-state generator that starts at state START and branches to one of *n* finite-state generators (sub-generators) with some probability, which must be rational because the overall generator is a finite-state machine (Icard 2020, Proposition 13)^{(39)}. The branching process outputs no digit, and part 3 of the proposition follows from Corollary 9 of Icard (2020)^{(39)}. The *n* sub-generators are special; each of them generates the binary expansion of a single real number in [0, 1] with probability 1.

To prove part 2 of the proposition, translate an arbitrary finite-state generator to a machine described in Lemma 2. Once that is done, all that must be shown is that there are two different non-empty sequences of coin flips that end up at the same configuration. This is easy using the pigeonhole principle, since the finite-state generator has a finite number of configurations. Thus, by propositions 5.11, 4.6, and AB of Adamczewski et al. (2020)^{(48)}, the generator can generate a real number's binary expansion only if that number is rational or transcendental (see also Cobham (1968)^{(49)}; Adamczewski and Bugeaud (2007)^{(50)}). □

**Proposition 9.** *If the distribution function generated by a finite-state generator is continuous and algebraic on the open interval (0, 1), then that function is a piecewise polynomial function.*

The proof follows from combining Kindler and Romik (2004, Theorem 2)^{(46)} and Knuth and Yao (1976)^{(44)} with Richman (2012)^{(51)}, who proved that a continuous algebraic function on an open interval is piecewise analytic ("analytic" is a stronger statement than having infinitely many "slope" functions).

## License

Any copyright to this page is released to the Public Domain. In case this is not possible, this page is also licensed under **Creative Commons Zero**.