# Supplemental Notes for Bernoulli Factory Algorithms

Peter Occil

## General Factory Functions

As a reminder, the Bernoulli factory problem is: We're given a coin that shows heads with an unknown probability, λ, and the goal is to use that coin (and possibly also a fair coin) to build a "new" coin that shows heads with a probability that depends on λ, call it f(λ).

The algorithm for general factory functions, described in my main article on Bernoulli factory algorithms, works by building randomized upper and lower bounds for a function f(λ), based on flips of the input coin. Roughly speaking, the algorithm works as follows:

1. Generate a uniform(0, 1) random variate, U.
2. Flip the input coin, then build an upper and lower bound for f(λ), based on the outcomes of the flips so far.
3. If U is less than or equal to the lower bound, return 1. If U is greater than the upper bound, return 0. Otherwise, go to step 2.

These randomized upper and lower bounds come from two sequences of polynomials as follows:

1. One sequence approaches the function f(λ) from above, the other from below, and both sequences must converge to f.
2. For each sequence, the first polynomial has degree 1 (so is a linear function), and each other polynomial's degree is 1 higher than the previous.
3. The consistency requirement must be met: The difference—

• between the degree-(n−1) upper polynomial and the degree-n upper polynomial, and
• between the degree-n lower polynomial and the degree-(n−1) lower polynomial,

must have non-negative coefficients, once the polynomials are rewritten in Bernstein form and elevated to degree n.

The consistency requirement ensures that the upper polynomials "decrease" and the lower polynomials "increase". Unfortunately, the reverse is not true in general; even if the upper polynomials "decrease" and the lower polynomials "increase" to f, this does not ensure the consistency requirement by itself. Examples of this fact are shown in the section "Schemes That Don't Work" later in this document.

In this document, fbelow(n, k) and fabove(n, k) mean the kth coefficient for the lower or upper degree-n polynomial in Bernstein form, respectively, where k is an integer in the interval [0, n].

### Approximation Schemes

A factory function f(λ) is a function for which the Bernoulli factory problem can be solved (see "About Bernoulli Factories"). The following are approximation schemes for f if it belongs to one of certain classes of factory functions. It would be helpful to plot the desired function f using a computer algebra system to see if it belongs to any of the classes of functions described below.

Concave functions. If f is known to be concave in the interval [0, 1] (which roughly means that its rate of growth there never goes up), then fbelow(n, k) can equal f(k/n), thanks to Jensen's inequality.

Convex functions. If f is known to be convex in the interval [0, 1] (which roughly means that its rate of growth there never goes down), then fabove(n, k) can equal f(k/n), thanks to Jensen's inequality. One example is f(λ) = exp(−λ/4).

Twice differentiable functions. The following method, proved in the appendix, implements fabove and fbelow if f(λ)—

• has a defined "slope-of-slope" everywhere in [0, 1] (that is, the function is twice differentiable there), and
• in the interval [0, 1]—
• has a minimum of greater than 0 and a maximum of less than 1, or
• is convex and has a minimum of greater than 0, or
• is concave and has a maximum of less than 1.

Let m be an upper bound of the highest value of abs(f′′(x)) for any x in [0, 1], where f′′ is the "slope-of-slope" function of f. Then for every integer n that's a power of 2:

• fbelow(n, k) = f(k/n) if f is concave; otherwise, min(fbelow(4,0), fbelow(4,1), ..., fbelow(4,4)) if n < 4; otherwise, f(k/n) − m/(7*n).
• fabove(n, k) = f(k/n) if f is convex; otherwise, max(fabove(4,0), fabove(4,1), ..., fabove(4,4)) if n < 4; otherwise, f(k/n) + m/(7*n).

My GitHub repository includes SymPy code for a method, c2params, to calculate the necessary values for m and the bounds of these polynomials, given f.

Note: For this method, the "slope-of-slope" function need not be continuous (Y. Peres, pers. comm., 2021).

Example: Take f(λ) = exp(−λ). This is a convex and twice differentiable function, and bounded below by 3321/10000. Then it can be shown that the following scheme for f is valid:

• fbelow(n, k) = 3321/10000 if n<4; otherwise, f(k/n) − 1/(7*n). (Observe that f(k/4) − 1/(7*4) ≥ 3321/10000.)
• fabove(n, k) = f(k/n) (because f is convex).
• fbound(n) = [0, 1].

Hölder and Lipschitz continuous functions. I have found a way to extend the results of Nacu and Peres (2005)[^1] to certain functions with a slope that tends to a vertical slope. The following scheme, proved in the appendix, implements fabove and fbelow if f(λ)—

• is α-Hölder continuous on [0, 1], meaning its vertical slopes there, if any, are no "steeper" than that of m*λα, for some number m greater than 0 (the Hölder constant) and for some α in the interval (0, 1], and
• in the interval [0, 1]—
• has a minimum of greater than 0 and a maximum of less than 1, or
• is convex and has a minimum of greater than 0, or
• is concave and has a maximum of less than 1.

If f is 1-Hölder continuous, then f is Lipschitz continuous[^2] and m is the highest absolute value of the function's "slope". Otherwise, finding m for a given α is non-trivial and it requires knowing where f's vertical slopes are, among other things.[^3] But assuming m and α are known, then for every integer n that's a power of 2:

• D(n) = m*(2/7)α/2/((2α/2−1)*nα/2).
• fbelow(n, k) = f(k/n) if f is concave; otherwise, min(fbelow(4,0), fbelow(4,1), ..., fbelow(4,4)) if n < 4; otherwise, f(k/n) − D(n).
• fabove(n, k) = f(k/n) if f is convex; otherwise, max(fabove(4,0), fabove(4,1), ..., fabove(4,4)) if n < 4; otherwise, f(k/n) + D(n).

Note:

1. Some functions f are not α-Hölder continuous for any α greater than 0. These functions have a slope that's steeper than every "nth" root, and can't be handled by this method. One example is f(λ) = 1/10 if λ is 0 and −1/(2*ln(λ/2)) + 1/10 otherwise, which has a slope near 0 that's steeper than every "nth" root.
2. In the Lipschitz case (α = 1), D(n) can be m*322613/(250000*sqrt(n)), which is an upper bound.
3. In the case α = 1/2, D(n) can be m*154563/(40000*n1/4), which is an upper bound.

Certain functions that equal 0 at 0. This approach involves transforming the function f so that it no longer equals 0 at the point 0. This can be done by dividing f by a function (h(λ)) that "dominates" f at every point in the interval [0, 1]. Unlike for the original function, there might be an approximation scheme described earlier in this section for the transformed function.

More specifically, h(λ) must meet the following requirements:

• h(λ) is continuous on the closed interval [0, 1].
• h(0) = 0. (This is required to ensure correctness in case λ is 0.)
• 1 ≥ h(1) ≥ f(1) ≥ 0.
• 1 > h(λ) > f(λ) > 0 for every λ in the open interval (0, 1).
• If f(1) = 0, then h(1) = 0. (This is required to ensure correctness in case λ is 1.)

Also, h should be a function with a simple Bernoulli factory algorithm. For example, h can be a polynomial in Bernstein form of degree n whose n plus one coefficients are [0, 1, 1, ..., 1]. This polynomial is easy to simulate using the algorithms from the section "Certain Polynomials".

The algorithm is now described.

Let g(λ) = limνλ f(ν)/h(ν) (in other words, the value that f(ν)/h(ν) approaches as ν approaches λ.) If—

• f(0) = 0 and f(1) < 1, and
• g(λ) is continuous on [0, 1] and belongs in one of the classes of functions given earlier,

then f can be simulated using the following algorithm:

1. Run a Bernoulli factory algorithm for h. If the call returns 0, return 0. (For example, if h(λ) = λ, then this step amounts to the following: "Flip the input coin. If it returns 0, return 0.")
2. Run a Bernoulli factory algorithm for g(.) and return the result of that algorithm. This can be one of the general factory function algorithms if there is a way to calculate polynomials that converge to g(.) in a manner needed for that algorithm (for example, if g is described earlier in this section).

Notes:

1. It may happen that g(0) = 0. In this case, step 2 of this algorithm can involve running this algorithm again, but with new g and h functions that are found based on the current g function. See the second example below.
2. If—

• f is monotonically increasing,
• h(λ) = λ, and
• f′(λ), the "slope" function of f, is continuous on [0, 1], maps (0, 1) to (0, 1), and belongs in one of the classes of functions given earlier,

then step 2 can be implemented by taking g as f′, except: (A) a uniform(0, 1) random variate u is generated at the start of the step; (B) instead of flipping the input coin as normal during that step, a different coin is flipped that does the following: "Flip the input coin, then sample from the number u. Return 1 if both the call and the flip return 1, and return 0 otherwise."
This is the "integral method" of Flajolet et al. (2010)[^4] (the modified step 2 simulates 1/λ times the integral of f.).

Examples:

1. If f(λ) = (sinh(λ)+cosh(λ)−1)/4, then f is bounded from above by h(λ) = λ, so g(λ) is 1/4 if λ = 0, and (exp(λ) − 1)/(4*λ) otherwise. The following code in Python that uses the SymPy computer algebra library computes this example: fx = (sinh(x)+cosh(x)-1)/4; h = x; pprint(Piecewise((limit(fx/h,x,0), Eq(x,0)), ((fx/h).simplify(), True))).
2. If f(λ) = cosh(λ) − 1, then f is bounded from above by h(λ) = λ, so g(λ) is 0 if λ = 0, and (cosh(λ)−1)/λ otherwise. Now, since g(0) = 0, find new functions g and h based on the current g. The current g is bounded from above by H(λ) = λ*3*(2−λ)/5 (a degree-2 polynomial that in Bernstein form has coefficients [0, 6/10, 6/10]), so G(λ) = 5/12 if λ = 0, and −(5*cosh(λ) − 5)/(3*λ2*(λ−2)) otherwise. G is bounded away from 0 and 1, resulting in the following algorithm:

1. (Simulate h.) Flip the input coin. If it returns 0, return 0.
2. (Simulate H.) Flip the input coin twice. If both flips return 0, return 0. Otherwise, with probability 4/10 (that is, 1 minus 6/10), return 0.
3. Run a Bernoulli factory algorithm for G (which might involve building polynomials that converge to G, noticing that G is twice differentiable) and return the result of that algorithm.

Certain functions that equal 0 at 0 and 1 at 1. Let f, g, and h be functions as defined earlier, except that f(0) = 0 and f(1) = 1. Define the following additional functions:

• ω(λ) is a function that meets the following requirements:
• ω(λ) is continuous on the closed interval [0, 1].
• ω(0) = 0 and ω(1) = 1.
• 1 > f(λ) > ω(λ) > 0 for every λ in the open interval (0, 1).
• q(λ) = limνλ ω(ν)/h(ν).
• r(λ) = limνλ (1−g(ν))/(1−q(ν)).

Roughly speaking, ω is a function that bounds f from below, just as h bounds f from above. ω should be a function with a simple Bernoulli factory algorithm, such as a polynomial in Bernstein form. If both ω and h are polynomials of the same degree, q will be a rational function with a relatively simple Bernoulli factory algorithm (see "Certain Rational Functions").

Now, if r(λ) is continuous on [0, 1], then f can be simulated using the following algorithm:

1. Run a Bernoulli factory algorithm for h. If the call returns 0, return 0. (For example, if h(λ) = λ, then this step amounts to the following: "Flip the input coin. If it returns 0, return 0.")
2. Run a Bernoulli factory algorithm for q(.). If the call returns 1, return 1.
3. Run a Bernoulli factory algorithm for r(.), and return 1 minus the result of that call. The Bernoulli factory algorithm can be one of the general factory function algorithms if there is a way to calculate polynomials that converge to r(.) in a manner needed for that algorithm (for example, if r is described earlier in this section).

Note: Quick proof: Rewrite $f=h\cdot(q\cdot1+(1-q)\cdot(1-r))+(1-h)\cdot0$.

Example: If f(λ) = (1−exp(λ))/(1−exp(1)), then f is bounded from above by h(λ) = λ, and from below by ω(λ) = λ2. As a result, q(λ) = λ, and r(λ) = (2 − exp(1))/(1 − exp(1)) if λ = 0; 1/(exp(1)−1) if λ = 1; and (−λ*(1 − exp(1)) − exp(λ) + 1)/(λ*(1 − exp(1))*(λ − 1)) otherwise. This can be computed using the following code in Python that uses the SymPy computer algebra library: fx=(1-exp(x))/(1-exp(1)); h=x; omega=x**2; q=(omega/h); r=(1-fx/h)/(1-q); r=Piecewise((limit(r, x, 0), Eq(x,0)), (limit(r,x,1),Eq(x,1)), (r,True)).simplify(); pprint(r).

Other functions that equal 0 or 1 at the endpoints 0 and/or 1. If f does not fully admit an approximation scheme under the convex, concave, twice differentiable, and Hölder classes:

If f(0) = And f(1) = Method
> 0 and < 1 1 Use the algorithm for certain functions that equal 0 at 0, but with f(λ) = 1 − f(1−λ).
Inverted coin: Instead of the usual input coin, use a coin that does the following: "Flip the input coin and return 1 minus the result."
Inverted result: If the overall algorithm would return 0, it returns 1 instead, and vice versa.
> 0 and < 1 0 Algorithm for certain functions that equal 0 at 0, but with f(λ) = f(1−λ). (For example, cosh(λ)−1 becomes cosh(1−λ)−1.)
Inverted coin.
1 0 Algorithm for certain functions that equal 0 at 0 and 1 at 1, but with f(λ) = 1−f(λ).
Inverted result.
1 > 0 and ≤ 1 Algorithm for certain functions that equal 0 at 0, but with f(λ) = 1−f(λ).
Inverted result.

Specific functions. My GitHub repository includes SymPy code for a method, approxscheme2, to build a polynomial approximation scheme for certain factory functions.

### Schemes That Don't Work

In the academic literature (papers and books), there are many approximation methods that involve polynomials that converge from above and below to a function. Unfortunately, most of them cannot be used as is to simulate a function f in the Bernoulli Factory setting, because they don't ensure the consistency requirement described earlier.

The following are approximation schemes with counterexamples to consistency.

First scheme. In this scheme (Powell 1981)[^5], let f be a C2 continuous function (a function with continuous "slope" and "slope-of-slope" functions) in [0, 1]. Then for every n≥1:

• fabove(n, k) = f(k/n) + M / (8*n).
• fbelow(n, k) = f(k/n) − M / (8*n).

Where M is not less than the maximum absolute value of f's slope-of-slope function (second derivative), and where k is an integer in the interval [0, n].

The counterexample involves the C2 continuous function g(λ) = sin(π*λ)/4 + 1/2.

For g, the coefficients for—

• the degree-2 upper polynomial in Bernstein form (fabove(5, k)) are [0.6542..., 0.9042..., 0.6542...], and
• the degree-4 upper polynomial in Bernstein form (fabove(6, k)) are [0.5771..., 0.7538..., 0.8271..., 0.7538..., 0.5771...].

The degree-2 polynomial lies above the degree-4 polynomial everywhere in [0, 1]. However, to ensure consistency, the degree-2 polynomial, once elevated to degree 4 and rewritten in Bernstein form, must have coefficients that are greater than or equal to those of the degree-4 polynomial.

• Once elevated to degree 4, the degree-2 polynomial's coefficients are [0.6542..., 0.7792..., 0.8208..., 0.7792..., 0.6542...].

As can be seen, the elevated polynomial's coefficient 0.8208... is less than the corresponding coefficient 0.8271... for the degree-4 polynomial.

The rest of this section will note counterexamples involving other functions and schemes, without demonstrating them in detail.

Second scheme. In this scheme, let f be a Lipschitz continuous function in [0, 1][^2] Then for every n≥2:

• fabove(n, k) = f(k/n) + L*(5/4) / sqrt(n).
• fbelow(n, k) = f(k/n) − L*(5/4) / sqrt(n).

Where L is the maximum absolute "slope", also known as the Lipschitz constant, and (5/4) is the so-called Popoviciu constant, and where k is an integer in the interval [0, n] (Lorentz 1986)[^6], (Popoviciu 1935)[^7].

There are two counterexamples here; together they show that this scheme can fail to ensure consistency, even if the set of functions is restricted to "smooth" functions (not just Lipschitz continuous functions):

1. The function f(λ) = min(λ, 1−λ)/2 is Lipschitz continuous with Lipschitz constant 1/2. (In addition, f has a kink at 1/2, so that it's not differentiable, but this is not essential for the counterexample.) The counterexample involves the degree-5 and degree-6 upper polynomials (fabove(5, k) and fabove(6, k)).
2. The function f = sin(4*π*λ)/4 + 1/2, a "smooth" function with Lipschitz constant π. The counterexample involves the degree-3 and degree-4 lower polynomials (fbelow(3, k) and fbelow(4, k)).

Third scheme. Same as the second scheme, but replacing (5/4) with the Sikkema constant, S = (4306+837*sqrt(6))/5832 (Lorentz 1986)[^6], (Sikkema 1961)[^8], which is slightly less than 1.09. In fact, the same counterexamples for the second scheme apply to this one, since this scheme merely multiplies the offset to bring the approximating polynomials closer to f.

Note on "clamping". For any approximation scheme, "clamping" the values of fbelow and fabove to fit the interval [0, 1] won't necessarily preserve the consistency requirement, even if the original scheme met that requirement.

Here is a counterexample that applies to any approximation scheme.

Let g and h be two polynomials in Bernstein form as follows:

• g has degree 5 and coefficients [10179/10000, 2653/2500, 9387/10000, 5049/5000, 499/500, 9339/10000].
• h has degree 6 and coefficients [10083/10000, 593/625, 9633/10000, 4513/5000, 4947/5000, 9473/10000, 4519/5000].

After elevating g's degree, g's coefficients are no less than h's, as required by the consistency property.

However, by clamping coefficients above 1 to equal 1, so that g is now g′ with [1, 1, 9387/10000, 1, 499/500, 9339/10000] and h is now h′ with [1, 593/625, 9633/10000, 4513/5000, 4947/5000, 9473/10000, 4519/5000], and elevate g′ for coefficients [1, 1, 14387/15000, 19387/20000, 1499/1500, 59239/60000, 9339/10000], some of the coefficients of g′ are less than those of h′. Thus, for this pair of polynomials, clamping the coefficients will destroy the consistent approximation property.

## Approximate Bernoulli Factories

Although the schemes in the previous section don't work when building a family of polynomials that converge to a function f(λ), they are still useful for building an approximation to that function, in the form of a single polynomial, leading to an approximate Bernoulli factory for f.

Here, f(λ) can be any function that maps the closed interval [0, 1] to [0, 1], even if it isn't continuous or a factory function (examples include the "step function" 0 if λ < 1/2 and 1 otherwise, or the function 2*min(λ, 1 − λ)). Continuous functions can be approximated arbitrarily well by an approximate Bernoulli factory (as a result of the so-called "Weierstrass approximation theorem"), but this is not the case in general for discontinuous functions.

To build an approximate Bernoulli factory:

1. First, find a polynomial in Bernstein form of degree n that is close to the desired function f.

The simplest choice for this polynomial has n+1 coefficients and its jth coefficient (starting at 0) is found as f(j/n), and this is used in the examples below. For this choice, if f is continuous, the polynomial can be brought arbitrarily close to f by choosing n high enough.

Whatever polynomial is used, the polynomial's coefficients must all lie in [0, 1].

2. Then, use one of the algorithms in the section "Certain Polynomials" to toss heads with probability equal to that polynomial, given its coefficients.

Examples of approximate Bernoulli factories. The schemes in the previous section give an upper bound on the error on approximating f with a degree-n polynomial in Bernstein form. For example, the third scheme does this when f is a Lipschitz continuous function (with Lipschitz constant L). To find a degree n such that f is approximated with a maximum error of ε, the following equation is solved for n:

• ε = L*((4306+837*sqrt(6))/5832) / sqrt(n).

This has the following solution:

• n = L2*(3604122*sqrt(6) + 11372525)/(17006112*ε2).

This is generally not an integer, so use n = ceil(n) to get the solution if it's an integer, or the nearest integer that's bigger than the solution. This solution can be simplified further to n = ceil(59393*L2/(50000*ε2)), which bounds the previous solution from above.

Now, if f is a Lipschitz continuous factory function with Lipschitz constant L, the following algorithm (adapted from "Certain Polynomials") simulates a polynomial that approximates f with a maximum error of ε:

1. Calculate n as ceil(59393*L2/(50000*ε2)).
2. Flip the input coin n times, and let j be the number of times the coin returned 1 this way.
3. With probability f(j/n), return 1. Otherwise, return 0. (If f(j/n) can be an irrational number, see "Algorithms for General Irrational Constants" for ways to sample this irrational probability exactly.)

As another example, the first scheme in the previous section is used to get the following approximate algorithm for C2 continuous functions with maximum "slope-of-slope" of M:

1. Calculate n as ceil(M/(8*ε)) (upper bound of the solution to the equation ε = M/(8*n)).
2. Flip the input coin n times, and let j be the number of times the coin returned 1 this way.
3. With probability f(j/n), return 1. Otherwise, return 0.

Other methods similar to this one can give an upper bound on the Bernstein-form polynomial approximation error, if they apply to the function f to be approximated.

Approximate methods for linear functions. There are a number of approximate methods to simulate λ*c, where c > 1 and λ lies in [0, 1/c). ("Approximate" because this function touches 1 at 1/c, so it can't be a factory function.) Since the methods use only up to n flips, where n is an integer greater than 0, the approximation will be a polynomial of degree n.

• Henderson and Glynn (2003, Remark 4)[^9] approximates the function λ*2 using a polynomial where the jth coefficient (starting at 0) is min((j/n)*2, 1−1/n). If g(λ) is that polynomial, then the error in approximating f is no greater than 1−g(1/2). g can be computed with the SymPy computer algebra library as follows: from sympy.stats import *; g=2*E( Min(sum(Bernoulli(("B%d" % (i)),z) for i in range(n))/n,(S(1)-S(1)/n)/2)).

• I found the following approximation for λ*c[^10]: "(1.) Set j to 0 and i to 0; (2.) If in, return 0; (3.) Flip the input coin, and if it returns 1, add 1 to j; (4.) (Estimate the probability and return 1 if it 'went over'.) If (j/(i+1)) ≥ 1/c, return 1; (5.) Add 1 to i and go to step 2." Here, λ*c is approximated by a polynomial where the jth coefficient (starting at 0) is min((j/n)*c, 1). If g(λ) is that polynomial, then the error in approximating f is no greater than 1−g(1/c).

• The previous approximation generalizes the one given in section 6 of Nacu and Peres (2005)[^1], which approximates λ*2.

Note: Bias and variance are the two sources of error in a randomized estimation algorithm. Let g(λ) be an approximation of f(λ). The original Bernoulli factory for f, if it exists, has bias 0 and variance f(λ)*(1−f(λ)), but the approximate Bernoulli factory has bias g(λ) − f(λ) and variance g(λ)*(1−g(λ)). Unlike with bias, there are ways to reduce variance, which are outside the scope of this document. An estimation algorithm's mean squared error equals variance plus square of bias.

## Achievable Simulation Rates

In general, the number of input coin flips needed by any Bernoulli factory algorithm for a factory function f(λ) depends on how "smooth" the function f is.

The following table summarizes the rate of simulation (in terms of the number of input coin flips needed) that can be achieved in theory depending on f(λ), assuming the unknown probability of heads. In the table below:

• λ, the unknown probability of heads, lies in the interval [ε, 1−ε] for some ε > 0.
• The simulation makes use of unbiased random bits in addition to input coin flips.
• Δ(n, r, λ) = O(max(sqrt(λ*(1−λ)/n),1/n)r), that is, O((1/n)r) near λ = 0 or 1, and O((1/n)r/2) elsewhere. (O(h(n)) roughly means "bounded from above by h(n) times a constant, for large enough n".)
Property of simulation Property of f
Requires no more than n input coin flips. If and only if f can be written as a polynomial in Bernstein form of degree n with coefficients in [0, 1] (Goyal and Sigman 2012)[^11].
Requires a finite number of flips on average. Also known as "realizable" by Flajolet et al. (2010)[^4]. Only if f is Lipschitz continuous (Nacu and Peres 2005)[^1].
Whenever f admits a fast simulation (Mendo 2019)[^12].
Number of flips required, raised to power of r, is finite on average and has a tail that drops off uniformly for every λ. Only if f is Cr continuous (has r or more continuous derivatives, or "slope" functions) (Nacu and Peres 2005)[^1].
Requires more than n flips with probability Δ(n, r + 1, λ), for integer r ≥ 0 and every λ. (The greater r is, the faster the simulation.) Only if f is Cr continuous and the rth derivative is in the Zygmund class (has no vertical slope) (Holtz et al. 2011)[^13].
Requires more than n flips with probability Δ(n, α, λ), for non-integer α > 0 and every λ. (The greater α is, the faster the simulation.) If and only if f is Cr continuous and the rth derivative is (αr)-Hölder continuous, where r = floor(α) (Holtz et al. 2011)[^13]. Assumes f is bounded away from 0 and 1.
"Fast simulation" (requires more than n flips with a probability that decays exponentially as n gets large). Also known as "strongly realizable" by Flajolet et al. (2010)[^4]. If and only if f is real analytic (is C continuous, or has continuous kth derivative for every k, and agrees with its Taylor series at every point) (Nacu and Peres 2005)[^1].
Average number of flips bounded from below by (f′(λ))2*λ*(1−λ)/(f(λ)*(1−f(λ))), where f′ is the first derivative of f. Whenever f admits a fast simulation (Mendo 2019)[^12].

Notes:

1. By the results of Holtz et al., it is suspected that the target function f can't be simulated using a finite number of flips on average for every probability of heads unless f's fourth derivative is Hölder continuous.
2. If a function is constant on some non-empty open interval in its domain, but is not constant on the whole domain, then it can't be real analytic.

## Complexity

The following note shows the complexity of the algorithm for 1/φ in the main article, where φ is the golden ratio.

Let E[N] be the expected ("long-run average") number of unbiased random bits (fair coin flips) generated by the algorithm.

Then, since each bit is independent, E[N] = 2*φ as shown below.

• Each iteration stops the algorithm with probability p = (1/2) + (1−(1/2)) * (1/φ) (1/2 for the initial bit and 1/φ for the recursive run; (1−(1/2)) because we're subtracting the (1/2) earlier on the right-hand side from 1).
• Thus, the expected number of iterations is E[T] = 1/p by a well-known rejection sampling argument, since the algorithm doesn't depend on iteration counts.
• Each iteration uses 1 * (1/2) + (1 + E[N]) * (1/2) bits on average, so the whole algorithm uses E[N] = (1 * (1/2) + (1 + E[N]) * (1/2)) * E[T] bits on average (each iteration consumes either 1 bit with probability 1/2, or (1 + E[N]) bits with probability 1/2). This equation has the solution E[N] = 1 + sqrt(5) = 2*φ.

Also, on average, half of these flips (φ) show 1 and half show 0, since the bits are unbiased (the coin is fair).

A similar analysis to the one above can be used to find the expected ("long-run average") time complexity of many Bernoulli factory algorithms.

## Examples of Bernoulli Factory Approximation Schemes

The following are approximation schemes and hints to simulate a coin of probability f(λ) given an input coin with probability of heads of λ. The schemes were generated automatically using approxscheme2 and have not been rigorously verified for correctness.

• Let f(λ) = min(1/2, λ). Then, for every integer n that's a power of 2, starting from 1:
• Detected to be concave and Lipschitz continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 9563/10000 if n<8; otherwise, f(k/n) + 322613/(250000*sqrt(n)).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 371/500 if n<4; otherwise, f(k/n) + 967839/(2000000*sqrt(n)).
• fbound(n) = [0, 1].
• Let f(λ) = cosh(λ) − 3/4. Then, for every integer n that's a power of 2, starting from 1:
• The function was detected to be convex and twice differentiable, leading to:
• fbelow(n, k) = 487/2500 if n<4; otherwise, f(k/n) − 154309/(700000*n).
• fabove(n, k) = f(k/n).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = 1043/5000 if n<4; otherwise, f(k/n) − 462927/(2800000*n).
• fabove(n, k) = f(k/n).
• fbound(n) = [0, 1].
• Let f(λ) = 3/4 − sqrt(−λ*(λ − 1)). Then, for every integer n that's a power of 2, starting from 1:
• Detected to be convex and (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 1545784563/(400000000*n1/4).
• fabove(n, k) = f(k/n).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n) − 26278337571/(25600000000*n1/4).
• fabove(n, k) = f(k/n).
• Let f(λ) = 3*sin(sqrt(3)*sqrt(sin(2*λ)))/4 + 1/50. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 709907859/(100000000*n1/4).
• fabove(n, k) = f(k/n) + 709907859/(100000000*n1/4).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n) − 6389170731/(3200000000*n1/4).
• fabove(n, k) = f(k/n) + 6389170731/(3200000000*n1/4).
• Let f(λ) = 3/4 − sqrt(−λ*(λ − 1)). Then, for every integer n that's a power of 2, starting from 1:
• Detected to be convex and (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 1545784563/(400000000*n1/4).
• fabove(n, k) = f(k/n).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n) − 26278337571/(25600000000*n1/4).
• fabove(n, k) = f(k/n).
• Let f(λ) = λ*sin(7*π*λ)/4 + 1/2. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be twice differentiable using numerical methods, which may be inaccurate:
• fbelow(n, k) = 523/10000 if n<64; otherwise, f(k/n) − 11346621/(700000*n).
• fabove(n, k) = 1229/1250 if n<64; otherwise, f(k/n) + 11346621/(700000*n).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = 681/10000 if n<32; otherwise, f(k/n) − 34039863/(4480000*n).
• fabove(n, k) = 4837/5000 if n<32; otherwise, f(k/n) + 34039863/(4480000*n).
• fbound(n) = [0, 1].
• Let f(λ) = sin(4*π*λ)/4 + 1/2. Then, for every integer n that's a power of 2, starting from 1:
• The function was detected to be twice differentiable, leading to:
• fbelow(n, k) = 737/10000 if n<32; otherwise, f(k/n) − 1973921/(350000*n).
• fabove(n, k) = 9263/10000 if n<32; otherwise, f(k/n) + 1973921/(350000*n).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = 1123/10000 if n<32; otherwise, f(k/n) − 1973921/(448000*n).
• fabove(n, k) = 8877/10000 if n<32; otherwise, f(k/n) + 1973921/(448000*n).
• fbound(n) = [0, 1].
• Let f(λ) = sin(6*π*λ)/4 + 1/2. Then, for every integer n that's a power of 2, starting from 1:
• The function was detected to be twice differentiable, leading to:
• fbelow(n, k) = 517/10000 if n<64; otherwise, f(k/n) − 2220661/(175000*n).
• fabove(n, k) = 9483/10000 if n<64; otherwise, f(k/n) + 2220661/(175000*n).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = 1043/10000 if n<64; otherwise, f(k/n) − 104371067/(11200000*n).
• fabove(n, k) = 8957/10000 if n<64; otherwise, f(k/n) + 104371067/(11200000*n).
• fbound(n) = [0, 1].
• Let f(λ) = sin(4*π*λ)/4 + 1/2. Then, for every integer n that's a power of 2, starting from 1:
• The function was detected to be twice differentiable, leading to:
• fbelow(n, k) = 737/10000 if n<32; otherwise, f(k/n) − 1973921/(350000*n).
• fabove(n, k) = 9263/10000 if n<32; otherwise, f(k/n) + 1973921/(350000*n).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = 1123/10000 if n<32; otherwise, f(k/n) − 1973921/(448000*n).
• fabove(n, k) = 8877/10000 if n<32; otherwise, f(k/n) + 1973921/(448000*n).
• fbound(n) = [0, 1].
• Let f(λ) = λ2/2 + 1/10 if λ ≤ 1/2; λ/2 − 1/40 otherwise. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be convex and twice differentiable using numerical methods, which may be inaccurate:
• fbelow(n, k) = 321/5000 if n<4; otherwise, f(k/n) − 1/(7*n).
• fabove(n, k) = f(k/n).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = 693/10000 if n<4; otherwise, f(k/n) − 55/(448*n).
• fabove(n, k) = f(k/n).
• fbound(n) = [0, 1].
• Let f(λ) = λ/2 if λ ≤ 1/2; (4*λ − 1)/(8*λ) otherwise. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be concave and twice differentiable using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 893/2000 if n<4; otherwise, f(k/n) + 2/(7*n).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 4197/10000 if n<4; otherwise, f(k/n) + 5/(28*n).
• fbound(n) = [0, 1].
• Let f(λ) = 1/2 − sqrt(1 − 2*λ)/2 if λ < 1/2; sqrt(2*λ − 1)/2 + 1/2 otherwise. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 1545784563/(400000000*n1/4).
• fabove(n, k) = f(k/n) + 1545784563/(400000000*n1/4).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n) − 10820491941/(12800000000*n1/4).
• fabove(n, k) = f(k/n) + 10820491941/(12800000000*n1/4).
• Let f(λ) = 1/2 − sqrt(1 − 2*λ)/4 if λ < 1/2; sqrt(2*λ − 1)/4 + 1/2 otherwise. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 772969563/(400000000*n1/4).
• fabove(n, k) = f(k/n) + 772969563/(400000000*n1/4).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = 193/5000 if n<16; otherwise, f(k/n) − 5410786941/(12800000000*n1/4).
• fabove(n, k) = 4807/5000 if n<16; otherwise, f(k/n) + 5410786941/(12800000000*n1/4).
• fbound(n) = [0, 1].
• Let f(λ) = λ/2 + (1 − 2*λ)3/2/12 − 1/12 if λ < 0; λ/2 + (2*λ − 1)3/2/12 − 1/12 if λ ≥ 1/2; λ/2 + (1 − 2*λ)3/2/12 − 1/12 otherwise. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be convex and Lipschitz continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 322613/(500000*sqrt(n)).
• fabove(n, k) = f(k/n).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n) − 3548743/(32000000*sqrt(n)).
• fabove(n, k) = f(k/n).
• Let f(λ) = 1/2 − sqrt(1 − 2*λ)/4 if λ < 1/2; sqrt(2*λ − 1)/4 + 1/2 otherwise. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 772969563/(400000000*n1/4).
• fabove(n, k) = f(k/n) + 772969563/(400000000*n1/4).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = 193/5000 if n<16; otherwise, f(k/n) − 5410786941/(12800000000*n1/4).
• fabove(n, k) = 4807/5000 if n<16; otherwise, f(k/n) + 5410786941/(12800000000*n1/4).
• fbound(n) = [0, 1].
• Let f(λ) = 1/2 − sqrt(1 − 2*λ)/8 if λ < 1/2; sqrt(2*λ − 1)/8 + 1/2 otherwise. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = 333/10000 if n<64; otherwise, f(k/n) − 386562063/(400000000*n1/4).
• fabove(n, k) = 9667/10000 if n<64; otherwise, f(k/n) + 386562063/(400000000*n1/4).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = 451/2000 if n<4; otherwise, f(k/n) − 2705934441/(12800000000*n1/4).
• fabove(n, k) = 1549/2000 if n<4; otherwise, f(k/n) + 2705934441/(12800000000*n1/4).
• fbound(n) = [0, 1].
• Let f(λ) = λ/4 + (1 − 2*λ)3/2/24 + 5/24 if λ < 0; λ/4 + (2*λ − 1)3/2/24 + 5/24 if λ ≥ 1/2; λ/4 + (1 − 2*λ)3/2/24 + 5/24 otherwise. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be convex and Lipschitz continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = 443/5000 if n<4; otherwise, f(k/n) − 322613/(1000000*sqrt(n)).
• fabove(n, k) = f(k/n).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = 1111/5000 if n<4; otherwise, f(k/n) − 3548743/(64000000*sqrt(n)).
• fabove(n, k) = f(k/n).
• fbound(n) = [0, 1].
• Let f(λ) = 3*λ/2 if λ ≤ 1 − λ; 3/2 − 3*λ/2 otherwise. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be concave and Lipschitz continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 124/125 if n<64; otherwise, f(k/n) + 967839/(500000*sqrt(n)).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 1863/2000 if n<64; otherwise, f(k/n) + 2903517/(2000000*sqrt(n)).
• fbound(n) = [0, 1].
• Let f(λ) = 9*λ/5 if λ ≤ 1 − λ; 9/5 − 9*λ/5 otherwise. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be concave and Lipschitz continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = f(k/n) + 2903517/(1250000*sqrt(n)).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = f(k/n) + 8710551/(5000000*sqrt(n)).
• Let f(λ) = min(λ, 1 − λ). Then, for every integer n that's a power of 2, starting from 1:
• Detected to be concave and Lipschitz continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 9563/10000 if n<8; otherwise, f(k/n) + 322613/(250000*sqrt(n)).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 123/125 if n<4; otherwise, f(k/n) + 967839/(1000000*sqrt(n)).
• fbound(n) = [0, 1].
• Let f(λ) = λ/2 if λ ≤ 1 − λ; 1/2 − λ/2 otherwise. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be concave and Lipschitz continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 5727/10000 if n<4; otherwise, f(k/n) + 322613/(500000*sqrt(n)).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 123/250 if n<4; otherwise, f(k/n) + 967839/(2000000*sqrt(n)).
• fbound(n) = [0, 1].
• Let f(λ) = 19*λ/20 if λ ≤ 1 − λ; 19/20 − 19*λ/20 otherwise. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be concave and Lipschitz continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 1817/2000 if n<8; otherwise, f(k/n) + 6129647/(5000000*sqrt(n)).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 2337/2500 if n<4; otherwise, f(k/n) + 18388941/(20000000*sqrt(n)).
• fbound(n) = [0, 1].
• Let f(λ) = min(1/8, 3*λ). Then, for every integer n that's a power of 2, starting from 1:
• Detected to be concave and Lipschitz continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 4047/5000 if n<32; otherwise, f(k/n) + 967839/(250000*sqrt(n)).
• fbound(n) = [0, 1].
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = 171/400 if n<4; otherwise, f(k/n) + 967839/(1600000*sqrt(n)).
• fbound(n) = [0, 1].

## Notes

[^1]: Nacu, Şerban, and Yuval Peres. "Fast simulation of new coins from old", The Annals of Applied Probability 15, no. 1A (2005): 93-115.

[^2]: A Lipschitz continuous function on [0, 1], with constant L is a continuous function such that f(x) and f(y) are no more than L*ε apart whenever x and y are in [0, 1] and no more than ε apart. Roughly speaking, the function has a defined slope at all points or "almost everywhere" in [0, 1], and that slope is bounded wherever it's defined.

[^3]: Specifically, the constant m is a number equal to or greater than abs(f(x)−f(y))/(abs(xy)α) for every x in [0, 1] and every y in [0, 1] such that x != y. However, m can't directly be calculated as it would involve checking an infinite number of x, y pairs.

[^4]: Flajolet, P., Pelletier, M., Soria, M., "On Buffon machines and numbers", arXiv:0906.5560 [math.PR], 2010.

[^5]: Powell, M.J.D., Approximation Theory and Methods, 1981

[^6]: G. G. Lorentz. Bernstein polynomials. 1986.

[^7]: Popoviciu, T., "Sur l'approximation des fonctions convexes d'ordre supérieur", Mathematica (Cluj), 1935.

[^8]: Sikkema, P.C., "Der Wert einiger Konstanten in der Theorie der Approximation mit Bernstein-Polynomen", Numer. Math. 3 (1961).

[^9]: Henderson, S.G., Glynn, P.W., "Nonexistence of a class of variate generation schemes", Operations Research Letters 31 (2003).

[^10]: For this approximation, if n were infinity, the method would return 1 with probability 1 and so would not approximate λ*c, of course.

[^11]: Goyal, V. and Sigman, K., 2012. On simulating a class of Bernstein polynomials. ACM Transactions on Modeling and Computer Simulation (TOMACS), 22(2), pp.1-5.

[^12]: Mendo, Luis. "An asymptotically optimal Bernoulli factory for certain functions that can be expressed as power series." Stochastic Processes and their Applications 129, no. 11 (2019): 4366-4384.

[^13]: Holtz, O., Nazarov, F., Peres, Y., "New Coins from Old, Smoothly", Constructive Approximation 33 (2011).

[^14]: Keane, M. S., and O'Brien, G. L., "A Bernoulli factory", ACM Transactions on Modeling and Computer Simulation 4(2), 1994.

[^15]: von Neumann, J., "Various techniques used in connection with random digits", 1951.

[^16]: Peres, Y., "Iterating von Neumann's procedure for extracting random bits", Annals of Statistics 1992,20,1, p. 590-597.

[^17]: Knuth, Donald E. and Andrew Chi-Chih Yao. "The complexity of nonuniform random number generation", in Algorithms and Complexity: New Directions and Recent Results, 1976.

[^18]: Mossel, Elchanan, and Yuval Peres. New coins from old: computing with unknown bias. Combinatorica, 25(6), pp.707-724.

[^19]: S. Pae, "Binarization Trees and Random Number Generation", arXiv:1602.06058v2 [cs.DS], 2018.

[^20]: Levy, H., Stochastic dominance, 1998.

[^21]: Henry (https://math.stackexchange.com/users/6460/henry), Proving stochastic dominance for hypergeometric random variables, URL (version: 2021-02-20): https://math.stackexchange.com/q/4033573 .

[^22]: Gal, S.G., "Calculus of the modulus of continuity for nonconcave functions and applications", Calcolo 27 (1990)

[^23]: Gal, S.G., 1995. Properties of the modulus of continuity for monotonous convex functions and applications. International Journal of Mathematics and Mathematical Sciences 18(3), pp.443-446.

[^24]: Anastassiou, G.A., Gal, S.G., Approximation Theory: Moduli of Continuity and Global Smoothness Preservation, Birkhäuser, 2012.

## Appendix

### Which functions admit a Bernoulli factory?

Let f(λ) be a function whose domain is the closed interval [0, 1] or a subset of it, and that maps its domain to [0, 1]. The domain of f gives the allowable values of λ, which is the input coin's probability of heads.

f admits a Bernoulli factory if and only if f is constant on its domain, or is continuous and polynomially bounded on its domain, as defined in the section "Proofs for Function Approximation Schemes" (Keane and O'Brien 1994)[^14].

If f(λ) meets these sufficient conditions, it admits a Bernoulli factory:

• f(λ) is continuous on the closed interval [0, 1].
• f(λ) has a minimum of greater than 0 and a maximum of less than 1.

If f(λ) meets these sufficient conditions, it admits a Bernoulli factory and is Hölder continuous (has no slope steeper than an nth root's):

• f(λ) is continuous.
• f(λ) maps the closed interval [0, 1] to [0, 1].
• f(λ) equals neither 0 nor 1 on the open interval (0, 1).
• f(λ) is algebraic over rational numbers (that is, there is a nonzero polynomial P(x, y) in two variables and whose coefficients are rational numbers, such that P(x, f(x)) = 0 for every x in the domain of f).

A proof by Reid Barton begins by showing that f is a semialgebraic function, so that by a known inequality and the other conditions, it meets the definitions of being Hölder continuous and polynomially bounded.

### Which functions don't require outside randomness to simulate?

The function f(λ) is strongly simulable if it admits a Bernoulli factory algorithm that uses nothing but the input coin as a source of randomness (Keane and O'Brien 1994)[^14]. See "Randomized vs. Non-Randomized Algorithms".

Strong Simulability Statement. A function f(λ) is strongly simulable only if—

1. f is constant on its domain, or is continuous and polynomially bounded on its domain, and
2. f maps the closed interval [0, 1] or a subset of it to [0, 1], and
3. f(0) equals 0 or 1 whenever 0 is in the domain of f, and
4. f(1) equals 0 or 1 whenever 1 is in the domain of f.

Keane and O'Brien already showed that f is strongly simulable if conditions 1 and 2 are true and neither 0 nor 1 are included in the domain of f. Conditions 3 and 4 are required because λ (the probability of heads) can be 0 or 1 so that the input coin returns 0 or 1, respectively, every time. This is called a "degenerate" coin. When given just a degenerate coin, no algorithm can produce one value with probability greater than 0, and another value with the opposite probability. Rather, the algorithm can only produce a constant value with probability 1. In the Bernoulli factory problem, that constant is either 0 or 1, so a Bernoulli factory algorithm for f must return 1 with probability 1, or 0 with probability 1, when given just a degenerate coin and no outside randomness, resulting in conditions 3 and 4.

To show that f is strongly simulable, it's enough to show that there is a Bernoulli factory for f that must flip the input coin and get 0 and 1 before it uses any outside randomness.

Proposition 1. If f(λ) is described in the strong simulability statement and is a polynomial with computable coefficients, it is strongly simulable.

Proof: If f is the constant 0 or 1, the proof is trivial: simply return 0 or 1, respectively.

Otherwise: Let a[j] be the jth coefficient of the polynomial in Bernstein form. Consider the following algorithm, modified from (Goyal and Sigman 2012)[^11].

1. Flip the input coin n times, and let j be the number of times the coin returned 1 this way.
2. If 0 is in the domain of f and if j is 0, return f(0). (By condition 3, f(0) must be either 0 or 1.)
3. If 1 is in the domain of f and if j is n, return f(1). (By condition 4, f(1) must be either 0 or 1.)
4. Generate a uniform(0, 1) random variate, then return 1 if that variate is less than a[j] (a[j] is the coefficient j of the polynomial written in Bernstein form), or 0 otherwise.

(By the properties of the Bernstein form, a[0] will equal f(0) and a[n] will equal f(1) whenever 0 or 1 is in the domain of f, respectively.)

Step 4 is done by first generating unbiased bits (such as with the von Neumann trick of flipping the input coin twice until the flip returns 0 then 1 or 1 then 0 this way, then taking the result as 0 or 1, respectively (von Neumann 1951)[^15]), then using the algorithm in "Digit Expansions" to produce the probability a[j]. The algorithm computes a[j] bit by bit and compares the computed value with the generated bits. Since the coin returned both 0 and 1 in step 1 earlier in the algorithm, we know the coin isn't degenerate, so that step 4 will finish with probability 1. Now, since the Bernoulli factory used only the input coin for randomness, this shows that f is strongly simulable. □

Proposition 2. If f(λ) is described in the strong simulability statement, and if either f is constant on its domain or f meets the additional conditions below, then f is strongly simulable.

1. If f(0) = 0 or f(1) = 0 or both, then there is a polynomial g(λ) in Bernstein form whose coefficients are computable and in the interval [0, 1], such that g(0) = f(0) and g(1) = f(1) whenever 0 or 1, respectively, is in the domain of f, and such that g(λ) > f(λ) for every λ in the domain of f, except at 0 and 1.
2. If f(0) = 1 or f(1) = 1 or both, then there is a polynomial h(λ) in Bernstein form whose coefficients are computable and in the interval [0, 1], such that h(0) = f(0) and h(1) = f(1) whenever 0 or 1, respectively, is in the domain of f, and such that h(λ) < f(λ) for every λ in the domain of f, except at 0 and 1.

Lemma 1. If f(λ) is described in the strong simulability statement and meets the additional condition below, then f is strongly simulable.

• There is a polynomial g(λ) in Bernstein form whose coefficients are computable and in the interval [0, 1], such that g(0) = f(0) and g(1) = f(1) whenever 0 or 1, respectively, is in the domain of f, and such that g(λ) > f(λ) for every λ in the domain of f, except at 0 and 1.

Proof: Consider the following algorithm.

1. If f is 0 everywhere in its domain or 1 everywhere in its domain, return 0 or 1, respectively.
2. Otherwise, use the algorithm given in Proposition 1 to simulate g(λ). If the algorithm returns 0, return 0. By the additional condition in the lemma, 0 will be returned if λ is either 0 or 1.

Now, we know that the input coin's probability of heads is neither 0 nor 1.

By the conditions in the lemma, both f and g will be positive on the open interval (0, 1) (wherever f is defined).

Now let h(λ) = f(λ)/g(λ). By the conditions in the lemma, h will be positive everywhere in that interval.

3. If h equals 1 everywhere in the interval (0, 1) (wherever f is defined), return 1.

4. Otherwise, we run a Bernoulli factory algorithm for h(λ) that uses the input coin (and possibly outside randomness). Since h is continuous and polynomially bounded and the input coin's probability of heads is neither 0 nor 1, h is strongly simulable; we can replace the outside randomness in the algorithm with unbiased random bits via the von Neumann trick.

Thus, f admits an algorithm that uses nothing but the input coin as a source of randomness, and so is strongly simulable. □

Lemma 2. If f(λ) is described in the strong simulability statement and meets the additional conditions below, then f is strongly simulable.

1. There are two polynomials g(λ) and ω(λ) in Bernstein form, such that both polynomials' coefficients are computable and all in the interval [0, 1].
2. g(0) = ω(0) = f(0) = 0 (so that 0 is in the domain of f).
3. g(1) = ω(1) = f(1) = 1 (so that 1 is in the domain of f).
4. For every λ in the domain of f, except at 0 and 1, g(λ) > f(λ).
5. For every λ in the domain of f, except at 0 and 1, ω(λ) < f(λ).

Proof: First, assume g and ω have the same degree. If not, elevate the degree of the polynomial with lesser degree to have the same degree as the other.

Now, let g[j] and ω[j] be the jth coefficient of the polynomial g or ω, respectively, in Bernstein form. Consider the following algorithm, which is similar to the algorithm in Proposition 1.

1. Flip the input coin n times, and let j be the number of times the coin returned 1 this way.
2. If 0 is in the domain of f and if j is 0, return g(0) = ω(0) = 0.
3. If 1 is in the domain of f and if j is n, return g(1) = ω(1) = 0.
4. Generate a uniform(0, 1) random variate, then return 1 if that variate is less than ω[j], or return 0 if that variate is greater than g[j]. This step is carried out via the von Neumann method, as in Proposition 1.

If the algorithm didn't return a value, then by now we know that the input coin's probability of heads is neither 0 nor 1, since step 2 returned a value (either 0 or 1), which can only happen if the input coin didn't return all zeros or all ones.

Now let r(λ) = (f(λ) − ω(λ)) / (g(λ) − ω(λ)). By the conditions in the lemma, h will be positive everywhere in the interval (0, 1), wherever f is defined.

Now, run a Bernoulli factory algorithm for r(λ) that uses the input coin (and possibly outside randomness). Since r is continuous and polynomially bounded and the input coin's probability of heads is neither 0 nor 1, r is strongly simulable; we can replace the outside randomness in the algorithm with unbiased random bits via the von Neumann trick.

Thus, f admits an algorithm that uses nothing but the input coin as a source of randomness, and so is strongly simulable. □

Proof of Proposition 2: The following cases can occur:

1. If neither 0 nor 1 are in the domain of f, then f is strongly simulable by the discussion above.
2. If f is 0 everywhere in its domain or 1 everywhere in its domain: Return 0 or 1, respectively.
3. If 0 but not 1 is in the domain of f: If f(0) = 0, apply Lemma 1. If f(0) = 1, apply Lemma 1, but take f = 1 − f and return 1 minus the output of the lemma's algorithm (this will bring f(0) = 0 and satisfy the lemma.)
4. If 1 but not 0 is in the domain of f: If f(1) = 0, apply Lemma 1. If f(1) = 1, apply Lemma 1, but take f = 1 − f and return 1 minus the output of the lemma's algorithm (this will bring f(1) = 0 and satisfy the lemma.)
5. f(0) = f(1) = 0: Apply Lemma 1.
6. f(0) = f(1) = 1: Apply Lemma 1, but take f = 1 − f and return 1 minus the output of the lemma's algorithm.
7. f(0) = 0 and f(1) = 1: Apply Lemma 2.
8. f(0) = 1 and f(1) = 0: Apply Lemma 2, but take f = 1 − f and return 1 minus the output of the lemma's algorithm.

Proposition 3. If f(λ) is described in the strong simulability statement and is Lipschitz continuous, then f is strongly simulable.

Lemma 3. If f(λ) is described in the strong simulability statement, is Lipschitz continuous, and is such that f(0) = 0 and f(1) = 0 whenever 0 or 1, respectively, is in the domain of f, then f is strongly simulable.

Proof: If f is 0 everywhere in its domain or 1 everywhere in its domain: Return 0 or 1, respectively. Otherwise, let—

• M be the Lipschitz constant of f (for example, its "slope" function's maximum absolute value), or a computable number greater than this.
• l be either 0 if 0 is in the domain of f, or 1 otherwise, and
• u be either 0 if 1 is in the domain of f, or 1 otherwise.

To build g, take its degree as ceil(M)+1 or greater (so that g's Lipschitz constant is greater than M and g has ceil(M) + 2 coefficients), then set the first coefficient as l, the last coefficient as u, and the remaining coefficients as 1. (As a result, the polynomial g will have computable coefficients.) Then g will meet the additional condition for Lemma 1 and the result follows from that lemma. □

Lemma 4. If f(λ) is described in the strong simulability statement, is Lipschitz continuous, and is such that f(0) = 0 and f(1) = 1 (so that 0 and 1 are in the domain of f), then f is strongly simulable.

Proof: Let M and l be as in Lemma 3.

To build g and ω, take their degree as ceil(M)+1 or greater (so that their Lipschitz constant is greater than M and each polynomial has ceil(M) + 2 coefficients), then for each polynomial, set its first coefficient as l and the last coefficient as 1. The remaining coefficients of g are set as 1 and the remaining coefficients of ω are set as 0. (As a result, the polynomial g will have computable coefficients.) Then g and ω will meet the additional conditions for Lemma 2 and the result follows from that lemma. □

Proof of Proposition 3: In the proof of proposition 2, replace Lemma 1 and Lemma 2 with Lemma 3 and Lemma 4, respectively. □

It is suspected that the conditions in Proposition 2 are necessary and sufficient for f(λ) to be strongly simulable.

### Multiple-Output Bernoulli Factory

A related topic is a Bernoulli factory that takes a coin with unknown probability of heads λ and produces one or more samples of the probability f(λ). This section calls it a multiple-output Bernoulli factory.

Obviously, any single-output Bernoulli factory can produce multiple outputs by running itself multiple times. But for some functions f, it may be that producing multiple outputs at a time may use fewer input coin flips than producing one output multiple times.

Let J be a closed interval on (0, 1), such as [1/100, 99/100]. Define the entropy bound as h(f(λ))/h(λ) where h(x) = −x*ln(x)−(1−x)*ln(1−x) is related to the Shannon entropy function. The question is:

When the probability λ can be any value in J, is there a multiple-output Bernoulli factory for f(λ) with an expected ("long-run average") number of input coin flips per sample that is arbitrarily close to the entropy bound? Call such a Bernoulli factory an optimal factory.

(See Nacu and Peres (2005, Question 2)[^1].)

So far, the following functions do admit an optimal factory:

• The functions λ and 1 − λ.
• Constants in [0, 1]. As Nacu and Peres (2005)[^1] already showed, any such constant c admits an optimal factory: generate unbiased random bits using Peres's iterated von Neumann extractor (Peres 1992)[^16], then build a binary tree that generates 1 with probability c and 0 otherwise (Knuth and Yao 1976)[^17].

It is easy to see that if an optimal factory exists for f(λ), then one also exists for 1 − f(λ): simply change all ones returned by the f(λ) factory into zeros and vice versa.

Also, as Yuval Peres (Jun. 24, 2021) told me, there is an efficient multiple-output Bernoulli factory for f(λ) = λ/2: the key is to flip the input coin enough times to produce unbiased random bits using his extractor (Peres 1992)[^12], then multiply each unbiased bit with another input coin flip to get a sample from λ/2. Given that the sample is equal to 0, there are three possibilities that can "be extracted to produce more fair bits": either the unbiased bit is 0, or the coin flip is 0, or both are 0.

This algorithm, though, doesn't count as an optimal factory, and Peres described this algorithm only incompletely. By simulation and trial and error I found an improved version of the algorithm. It uses two randomness extractors (extractor 1 and extractor 2) that produce unbiased random bits from biased data (which is done using a method given later in this section). The extractors must be asymptotically optimal (they must approach the entropy limit as closely as desired); one example is the iterated von Neumann construction in Peres (1992)[^16]. The algorithm consists of doing the following in a loop until the desired number of outputs is generated.

1. If the number of outputs generated so far is divisible by 20, do the following:
• Generate an unbiased random bit (see below). If that bit is zero, output 0, then repeat this step unless the desired number of outputs has been generated. If the bit is 1, flip the input coin and output the result.
2. Otherwise, do the following:
1. Generate an unbiased random bit (see below), call it fc. Then flip the input coin and call the result bc.
2. Output fc*bc.
3. (The following steps pass "unused" randomness to the extractor in a specific way to ensure correctness.) If fc is 0, and bc is 1, append 0 to extractor 2's input bits.
4. If fc and bc are both 0, append 1 then 1 to extractor 2's input bits.
5. If fc is 1 and bc is 0, append 1 then 0 to extractor 2's input bits.

Inspired by Peres's result with λ/2, the following algorithm is proposed. It works for any rational function of the form D(λ)/E(λ), where—

• D(λ) = ∑i = 0, ..., k λi * (1 − λ)ki * d[i],
• E(λ) = ∑i = 0, ..., k λi * (1 − λ)ki * e[i],
• every d[i] is less than or equal to the corresponding e[i], and
• each d[i] and each e[i] is a non-negative integer.

The algorithm is a modified version of the "block simulation" in Mossel and Peres (2005, Proposition 2.5)[^18], which also "extracts" residual randomness with the help of six asymptotically optimal randomness extractors. In the algorithm, let r be an integer such that, for every integer i in [0, k], e[i] < choose(k, i)*choose(2*r, r).

1. Set iter to 0.
2. Flip the input coin k times. Then build a bitstring B1 consisting of the coin flip results in the order they occurred. Let i be the number of ones in B1.
3. Generate 2*r unbiased random bits (see below). (Rather than flipping the input coin 2*r times, as in the algorithm of Proposition 2.5.) Then build a bitstring B2 consisting of the coin flip results in the order they occurred.
4. If the number of ones in B2 is other than r: Translate B1 + B2 to an integer under mapping 1, then pass that number to extractor 2, then add 1 to iter, then go to step 2.
5. Translate B1 + B2 to an integer under mapping 2, call the integer β. If β < d[i], pass β to extractor 3, then pass iter to extractor 6, then output a 1. Otherwise, if β < e[i], pass βd[i] to extractor 4, then pass iter to extractor 6, then output a 0. Otherwise, pass βe[i] to extractor 5, then add 1 to iter, then go to step 2.

The mappings used in this algorithm are as follows:

1. A one-to-one mapping between—
• bitstrings of length k + 2*r with fewer or greater than r ones among the last 2*r bits, and
• the integers in [0, 2k * (22*r − choose(2*r, r))).
2. A one-to-one mapping between—
• bitstrings of length k + 2*r with exactly i ones among the first k bits and exactly r ones among the remaining bits, and
• the integers in [0, choose(k, i)*choose(2*r, r)).

In this algorithm, an unbiased random bit is generated as follows. Let m be an even integer that is 32 or greater (in general, the greater m is, the more efficient the overall algorithm is in terms of coin flips).

1. Use extractor 1 to extract outputs from floor(n/m)*m inputs, where n is the number of input bits available to that extractor. Do the same for the remaining extractors.
2. If extractor 2 has at least one unused output bit, take an output and stop. Otherwise, repeat this step for the remaining extractors.
3. Flip the input coin at least m times, append the coin results to extractor 1's inputs, and go to step 1.

Now consider the last paragraph of Proposition 2.5. If the input coin were flipped in step 2, the probability of—

• outputting 1 in the algorithm's last step would be P1 = λr*(1−λ)r*D(λ).
• outputting either 0 or 1 in that step would be P01 = λr*(1−λ)r*E(λ),

so that the algorithm would simulate f(λ) = P1 / P01. Observe that the λr*(1−λ)r cancels out in the division. Thus, we could replace the input coin with unbiased random bits and still simulate f(λ); the λr*(1−λ)r above would then be (1/2)2*r.

While this algorithm is coin-flip-efficient, it is not believed to be an optimal factory, at least not without more work. In particular, a bigger savings of input coin flips could occur if f(λ) maps the interval J to a small range of values, so that the algorithm could, for example, generate a uniform random variate in [0, 1] using unbiased random bits and see whether it lies outside that range of values — and thus produce a sample from f(λ) without flipping the input coin again.

For example, by translating the number to input bits via Pae's entropy-preserving binarization (Pae 2018)[^19]. But correctness might depend on how this is done; after all, the number of coin flips per sample must equal or exceed the entropy bound for every λ.

### Proofs for Function Approximation Schemes

This section shows mathematical proofs for some of the approximation schemes of this page.

In the following results:

• A strictly bounded factory function means a continuous function on the closed interval [0, 1], with a minimum of greater than 0 and a maximum of less than 1.
• A function f(λ) is polynomially bounded if both f(λ) and 1−f(λ) are bounded from below by min(λn, (1−λ)n) for some integer n (Keane and O'Brien 1994)[^14].
• A modulus of continuity of a function f means a non-negative and nondecreasing function ω on the interval [0, 1], for which ω(0) = 0, and for which abs(f(x) − f(y)) ≤ ω(abs(xy)) for every x in [0, 1] and every y in [0, 1]. Loosely speaking, a modulus of continuity ω(δ) is bounded below by f's maximum range in a window of size δ.

Lemma 1. Let f(λ) be a continuous and nondecreasing function, and let Xk be a hypergeometric(2*n, k, n) random variable, where n≥1 is a constant integer and k is an integer in [0, 2*n] . Then the expected value of f(Xk/n) is nondecreasing as k increases.

Proof. This is equivalent to verifying whether Xm+1/n "dominates" Xm/n (and, obviously by extension, Xm+1 "dominates" Xm) in terms of first-degree stochastic dominance (Levy 1998)[^20]. This means that the probability that (Xm+1j) is less than or equal to that for Xm for each j in the interval [0, n]. A proof of this was given by the user "Henry" of the Mathematics Stack Exchange community[^21]. □

Lemma 6(i) of Nacu and Peres (2005)[^1] can be applied to continuous functions beyond just Lipschitz continuous functions. This includes Hölder continuous functions, namely continuous functions with no slope that's "steeper" than every "nth" root.

Lemma 2. Let f(λ) be a continuous function that maps [0, 1] to [0, 1], and let X be a hypergeometric(2*n, k, n) random variable.

1. Let ω(x) be a modulus of continuity of f. If ω is continuous and concave on [0, 1], then the expression—
abs(E[f(X/n)] − f(k/(2*n))),   (1)
is bounded from above by—
• ω(sqrt(1/(8*n−4))), for every integer n≥1 that's a power of 2,
• ω(sqrt(1/(7*n))), for every integer n≥4 that's a power of 2,
• ω(sqrt(1/(2*n))), for every integer n≥1 that's a power of 2, and
• ω(sqrt( (k/(2*n)) * (1−k/(2*n)) / (2*n−1) )), for every n≥1 that's a power of 2.
2. If f is α-Hölder continuous with Hölder constant M and with α in the interval (0, 1], then the expression (1) is bounded from above by—
• M*(1/(2*n))α/2, for every integer n≥1 that's a power of 2,
• M*(1/(7*n))α/2, for every integer n≥4 that's a power of 2, and
• M*(1/(8*n−4))α/2, for every integer n≥1 that's a power of 2.
3. If f has a second derivative whose absolute value is defined in all of [0, 1] and bounded from above by M, then the expression (1) is bounded from above by—
• (M/2)*(1/(7*n)), for every integer n≥4 that's a power of 2, and
• (M/2)*(1/(8*n−4)), for every integer n≥1 that's a power of 2.
4. If f is convex, nondecreasing, and bounded from below by 0, then the expression (1) is bounded from above by E[f(Y/n)] for every integer n≥1 that's a power of 2, where Y is a hypergeometric(2*n, n, n) random variable.

Proof.

1. ω is assumed to be non-negative because absolute values are non-negative. To prove the first and second bounds: abs(E[f(X/n)] − f(k/(2 * n))) ≤ E[abs(f(X/n) − f(k/(2 * n))] ≤ E[ω(abs(X/nk/(2 * n))] ≤ ω(E[abs(X/nk/(2 * n))]) (by Jensen's inequality and because ω is concave) ≤ ω(sqrt(E[abs(X/nk/(2 * n))]2)) = ω(sqrt(Var[X/n])) = ω(sqrt((k*(2 * nk)/(4*(2 * n−1)*n2)))) ≤ ω(sqrt((n2/(4*(2 * n−1)*n2)))) = ω(sqrt((1/(8*n−4)))) = ρ, and for every n≥4 that's an integer power of 2, ρω(sqrt(1/(7*n))). To prove the third bound: abs(E[f(X/n)] − f(k/(2 * n))) ≤ ω(sqrt(Var[X/n])) ≤ ω(sqrt(1/(2*n))). To prove the fourth bound: abs(E[f(X/n)] − f(k/(2 * n))) ≤ ω(sqrt((n2/(4*(2 * n−1)*n2)))) = ω(sqrt( (k/(2*n)) * (1−k/(2*n)) / (2*n−1) )).
2. By the definition of Hölder continuous functions, take ω(x) = M*xα. Because ω is a concave modulus of continuity on [0,1], the result follows from part 1.
3. abs(E[f(X/n)] − f(k/(2 * n))) ≤ (M/2)*Var[X/n] = (M/2)*(k*(2 * nk)/(4*(2 * n−1)*n2)) ≤ (M/2)*(n2/(4*(2 * n−1)*n2)) = (M/2)*(1/(8*n−4)) = ρ. For every integer n≥4 that's a power of 2, ρ ≤ (M/2)*(1/(7*n)).
4. Let Xm be a hypergeometric(2 * n, m, n) random variable. By Lemma 1 and the assumption that f is nondecreasing, E[f(Xk/n)] is nondecreasing as k increases, so take E[f(Xn/n)] = E[f(Y/n)] as the upper bound. Then, abs(E[f(X/n)] − f(k/(2 * n))) = abs(E[f(X/n)] − f(E[X/n])) = E[f(X/n)] − f(E[X/n]) (by Jensen's inequality, because f is convex and bounded by 0) = E[f(X/n)] − f(k/(2 * n)) ≤ E[f(X/n)] (because f is bounded by 0) ≤ E[f(Y/n)]. □

Notes:

1. E[.] means expected value ("long-run average"), and Var[.] means variance. A hypergeometric(2 * n, k, n) random variable is the number of "good" balls out of n balls taken uniformly at random, all at once, from a bag containing 2 * n balls, k of which are "good".
2. f is α-Hölder continuous if its vertical slopes, if any, are no "steeper" than that of M*λα, where α is in the interval (0, 1] and M is greater than 0. An α-Hölder continuous function on the closed interval [0, 1] is also β-Hölder continuous for any β less than α.
3. Parts 1 and 2 exploit a tighter bound on Var[X/n] than the bound given in Nacu and Peres (2005, Lemma 6(i) and 6(ii), respectively)[^1]. However, for technical reasons, different bounds are proved for different ranges of integers n.
4. For part 3, as in Lemma 6(ii) of Nacu and Peres 2005, the second derivative need not be continuous (Y. Peres, pers. comm., 2021).
5. All continuous functions that map the closed interval [0, 1] to [0, 1], including all of them that admit a Bernoulli factory, have a modulus of continuity. The proof of part 1 remains valid even if ω(0) > 0, because the bounds proved remain correct even if ω is overestimated. The following functions have a simple ω that satisfies the lemma:
1. If f is monotone increasing and convex, ω(x) can equal f(1) − f(1−x) (Gal 1990)[^22]; (Gal 1995)[^23].
2. If f is monotone decreasing and convex, ω(x) can equal f(0) − f(x) (Gal 1990)[^22]; (Gal 1995)[^23].
3. If f is monotone increasing and concave, ω(x) can equal f(x) − f(0) (by symmetry with 2).
4. If f is monotone decreasing and concave, ω(x) can equal f(1−x) − f(1) (by symmetry with 1).
5. If f is concave and is monotone increasing then monotone decreasing, then ω(h) can equal (f(min(h, σ))+(f(1−min(h, 1−σ))−f(1)), where σ is the point where f stops increasing and starts decreasing (Anastassiou and Gal 2012)[^24].

Theorem 1. Let ω(x) be as described in part 1 of Lemma 2, and let f(λ) be a strictly bounded factory function. Let—

η(n) = η(2m) = ∑k=m, m+1,... φ(2k),

for every integer n≥1 that's a power of 2 (with n=2m), where φ(n) is either—

• ω(sqrt(1/(8*n−4))), or
• ω(sqrt(1/(2*n))).

If the infinite series η(n) converges, then the following approximation scheme for f(λ) is valid in the following sense: By forming two sequences of polynomials in Bernstein form with coefficients fabove(n, k) for the upper polynomials, and fbelow(n, k) for the lower polynomials, then those polynomials meet conditions (i), (iii), and (iv) of Proposition 3 of Nacu and Peres (2005)[^1], for every integer n≥1 that's a power of 2, by defining fabove and fbelow as follows:

• fbelow(n, k) = f(k/n) − η(n).
• fabove(n, k) = f(k/n) + η(n).

Except that the following bounding note applies: If fabove(n, k) > 1 for a given n and some k, fabove(n, k) = 1 instead for that n, and if fbelow(n, k) < 0 for a given n and some k, fbelow(n, k) = 0 instead for that n.

Proof. Follows from part 1 of Lemma 2 above as well as Remark B and the proof of Proposition 10 of Nacu and Peres (2005)[^1].

For the series η(n) in the theorem, each term of the series is nonnegative making the series nonnegative and, by the assumption that the series converges, η(n) is nonincreasing with increasing n.

Condition (i) says that the coefficients fbelow and fabove must be bounded by 0 and 1. This is ensured starting with a large enough value of n greater than 0, call it n0, as shown next.

Let ε be a positive distance between 0 and the minimum or between 1 and the maximum of f, whichever is smaller. This ε exists by the assumption that f is bounded away from 0 and 1. Because the series η converges, η(n) will eventually stay less than ε. And the f(k/n) term is bounded by the minimum and maximum of f by construction. This combined means that the coefficients fbelow and fabove will eventually be bounded by 0 and 1 for every integer n starting with n0.

For n less than n0, condition (i) is ensured by setting fbelow and fabove to 0 or 1, respectively, whenever a coefficient of the degree-n polynomial would otherwise be outside the bounds.

Condition (iii) says that the upper polynomials must converge to f and so must the lower polynomials. This is ensured in a similar way as in Proposition 10, as well as by the assumption that the series converges: as n goes to infinity, η(n) goes to 0 so that the coefficients, and thus the polynomials, converge to f. For n less than n0, the values of fbelow and fabove can be 0 or 1 without affecting convergence.

Condition (iv) is the consistency requirement described earlier in this page. This is ensured as in Proposition 10 by bounding, from below, the offset by which to shift the approximating polynomials. This lower bound is η(n), a solution to the equation 0 = η(n) − η(2 * n) − φ(n) (see note below), where φ can take on either form given in the theorem. The solution given in the theorem is easy to prove by noting that this is a recursive process: we start by calculating the series for n = 2*n, then adding φ(n) to it, in effect working backwards and recursively, and we can easily see that we can calculate the series for n = 2*n only if the series converges, hence the assumption of a converging series. For n0, the consistency requirement is maintained by noting that the degree-n0 polynomial's coefficients must be bounded by 0 and 1 by condition (i) so they will likewise be bounded by those of the lower and upper polynomials of degree less than n0, and those polynomials are the constant 0 and the constant 1, respectively, as are their coefficients. □

Note: There is only one solution η(n) in the case at hand. Unlike so-called functional equations and linear recurrences, with a solution that varies depending on the starting value, there is only one solution in the case at hand, namely the solution that makes the series converge, if it exists at all. Alternatively, the equation can be expanded to 0 = η(n) − η(4 * n) − φ(2*n) − φ(n) = η(n) − η(8 * n) − φ(4*n) − φ(2*n) − φ(n) = ...

Corollary 1. Let f(λ) be a strictly bounded factory function. If f is α-Hölder continuous with Hölder constant M and with α in the interval (0, 1], then the following approximation scheme determined by fbelow and fabove is valid in the sense of Theorem 1, subject to the bounding note:

• fbelow(n, k) = f(k/n) − δ(n).
• fabove(n, k) = f(k/n) + δ(n).

Where D(n) = M/((2α/2−1)*nα/2).

Proof. Follows from Theorem 1 by using the ω given in part 2 of Lemma 2, and by using φ(n) = ω(sqrt(1/(2*n))). □

Note: For specific values of α, the equation D(n) = D(2 * n) + φ(n) can be solved via linear recurrences; an example for α = 1/2 is the following code in Python that uses the SymPy computer algebra library: rsolve(Eq(f(n),f(n+1)+z*(1/(2*2**n))**((S(1)/2)/2)),f(n)).subs(n,ln(n,2)).simplify(). Trying different values of α suggested the following formula for Hölder continuous functions with α of 1/j or greater: (M* ∑i = 0,...,(j*2)−1 2i/(2*j))/n1/(2*j) = M / ((21/(2*j)−1)*n1/(2*j)); and generalizing the latter expression led to the term in the theorem.

Corollary 2. Let f(λ) be a strictly bounded factory function. If f is Lipschitz continuous with Lipschitz constant M, then the following approximation scheme determined by fbelow and fabove is valid in the sense of Theorem 1, subject to the bounding note:

• fbelow(n, k) = f(k/n) − M/((sqrt(2)−1)*sqrt(n)).
• fabove(n, k) = f(k/n) + M/((sqrt(2)−1)*sqrt(n)).

Proof. Because Lipschitz continuous functions are 1-Hölder continuous with Hölder constant M, the result follows from Corollary 1. □

Note: This special case of Theorem 1 was already found by Nacu and Peres (2005)[^1].

Theorem 2. Let f(λ) be a strictly bounded factory function, and let ω(x) be as described in Theorem 1. Theorem 1 remains valid with the following versions of φ(n), fbelow, and fabove, rather than as given in that theorem, subject to the bounding note:

• φ(n) = ω(sqrt(1/(7*n))).
• fbelow(n, k) = min(fbelow(4,0), fbelow(4,1), ..., fbelow(4,4)) if n < 4; otherwise, f(k/n) − η(n).
• fabove(n, k) = max(fabove(4,0), fabove(4,1), ..., fabove(4,4)) if n < 4; otherwise, f(k/n) + η(n).

Proof. Follows from Theorem 1 and part 1 of Lemma 2 above, as well as Remark B and the proof of Proposition 10 of Nacu and Peres, including the observation in Remark B of the paper that we can start the algorithm from n = 4; in that case, the upper and lower polynomials of degree 1 through 3 above would be constant functions, so that as polynomials in Bernstein form, the coefficients of each one would be equal. With the φ given in this theorem, the series η(n) in Theorem 1 remains nonnegative; also, this theorem adopts Theorem 1's assumption that the series converges, so that η(n) still decreases with increasing n. □

Corollary 3. Let f(λ) be a strictly bounded factory function. If f is α-Hölder continuous with Hölder constant M and with α in the interval (0, 1], then the following approximation scheme is valid in the sense of Theorem 1, subject to the bounding note:

• fbelow(n, k) = min(fbelow(4,0), fbelow(4,1), ..., fbelow(4,4)) if n < 4; otherwise, f(k/n) − η(n).
• fabove(n, k) = max(fabove(4,0), fabove(4,1), ..., fabove(4,4)) if n < 4; otherwise, f(k/n) + η(n).

Where η(n) = M*(2/7)α/2/((2α/2−1)*nα/2).

Proof. Follows from Theorem 2 by using the ω given in part 2 of Lemma 2. □

Theorem 3. Let f(λ) be a strictly bounded factory function. If f has a second derivative whose absolute value is defined in all of [0, 1] and bounded from above by M, then the following approximation scheme is valid in the sense of Theorem 1, subject to the bounding note:

• fbelow(n, k) = min(fbelow(4,0), fbelow(4,1), ..., fbelow(4,4)) if n < 4; otherwise, f(k/n) − M/(7*n).
• fabove(n, k) = max(fabove(4,0), fabove(4,1), ..., fabove(4,4)) if n < 4; otherwise, f(k/n) + M/(7*n).

Proof. Because (M/2)*(1/(7*n)) in part 3 of Lemma 2 is bounded the same way as the statement ω(sqrt(1/(7*n))) in Theorem 2 and part 1 of Lemma 2, take ω(n) as (M/2)*(1/(7*n)). Then the result follows from Theorem 2. □

Theorem 4. Let f(λ) be a strictly bounded factory function. If f is convex and nondecreasing, then Theorem 1 remains valid with φ(n) = E[f(Y/n)] (where Y is a hypergeometric(2*n, n, n) random variable), rather than as given in that theorem.

Proof. Follows from Theorem 1 and part 4 of Lemma 2 above. With the φ given in this theorem, the series η(n) in Theorem 1 remains nonnegative; also, this theorem adopts Theorem 1's assumption that the series converges, so that η(n) still decreases with increasing n. □

Proposition 1.

1. Let f be as given in any of Theorems 1 through 4, except that f must be concave and polynomially bounded and may have a minimum of 0. Then the approximation scheme of that theorem remains valid if fbelow(n, k) = f(k/n), rather than as given in that theorem.
2. Let f be as given in any of Theorems 1 through 4, except that f must be convex and polynomially bounded and may have a maximum of 1. Then the approximation scheme of that theorem remains valid if fabove(n, k) = f(k/n), rather than as given in that theorem.
3. Theorems 1 through 4 can be extended to all integers n≥1, not just those that are powers of 2, by defining—

• fbelow(n, k) = (k/n)*fbelow(n−1, max(0, k−1)) + ((n−k)/n)*fbelow(n−1, min(n−1, k)), and
• fabove(n, k) = (k/n)*fabove(n−1, max(0, k−1)) + ((n−k)/n)*fabove(n−1, min(n−1, k)),

for every integer n≥1 other than a power of 2. Parts 1 and 2 of this proposition still apply to the modified scheme.

Proof. Parts 1 and 2 follow from Theorems 1 through 4, as the case may be. For part 1, the lower polynomials are replaced by the degree-n Bernstein approximations of f, and they meet the conditions in those theorems by Jensen's inequality. For part 2, the upper polynomials are involved instead of the lower polynomials. Part 3 also follows from Remark B of Nacu and Peres (2005)[^1]. □

### Example of Approximation Scheme

The following example uses the results above to build an approximation scheme for a factory function.

Let f(λ) = 0 if λ is 0, and (ln(λ/exp(3)))−2 otherwise. Then the following approximation scheme is valid in the sense of Theorem 1:

• η(k) = Φ(1, 2, (ln(k)+ln(7)+6)/ln(2))*4/ln(2)2.
• fbelow(n, k) = f(k/n).
• fabove(n, k) = max(fabove(4,0), fabove(4,1), ..., fabove(4,4)) if n < 4; otherwise, f(k/n) + η(n).

Where Φ(.) is a function called the Lerch transcendent, and fabove is subject to Theorem 1's bounding note.

Notice that the function f is not Hölder continuous; its slope is exponentially steep at the point 0.

The first step is to find a concave modulus of continuity of f (called ω(h)). Because f is monotone increasing and concave, and because f(0) = 0, we can take ω(h) = f(h).

Now, by plugging sqrt(1/(7*n)) into ω, we get the following for Theorem 2 (assuming n≥0):

• φ(n) = 1/(ln(sqrt(7/n)/7)−3)2.

Now, by applying Theorem 1, we compute η(k) by substituting n with 2n, summing over [k, ∞), and substituting k with log2(k). η converges, resulting in:

• η(k) = Φ(1, 2, (ln(k)+ln(7)+6)/ln(2))*4/ln(2)2,

where Φ(.) is the Lerch transcendent. This η matches the η given in the scheme above. That scheme then follows from Theorems 1 and 2, as well as from part 1 of Proposition 1 because f is concave.

the following code in Python that uses the SymPy computer algebra library is an example of finding the parameters for this approximation scheme.

px=Piecewise((0,Eq(x,0)),((ln(x/exp(3))**-2),True))
# omega is modulus of continuity.  Since
# px is monotone increasing, concave, and px(0)=0,
# we can take omega as px
omega=px
omega=piecewise_fold(omega.rewrite(Piecewise)).simplify()
# compute omega
phi=omega.subs(x,sqrt(1/(7*n)))
pprint(phi)
# compute eta
eta=summation(phi.subs(n,2**n),(n,k,oo)).simplify()
pprint(eta)
for i in range(20):
# Calculate upper bounds for eta at certain points.
try:
print("eta(2^%d) ~= %s" % (i,ceiling(eta.subs(k,i)*10000000).n()/10000000))
except:
print("eta(2^%d) ~= [FAILED]" % (i))