# Supplemental Notes for Bernoulli Factory Algorithms

Peter Occil

## Definitions

This section describes certain math terms used on this page for programmers to understand.

The following terms can describe a function $f(x)$, specifically how "well-behaved" $f$ is — which can be important when designing Bernoulli factory algorithms. This page mostly cares how $f$ behaves when its domain is the interval [0, 1], that is, when $0 \le x \le 1$.

• If $f$ is continuous, its derivative is, roughly speaking, its "slope" or "velocity" or "instantaneous-rate-of-change" function. The derivative (or first derivative) is denoted as $f'$. The second derivative ("slope-of-slope") of $f$, denoted $f''$, is the derivative of $f'$; the third derivative is the derivative of $f''$; and so on.
• A Hölder continuous function (with M being the Hölder constant and α being the Hölder exponent) is a continuous function f such that f(x) and f(y) are no more than M*δα apart whenever x and y are in the function's domain and no more than δ apart.
Roughly speaking, the function's "steepness" is no greater than that of M*xα.
The function also admits a Hölder exponent β such that 0 < β < α.
• A Lipschitz continuous function with constant L (the Lipschitz constant) is Hölder continuous with Hölder exponent 1 and Hölder constant L.
Roughly speaking, the function's "steepness"_ is no greater than that of L*x.
If the function has a derivative on its domain, L is the maximum absolute value of that derivative.
• A convex function $f$ has the property that $f((x+y)/2) \le (f(x)+f(y))/2$ whenever $x$, $y$, and $(x+y)/2$ are in the function's domain.
Roughly speaking, if the function's "slope" never goes down, then it's convex.
• A concave function $f$ has the property that $f((x+y)/2) \ge (f(x)+f(y))/2$ whenever $x$, $y$, and $(x+y)/2$ are in the function's domain.
Roughly speaking, if the function's "slope" never goes up, then it's concave.

This is an open-source document; for an updated version, see the source code or its rendering on GitHub. You can send comments on this document on the GitHub issues page. See "Open Questions on the Bernoulli Factory Problem" for a list of things about this document that I seek answers to.

My audience for this article is computer programmers with mathematics knowledge, but little or no familiarity with calculus. It should be read in conjunction with the article "Bernoulli Factory Algorithms".

I encourage readers to implement any of the algorithms given in this page, and report their implementation experiences. In particular, I seek comments on the following aspects:

• Are the algorithms in this article (in conjunction with "Bernoulli Factory Algorithms") easy to implement? Is each algorithm written so that someone could write code for that algorithm after reading the article?
• Does this article have errors that should be corrected?
• Are there ways to make this article more useful to the target audience?

Comments on other aspects of this document are welcome.

## General Factory Functions

As a reminder, the Bernoulli factory problem is: We're given a coin that shows heads with an unknown probability, λ, and the goal is to use that coin (and possibly also a fair coin) to build a "new" coin that shows heads with a probability that depends on λ, call it f(λ).

The algorithm for general factory functions, described in my main article on Bernoulli factory algorithms, works by building randomized upper and lower bounds for a function f(λ), based on flips of the input coin. Roughly speaking, the algorithm works as follows:

1. Generate a random variate, U, uniformly distributed, greater than 0 and less than 1.
2. Flip the input coin, then build an upper and lower bound for f(λ), based on the outcomes of the flips so far.
3. If U is less than or equal to the lower bound, return 1. If U is greater than the upper bound, return 0. Otherwise, go to step 2.

These randomized upper and lower bounds come from two sequences of polynomials as follows:

1. One sequence approaches the function f(λ) from above, the other from below, and both sequences must converge to f.
2. For each sequence, the first polynomial has degree 1 (so is a linear function), and each other polynomial's degree is 1 higher than the previous.
3. The consistency requirement must be met: The difference—

• between the degree-(n−1) upper polynomial and the degree-n upper polynomial, and
• between the degree-n lower polynomial and the degree-(n−1) lower polynomial,

must have nonnegative coefficients, once the polynomials are rewritten in Bernstein form and elevated to degree n.

The consistency requirement ensures that the upper polynomials "decrease" and the lower polynomials "increase". Unfortunately, the reverse is not true in general; even if the upper polynomials "decrease" and the lower polynomials "increase" to f, this does not ensure the consistency requirement by itself. Examples of this fact are shown in the section "Failures of the Consistency Requirement" in the appendix.

In this document, fbelow(n, k) and fabove(n, k) mean the kth coefficient for the lower or upper degree-n polynomial in Bernstein form, respectively, where 0 ≤ kn is an integer.

### Building the Lower and Upper Polynomials

A factory function f(λ) is a function for which the Bernoulli factory problem can be solved (see "About Bernoulli Factories"). The following are ways to build sequences of polynomials that appropriately converge to a factory function if that function meets certain conditions. It would be helpful to plot that factory function using a computer algebra system to see if it belongs to any of the classes of functions described below.

Concave functions. If f is concave on the interval [0, 1], then fbelow(n, k) can equal f(k/n), thanks to Jensen's inequality. One example is f(λ) = 1− λ2.

Convex functions. If f is convex on the interval [0, 1], then fabove(n, k) can equal f(k/n), thanks to Jensen's inequality. One example is f(λ) = exp(−λ/4).

Hölder and Lipschitz continuous functions. I have found a way to extend the results of Nacu and Peres (2005)[^1] to certain functions with a slope that tends to a vertical slope. The following scheme, proved in the appendix, implements fabove and fbelow if f(λ)—

• is Hölder continuous on [0, 1], with Hölder constant m and Hölder exponent α (see "Definitions"), and
• in the interval [0, 1]—
• has a minimum of greater than 0 and a maximum of less than 1, or
• is convex and has a minimum of greater than 0, or
• is concave and has a maximum of less than 1.

Finding m and α is non-trivial in general. But assuming m and α are known, then for every integer n that's a power of 2:

• D(n) = m*(2/7)α/2/((2α/2−1)*nα/2).
• fbelow(n, k) = f(k/n) if f is concave; otherwise, min(fbelow(4,0), fbelow(4,1), ..., fbelow(4,4)) if n < 4; otherwise, f(k/n) − D(n).
• fabove(n, k) = f(k/n) if f is convex; otherwise, max(fabove(4,0), fabove(4,1), ..., fabove(4,4)) if n < 4; otherwise, f(k/n) + D(n).

Note:

1. Some factory functions are not Hölder continuous for any Hölder exponent greater than 0. These functions have a slope that's steeper than every "nth" root, and can't be handled by this method. One example is f(λ) = 1/10 if λ is 0 and −1/(2*ln(λ/2)) + 1/10 otherwise, which has a slope near 0 that's steeper than every "nth" root.
2. If the factory function has a Hölder exponent of 1 (and so is Lipschitz continuous), D(n) can be m*322613/(250000*sqrt(n)), which is an upper bound.
3. If the factory function's Hölder exponent is 1/2 or greater, D(n) can be m*154563/(40000*n1/4), which is an upper bound.

Functions with a Lipschitz continuous derivative. The following method, proved in the appendix, implements fabove and fbelow if f(λ)—

• has a Lipschitz continuous derivative (see "Definitions"), and
• in the interval [0, 1]—
• has a minimum of greater than 0 and a maximum of less than 1, or
• is convex and has a minimum of greater than 0, or
• is concave and has a maximum of less than 1.

Let m be the Lipschitz constant of f's derivative, or a greater number than that constant (if f has a second derivative on the domain [0, 1], then m can be the maximum absolute value of that second derivative). Then for every integer n that's a power of 2:

• fbelow(n, k) = f(k/n) if f is concave; otherwise, min(fbelow(4,0), fbelow(4,1), ..., fbelow(4,4)) if n < 4; otherwise, f(k/n) − m/(7*n).
• fabove(n, k) = f(k/n) if f is convex; otherwise, max(fabove(4,0), fabove(4,1), ..., fabove(4,4)) if n < 4; otherwise, f(k/n) + m/(7*n).

My GitHub repository includes SymPy code for a method, c2params, to calculate the necessary values for m and the bounds of these polynomials, given a factory function.

Examples:

1. Take f(λ) = exp(−λ). This is a convex function, and its derivative is Lipschitz continuous with Lipschitz constant 1. Then it can be shown that the following scheme for f is valid (the value 3321/10000 is slightly greater than M − 1/(7*4), where M is the minimum of f on its domain):

• fbelow(n, k) = 3321/10000 if n<4; otherwise, f(k/n) − 1/(7*n). (Observe that f(k/4) − 1/(7*4) ≥ 3321/10000.)
• fabove(n, k) = f(k/n) (because f is convex).
2. Take f(λ) = λ/2 if λ ≤ 1/2; (4*λ − 1)/(8*λ) otherwise. This function is concave, and its derivative is Lipschitz continuous with Lipschitz constant 2. Then it can be shown that the following scheme for f is valid (the value 893/2000 is slightly greater than M + 2/(7*4), where M is the maximum of f on its domain):

• fbelow(n, k) = f(k/n).
• fabove(n, k) = 893/2000 if n<4; otherwise, f(k/n) + 2/(7*n).

Certain functions that equal 0 at 0. This approach involves transforming the function f so that it no longer equals 0 at the point 0. This can be done by dividing f by a function (h(λ)) that "dominates" f at every point in the interval [0, 1]. Unlike for the original function, there might be a polynomial-building scheme described earlier in this section for the transformed function.

More specifically, h(λ) must meet the following requirements:

• h(λ) is continuous on the closed interval [0, 1].
• h(0) = 0. (This is required to ensure correctness in case λ is 0.)
• 1 ≥ h(1) ≥ f(1) ≥ 0.
• 1 > h(λ) > f(λ) > 0 for every λ in the open interval (0, 1).
• If f(1) = 0, then h(1) = 0. (This is required to ensure correctness in case λ is 1.)

Also, h should be a function with a simple Bernoulli factory algorithm. For example, h can be a polynomial in Bernstein form of degree n whose n plus one coefficients are [0, 1, 1, ..., 1]. This polynomial is easy to simulate using the algorithms from the section "Certain Polynomials".

The algorithm is now described.

Let g(λ) = limνλ f(ν)/h(ν) (roughly speaking, the value that f(ν)/h(ν) approaches as ν approaches λ.) If—

• f(0) = 0 and f(1) < 1, and
• g(λ) is continuous on [0, 1] and belongs in one of the classes of functions given earlier,

then f can be simulated using the following algorithm:

1. Run a Bernoulli factory algorithm for h. If the call returns 0, return 0. (For example, if h(λ) = λ, then this step amounts to the following: "Flip the input coin. If it returns 0, return 0.")
2. Run a Bernoulli factory algorithm for g(.) and return the result of that algorithm. This can be one of the general factory function algorithms if there is a way to calculate polynomials that converge to g(.) in a manner needed for that algorithm (for example, if g is described earlier in this section).

Notes:

1. It may happen that g(0) = 0. In this case, step 2 of this algorithm can involve running this algorithm again, but with new g and h functions that are found based on the current g function. See the second example below.
2. If—

• f is strictly increasing,
• h(λ) = λ, and
• f′(λ), the first derivative of f, is continuous on [0, 1], maps (0, 1) to (0, 1), and belongs in one of the classes of functions given earlier,

then step 2 can be implemented by taking g as f′, except: (A) a uniform random variate, greater than 0 and less than 1, is generated at the start of the step; (B) instead of flipping the input coin as normal during that step, a different coin is flipped that does the following: "Flip the input coin, then sample from the number u. Return 1 if both the call and the flip return 1, and return 0 otherwise."
This is the "integral method" of Flajolet et al. (2010)[^2] (the modified step 2 simulates 1/λ times the integral of f.).

Examples:

1. If f(λ) = (sinh(λ)+cosh(λ)−1)/4, then f is less than or equal to h(λ) = λ, so g(λ) is 1/4 if λ = 0, and (exp(λ) − 1)/(4*λ) otherwise. The following code in Python that uses the SymPy computer algebra library computes this example: fx = (sinh(x)+cosh(x)-1)/4; h = x; pprint(Piecewise((limit(fx/h,x,0), Eq(x,0)), ((fx/h).simplify(), True))).
2. If f(λ) = cosh(λ) − 1, then f is less than or equal to h(λ) = λ, so g(λ) is 0 if λ = 0, and (cosh(λ)−1)/λ otherwise. Now, since g(0) = 0, find new functions g and h based on the current g. The current g is less than or equal to H(λ) = λ*3*(2−λ)/5 (a degree-2 polynomial that in Bernstein form has coefficients [0, 6/10, 6/10]), so G(λ) = 5/12 if λ = 0, and −(5*cosh(λ) − 5)/(3*λ2*(λ−2)) otherwise. G is bounded away from 0 and 1, resulting in the following algorithm:

1. (Simulate h.) Flip the input coin. If it returns 0, return 0.
2. (Simulate H.) Flip the input coin twice. If both flips return 0, return 0. Otherwise, with probability 4/10 (that is, 1 minus 6/10), return 0.
3. Run a Bernoulli factory algorithm for G (which might involve building polynomials that converge to G, noticing that G's derivative is Lipschitz continuous) and return the result of that algorithm.

Certain functions that equal 0 at 0 and 1 at 1. Let f, g, and h be functions as defined earlier, except that f(0) = 0 and f(1) = 1. Define the following additional functions:

• ω(λ) is a function that meets the following requirements:
• ω(λ) is continuous on the closed interval [0, 1].
• ω(0) = 0 and ω(1) = 1.
• 1 > f(λ) > ω(λ) > 0 for every λ in the open interval (0, 1).
• q(λ) = limνλ ω(ν)/h(ν).
• r(λ) = limνλ (1−g(ν))/(1−q(ν)).

Roughly speaking, ω is a function that bounds f from below, just as h bounds f from above. ω should be a function with a simple Bernoulli factory algorithm, such as a polynomial in Bernstein form. If both ω and h are polynomials of the same degree, q will be a rational function with a relatively simple Bernoulli factory algorithm (see "Certain Rational Functions").

Now, if r(λ) is continuous on [0, 1], then f can be simulated using the following algorithm:

1. Run a Bernoulli factory algorithm for h. If the call returns 0, return 0. (For example, if h(λ) = λ, then this step amounts to the following: "Flip the input coin. If it returns 0, return 0.")
2. Run a Bernoulli factory algorithm for q(.). If the call returns 1, return 1.
3. Run a Bernoulli factory algorithm for r(.), and return 1 minus the result of that call. The Bernoulli factory algorithm can be one of the general factory function algorithms if there is a way to calculate polynomials that converge to r(.) in a manner needed for that algorithm (for example, if r is described earlier in this section).

Note: Quick proof: Rewrite $f=h\cdot(q\cdot1+(1-q)\cdot(1-r))+(1-h)\cdot0$.

Example: If f(λ) = (1−exp(λ))/(1−exp(1)), then f is less than or equal to h(λ) = λ, and greater than or equal to ω(λ) = λ2. As a result, q(λ) = λ, and r(λ) = (2 − exp(1))/(1 − exp(1)) if λ = 0; 1/(exp(1)−1) if λ = 1; and (−λ*(1 − exp(1)) − exp(λ) + 1)/(λ*(1 − exp(1))*(λ − 1)) otherwise. This can be computed using the following code in Python that uses the SymPy computer algebra library: fx=(1-exp(x))/(1-exp(1)); h=x; omega=x**2; q=(omega/h); r=(1-fx/h)/(1-q); r=Piecewise((limit(r, x, 0), Eq(x,0)), (limit(r,x,1),Eq(x,1)), (r,True)).simplify(); pprint(r).

Other functions that equal 0 or 1 at the endpoints 0 and/or 1. If f does not fully admit a polynomial-building scheme under the convex, concave, Lipschitz derivative, and Hölder classes:

If f(0) = And f(1) = Method
> 0 and < 1 1 Use the algorithm for certain functions that equal 0 at 0, but with f(λ) = 1 − f(1−λ).
Inverted coin: Instead of the usual input coin, use a coin that does the following: "Flip the input coin and return 1 minus the result."
Inverted result: If the overall algorithm would return 0, it returns 1 instead, and vice versa.
> 0 and < 1 0 Algorithm for certain functions that equal 0 at 0, but with f(λ) = f(1−λ). (For example, cosh(λ)−1 becomes cosh(1−λ)−1.)
Inverted coin.
1 0 Algorithm for certain functions that equal 0 at 0 and 1 at 1, but with f(λ) = 1−f(λ).
Inverted result.
1 > 0 and ≤ 1 Algorithm for certain functions that equal 0 at 0, but with f(λ) = 1−f(λ).
Inverted result.

Specific functions. My GitHub repository includes SymPy code for a method, approxscheme2, to build a polynomial-building scheme for certain factory functions.

## Approximate Bernoulli Factories

An approximate Bernoulli factory for a function f(λ) is a Bernoulli factory algorithm that simulates another function, g(λ), that approximates f in some sense.

Usually g is a polynomial, but can also be a rational function (ratio of polynomials) or another function with an easy-to-implement Bernoulli factory algorithm.

Meanwhile, f(λ) can be any function that maps the closed interval [0, 1] to itself, even if it isn't continuous or a factory function (examples include the "step function" 0 if λ < 1/2 and 1 otherwise, or the function 2*min(λ, 1 − λ)). Continuous functions on [0, 1] can be approximated arbitrarily well by an approximate Bernoulli factory (as a result of the so-called "Weierstrass approximation theorem"), but this is not the case in general for discontinuous functions.

To build an approximate Bernoulli factory with a polynomial:

1. First, find a polynomial in Bernstein form of degree n that is close to the desired function f(λ).

The simplest choice for this polynomial, known simply as a Bernstein polynomial, has n+1 coefficients and its jth coefficient (starting at 0) is found as f(j/n). For this choice, if f is continuous, the polynomial can be brought arbitrarily close to f by choosing n high enough.

Whatever polynomial is used, the polynomial's coefficients must all lie in [0, 1].

2. Then, use one of the algorithms in the section "Certain Polynomials" to toss heads with probability equal to that polynomial, given its coefficients.

Note: Bias and variance are the two sources of error in a randomized estimation algorithm. Let g(λ) be an approximation of f(λ). The original Bernoulli factory for f, if it exists, has bias 0 and variance f(λ)*(1−f(λ)), but the approximate Bernoulli factory has bias g(λ) − f(λ) and variance g(λ)*(1−g(λ)). ("Variance reduction" methods are outside the scope of this document.) An estimation algorithm's mean squared error equals variance plus square of bias.

### Approximate Bernoulli Factories for Certain Functions

This section first discusses approximating $f$ with a Bernstein polynomial (a degree-$n$ polynomial in Bernstein form with coefficients $f(k/n)$ with $0\le k\le n$). The advantage is only one Bernstein coefficient has to be found per run; the disadvantage is that Bernstein polynomials approach $f$ slowly in general, in the order of $1/n$ (Voronovskaya 1932)[^3].

There are results that give an upper bound on the error on approximating f with a degree-n Bernstein polynomial. To find a degree n such that f is approximated with a maximum error of ε, solve the error bound's equation for n, then take n = ceil(n) to get the solution if it's an integer, or the nearest integer that's bigger than the solution.

For example:

If f(λ): Then the degree-n Bernstein polynomial is close to $f$ with the following error bound: Where n is: Notes
Has Lipschitz continuous derivative (see "Definitions"). ε = M/(8*n). n = ceil(M/(8*ε)). Lorentz (1966)[^4]. M is the derivative's Lipschitz constant.
Hölder continuous with constant M and exponent α. ε = M*(1/(4*n))α/2. n = ceil(1/(4α*ε2/M2)1/α). Mathé (1999)[^5]. 0 < α ≤ 1.
Lipschitz continuous with constant L. ε = L*sqrt(1/(4*n)). n = ceil(L2/(4*ε2)). Special case of previous entry.

Now, if f belongs to any of the classes given above, the following algorithm (adapted from "Certain Polynomials") simulates a polynomial that approximates f with a maximum error of ε:

1. Calculate n as described in the table above for the given class.
2. Flip the input coin n times, and let j be the number of times the coin returned 1 this way.
3. With probability f(j/n), return 1. Otherwise, return 0. (If f(j/n) can be an irrational number, see "Algorithms for General Irrational Constants" for ways to sample this irrational probability exactly.)

Alternatively, polynomials other than Bernstein polynomials, but written in Bernstein form, can be used to approximate $f$ with an error no more than $\epsilon$, as long as an explicit upper bound on the approximation error is available. A ratio of two such polynomials can also approximate $f$ this way. See my question on MathOverflow.

An example is given by the iterated Bernstein polynomial construction discussed in Micchelli (1973)[^6] and Guan (2009)[^7]. Let $B_n(f(\lambda))$ be the ordinary Bernstein polynomial for $f(\lambda)$. Then—

• the order-2 iterated Bernstein polynomial of degree $n$ is $U_{n,2} = B_n(W_{n,2})$, where $W_{n,2} = 2 f(\lambda) - B_n(f(\lambda))$, and
• the order-3 iterated Bernstein polynomial of degree $n$ is $U_{n,3} = B_n(W_{n,3})$, where $W_{n,3} = B_n(B_n(f(\lambda))) + 3 (f(\lambda) - B_n(f(\lambda)))$

(Güntürk and Li 2021, sec. 3.3)[^8]. The goal is now to find a degree $n$ such that—

1. the iterated polynomial is within $\epsilon$ of $f(\lambda)$, and
2. the polynomial $W_{n,i}$ is not less than 0 or greater than 1.

By analyzing the proof of Theorem 3.3 of the paper just cited, the following error bounds appear to be true. In the table below, Mn is not less than the so-called $C^n$ norm. Unfortunately, the $C^n$ norm is defined differently in different academic works, and the bounds are sensitive to how that norm is defined.[^9]

If f(λ): Then the following polynomial: Is close to f with the following error bound: Where n is:
Has continuous third derivative. $U_{n,2}$ ε = 0.3489*M3/n3/2. n=ceil((0.3489)2/3*(M4/ε)2/3) < ceil((49561/100000)*(M/ε)2/3).
Has continuous fourth derivative. $U_{n,2}$ ε = 0.275*M4/n2. n=ceil(sqrt(0.275)*sqrt(M4/ε)) < ceil((52441/100000)*sqrt(M/ε)).
Has continuous fifth derivative. $U_{n,3}$ ε = 0.7284*M5/n5/2. n=ceil((0.7284)2/5*(M5/ε)2/5) < ceil((88095/100000)*(M/ε)2/5).
Has continuous sixth derivative. $U_{n,3}$ ε = 0.9961*M6/n3. n=ceil((0.9961)1/3*(M6/ε)1/3) < ceil((99870/100000)*(M/ε)1/3).

However, unlike with ordinary Bernstein polynomials, the polynomial $W$ (and thus $U$) is not necessarily bounded by 0 and 1. The following process can be used to calculate the required degree $n$, given an error tolerance of $\epsilon$.

1. Determine whether $f$ is described in the table above. Let A be the minimum of $f$ on [0, 1] and let B be the maximum of $f$ there.
2. If 0 < AB < 1, calculate $n$ as given in the table above, but with $\epsilon=\min(\epsilon, A, 1-B)$, and stop.
3. Propositions B1, B2, and B3 in the appendix give conditions on $f$ so that $W_{n,2}$ or $W_{n,3}$ (as the case may be) will be nonnegative. If B is less than 1 and any of those conditions is met, calculate $n$ as given in the table above, but with $\epsilon=\min(\epsilon, 1-B)$. (For B3, set $n$ to max($n$, $m$), where $m$ is given in that proposition.) Then stop; $W$ will now be bounded by 0 and 1.
4. Calculate $n$ as given in the table above. Then, if $W_{n,i}(j/n)\lt 0$ or $W_{n,i}(j/n)\gt 1$ for some $0\le j\le n$, double the value of $n$ until this condition is no longer true.

Once n is found, simulating the iterated polynomial is as follows:

1. Flip the input coin n times, and let j be the number of times the coin returned 1 this way.
2. With probability $W_{n,2}(j/n)$ or $W_{n,3}(j/n)$ (as the case may be), return 1. Otherwise, return 0.

Notes:

1. Providing the full proof for the error bounds shown in the table is a bit tedious, so here is a sketch. The proof was found by analyzing Theorem 3.3 of Güntürk and Li (2021)[^8], finding upper bounds for so-called "central moments" of the binomial distribution (see B4 to B7 in the appendix), then plugging them in to various estimates mentioned in that theorem's proof. The most significant estimate in that theorem is denoted $(B_n-I)^{\lceil (r+1)/2 \rceil}(f)$, which in this case is the error when approximating $f$ using an iterated Bernstein polynomial, when $f$ has a continuous $(r+1)$-th derivative.
2. A polynomial's Bernstein coefficients can be rounded to multiples of $\delta$ (where $0 \lt\delta\le 1$) by setting $c$=floor($c/\delta$) * $\delta$ for each coefficient $c$. The new polynomial will differ from the old one by at most $\delta$. (Thus, to find a polynomial with multiple-of-$\delta$ coefficients that approximates $f$ with error $\epsilon$ [which must be greater than $\delta$], first find a polynomial with error $\epsilon - \delta$, then round that polynomial's coefficients as given here.)

### Approximate Bernoulli Factories for Power Series

Some functions can be rewritten as a power series, namely: $$f(\lambda) = a_0 \lambda^0 + a_1 \lambda^1 + ... + a_i \lambda^i + ...,$$ where $a_i$, the coefficients, are constant rational numbers[^10].

To simulate an approximation of $f$ that comes within $\epsilon$ of $f$:

1. Find the first $n$+1 coefficients such that the polynomial $P(\lambda) = a_0 \lambda^0 + ... + a_n\lambda^n$ is within $\epsilon$ of $f$ on the interval [0, 1].

If $f$'s coefficients are each greater than 0, form a nonincreasing sequence (example: (1/4, 1/8, 1/8, 1/16, ...)), and meet the so-called "ratio test", the algorithms in Carvalho and Moreira (2022)[^11] can be used here (see also "Proofs on Cutting Off a Power Series" in the appendix).

Alternatively, if bounds on the derivatives of $f$ are known, then thanks to Taylor's theorem, $P(\lambda)$ will be close enough if $M/((n+1)!) \le \epsilon$, where $M$ is equal to or greater than the maximum absolute value of $f$'s ($n$+1)-th derivative on [0, 1].

2. Rewrite $P(\lambda)$ as a polynomial in Bernstein form. (One way to transform a polynomial to Bernstein form, given the "power" coefficients $a_0, ..., a_n$, is the so-called "matrix method" from Ray and Nataraj (2012)[^12].) Let $b_0, ..., b_n$ be the Bernstein-form polynomial's coefficients.
3. Flip the input coin n times, then let j be the number of times the coin returned 1 this way, then return either 1 with probability $b_j$, or 0 otherwise.

In fact, if $f(\lambda)$ belongs in Gevrey's hierarchy (there are $B\ge 1, l\ge 1, \gamma\ge 1$ such that its $n$-th derivative's absolute value is not greater than $Bl^n n^{\gamma n}$ for every $n$), which includes functions equaling power series as a special case ($\gamma=1$), it's possible to bound the derivatives and find the appropriate degree for the approximating polynomial (for details, see (Kawamura et al. 2015)[^13]; see also (Gevrey 1918)[^14]).

### Approximate Bernoulli Factories for Linear Functions

There are a number of approximate methods to simulate λ*c, where c > 1 and λ lies in [0, 1/c). ("Approximate" because this function touches 1 at 1/c, so it can't be a factory function.) Since the methods use only up to n flips, where n is an integer greater than 0, the approximation will be a polynomial of degree n.

• Henderson and Glynn (2003, Remark 4)[^15] approximates the function λ*2 using a polynomial where the jth coefficient (starting at 0) is min((j/n)*2, 1−1/n). If g(λ) is that polynomial, then the error in approximating f is no greater than 1−g(1/2). g can be computed with the SymPy computer algebra library as follows: from sympy.stats import *; g=2*E( Min(sum(Bernoulli(("B%d" % (i)),z) for i in range(n))/n,(S(1)-S(1)/n)/2)).

• I found the following approximation for λ*c[^16]: "(1.) Set j to 0 and i to 0; (2.) If in, return 0; (3.) Flip the input coin, and if it returns 1, add 1 to j; (4.) (Estimate the probability and return 1 if it 'went over'.) If (j/(i+1)) ≥ 1/c, return 1; (5.) Add 1 to i and go to step 2." Here, λ*c is approximated by a polynomial where the jth coefficient (starting at 0) is min((j/n)*c, 1). If g(λ) is that polynomial, then the error in approximating f is no greater than 1−g(1/c).

• The previous approximation generalizes the one given in section 6 of Nacu and Peres (2005)[^1], which approximates λ*2.

## Achievable Simulation Rates

In general, the number of input coin flips needed by any Bernoulli factory algorithm for a factory function f(λ) depends on how "smooth" the function f is.

The following table summarizes the rate of simulation (in terms of the number of input coin flips needed) that can be achieved in theory depending on f(λ), assuming the unknown probability of heads. In the table below:

• λ, the unknown probability of heads, lies in the interval [ε, 1−ε] for some ε > 0.
• The simulation makes use of unbiased random bits in addition to input coin flips.
• Δ(n, r, λ) = O(max(sqrt(λ*(1−λ)/n),1/n)r), that is, O((1/n)r) near λ = 0 or 1, and O((1/n)r/2) elsewhere. (O(h(n)) roughly means "less than or equal to h(n) times a constant, for every n large enough".)
Property of simulation Property of f
Requires no more than n input coin flips. If and only if f can be written as a polynomial in Bernstein form of degree n with coefficients in [0, 1] (Goyal and Sigman 2012)[^17].
Requires a finite number of flips on average. Also known as "realizable" by Flajolet et al. (2010)[^2]. Only if f is Lipschitz continuous (Nacu and Peres 2005)[^1].
Whenever f admits a fast simulation (Mendo 2019)[^18].
Number of flips required, raised to power of r, is bounded by a finite number on average and has a tail that drops off uniformly over f's domain. Only if f has continuous r-th derivative (Nacu and Peres 2005)[^1].
Requires more than n flips with probability Δ(n, r + 1, λ), for integer r ≥ 0 and every λ. (The greater r is, the faster the simulation.) Only if f has an r-th derivative that is continuous and in the Zygmund class (see note 3) (Holtz et al. 2011)[^19].
Requires more than n flips with probability Δ(n, α, λ), for non-integer α > 0 and every λ. (The greater α is, the faster the simulation.) If and only if f has an r-th derivative that is Hölder continuous with exponent (αr), where r = floor(α) (Holtz et al. 2011)[^19]. Assumes f is bounded away from 0 and 1.
"Fast simulation" (requires more than n flips with a probability that decays exponentially as n gets large). Also known as "strongly realizable" by Flajolet et al. (2010)[^2]. If and only if f is real analytic (writable as $f(\lambda)=a_0 \lambda^0 + a_1 \lambda^1 + ...$ for real constants $a_i$) (Nacu and Peres 2005)[^1].
Average number of flips greater than or equal to (f′(λ))2*λ*(1−λ)/(f(λ)*(1−f(λ))), where f′ is the first derivative of f. Whenever f admits a fast simulation (Mendo 2019)[^18].

Notes:

1. By the results of Holtz et al., it is suspected that the target function f can't be simulated using a finite number of flips on average for every probability of heads unless f's fourth derivative is Hölder continuous.
2. If a function is constant on some non-empty open interval in its domain, but is not constant on the whole domain, then it can't be real analytic.
3. A function in the Zygmund class, roughly speaking, has no vertical slope. The Zygmund class includes the smaller class of Lipschitz continuous functions.

## Complexity

The following note shows the complexity of the algorithm for 1/φ in the main article, where φ is the golden ratio.

Let E[N] be the expected ("long-run average") number of unbiased random bits (fair coin flips) generated by the algorithm.

Then, since each bit is independent, E[N] = 2*φ as shown below.

• Each iteration stops the algorithm with probability p = (1/2) + (1−(1/2)) * (1/φ) (1/2 for the initial bit and 1/φ for the recursive run; (1−(1/2)) because we're subtracting the (1/2) earlier on the right-hand side from 1).
• Thus, the expected number of iterations is E[T] = 1/p by a well-known rejection sampling argument, since the algorithm doesn't depend on iteration counts.
• Each iteration uses 1 * (1/2) + (1 + E[N]) * (1/2) bits on average, so the whole algorithm uses E[N] = (1 * (1/2) + (1 + E[N]) * (1/2)) * E[T] bits on average (each iteration consumes either 1 bit with probability 1/2, or (1 + E[N]) bits with probability 1/2). This equation has the solution E[N] = 1 + sqrt(5) = 2*φ.

Also, on average, half of these flips (φ) show 1 and half show 0, since the bits are unbiased (the coin is fair).

A similar analysis to the one above can be used to find the expected ("long-run average") time complexity of many Bernoulli factory algorithms.

## Examples of Bernoulli Factory Polynomial-Building Schemes

The following are polynomial-building schemes and hints to simulate a coin of probability f(λ) given an input coin with probability of heads of λ. The schemes were generated automatically using approxscheme2 and have not been rigorously verified for correctness.

• Let f(λ) = cosh(λ) − 3/4. Then, for every integer n that's a power of 2, starting from 1:

• The function was detected to be convex and twice differentiable, leading to:

• fbelow(n, k) = 487/2500 if n<4; otherwise, f(k/n) − 154309/(700000*n).
• fabove(n, k) = f(k/n).
• Generated using tighter bounds than necessarily proven:

• fbelow(n, k) = 1043/5000 if n<4; otherwise, f(k/n) − 462927/(2800000*n).
• fabove(n, k) = f(k/n).
• Let f(λ) = 3/4 − sqrt(−λ*(λ − 1)). Then, for every integer n that's a power of 2, starting from 1:

• Detected to be convex and (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 1545784563/(400000000*n1/4).
• fabove(n, k) = f(k/n).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n) − 26278337571/(25600000000*n1/4).
• fabove(n, k) = f(k/n).
• Let f(λ) = 3*sin(sqrt(3)*sqrt(sin(2*λ)))/4 + 1/50. Then, for every integer n that's a power of 2, starting from 1:
• Detected to be (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 709907859/(100000000*n1/4).
• fabove(n, k) = f(k/n) + 709907859/(100000000*n1/4).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n) − 6389170731/(3200000000*n1/4).
• fabove(n, k) = f(k/n) + 6389170731/(3200000000*n1/4).
• Let f(λ) = 3/4 − sqrt(−λ*(λ − 1)). Then, for every integer n that's a power of 2, starting from 1:
• Detected to be convex and (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 1545784563/(400000000*n1/4).
• fabove(n, k) = f(k/n).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n) − 26278337571/(25600000000*n1/4).
• fabove(n, k) = f(k/n).
• Let f(λ) = λ*sin(7*π*λ)/4 + 1/2. Then, for every integer n that's a power of 2, starting from 1:

• Detected to be twice differentiable using numerical methods, which may be inaccurate:

• fbelow(n, k) = 523/10000 if n<64; otherwise, f(k/n) − 11346621/(700000*n).
• fabove(n, k) = 1229/1250 if n<64; otherwise, f(k/n) + 11346621/(700000*n).
• Generated using tighter bounds than necessarily proven:

• fbelow(n, k) = 681/10000 if n<32; otherwise, f(k/n) − 34039863/(4480000*n).
• fabove(n, k) = 4837/5000 if n<32; otherwise, f(k/n) + 34039863/(4480000*n).
• Let f(λ) = sin(4*π*λ)/4 + 1/2. Then, for every integer n that's a power of 2, starting from 1:

• The function was detected to be twice differentiable, leading to:

• fbelow(n, k) = 737/10000 if n<32; otherwise, f(k/n) − 1973921/(350000*n).
• fabove(n, k) = 9263/10000 if n<32; otherwise, f(k/n) + 1973921/(350000*n).
• Generated using tighter bounds than necessarily proven:

• fbelow(n, k) = 1123/10000 if n<32; otherwise, f(k/n) − 1973921/(448000*n).
• fabove(n, k) = 8877/10000 if n<32; otherwise, f(k/n) + 1973921/(448000*n).
• Let f(λ) = sin(6*π*λ)/4 + 1/2. Then, for every integer n that's a power of 2, starting from 1:

• The function was detected to be twice differentiable, leading to:

• fbelow(n, k) = 517/10000 if n<64; otherwise, f(k/n) − 2220661/(175000*n).
• fabove(n, k) = 9483/10000 if n<64; otherwise, f(k/n) + 2220661/(175000*n).
• Generated using tighter bounds than necessarily proven:

• fbelow(n, k) = 1043/10000 if n<64; otherwise, f(k/n) − 104371067/(11200000*n).
• fabove(n, k) = 8957/10000 if n<64; otherwise, f(k/n) + 104371067/(11200000*n).
• Let f(λ) = sin(4*π*λ)/4 + 1/2. Then, for every integer n that's a power of 2, starting from 1:

• The function was detected to be twice differentiable, leading to:

• fbelow(n, k) = 737/10000 if n<32; otherwise, f(k/n) − 1973921/(350000*n).
• fabove(n, k) = 9263/10000 if n<32; otherwise, f(k/n) + 1973921/(350000*n).
• Generated using tighter bounds than necessarily proven:

• fbelow(n, k) = 1123/10000 if n<32; otherwise, f(k/n) − 1973921/(448000*n).
• fabove(n, k) = 8877/10000 if n<32; otherwise, f(k/n) + 1973921/(448000*n).
• Let f(λ) = λ2/2 + 1/10 if λ ≤ 1/2; λ/2 − 1/40 otherwise. Then, for every integer n that's a power of 2, starting from 1:

• Detected to be convex and twice differentiable using numerical methods, which may be inaccurate:
• fbelow(n, k) = 321/5000 if n<4; otherwise, f(k/n) − 1/(7*n).
• fabove(n, k) = f(k/n).
• Let f(λ) = 1/2 − sqrt(1 − 2*λ)/2 if λ < 1/2; sqrt(2*λ − 1)/2 + 1/2 otherwise. Then, for every integer n that's a power of 2, starting from 1:

• Detected to be (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 1545784563/(400000000*n1/4).
• fabove(n, k) = f(k/n) + 1545784563/(400000000*n1/4).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n) − 10820491941/(12800000000*n1/4).
• fabove(n, k) = f(k/n) + 10820491941/(12800000000*n1/4).
• Let f(λ) = 1/2 − sqrt(1 − 2*λ)/4 if λ < 1/2; sqrt(2*λ − 1)/4 + 1/2 otherwise. Then, for every integer n that's a power of 2, starting from 1:

• Detected to be (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 772969563/(400000000*n1/4).
• fabove(n, k) = f(k/n) + 772969563/(400000000*n1/4).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = 193/5000 if n<16; otherwise, f(k/n) − 5410786941/(12800000000*n1/4).
• fabove(n, k) = 4807/5000 if n<16; otherwise, f(k/n) + 5410786941/(12800000000*n1/4).
• Let f(λ) = λ/2 + (1 − 2*λ)3/2/12 − 1/12 if λ < 0; λ/2 + (2*λ − 1)3/2/12 − 1/12 if λ ≥ 1/2; λ/2 + (1 − 2*λ)3/2/12 − 1/12 otherwise. Then, for every integer n that's a power of 2, starting from 1:

• Detected to be convex and Lipschitz continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 322613/(500000*sqrt(n)).
• fabove(n, k) = f(k/n).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n) − 3548743/(32000000*sqrt(n)).
• fabove(n, k) = f(k/n).
• Let f(λ) = 1/2 − sqrt(1 − 2*λ)/4 if λ < 1/2; sqrt(2*λ − 1)/4 + 1/2 otherwise. Then, for every integer n that's a power of 2, starting from 1:

• Detected to be (1/2)-Hölder continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n) − 772969563/(400000000*n1/4).
• fabove(n, k) = f(k/n) + 772969563/(400000000*n1/4).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = 193/5000 if n<16; otherwise, f(k/n) − 5410786941/(12800000000*n1/4).
• fabove(n, k) = 4807/5000 if n<16; otherwise, f(k/n) + 5410786941/(12800000000*n1/4).
• Let f(λ) = 1/2 − sqrt(1 − 2*λ)/8 if λ < 1/2; sqrt(2*λ − 1)/8 + 1/2 otherwise. Then, for every integer n that's a power of 2, starting from 1:

• Detected to be (1/2)-Hölder continuous using numerical methods, which may be inaccurate:

• fbelow(n, k) = 333/10000 if n<64; otherwise, f(k/n) − 386562063/(400000000*n1/4).
• fabove(n, k) = 9667/10000 if n<64; otherwise, f(k/n) + 386562063/(400000000*n1/4).
• Generated using tighter bounds than necessarily proven:

• fbelow(n, k) = 451/2000 if n<4; otherwise, f(k/n) − 2705934441/(12800000000*n1/4).
• fabove(n, k) = 1549/2000 if n<4; otherwise, f(k/n) + 2705934441/(12800000000*n1/4).
• Let f(λ) = λ/4 + (1 − 2*λ)3/2/24 + 5/24 if λ < 0; λ/4 + (2*λ − 1)3/2/24 + 5/24 if λ ≥ 1/2; λ/4 + (1 − 2*λ)3/2/24 + 5/24 otherwise. Then, for every integer n that's a power of 2, starting from 1:

• Detected to be convex and Lipschitz continuous using numerical methods, which may be inaccurate:

• fbelow(n, k) = 443/5000 if n<4; otherwise, f(k/n) − 322613/(1000000*sqrt(n)).
• fabove(n, k) = f(k/n).
• Generated using tighter bounds than necessarily proven:

• fbelow(n, k) = 1111/5000 if n<4; otherwise, f(k/n) − 3548743/(64000000*sqrt(n)).
• fabove(n, k) = f(k/n).
• Let f(λ) = 3*λ/2 if λ ≤ 1 − λ; 3/2 − 3*λ/2 otherwise. Then, for every integer n that's a power of 2, starting from 1:

• Detected to be concave and Lipschitz continuous using numerical methods, which may be inaccurate:

• fbelow(n, k) = f(k/n).
• fabove(n, k) = 124/125 if n<64; otherwise, f(k/n) + 967839/(500000*sqrt(n)).
• Generated using tighter bounds than necessarily proven:

• fbelow(n, k) = f(k/n).
• fabove(n, k) = 1863/2000 if n<64; otherwise, f(k/n) + 2903517/(2000000*sqrt(n)).
• Let f(λ) = 9*λ/5 if λ ≤ 1 − λ; 9/5 − 9*λ/5 otherwise. Then, for every integer n that's a power of 2, starting from 1:

• Detected to be concave and Lipschitz continuous using numerical methods, which may be inaccurate:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = f(k/n) + 2903517/(1250000*sqrt(n)).
• Generated using tighter bounds than necessarily proven:
• fbelow(n, k) = f(k/n).
• fabove(n, k) = f(k/n) + 8710551/(5000000*sqrt(n)).
• Let f(λ) = 19*λ/20 if λ ≤ 1 − λ; 19/20 − 19*λ/20 otherwise. Then, for every integer n that's a power of 2, starting from 1:

• Detected to be concave and Lipschitz continuous using numerical methods, which may be inaccurate:

• fbelow(n, k) = f(k/n).
• fabove(n, k) = 1817/2000 if n<8; otherwise, f(k/n) + 6129647/(5000000*sqrt(n)).
• Generated using tighter bounds than necessarily proven:

• fbelow(n, k) = f(k/n).
• fabove(n, k) = 2337/2500 if n<4; otherwise, f(k/n) + 18388941/(20000000*sqrt(n)).
• Let f(λ) = min(1/8, 3*λ). Then, for every integer n that's a power of 2, starting from 1:

• Detected to be concave and Lipschitz continuous using numerical methods, which may be inaccurate:

• fbelow(n, k) = f(k/n).
• fabove(n, k) = 4047/5000 if n<32; otherwise, f(k/n) + 967839/(250000*sqrt(n)).
• Generated using tighter bounds than necessarily proven:

• fbelow(n, k) = f(k/n).
• fabove(n, k) = 171/400 if n<4; otherwise, f(k/n) + 967839/(1600000*sqrt(n)).