Open Questions on the Bernoulli Factory Problem

Peter Occil


We're given a coin that shows heads with an unknown probability, $\lambda$. The goal is to use that coin (and possibly also a fair coin) to build a "new" coin that shows heads with a probability that depends on $\lambda$, call it $f(\lambda)$. This is the Bernoulli factory problem, and it can be solved only for certain functions $f$. (For example, flipping the coin twice and taking heads only if exactly one coin shows heads, we can simulate the probability $2\lambda(1-\lambda)$.)

Specifically, the only functions that can be simulated this way are continuous and polynomially bounded on their domain, and map $[0, 1]$ or a subset thereof to $[0, 1]$, as well as $f=0$ and $f=1$. These functions are called factory functions in this page. (A function $f(x)$ is polynomially bounded if both $f$ and $1-f$ are greater than or equal to min($x^n$, $(1-x)^n$) for some integer $n$ (Keane and O'Brien 1994). This implies that $f$ admits no roots on (0, 1) and can't take on the value 0 or 1 except possibly at 0 and/or 1.)

This page contains several questions about the Bernoulli factory problem. Answers to them will greatly improve my pages on this site about Bernoulli factories. If you can answer any of them, post an issue in the GitHub issues page.


Key Problems

The following summarizes most of the problems raised by these open questions.

  1. Given $\epsilon > 0$, and given certain assumptions on $f(\lambda)$ (such as those given later), find:
    • A reasonably small integer $n>0$ such that there is a polynomial $P(\lambda)$ of degree $n$ in Bernstein form such that $|P(\lambda) - f(\lambda)| \le \epsilon$ whenever $0\le\lambda\le 1$. Preferably, $P$'s coefficients are rational numbers in $[0,1]$ and $|P - f| = O(1/n^{r/2})$ whenever $f$ has a continuous $r$-th derivative.
    • The Bernstein coefficients of $P(\lambda)$.
  2. Same as problem 1, except that $P(\lambda)$ is a rational function (ratio of two polynomials) rather than a polynomial.
  3. Given certain assumptions on $f(\lambda)$ (such as those given later), find two sequences of polynomials in Bernstein form meeting the Formal Statement given later, together with their Bernstein coefficients.
  4. Characterize the factory functions $f(\lambda)$ that can be simulated using nothing but the biased coin, when the biased coin can show heads every time and/or tails every time.
  5. Characterize the factory functions $f(\lambda)$ with a Bernoulli factory that can come arbitrarily close to the entropy limit if it produces multiple $f$-coin flips at a time, rather than just one. Describe those Bernoulli factories.
  6. Characterize the algebraic factory functions $f(\lambda)$ that can be simulated by a finite-state machine with an unbounded stack.
  7. Describe how Algorithm 4 of Łatuszyński et al. (2009/2011) works when the randomized lower and upper bounds of $f(\lambda)$ given the number of heads are formed by functions other than polynomials. (Proposition 3.1 there describes the case of polynomials.)

Assumptions on $f(\lambda)$ can include any combination of the following:

Polynomials that approach a factory function "fast"

In this question, a polynomial $P(x)$ is written in Bernstein form of degree $n$ if it is written as— $$P(x)=\sum_{k=0}^n a_k {n \choose k} x^k (1-x)^{n-k},$$ where $a_0, ..., a_n$ are the polynomial's Bernstein coefficients.

The degree-$n$ Bernstein polynomial of an arbitrary function $f(x)$ has Bernstein coefficients $a_k = f(k/n)$. In general, this Bernstein polynomial differs from $f$ even if $f$ is a polynomial.

An algorithm simulates a factory function $f(\lambda)$ via two sequences of polynomials that converge from above and below to that function. Roughly speaking, the algorithm works as follows:

  1. Generate U, a uniform random variate in $[0, 1]$.
  2. Flip the input coin (with a probability of heads of $\lambda$), then build an upper and lower bound for $f(\lambda)$, based on the outcomes of the flips so far. In this case, these bounds come from two degree-$n$ polynomials that approach $f$ as $n$ gets large, where $n$ is the number of coin flips so far in the algorithm.
  3. If U is less than or equal to the lower bound, return 1. If U is greater than the upper bound, return 0. Otherwise, go to step 2.

The result of the algorithm is 1 with probability exactly equal to $f(\lambda)$, or 0 otherwise.

However, the algorithm requires the polynomial sequences to meet certain requirements; among them, the sequences must be of Bernstein-form polynomials that converge from above and below to a factory function. See the formal statement, next.

Formal Statement

More formally, there two sequences of polynomials, namely—

for every integer $n\ge1$, such that—

  1. $0\le a(n, k)\le b(n, k)\le1$,
  2. $\lim_{n}g_{n}(\lambda)=\lim_{n}h_{n}(\lambda)=f(\lambda)$ for every $\lambda\in[0,1]$, and
  3. $(g_{n+1}-g_{n})$ and $(h_{n}-h_{n+1})$ are polynomials with nonnegative Bernstein coefficients once they are rewritten to polynomials in Bernstein form of degree exactly $n+1$,

where $f(\lambda)$ is continuous on $[0, 1]$ (Nacu and Peres 2005; Holtz et al. 2011), and the goal is to find the appropriate values for $a(n, k)$ and $b(n, k)$.

It is allowed for $a(n, k)\lt 0$ for a given $n$ and some $k$, in which case all $a(n, k)$ for that $n$ are taken to be 0 instead. It is allowed for $b(n, k)\gt 1$ for a given $n$ and some $k$, in which case all $b(n, k)$ for that $n$ are taken to be 1 instead.

Alternatively, find a way to rewrite $f(\lambda)$ as— $$f(\lambda) = \sum_{n\ge 1} P_n(\lambda) = 1 - \sum_{n\ge 1} Q_n(\lambda),$$ where $P_n$ and $Q_n$ are polynomials of degree $n$ with nonnegative Bernstein coefficients.

On Condition 3

Condition 3 is also known as a "consistency requirement"; it ensures that not only the upper and lower polynomials "decrease" and "increase" to $f(\lambda)$, but also their Bernstein coefficients do as well. This requirement is crucial in the algorithm I mentioned above.

Condition 3 is equivalent in practice to the following statement (Nacu & Peres 2005). For every integer $k\in[0,2n]$ and every integer $n\ge 1$ that's a power of 2, $a(2n, k)\ge\mathbb{E}[a(n, X_{n,k})]= \left(\sum_{i=0}^k a(n,i) {n\choose i}{n\choose {k-i}}/{2n\choose k}\right)$ and $b(2n, k)\le\mathbb{E}[b(n, X_{n,k})]$, where $X_{n,k}$ is a hypergeometric($2n$, $k$, $n$) random variable. A hypergeometric($2n$, $k$, $n$) random variable is the number of "good" balls out of $n$ balls taken uniformly at random, all at once, from a bag containing $2n$ balls, $k$ of which are "good". See also my MathOverflow question on finding bounds for hypergeometric variables.

A Matter of Efficiency

However, ordinary Bernstein polynomials can't in general converge to a function faster than $O(1/n)$, a result known since Voronovskaya (1932) and a rate that will lead to an infinite expected number of coin flips in general. (See also my supplemental notes.)

But Lorentz (1966) showed that if the function is positive and has a continuous $k$-th derivative, there are polynomials with nonnegative Bernstein coefficients that converge at the rate $O(1/n^{k/2})$ (and thus can enable a finite expected number of coin flips if the function is "smooth" enough).

Thus, people have developed alternatives, including iterated Bernstein polynomials, to improve the convergence rate. These include Micchelli (1973), Guan (2009), Güntürk and Li (2021a, 2021b), the "Lorentz operator" in Holtz et al. (2011) (see also "New coins from old, smoothly"), and Draganov (2014).

These alternative polynomials usually include results where the error bound is the desired $O(1/n^{k/2})$, but nearly all those results (e.g., Theorem 4.4 in Micchelli; Theorem 5 in Güntürk and Li) have hidden constants with no upper bounds given, making them unimplementable (that is, it can't be known beforehand whether a given polynomial will come close to the target function within a user-specified error tolerance). (See note 4 in "End Notes".)


Thus the questions are:

  1. Are there practical formulas to compute polynomials that—

    • meet the formal statement above, and
    • meet the following error bound? $$|f(x) - P_n(f)(x)| \le \epsilon(f,n,x) = O(1/n^{k/2}),$$ for every $n\ge 1$, where $P_n(f)(x)$ is an approximating degree-$n$ polynomial that can be readily rewritten to Bernstein form (ideally with coefficients in $[0,1]$); $k$ is the number of continuous derivatives; and $\epsilon(f,n,x)$ is a fully determined formula with all constants in the formula having a known exact value or upper bound.
  2. The following is a technical question that could help reduce this problem to the problem of finding explicit error bounds when approximating a function by polynomials.

    Let $f(\lambda):[0,1]\to(0,1)$ have $r\ge 1$ continuous derivatives, let $M$ be the maximum absolute value of $f$ and its derivatives up to the $r$-th derivative, and denote the Bernstein polynomial of degree $n$ of a function $g$ as $B_n(g)$. Let $W_{2^0}(\lambda), W_{2^1}(\lambda), ..., W_{2^i}(\lambda),...$ be a sequence of functions on [0, 1] that converge uniformly to $f$.

    For each integer $n\ge 1$ that's a power of 2, suppose that there is $D>0$ such that— $$|f(\lambda)-B_n(W_n(\lambda))| \le DM/n^{r/2},$$ whenever $0\le \lambda\le 1$.

    Then, a conjecture is that there is $C_0\ge D$ such that for every $C\ge C_0$, there are polynomials $g_n$ and $h_n$ (for each $n\ge 1$) as follows: (A) $g_n$ and $h_n$ have Bernstein coefficients $W_n(k/n) - CM/n^{r/2}$ and $W_n(k/n) + CM/n^{r/2}$, respectively ($0\le k\le n$), if $n$ is a power of 2, and $g_n=g_{n-1}$ and $h_n=h_{n-1}$ otherwise; (B) $\lim_n g_n =\lim_n h_n=f$; (C) $(g_{n+1}-g_{n})$ and $(h_{n}-h_{n+1})$ are polynomials with nonnegative Bernstein coefficients once they are rewritten to polynomials in Bernstein form of degree exactly $n+1$.

    Equivalently (see also Nacu and Peres 2005), there is $C_1>0$ such that, for each integer $n\ge 1$ that's a power of 2— $$\max_{0\le k\le 2n}\left|\left(\sum_{i=0}^k \left(W_n\left(\frac{i}{n}\right)\right) {n\choose i}{n\choose {k-i}}/{2n \choose k}\right)-W_{2n}\left(\frac{k}{2n}\right)\right|\le \frac{C_1 M}{n^{r/2}}.$$

    1. For what value of $C_0$ (or $C_1$) is the statement true in each of the following cases?

      • $W_n$ is arbitrary.
      • $W_n = 2 f - B_n(f)$(***) and $r$ is 3 or 4. I conjecture $C_0=0.3$. Interesting functions $f$ to test are quadratic polynomials.
      • $W_n = B_n(B_n(f))+3(f-B_n(f))$(***) and $r$ is 5 or 6.
    2. Does the same value of $C_0$ (or $C_1$) suffice when $f$ has a Lipschitz continuous $(r-1)$-th derivative and $M$ is the maximum absolute value of $f$ and the Lipschitz constants of $f$ and its derivatives up to the $(r-1)$-th derivative?

  3. Are there practical formulas to compute polynomials that satisfy the Formal Statement and/or meet the error bound given in question 1 and can be readily rewritten to Bernstein form with coefficients in [0, 1]?

(***) Corresponds to the iterated Bernstein polynomials of order 2 and 3 [Güntürk and Li 2021].

Examples of Functions to Ponder

The following are examples of functions worth pondering for these questions:

New coins from old, smoothly

Now, we focus on a specific approximation scheme, the one presented by Holtz et al. 2011, in the paper "New coins from old, smoothly".

The scheme involves building polynomials that are shifted upward and downward to approximate $f$ from above and below so that the polynomials meet the Formal Statement given earlier.

The scheme achieves a convergence rate that generally depends on the smoothness of $f$; in fact, it can achieve the highest convergence rate possible for functions with that smoothness.

Specifically, Holtz et al. proved the following results:

  1. A function $f(\lambda):[0,1]\to(0,1)$ can be approximated, in a manner that solves the Bernoulli factory problem, at the rate $O((\Delta_n(\lambda))^\beta)$ if and only if $f$ is $\lfloor\beta\rfloor$ times differentiable and has a Hölder continuous $\lfloor\beta\rfloor$-th derivative with Hölder exponent $\beta-\lfloor\beta\rfloor$, where $\beta>0$ is a non-integer and $\Delta_n(\lambda) = \max((\lambda(1-\lambda)/n)^{1/2}, 1/n)$. (Roughly speaking, the rate is $O((1/n)^{\beta})$ when $\lambda$ is close to 0 or 1, and $O((1/n)^{\beta/2})$ elsewhere.)

  2. A function $f(\lambda):[0,1]\to(0,1)$ can be approximated, in a manner that solves the Bernoulli factory problem, at the rate $O((\Delta_n(\lambda))^{r+1})$ only if the $r$th derivative of $f$ is in the Zygmund class, where $r\ge 0$ is an integer.

The scheme is as follows:

Let $f$ be a function—

Let $\alpha = r+\beta$, let $b = 2^s$, and let $s\ge0$ be an integer. Let $Q_{n, r}f$ be a degree $n+r$ approximating polynomial called a Lorentz operator (see the paper for details on the Lorentz operator). Let $n_0$ be the smallest $n$ such that $Q_{n_0, r}f$ has coefficients within [0, 1]. Define the following for every integer $n \ge n_0$ divisible by $n_{0}b$:

Let $B_{n, k, F}$ be the $k$th coefficient of the degree-$n$ Bernstein polynomial of $F$.

Let $C(\lambda)$ be a polynomial as follows: Find the degree-$n$ Bernstein polynomial of $\phi(n, r+\beta, \lambda)$, then elevate it to a degree-$n+r$ Bernstein polynomial.

Then the coefficients for the degree $n+r$ polynomial that approximates $f$ are—

However, the Holtz method is not yet implementable, for the following reasons among others:

And I seek ways to make this solution implementable.


  1. What are practical upper bounds for $s$, $\theta_{\alpha}$, and $D$ for the "New coins from old, smoothly" method, given a factory function $f$, with or without additional assumptions on $f$ (such as assumptions given in the section "Key Problems")?
  2. Given a continuous function $f$ that maps $[0,1]$ to $(0,1)$, is the "New coins from old, smoothly" method valid in the following cases? (Note that the method as written doesn't apply to non-integer $\alpha$; see also Conjecture 34 of Holtz et al., 2011, which claims the converse of the second result given above.)

    • With $\alpha=1, r=0$, when $f$ is Lipschitz continuous and/or differentiable.
    • With $\alpha=2, r=1$, when $f$ has a Lipschitz continuous first derivative.
    • With $\alpha=2, r=2$, when $f$ is twice differentiable.
    • With $\alpha=4, r=3$, when $f$ has a Lipschitz continuous third derivative.
    • With $\alpha=4, r=4$, when $f$ is four times differentiable.
    • With $\alpha=5, r=4$, when $f$ has a Lipschitz continuous fourth derivative.

Reverse-time martingales

This section is withdrawn. For the Bernoulli factory problem, rational functions are probably not much better than polynomials when approximating functions with low smoothness (e.g., those with only three continuous derivatives). This follows from Borwein (1979, theorem 29) and Holtz et al. (2011) (which disproved a theorem of Lorentz relied on by Borwein but maintained it with an extra assumption used in the Bernoulli factory setting).

One way to toss heads with probability $f(\lambda)$ given a coin that shows heads with probability $\lambda$ is to build randomized upper and lower bounds that converge to $f$ on average. These bounds serve as an unbiased estimator of $f(\lambda)$; the algorithm returns 1 with probability equal to the estimate, and 0 otherwise.

Part of the reverse-time martingale algorithm of Łatuszyński et al. (2009/2011) (see "General Factory Functions") to simulate a factory function $f(\lambda)$ is as follows. For each n starting with 1:

  1. Flip the input coin, and compute the nth upper and lower bounds of f given the number of heads so far, call them L and U.

  2. Compute the $(n-1)$th upper and lower bounds of f given the number of heads so far, call them L* and U*. (These bounds must be the same regardless of the outcomes of future coin flips, and the interval [L*, U*] must equal or entirely contain the interval [L, U].)

More technically (Algorithm 4):

  1. Obtain $L_n$ and $U_n$ given $\mathcal{F}_{0, n-1}$,
  2. Compute $L_n^s$ = $E(L_{n-1}$ | $\mathcal{F}n)$ and $U_n^s$ = $E(U{n-1}$ | $\mathcal{F}_{n})$,

where $\mathcal{F}_n$ is a filtration that depends on $L_n$ and $U_n$.

Though the paper as well as the section on general factory functions that I linked to above shows how this algorithm can be implemented for polynomials, these parts of the algorithm appear to work for any two sequences of functions that converge to $f$, where $L$ or $L^s$ and $U$ or $U^s$ are their lower and upper bound approximations. An example for polynomials follows:

  1. Given the number of heads $H_n$, $L_n$ is the $H_n$th Bernstein coefficient of the $n$th lower approximating polynomial, and $U_n$ is the $H_n$th Bernstein coefficient of the $n$th upper approximating polynomial.
  2. $L^s_n$ is the $H_n$th Bernstein coefficient of the $(n-1)$th lower approximating polynomial, and $U^s_n$ is the $H_n$th Bernstein coefficient of the $(n-1)$th upper approximating polynomial, after elevating both polynomials to degree $n$.

But how do these steps work when the approximating functions (the functions that converge to f) are other than polynomials? Specifically, what if the approximating functions are rational functions with integer coefficients? rational functions with rational coefficients? arbitrary approximating functions?

Tossing Heads According to a Concave Function

Using Two Polynomial Sequences

If $f$ is a concave factory function, there is a simple way to approximate that function from below with polynomials in Bernstein form. Define—

$$a(n,k) = f(k/n).$$

Then the lower polynomials $g_{n}(\lambda)$ will meet all the requirements of the formal statement above. Indeed:

$$g_{n}(\lambda)=\sum_{k=0}^{n}f(k/n){n \choose k}\lambda^{k}(1-\lambda)^{n-k}.$$

The trickier part is thus to approximate $f(\lambda)$ from above (that is, to find the upper polynomials $h_n$). In this sense I am aware of attempts to do so for specific functions: Nacu & Peres (2005) for linear functions, and Flegal & Herbei (2012) for a twice differentiable elbow function. I am also aware of a result by Holtz et al. 2011, given in the section "New coins from old, smoothly".

I believe a more general solution is to somehow find the maximum difference between $g_{n}(\lambda)$ and $f(\lambda)$, for each $n$, and to shift the lower polynomial $g_n$ upward by that difference to get the upper polynomial $h_{n}(\lambda)$. I believe this solution works for the function $\min(\lambda,1-\lambda)$ (by getting the maximum difference at $\lambda=1/2$), but I don't know if it works for more general concave functions.

Using a Series Expansion

There is another way to simulate a concave $f$. This involves rewriting the concave function as a series expansion as follows:

$$f(\lambda)=\sum_{a\ge 0} \gamma_a(\lambda) = \sum_{a\ge 0} \frac{\gamma_a(\lambda)}{\pi(a, p)} \pi(a, p), \tag{1}$$


Then an algorithm to toss heads with probability equal to $f$ would be:

  1. Flip a coin that shows heads with probability $p$ until that coin shows heads. Set $a$ to the number of tails.
  2. Write $\frac{\gamma_a(\lambda)}{\pi(a, p)}$ as a polynomial in Bernstein form of degree $n_{a}$ (or a higher degree such that the Bernstein coefficients are all in [0, 1]). Flip the biased coin (with probability of heads $\lambda$) $n$ times, where $n$ is the polynomial's degree, and let $j$ be the number of heads.
  3. Return 1 with probability equal to the polynomial's $j$th Bernstein coefficient, or 0 otherwise (see also Goyal and Sigman 2012 for an algorithm to simulate polynomials).

(This algorithm has similarities to the one used in the proof in Keane and O'Brien 1994.)

However, using this technique for a given concave $f$ requires finding the appropriate sequence for $n_a$ and the appropriate value of $p$ so that the series expansion can be formed. Here is an example for $\min(\lambda, 1-\lambda)$ which appears to be correct:

Finding these parameters was far from rigorous, though, and the process will have to be repeated for each concave function.

It's also possible for $a$ to be generated in step 1 differently, perhaps with a Poisson distribution. In that case, $\pi(a, p)$ will have to be changed to the corresponding probability of getting $a$ under the new distribution.

Note: Some concave functions can be rewritten as—

$$f(\lambda)=g_{n_k}(\lambda) + \sum_{a\gt k} \frac{\gamma_a(\lambda)}{\pi(a, p)} \pi(a, p), \tag{2}$$

for some integer $k\ge 0$, if they satisfy the series expansion $(1)$ except that $\frac{\gamma_a(\lambda)}{\pi(a, p)}$ is allowed to equal 1 or greater for some $p$ in $(0, 1)$ and some $a\le k$. This way of writing $f$ is acceptable for my purposes.


  1. Given that a factory function $f(\lambda)$ is concave and has a continuous $\alpha$-th derivative, is there a formula to find the amount by which to shift the lower polynomials $g_n$ upward so that the upper polynomials $h_n$ meet the formal statement above (or to otherwise convert the lower polynomials to upper polynomials that meet that statement)? By Holtz's results, this formula would have to behave asymptotically like $O((\Delta_n(\lambda))^\alpha)$, but I am looking for nonasymptotic results that achieve this rate of convergence.
  2. Given that a factory function $f(\lambda):[0, 1] \to (0, 1)$ is concave and continuous, is it enough to shift $g_{n}(\lambda)$ upward by the maximum difference between $g_{n}(\lambda)$ and $f(\lambda)$, for each $n$, to get the corresponding upper polynomial $h_{n}(\lambda)$? If not, for which concave functions does this work?
  3. Given that a factory function $f(\lambda):[0, 1] \to [0, 1)$ is concave and continuous, what values of $n_a$ and $p$ will allow that function to have the series expansion $(1)$ or $(2)$? I suspect that a formula for this question will depend on the smoothness of $f$, due to Holtz's results.

See also Note 1.

Simulable and strongly simulable functions

There are two kinds of factory functions:

Every strongly simulable function is simulable, but not vice versa.

In fact, Keane and O'Brien (1994) showed already that $f(\lambda)$ is strongly simulable if $f$ is simulable and neither 0 nor 1 is in $f$'s domain (that is, if the biased coin doesn't show heads every time or tails every time). And it's also easy to show that if $f$ is strongly simulable, then $f(0)$ and $f(1)$ must each be 0, 1, or undefined.

However, it's not so trivial to find the exact class of strongly simulable functions when $f$'s domain includes 0, 1, or both.

As one illustration of this, the proof of Keane and O'Brien relies on generating a geometric random variate and using that variate to control which "part" of the target function $f(\lambda)$ to simulate. This obviously works on all of [0, 1] if the algorithm uses both the biased coin and a separate fair coin. However, if only the biased coin is used in the algorithm, the geometric random variate is generated using fair bits via the von Neumann method, which however will never terminate if $\lambda$ is either 0 or 1.

As another illustration, I managed to find the following result:

And the proof proceeds by showing, among other things, that the Bernoulli factory for $f$ must flip the input coin and get 0 and 1 before it simulates any fair coin flips via the von Neumann trick.

Question: Does the result just given describe all the functions that are strongly simulable (using nothing but the biased coin) when the biased coin can show heads every time and/or tails every time? If not, what is the exact class of strongly simulable functions? Examples of functions to ponder are those that are continuous but not Lipschitz continuous at 0 (and $\lambda$ can equal 0), such as $\lambda^\alpha$ where $\alpha \in (0, 1)$, or $\lim_{z\to\lambda} z-z \ln(z)$, or $\lim_{z\to\lambda} -1/(2 \ln(z/2))$.

Multiple-Output Bernoulli Factories

Let $J$ be a closed interval on $(0, 1)$, and let $f(\lambda):J \to (0, 1)$ be continuous.

Then by Keane and O'Brien, $f$ admits an algorithm that solves the Bernoulli factory problem for $f$ (using only the biased coin, in fact). A related problem is a Bernoulli factory that takes a coin with unknown probability of heads $\lambda \in J$ and produces one or more samples, at a time, of the probability $f(\lambda)$. This question calls it a multiple-output Bernoulli factory.

Obviously, any single-output Bernoulli factory can produce multiple outputs by running itself multiple times. But for some functions $f$, it may be that producing multiple outputs at a time may use fewer coin flips than producing one output multiple times.

Define the entropy bound as— $$h(f(\lambda))/h(\lambda),$$ where— $$h(x)=-x \ln(x)-(1-x) \ln(1-x),$$ is related to the Shannon entropy function.


  1. Given that a function $f(\lambda)$ is continuous and maps a closed interval in (0, 1) to (0, 1), is there a multiple-output Bernoulli factory algorithm for $f$ with an expected number of coin flips per sample that is arbitrarily close to the entropy bound, uniformly for every $\lambda$ in $f$'s domain? Call such a Bernoulli factory an optimal factory. (See Nacu and Peres 2005, Question 1.)
  2. Does the answer to question 1 change if the algorithm can also use a fair coin in addition to the biased coin?

Functions with Optimal Factories

So far, the following functions do admit an optimal factory:

It is easy to see that if an optimal factory exists for $f(\lambda)$, then one also exists for $1-f(\lambda)$: simply change all ones returned by the $f(\lambda)$ factory into zeros and vice versa.

Also, as Yuval Peres (Jun. 24, 2021) told me, there is an efficient multiple-output Bernoulli factory for $f(\lambda) = \lambda/2$: the key is to flip the input coin enough times to produce unbiased random bits using his extractor (Peres 1992), then multiply each unbiased bit with another input coin flip to get a sample from $\lambda/2$. Given that the sample is equal to 0, there are three possibilities that can "be extracted to produce more fair bits": either the unbiased bit is 0, or the coin flip is 0, or both are 0. This algorithm, though, might not count as an optimal factory, and Peres described this algorithm only incompletely. Indeed, the correctness might depend on how the three possibilities are "extracted to produce more fair bits"; after all, the number of coin flips per sample, for every $\lambda$, must not surpass the entropy bound.

In any case, I believe that not all factory functions admit an optimal factory described here; especially because—

See an appendix in one of my articles for more information on my progress on the problem.

From coin flips to algebraic functions via pushdown automata

This section is about solving the Bernoulli factory problem on a restricted computing model, namely the model of pushdown automata (finite-state machines with a stack) that are driven by flips of a coin and produce new probabilities.

Pushdown Automata

A pushdown automaton has a finite set of states and a finite set of stack symbols, one of which is called EMPTY and takes a biased coin with an unknown probability of heads. It starts with a given state and its stack starts with EMPTY. On each iteration:

When the stack is empty, the machine stops and returns either 0 or 1 depending on the state it ends up at. (For the questions below, let flip be HEADS, TAILS, or a rational number in [0, 1]; this likewise reduces to the definition above. The rest of this question assumes the pushdown automaton terminates with probability 1.)

Algebraic Functions

Let $f: (0, 1) \to (0, 1)$ be continuous. Mossel and Peres (2005) showed that a pushdown automaton can simulate $f$ only if $f$ is algebraic over the rational numbers (there is a nonzero polynomial $P(x, y)$ in two variables and whose coefficients are rational numbers, such that $P(x, f(x)) = 0$ for every $x$ in the domain of $f$). The algebraic function generated by pushdown automata corresponds to a system of polynomial equations, as described by Mossel and Peres (2005) and Esparza et al. 2004.

Let $\mathcal{C}$ be the class of continuous functions that map (0, 1) to (0, 1) and are algebraic over rationals. The constants 0 and 1 are also in $\mathcal{C}$.

Let $\mathcal{D} \subseteq \mathcal{C}$ be the class of functions that a pushdown automaton can simulate.

I don't yet know whether $\mathcal{D}=\mathcal{C}$ (and that was also a question of Mossel and Peres).

The following section of my open-source page,, contains information on the question. As that section shows, I could establish the following about the class $\mathcal{D}$:


  1. For every function in class $\mathcal{C}$, is there a pushdown automaton that can simulate that function? (In other words, is $\mathcal{D}=\mathcal{C}$?).
  2. In particular, is min($\lambda$, $1-\lambda$) in class $\mathcal{D}$? What about $\lambda^{1/p}$ for some prime $p\ge 3$?

See also Notes 2 and 3.

Other Questions

Simple simulation algorithms:

End Notes

Note 1: Besides the questions on concave functions given above, there is also the question of whether the solution terminates with a finite expected running time. In this sense, Nacu & Peres showed that a finite expected time is possible only if $f$ is Lipschitz continuous, and I strongly suspect it's not possible either unless $f$ has a Hölder continuous fourth derivative, in view of the results by Holtz given earlier.

Note 2: On pushdown automata: Etessami and Yannakakis (2009) showed that pushdown automata with rational probabilities are equivalent to recursive Markov chains (with rational transition probabilities), and that for every recursive Markov chain, the system of polynomial equations has nonnegative coefficients. But this paper doesn't deal with the case of recursive Markov chains where the transition probabilities cannot just be rational, but can also be $\lambda$ and $1-\lambda$ where $\lambda$ is an unknown rational or irrational probability of heads.

Note 3: On pushdown automata: Banderier and Drmota (2014) showed the asymptotic behavior of power series solutions $f(\lambda)$ of a polynomial system, where both the series and the system have nonnegative real coefficients. Notably, functions of the form $\lambda^{1/p}$ where $p\ge 3$ is not a power of 2, are not possible solutions, because their so-called "critical exponent" is not dyadic. But the result seems not to apply to piecewise power series such as $\min(\lambda,1-\lambda)$, which are likewise algebraic functions.

Note 4: An exception is Chebyshev interpolants, but my implementation experience shows that Chebyshev interpolants are far from being readily convertible to Bernstein form without using transcendental functions or paying attention to the difference between first vs. second kind, Chebyshev points vs. coefficients, and the interval [-1, 1] vs. [0, 1]. By contrast, other schemes (which are of greater interest to me) involve polynomials that are already in Bernstein form or that use only rational arithmetic to transform to Bernstein form (these include so-called "iterated Bernstein" polynomials and "one-bit" polynomials). Indeed, unlike with rational arithmetic (where arbitrary precision is trivial), transcendental functions require special measures to support arbitrary accuracy, such as constructive/recursive reals — floating-point numbers won't do for purposes of these open questions.

My Attempt

The Python script shows my attempt to implement the Holtz approximation scheme. It implements an algorithm to toss heads with probability equal to a piecewise polynomial factory function with continuous second or fourth derivative. However, it relies on an unproven conjecture (Conjecture 34) in the Holtz paper.

Based on this attempt, the case of four continuous derivatives is efficient enough for my purposes, but the case of functions with lesser regularity is not so efficient (such as Lipschitz or two continuous derivatives).

Here is my current progress for the Lorentz operator for $\alpha=2$, so $r=2$ (which applies to twice-differentiable functions with Hölder continuous second derivative, even though the paper appears not to apply when $\alpha$ is an integer). Is the following work correct?

The Lorentz operator for $r=2$ finds the degree-n Bernstein polynomial for the target function $f$, elevates it to degree $n+r$, then shifts the coefficient at $k+1$ by $-f\prime\prime(k/n) A(n,k)$ (but the coefficients at 0 and $n+r$ are not shifted this way), where:

$$A(n,k) = (1/(4n)) \times 2 \times (n+2-k)/((n+1)\times (n+2)),$$

where $k$ is an integer in $[0,n+r]$. Observing that $A(n,k)$ equals 0 at 0 and at $n+r$, and has a peak at $(n+r)/2$, the shift will be no greater (in absolute value) than $A(n,(n+r)/2)*F$, where $F$ is the maximum absolute value of the second derivative of $f(\lambda)$. $A(n,(n+r)/2)$ is bounded above by $(3/16)/n$ for $n\ge 1$, and by $(3/22)/n$ for $n\ge 10$.

Now, find $\theta_\alpha$ for $\alpha=2$, according to Lemma 25:

Let $0<\gamma<2^{\alpha/2}-1$ ($\gamma<1$ if $\alpha=2$).

Solve for $K$: $$(1-(1+\gamma)/2^{\alpha/2} - 4/(4*K)) = 0.$$ The solution for $\alpha=2$ is $K = 2/(1-\gamma)$.

Now find: $$\theta_a = ((4/4) K^{\alpha/2}/n^\alpha) / ((1-(1+\gamma)/2^\alpha)/n^\alpha)$$ The solution for $\alpha=2$ is $\theta_a = 8/((\gamma-3) (\gamma-1))$.

For $\gamma=1/100$ and $\alpha=2$, $\theta_a = 80000/29601 < 2.703$.

There's no need to check whether the output polynomials have Bernstein coefficients in $(0, 1)$, since out-of-bounds polynomials will be replaced with 0 or 1 as necessary — which is more practical and convenient.

Now all that remains is to find $D$ given $\alpha=2$. I believe this will involve the following:

  1. Find upper bounds for the constants $C_{19}$, $C_{21}$, and $C_{24}$ (used in Lemmas 19, 21, and 24, respectively) given $\alpha$ and $r$ (and for $C_{19}$ at least, the best I can do is a visual inspection of the plot).
  2. Calculate $t = C_{24} C_{19} (1+2 C_{21}) b^{\alpha/2} M$, where $M$ is the Hölder constant of $f$'s $r$th derivative (see "Step 4" in "The Iterative Construction").
  3. Find the maximum value of $(t (x+(1-x))^r B_n \Delta_n{\alpha}) / ((x+(1-x))^r B_n\phi_n)$ (the best I can do is a visual inspection), and set $D$ to that value.

Moreover, there remains to find the parameters for the Lorentz operator when $r$ is 0, 1, 2, or 4. (When $r=0$ or $r=1$, the Lorentz operator is simply the Bernstein polynomial of degree $n$, elevated $r$ degrees to degree $n+r$.)