Open Questions on the Bernoulli Factory Problem

Peter Occil

Background

Suppose there is a coin that shows heads with an unknown probability, $\lambda$. The goal is to use that coin (and possibly also a fair coin) to build a "new" coin that shows heads with a probability that depends on $\lambda$, call it $f(\lambda)$. This is the Bernoulli factory problem, and it can be solved only for certain functions $f$. (For example, flipping the coin twice and taking heads only if exactly one coin shows heads, the probability $2\lambda(1-\lambda)$ can be simulated.)

Specifically, the only functions that can be simulated this way are continuous and polynomially bounded on their domain, and map $[0, 1]$ or a subset thereof to $[0, 1]$, as well as $f=0$ and $f=1$. These functions are called factory functions in this page. (A function $f(x)$ is polynomially bounded if both $f$ and $1-f$ are greater than or equal to min($x^n$, $(1-x)^n$) for some integer $n$ (Keane and O'Brien 1994)[^1]. This implies that $f$ admits no roots on (0, 1) and can't take on the value 0 or 1 except possibly at 0, 1, or both.)

This page contains several questions about the Bernoulli factory problem. Answers to them will greatly improve my pages on this site about Bernoulli factories. If you can answer any of them, post an issue in the GitHub issues page.

Note: The Bernoulli factory problem is a special case of a more general mathematical problem that I call "The Sampling Problem".

Contents

Polynomials that approach a factory function "fast"

This question involves solving the Bernoulli factory problem with polynomials.[^2]

In this question, a polynomial $P(x)$ is written in Bernstein form of degree $n$ if it is written as— $$P(x)=\sum_{k=0}^n a_k {n \choose k} x^k (1-x)^{n-k},$$ where $a_0, ..., a_n$ are the polynomial's Bernstein coefficients.

The degree-$n$ Bernstein polynomial of an arbitrary function $f(x)$ has Bernstein coefficients $a_k = f(k/n)$. In general, this Bernstein polynomial differs from $f$ even if $f$ is a polynomial.

Main Question

Suppose $f:[0,1]\to [0,1]$ is continuous and belongs to a large class of functions (for example, the $k$-th derivative, $k\ge 0$, is continuous, Lipschitz continuous, concave, strictly increasing, or bounded variation, or $f$ is real analytic).

  1. (Exact Bernoulli factory): Compute the Bernstein coefficients of a sequence of polynomials ($g_n$) of degree 2, 4, 8, ..., $2^i$, ... that converge to $f$ from below and satisfy: $(g_{2n}-g_{n})$ is a polynomial with non-negative Bernstein coefficients once it's rewritten to a polynomial in Bernstein form of degree exactly $2n$. (See Note 3 in "End Notes".) Assume $0\lt f(\lambda)\lt 1$ or $f$ is polynomially bounded.
  2. (Approximate Bernoulli factory): Given $\epsilon > 0$, compute the Bernstein coefficients of a polynomial or rational function (of some degree $n$) that is within $\epsilon$ of $f$.
  3. (Series expansion of simple functions): Find a non-negative random variable $X$ and a series $f(\lambda)=\sum_{a\ge 0}\gamma_a(\lambda)$ such that $\gamma_a(\lambda)/\mathbb{P}(X=a)$ (letting 0/0 equal 0) is a polynomial or rational function with rational Bernstein coefficients lying in $[0, 1]$. (See note 1 in "End Notes".)

The convergence rate must be $O(1/n^{r/2})$ if the class has only functions with Lipschitz-continuous $(r-1)$-th derivative. The method may not introduce transcendental or trigonometric functions (as with Chebyshev interpolants).

Solving the Bernoulli factory problem with polynomials

An algorithm (Łatuszyński et al. 2009/2011)[^3] simulates a factory function $f(\lambda)$ via two sequences of polynomials that converge from above and below to that function. Roughly speaking, the algorithm works as follows:

  1. Generate U, a uniform random variate in $[0, 1]$.
  2. Flip the input coin (with a probability of heads of $\lambda$), then build an upper and lower bound for $f(\lambda)$, based on the outcomes of the flips so far. In this case, these bounds come from two degree-$n$ polynomials that approach $f$ as $n$ gets large, where $n$ is the number of coin flips so far in the algorithm.
  3. If U is less than or equal to the lower bound, return 1. If U is greater than the upper bound, return 0. Otherwise, go to step 2.

The result of the algorithm is 1 with probability exactly equal to $f(\lambda)$, or 0 otherwise.

However, the algorithm requires the polynomial sequences to meet certain requirements; among them, the sequences must be of Bernstein-form polynomials that converge from above and below to a factory function. Specifically:

For $f(\lambda)$ there must be a sequence of polynomials ($g_n$) in Bernstein form of degree 1, 2, 3, ... that converge to $f$ from below and satisfy: $(g_{n+1}-g_{n})$ is a polynomial with non-negative Bernstein coefficients once it's rewritten to a polynomial in Bernstein form of degree exactly $n+1$ (see Note 3 in "End Notes"; Nacu and Peres (2005)[^4]; Holtz et al. (2011)[^5]). For $f(\lambda)=1-f(\lambda)$ there must likewise be a sequence of this kind.

A Matter of Efficiency

However, ordinary Bernstein polynomials converge to a function at the rate $\Omega(1/n)$ in general, a result known since Voronovskaya (1932)[^6] and a rate that will lead to an infinite expected number of coin flips in general. (See also my supplemental notes.)

But Lorentz (1966)[^7] showed that if the function is positive and has a continuous $k$-th derivative, there are polynomials with nonnegative Bernstein coefficients that converge at the rate $O(1/n^{k/2})$ (and thus can enable a finite expected number of coin flips if the function is "smooth" enough).

Thus, people have developed alternatives, including linear combinations and iterated Boolean sums of Bernstein polynomials, to improve the convergence rate. These include Micchelli (1973)[^8], Guan (2009)[^9], Güntürk and Li (2021a)[^10], (2021b)[^11], the "Lorentz operator" in Holtz et al. (2011)[^5], Draganov (2014), and Tachev (2022)[^12].

These alternative polynomials usually include results where the error bound is the desired $O(1/n^{k/2})$, but most of those results (e.g., Theorem 4.4 in Micchelli; Theorem 5 in Güntürk and Li) have hidden constants with no upper bounds given, making them unimplementable (that is, it can't be known beforehand whether a given polynomial will come close to the target function within a user-specified error tolerance).

A Conjecture on Polynomial Approximation

The following is a conjecture that could help reduce this problem to the problem of finding explicit error bounds when approximating a function by polynomials.

Let $f(\lambda):[0,1]\to(0,1)$ have $r\ge 1$ continuous derivatives, let $M$ be the maximum of the absolute value of $f$ and its derivatives up to the $r$-th derivative, and denote the Bernstein polynomial of degree $n$ of a function $g$ as $B_n(g)$. Let $W_{2^0}(\lambda), W_{2^1}(\lambda), ..., W_{2^i}(\lambda),...$ be a sequence of functions on [0, 1] that converge uniformly to $f$.

For each integer $n\ge 1$ that's a power of 2, suppose that there is $D>0$ such that— $$|f(\lambda)-B_n(W_n(\lambda))| \le DM/n^{r/2},$$ whenever $0\le \lambda\le 1$. Then there is $C_0\ge D$ such that for every $C\ge C_0$, the polynomials $(g_n)$ in Bernstein form of degree 2, 4, 8, ..., $2^i$, ..., defined as $g_n=B_n(W_n(\lambda) - CM/n^{r/2})$, converge from below to $f$ and satisfy: $(g_{2n}-g_{n})$ is a polynomial with non-negative Bernstein coefficients once it's rewritten to a polynomial in Bernstein form of degree exactly $2n$. (See Note 3 in "End Notes".)

Equivalently (see also Nacu and Peres (2005)[^4]), there is $C_1>0$ such that the inequality $(PB)$ (see below) holds true for each integer $n\ge 1$ that's a power of 2 (see "Strategies", below).

My goal is to see not just whether this conjecture is true, but also which value of $C_0$ (or $C_1$) suffices for the conjecture, especially for any combination of the special cases mentioned at the end of "Main Question", above.

Strategies

The following are some strategies for answering these questions:

Other Questions

Prove or disprove:

End Notes

Note 1: An example of $X$ is $\mathbb{P}(X=a) = p (1-p)^a$ where $0 < p < 1$ is a known rational. This question's requirements imply that $\sum_{a\ge 0}\max_\lambda |\gamma_a(\lambda)| \le 1$. The proof of Keane and O'Brien (1994)[^1] produces a convex combination of polynomials with 0 and 1 as Bernstein coefficients, but the combination is difficult to construct (it requires finding maximums, for example) and so this proof does not appropriately answer this question.

Note 2: On pushdown automata: Etessami and Yannakakis (2009)[^16] showed that pushdown automata with rational probabilities are equivalent to recursive Markov chains (with rational transition probabilities), and that for every recursive Markov chain, the system of polynomial equations has nonnegative coefficients. But this paper doesn't deal with the case of recursive Markov chains where the transition probabilities cannot just be rational, but can also be $\lambda$ and $1-\lambda$ where $\lambda$ is an unknown rational or irrational probability of heads. Also, Banderier and Drmota (2014)[^17] showed the asymptotic behavior of power series solutions $f(\lambda)$ of a polynomial system, where both the series and the system have nonnegative real coefficients. Notably, functions of the form $\lambda^{1/p}$ where $p\ge 3$ is not a power of 2, are not possible solutions, because their so-called "critical exponent" is not dyadic. But the result seems not to apply to piecewise power series such as $\min(\lambda,1-\lambda)$, which are likewise algebraic functions.

Note 3: This condition is also known as a "consistency requirement"; it ensures that not only the polynomials "increase" to $f(\lambda)$, but also their Bernstein coefficients do as well. This condition is equivalent in practice to the following statement (Nacu & Peres 2005)[^4]. For every integer $n\ge 1$ that's a power of 2, $a(2n, k)\ge\mathbb{E}[a(n, X_{n,k})]= \left(\sum_{i=0}^k a(n,i) {n\choose i}{n\choose {k-i}}/{2n\choose k}\right)$, where $a(n,k)$ is the degree-$n$ polynomial's $k$-th Bernstein coefficient, where $0\le k\le 2n$ is an integer, and where $X_{n,k}$ is a hypergeometric($2n$, $k$, $n$) random variable. A hypergeometric($2n$, $k$, $n$) random variable is the number of "good" balls out of $n$ balls taken uniformly at random, all at once, from a bag containing $2n$ balls, $k$ of which are "good". See also my MathOverflow question on finding bounds for hypergeometric variables.

Note 4: If $W_n(0)=f(0)$ and $W_n(1)=f(1)$ for every $n$, then the inequality $(PB)$ is automatically true when $k=0$ and $k=2n$, so that the statement has to be checked only for $0\lt k\lt 2n$. If, in addition, $W_n$ is symmetric about 1/2, so that $W_n(\lambda)=W_n(1-\lambda)$ whenever $0\le \lambda\le 1$, then the statement has to be checked only for $0\lt k\le n$ (since the values $\sigma_{n,k,i} = {n\choose i}{n\choose {k-i}}/{2n \choose k}$ are symmetric in that they satisfy $\sigma_{n,k,i}=\sigma_{n,k,k-i}$).
This question is a problem of finding the Jensen gap of $W_n$ for certain kinds of hypergeometric random variables (see Note 3). Lee et al. (2021)[^18] deal with a problem very similar to this one and find results that take advantage of $f$'s (here, $W_n$'s) smoothness, but unfortunately assume the variable is supported on an open interval, rather than a closed one (namely $[0,1]$) as in this question.
Special cases for this question are if $W_n = 2 f - B_n(f)$ and $r$ is 3 or 4, or $W_n = B_n(B_n(f))+3(f-B_n(f))$ and $r$ is 5 or 6.

Particularly for the case $W_n=2f-B_n(f)$, the right-hand side of $(PB)$ is believed to be $O(1/n^{3/2})$ when $f$ has a Lipschitz continuous second derivative on $[0,1]$, but I have been unable to find a bound better than $O(1/n)$, especially because in one form or another my attempts at the bound seem to require an estimate of $|B_{2n}(f)-B_{n}(f)|$, which in general is no better than $O(1/n)$. Thus, a proof or counterexample of a bound of $O(1/n^{3/2})$ in this case would be appreciated.

Notes

[^1]: Keane, M. S., and O'Brien, G. L., "A Bernoulli factory", ACM Transactions on Modeling and Computer Simulation 4(2), 1994.

[^2]: See also the following questions on Mathematics Stack Exchange and MathOverflow: Converging polynomials, Error bounds, A conjecture, Hypergeometric random variable, Lorentz operators, Derivatives of moments, Series representations.

[^3]: Łatuszyński, K., Kosmidis, I., Papaspiliopoulos, O., Roberts, G.O., "Simulating events of unknown probabilities via reverse time martingales", arXiv:0907.4018v2 [stat.CO], 2009/2011.

[^4]: Nacu, Şerban, and Yuval Peres. "Fast simulation of new coins from old", The Annals of Applied Probability 15, no. 1A (2005): 93-115.

[^5]: Holtz, O., Nazarov, F., Peres, Y., "New Coins from Old, Smoothly", Constructive Approximation 33 (2011).

[^6]: E. Voronovskaya, "Détermination de la forme asymptotique d'approximation des fonctions par les polynômes de M. Bernstein", 1932.

[^7]: G.G. Lorentz, "The degree of approximation by polynomials with positive coefficients", 1966.

[^8]: Micchelli, Charles. "The saturation class and iterates of the Bernstein polynomials", Journal of Approximation Theory 8, no. 1 (1973): 1-18.

[^9]: Guan, Zhong. "Iterated Bernstein polynomial approximations." arXiv preprint arXiv:0909.0684 (2009).

[^10]: Güntürk, C. Sinan, and Weilin Li. "Approximation with one-bit polynomials in Bernstein form", arXiv:2112.09183 (2021); Constructive Approximation, pp.1-30 (2022).

[^11]: Güntürk, C. Sinan, and Weilin Li. "Approximation of functions with one-bit neural networks", arXiv:2112.09181 (2021).

[^12]: Tachev, Gancho. "Linear combinations of two Bernstein polynomials", Mathematical Foundations of Computing, 2022.

[^13]: Butzer, P.L., "Linear combinations of Bernstein polynomials", Canadian Journal of Mathematics 15 (1953).

[^14]: Mossel, Elchanan, and Yuval Peres. New coins from old: computing with unknown bias. Combinatorica, 25(6), pp.707-724, 2005.

[^15]: Wästlund, J., "Functions arising by coin flipping", 1999.

[^16]: Etessami, K. And Yannakakis, M., "Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations", Journal of the ACM 56(1), pp.1-66, 2009.

[^17]: Banderier, C. And Drmota, M., 2015. Formulae and asymptotics for coefficients of algebraic functions. Combinatorics, Probability and Computing, 24(1), pp.1-53.

[^18]: Lee, Sang Kyu, Jae Ho Chang, and Hyoung-Moon Kim. "Further sharpening of Jensen's inequality." Statistics 55, no. 5 (2021): 1154-1168.