# Randomized Estimation Algorithms

## Introduction

This page presents general-purpose algorithms for estimating the mean value of a stream of random variates, or estimating the mean value of a function of those numbers. The estimates are either *unbiased* (they have no systematic bias from the true mean value), or they come close to the true value with a user-specified error tolerance.

The algorithms are described to make them easy to implement by programmers.

## Concepts

The following concepts are used in this document.

Each algorithm takes a stream of independent random variates (numbers). These numbers follow a *probability distribution* or simply *distribution*, or a rule that says which kinds of numbers are more likely to occur than others. A distribution has the following properties.

- The
*expectation*,*expected value*, or*mean*is the average value of the distribution. It is expressed as**E**[*X*], where*X*is a number taken from the stream. In other words, take random samples and then take their average. The average will approach the expected value as*n*gets large (the*law of large numbers*). - An
*n*is the expected value of^{th}moment*X*^{n}. In other words, take random samples, raise them to the power*n*, then take their average. Then with probability 1, the average approaches the*n*^{th}moment as*n*gets large. - An
*n*is the expected value of (^{th}central moment (about the mean)*X*−*μ*)^{n}, where*μ*is the distribution's mean. The 2nd central moment is called*variance*. - An
*n*(c.a.m.) is the expected value of abs(^{th}central absolute moment*X*−*μ*)^{n}, where*μ*is the distribution's mean. This is the same as the central moment when*n*is even.

Some distributions don't have an *n*^{th} moment for a particular *n*. This usually means the *n*^{th} power of the stream's numbers varies so wildly that it can't be estimated accurately. If a distribution has an *n*^{th} moment, it also has a *k*^{th} moment for any *k* in the interval [1, *n*).

For any estimation algorithm, the *relative error* is abs(*est*/*trueval*) − 1, where *est* is the estimate and *trueval* is the true expected value.

## A Relative-Error Algorithm for a Bernoulli Stream

The following algorithm from Huber (2017)^{(1)} estimates the probability that a stream of random zeros and ones produces the number 1. The algorithm's relative error is independent of that probability, however, and the algorithm produces *unbiased* estimates. Specifically, the stream of numbers has the following properties:

- The stream produces only zeros and ones (that is, the stream follows the
**Bernoulli distribution**). - The stream of numbers can't take on the value 0 with probability 1.
- The stream's mean (expected value) is unknown.

The algorithm, also known as *Gamma Bernoulli Approximation Scheme*, has the following parameters:

*ε*,*δ*: Both parameters must be greater than 0, and*ε*must be 3/4 or less, and*δ*must be less than 1.

With this algorithm, the relative error will be no greater than *ε* with probability 1 − *δ* or greater. However, the estimate can be higher than 1 with probability greater than 0.

The algorithm, called **Algorithm A** in this document, follows.

- Calculate the minimum number of samples
*k*. There are two suggestions. The simpler one is*k*= ceil(−6*ln(2/*δ*)/(*ε*^{2}*(4**ε*−3))). A more complicated one is the smallest integer*k*such that gammainc(*k*,(*k*−1)/(1+*ε*)) + (1 − gammainc(*k*,(*k*−1)/(1−*ε*))) ≤*δ*, where gammainc is the regularized lower incomplete gamma function. - Take samples from the stream until
*k*1's are taken this way. Let*r*be the total number of samples taken this way. - Generate
*g*, a gamma random variate with shape parameter*r*and scale 1, then return (*k*−1)/*g*.

Notes:

As noted in Huber 2017, if we have a stream of random variates that take on values in the interval [0, 1], but have unknown mean, we can transform each number by—

- generating a uniform(0, 1) random variate
u, then- changing that number to 1 if
uis less than that number, or 0 otherwise,and we can use the new stream of zeros and ones in the algorithm to get an unbiased estimate of the unknown mean.

- As can be seen in Feng et al. (2016)
^{(2)}, the following is equivalent to steps 2 and 3 ofAlgorithm A: "Let G be 0. Do thisktimes: 'Flip a coin until it shows heads, letrbe the number of flips (including the last), generate a gamma random variate with shape parameterrand scale 1, and add that variate to G.' The estimated probability of heads is then (k−1)/G.", and the following is likewise equivalent if the stream of random variates follows a (zero-truncated) "geometric" distribution with unknown mean: "Let G be 0. Do thisktimes: 'Take a sample from the stream, call itr, generate a gamma random variate with shape parameterrand scale 1, and add that variate to G.' The estimated mean is then (k−1)/G." (This is with the understanding that the geometric distribution is defined differently in different academic works.) The geometric algorithm produces unbiased estimates just likeAlgorithm A.- The generation of a gamma random variate and the division by that variate can cause numerical errors in practice, such as rounding and cancellations, unless care is taken.

## A Relative-Error Algorithm for a Bounded Stream

The following algorithm comes from Huber and Jones (2019)^{(3)}; see also Huber (2017)^{(4)}. It estimates the expected value of a stream of random variates with the following properties:

- The numbers in the stream lie in the closed interval [0, 1].
- The stream of numbers can't take on the value 0 with probability 1.
- The stream's mean (expected value) is unknown.

The algorithm has the following parameters:

*ε*,*δ*: Both parameters must be greater than 0, and*ε*must be 1/8 or less, and*δ*must be less than 1.

With this algorithm, the relative error will be no greater than *ε* with probability 1 − *δ* or greater. However, the estimate has a nonzero probability of being higher than 1.

This algorithm is not guaranteed to produce unbiased estimates.

The algorithm, called **Algorithm B** in this document, follows.

- Set
*k*to ceil(2*ln(6/*δ*)/*ε*^{2/3}). - Set
*b*to 0 and*n*to 0. - (Stage 1: Modified gamma Bernoulli approximation scheme.) While
*b*is less than*k*:- Add 1 to
*n*. - Take a sample from the stream, call it
*s*. - With probability
*s*(for example, if a newly generated uniform(0, 1) random variate is less than*s*), add 1 to*b*.

- Add 1 to
- Set
*gb*to*k*+ 2, then generate a gamma random variate with shape parameter*n*and scale 1, then divide*gb*by that variate. - (Find the sample size for the next stage.) Set
*c1*to 2*ln(3/*δ*). - Generate a Poisson random variate with mean
*c1*/(*ε***gb*), call it*n*. - Run the standard deviation sub-algorithm (given later)
*n*times. Set*A*to the number of 1's returned by that sub-algorithm this way. - Set
*csquared*to (*A*/*c1*+ 1 / 2 + sqrt(*A*/*c1*+ 1 / 4)) * (1 +*ε*^{1 / 3})^{2}**ε*/*gb*. - Set
*n*to ceil((2*ln(6/*δ*)/*ε*^{2})/(1−*ε*^{1/3})), or an integer greater than this. - (Stage 2: Light-tailed sample average.) Set
*e0*to*ε*^{1/3}. - Set
*mu0*to*gb*/(1−*e0*^{2}). - Set
*alpha*to*ε*/(*csquared***mu0*). - Set
*w*to*n***mu0*. - Do the following
*n*times:- Get a sample from the stream, call it
*g*. Set*s*to*alpha**(*g*−*mu0*). - If
*s*≥0, add ln(1+*s*+*s***s*/2)/*alpha*to*w*. Otherwise, subtract ln(1−*s*+*s***s*/2)/*alpha*from*w*.

- Get a sample from the stream, call it
- Return
*w*/*n*.

The standard deviation sub-algorithm follows.

- Generate an unbiased random bit. If that bit is 1 (which happens with probability 1/2), return 0.
- Get two samples from the stream, call them
*x*and*y*. - Generate a uniform(0, 1) random variate, call it
*u*. - If
*u*is less than (*x*−*y*)^{2}, return 1. Otherwise, return 0.

Notes:

- As noted in Huber and Jones, if the stream of random variates takes on values in the interval [0,
m], wheremis a known number, we can divide the stream's numbers bymbefore using them inAlgorithm B, and the algorithm will still work.- While this algorithm is exact in theory (assuming computers can store real numbers of any precision), practical implementations of it can cause numerical errors, such as rounding and cancellations, unless care is taken.

## An Absolute-Error Adaptive Algorithm

The following algorithm comes from Kunsch et al. (2019)^{(5)}. It estimates the mean of a stream of random variates with the following properties:

- The distribution of numbers in the stream has a finite
*q*^{th}c.a.m. and*p*^{th}c.a.m. (also called*q*^{th}c.a.m. and*p*^{th}c.a.m., respectively, in this section). - The exact
*q*^{th}c.a.m. and*p*^{th}c.a.m. need not be known in advance. - The
*q*^{th}c.a.m.'s*q*^{th}root divided by the*p*^{th}c.a.m.'s*p*^{th}root is no more than*κ*, where*κ*is 1 or greater. (Note that the*q*^{th}c.a.m.'s*q*^{th}root is also known as*standard deviation*if*q*= 2, and*mean deviation*if*q*= 1; similarly for*p*.)

The algorithm works by first estimating the *p*^{th} c.a.m. of the stream, then using the estimate to determine a sample size for the next step, which actually estimates the stream's mean.

This algorithm is not guaranteed to produce unbiased estimates.

The algorithm has the following parameters:

*ε*,*δ*: Both parameters must be greater than 0, and*δ*must be less than 1. The algorithm will return an estimate within*ε*of the true expected value with probability 1 −*δ*or greater, and the estimate will not go beyond the bounds of the stream's numbers. The algorithm is not guaranteed to maintain a finite mean squared error or expected error in its estimates.*p*: The degree of the*p*^{th}c.a.m. that the algorithm will estimate to determine the mean.*q*: The degree of the*q*^{th}c.a.m.*q*must be greater than*p*.*κ*: Maximum value allowed for the following value: the*q*^{th}c.a.m.'s*q*^{th}root divided by the*p*^{th}c.a.m.'s*p*^{th}root. (If*p*= 2 and*q*= 4, this is the maximum value allowed for the kurtosis's 4th root (Hickernell et al. 2012)^{(6)}^{(7)}.)*κ*may not be less than 1.

Both *p* and *q* must be 1 or greater and are usually integers.

For example:

- With parameters
*p*= 2,*q*= 4,*ε*= 1/10,*δ*= 1/16,*κ*= 1.1, the algorithm assumes the stream's numbers are distributed so that the kurtosis's 4th root, that is, the 4th c.a.m.'s 4th root (*q*=4) divided by the standard deviation (*p*=2), is no more than 1.1 (or alternatively, the kurtosis is no more than 1.1^{4}= 1.4641), and will return an estimate that's within 1/10 of the true mean with probability at least (1 − 1/16) or 15/16. - With parameters
*p*= 1,*q*= 2,*ε*= 1/10,*δ*= 1/16,*κ*= 2, the algorithm assumes the stream's numbers are distributed so that the standard deviation (*q*=2) divided by the mean deviation (*p*=1) is no more than 2, and will return an estimate that's within 1/10 of the true mean with probability at least (1 − 1/16) or 15/16.

The algorithm, called **Algorithm C** in this document, follows.

- If
*κ*is 1:- Set
*n*to ceil(ln(1/*δ*)/ln(2))+1 (or an integer greater than this). - Get
*n*samples from the stream and return (*mn*+*mx*)/2, where*mx*is the highest sample and*mn*is the lowest.

- Set
- Set
*k*to ceil((2*ln(1/*δ*))/ln(4/3)). If*k*is even, add 1 to*k*. - Set
*kp*to*k*. - Set
*κ*to*κ*^{(p*q/(q−p))}. - If
*q*is 2 or less:- Set
*m*to ceil(3**κ**48^{1/(q−1)}) (or an integer greater than this); set*s*to 1+1/(*q*−1); set*h*to 16^{1/(q−1)}**κ*/*ε*^{s}.

- Set
- If
*q*is greater than 2:- Set
*m*to ceil(144**κ*); set*s*to 2; set*h*to 16**κ*/*ε*^{s}.

- Set
- (Stage 1: Estimate
*p*^{th}c.a.m. to determine number of samples for stage 2.) Create*k*many blocks. For each block:- Get
*m*samples from the stream. - Add the samples and divide by
*m*to get this block's sample mean,*mean*. - Calculate the estimate of the
*p*^{th}c.a.m. for this block, which is: (*∑*_{i = 0, ..., k−1}(*block*[*i*] −*mean*)^{p})/*m*, where*block*[*i*] is the sample at position*i*of the block (positions start at 0).

- Get
- (Find the median of the
*p*^{th}c.a.m. estimates.) Sort the estimates calculated by step 7 in ascending order, and set*median*to the value in the middle of the sorted list (at position floor(*k*/2) with positions starting at 0); this works because*k*is odd. - (Calculate sample size for the next stage.) Set
*mp*to max(1, ceil(*h***median*^{s})), or an integer greater than this. - (Stage 2: Estimate of the sample mean.) Create
*kp*many blocks. For each block:- Get
*mp*samples from the stream. - Add the samples and divide by
*mp*to get this block's sample mean.

- Get
- (Find the median of the sample means. This is definitely an unbiased estimate of the mean when
*kp*is 1 or 2, but unfortunately, it isn't one for any*kp*> 2.) Sort the sample means from step 10 in ascending order, and return the value in the middle of the sorted list (at position floor(*kp*/2) with positions starting at 0); this works because*kp*is odd.

Notes:

- If the stream of random variates meets the condition for
Algorithm Cfor a givenq,p, andκ, then it still meets that condition when those numbers are multiplied by a constant or a constant is added to them.- Theorem 3.4 of Kunsch et al. (2019)
^{(5)}shows that there is no mean estimation algorithm that—

- produces an estimate within a user-specified error tolerance (in terms of
absolute error, as opposed torelative error) with probability greater than a user-specified value, and- works for all streams whose distribution is known only to have finite moments (the moments are bounded but the bounds are unknown).

Examples:

- To estimate the probability of heads of a coin that produces either 1 with an unknown probability in the interval [
μ, 1−μ], or 0 otherwise, we can takeq= 4,p= 2, andκ≥ (1/min(μ, 1−μ))^{1/4}(Kunsch et al. 2019, Lemma 3.6).- The kurtosis of a Poisson distribution with mean
μis (3 + 1/μ). Thus, for example, to estimate the mean of a stream of Poisson variates with meanνor greater but otherwise unknown, we can takeq= 4,p= 2, andκ≥ (3 + 1/ν)^{1/4}.- The kurtosis of an exponential distribution is 9 regardless of its rate. Thus, to estimate the mean of a stream of exponential variates with unknown mean, we can take
q= 4,p= 2, andκ≥ 9^{1/4}= sqrt(3).

## Estimating a Function of the Mean

*Algorithm C* can be used to estimate a function of the mean of a stream of random variates with unknown mean. Specifically, the goal is to estimate *f*(**E**[**z**]), where:

**z**is a number produced by the stream. Each number produced by the stream must lie in the interval [0, 1].*f*is a known continuous function that maps the closed interval [0, 1] to [0, 1].- The stream's numbers can take on a single value with probability 1.

The following algorithm takes the following parameters:

*p*,*q*, and*κ*are as defined in*Algorithm C*.*ε*,*δ*: The algorithm will return an estimate within*ε*of*f*(**E**[**z**]) with probability 1 −*δ*or greater, and the estimate will be in the interval [0, 1].

The algorithm, like *Algorithm C*, works only if the stream's distribution has the following technical property: The *q*^{th} c.a.m.'s *q*^{th} root divided by the *p*^{th} c.a.m.'s *p*^{th} root is no more than *κ*, where *κ* is 1 or greater. The algorithm, called **Algorithm D** in this document, follows.

- Calculate
*γ*as a number equal to or less than*ψ*(*ε*), or the*inverse modulus of continuity*, which is found by taking the so-called*modulus of continuity*of*f*(*x*), call it*ω*(*h*), and solving the equation*ω*(*h*) =*ε*for*h*.- Loosely speaking, a modulus of continuity
*ω*(*h*) gives the maximum range of*f*in a window of size*h*. - For example, if
*f*'s slope is never vertical and*f*has only finitely many "sharp turns", then*f*is*Lipschitz continuous*and its modulus of continuity is*ω*(*h*) =*M***h*, where*M*is the Lipschitz constant, which in this case is the maximum absolute value of*f*'s "slope function". The solution for*ψ*is then*ψ*(*ε*) =*ε*/*M*. - Because
*f*is continuous on a closed interval, it's guaranteed to have a modulus of continuity (by the Heine–Cantor theorem; see also a**related question**).

- Loosely speaking, a modulus of continuity
- Run
*Algorithm C*with the given parameters*p*,*q*,*κ*, and*δ*, but with*ε*=*γ*. Let*μ*be the result. - Return
*f*(*μ*).

A simpler version of *Algorithm D* was given as an answer to the linked-to question; see also Jiang and Hickernell (2014)^{(8)}. As with *Algorithm D*, this algorithm will return an estimate within *ε* of *f*(**E**[**z**]) with probability 1 − *δ* or greater, and the estimate will be in the interval [0, 1]. The algorithm, called **Algorithm E** in this document, follows.

- Calculate
*γ*as given in step 1 of*Algorithm D*. - (Calculate the sample size.) Set
*n*to ceil(ln(2/*δ*)/(2**γ*^{2})). (As the answer notes, this sample size is based on Hoeffding's inequality.) - (Calculate the sample mean.) Get
*n*samples from the stream, sum them, then divide the sum by*n*, then call the result*μ*. Return*f*(*μ*).

If the stream is **unbounded** (can take on any real number) and its distribution has a **known upper bound on the standard deviation** *σ* (**or the variance** *σ*^{2}), then a similar algorithm follows from Chebyshev's inequality. This was mentioned as Equation 14 in Hickernell et al. (2012/2013)^{(6)}, but is adapted to find the mean for *f*(*x*), which must be bounded and continuous on every closed interval of the real line. The algorithm will return an estimate within *ε* of *f*(**E**[**z**]) with probability 1 − *δ* or greater, and the estimate will not go beyond the bounds of the stream's numbers. The algorithm, called **Algorithm F** in this document, follows.

- Calculate
*γ*as given in step 1 of*Algorithm D*. - (Calculate the sample size.) Set
*n*to ceil(*σ*^{2}/(*δ***γ*^{2})). - (Calculate the sample mean.) Get
*n*samples from the stream, sum them, then divide the sum by*n*, then call the result*μ*. Return*f*(*μ*).

Notes:

Algorithm DandAlgorithm Ewon't work in general whenf(x) has jump discontinuities (this happens in general whenfis piecewise continuous, or made up of independent continuous pieces that cover all of [0, 1]), at least whenεis equal to or less than the maximum jump among all the jump discontinuities (see also arelated question).Algorithm Ecan be used to build so-called "approximate Bernoulli factories", or algorithms that approximately sample the probabilityf(λ) given a coin with probability of heads ofλ. In this case, the stream of numbers should produce only zeros and ones (and thus follow theBernoulli distribution), andfshould also be a continuous function. The approximate Bernoulli factory would work as follows: After runningAlgorithm Eand getting an estimate, generate a uniform random variate in [0, 1), then return 1 if the number is less than the estimate, or 0 otherwise.

Algorithm DandAlgorithm Ecan be adapted to apply to streams outputting numbers in a bounded interval [a,b] (whereaandbare known rational numbers), but with unknown mean, and withfbeing a continuous function that maps [a,b] to [a,b], as follows:

- For each number in the stream, subtract
afrom it, then divide it by (b−a).- Instead of
ε, takeε/(b−a).- If the algorithm would return
f(μ), instead returng(μ) whereg(μ) =f(a+ (μ*(b−a))).

Algorithm EandAlgorithm Fare not unbiased estimators in general. However, whenf(x) =x, the sample mean used by both algorithms is an unbiased estimator of the mean as long as the sample sizenis unchanged.

Examples:

- Take
f(x) = sin(π*x*4)/2 + 1/2. This is a Lipschitz continuous function with Lipschitz constant 2*π, so for thisf,ψ(ε) =ε/(2*π). Now, if we have a coin that produces heads with an unknown probability in the interval [μ, 1−μ], or 0 otherwise, we can runAlgorithm DorAlgorithm Ewithq= 4,p= 2, andκ≥ (1/min(μ, 1−μ))^{1/4}(see the section onAlgorithm C).- Take
f(x) =x. This is a Lipschitz continuous function with Lipschitz constant 1, so for thisf,ψ(ε) =ε/1.- The variance of a Poisson distribution with mean
μisμ. Thus, for example, to estimate the mean of a stream of Poisson variates with meanνor less but otherwise unknown, we can takeσ= sqrt(ν) so that the sample sizenis ceil(σ^{2}/(δ*ε^{2})), in accordance withAlgorithm F.

## Randomized Integration

Monte Carlo integration is a randomized way to estimate the integral ("area under the graph") of a function. *Algorithm C* can be used to estimate an integral of a function *h*(**z**), with the following properties:

*h*(**z**) is a multidimensional function that takes an*n*-dimensional vector and returns a real number.*h*(**z**) is usually a function that's easy to evaluate but whose integral is hard to calculate.**z**is an*n*-dimensional vector chosen at random in the sampling domain.

The estimate will come within *ε* of the true integral with probability 1 − *δ* or greater, as long as the following conditions are met:

- The
*q*^{th}c.a.m. for*h*(**z**) is finite. That is,**E**[abs(*h*(**z**)−**E**[*h*(**z**)])^{q}] is finite. - The
*q*^{th}c.a.m.'s*q*^{th}root divided by the*p*^{th}c.a.m.'s*p*^{th}root is no more than*κ*, where*κ*is 1 or greater.

Unfortunately, these conditions may be hard to verify in practice, especially when the distribution *h*(**z**) is not known. (In fact, **E**[*h*(**z**)], as seen above, is the unknown integral that we seek to estimate.)

For this purpose, each number in the stream of random variates is generated as follows (see also Kunsch et al.):

- Set
**z**to an*n*-dimensional vector (list of*n*numbers) chosen at random in the sampling domain, independently of any other choice. Usually,**z**is chosen*uniformly*at random this way (see note later in this section). - Calculate
*h*(**z**), and set the next number in the stream to that value.

Alternatively, if *h*(**z**) can take on only numbers in the closed interval [0, 1], the much simpler *Algorithm E* can be used on the newly generated stream (taking *f*(*x*) = *x*), rather than *Algorithm C*.

The following example (coded in Python for the SymPy computer algebra library) shows how to find parameter *κ* for estimating the integral of min(*Z1*, *Z2*) where *Z1* and *Z2* are each uniformly chosen at random in the interval [0, 1]. It assumes *p* = 2 and *q* = 4. (This is a trivial example because we can calculate the integral directly — 1/3 — but it shows how to proceed for more complicated cases.)

# Distribution of Z1 and Z2 u1=Uniform('U1',0,1) u2=Uniform('U2',0,1) # Function to estimate func = Min(u1,u2) emean=E(func) p = S(2) # Degree of p-moment q = S(4) # Degree of q-moment # Calculate value for kappa kappa = E(Abs(func-emean)**q)**(1/q) / E(Abs(func-emean)**p)**(1/p) pprint(Max(1,kappa))

Note:As an alternative to the usual process of choosing a point uniformly in thewholesampling domain,stratified sampling(Kunsch and Rudolf 2018)^{(9)}, which divides the sampling domain in equally sized boxes and finds the mean of random points in those boxes, can be described as follows (assuming the sampling domain is thed-dimensional hypercube [0, 1]^{d}):

- For a sample size
n, setmto floor(n^{1/d}), wheredis the number of dimensions in the sampling domain (number of components of each point). Setsto 0.- For each
i[1]in [0,m), do: For eachi[2]in [0,m), do: ..., For eachi[d]in [0,m), do:

- For each dimension
jin [1,d], setp[j]to a number in the half-open interval [i[j]/m, (i[j]+1)/m) chosen uniformly at random.- Add
f((p[1],p[2], ...,p[j])) tos.- Return
s/m^{d}.The paper also implied a sample size

nfor use in stratified sampling whenfisβ-Hölder continuous (is continuous and no "steeper" thanz^{β}) and is defined on [0, 1]^{d}, namelyn= ceil((ln(2/δ)/2*ε^{2})^{d/(2*β+d)}).

## Finding Biased Coins

Given *m* coins with unknown probability of heads, the following algorithm finds the *k* coins that are most likely to show heads, such that the algorithm correctly finds them with probability at least 1 − *δ*. It uses the following parameters:

*k*is the number of coins to return.*δ*is the confidence level; the algorithm correctly finds the most biased coins with probability at least 1 −*δ*.*D*is a*gap parameter*or a lesser number, but must be 0 or greater. The*gap parameter*is the difference between the*k*^{th}most biased coin and the (*k*+1)^{th}most biased coin. Practically speaking,*D*is the smallest possible difference between one probability of heads and another.*r*is the number of rounds to run the algorithm and must be an integer 1 or greater.

In this section, ilog(*a*, *r*) means either *a* if *r* is 0, or max(ln(ilog(*a*, *r*−1)), 1) otherwise.

Agarwal et al. (2017)^{(10)} called this algorithm "aggressive elimination", and it can be described as follows.

- Let
*t*be ceil((ilog(*m*,*r*) + ln(82/(*k*/*δ*))\*D***D*)). - For each integer
*i*in [1,*m*], flip the coin labeled*i*,*t*many times, then set*P*[*i*] to a list of two items: first is the number of times coin*i*showed heads, and second is the label*i*. - Sort the
*P*[*i*] in decreasing order by their values. - If
*r*is 1, return the labels to the first*k*items in the list*P*, and the algorithm is done. - Set
*μ*to ceil(*k*+*m*/ilog(*m*,*r*− 1)). - Let
*C*be the coins whose labels are given in the first*μ*items in the list (these are the*μ*many coins found to be the most biased by this algorithm). - If
*μ*≤ 2**k*, do a recursive run of this algorithm, using only the coins in*C*and with*δ*=*δ*/2 and*r*= 1. - If
*μ*> 2**k*, do a recursive run of this algorithm, using only the coins in*C*and with*δ*=*δ*/2 and*r*=*r*− 1.

## Requests and Open Questions

Let *X* be an endless stream of random variates and let *f*(*x*) be a known continuous function.

Is there an algorithm, besides

*Algorithm C*or*Algorithm F*, that can find**E**[*X*] (or*f*(**E**[*X*])) with either a high probability of a "small" absolute error or one of a "small" relative error, when the distribution of*X*is unbounded, and additional assumptions on the distribution of*X*apply, such as—- being unimodal (having one peak) and symmetric (mirrored on each side of the peak), and/or
- following a geometric distribution, and/or
- having decreasing or nonincreasing probabilities?

Notice that merely having finite moments is not enough (Theorem 3.4, Kunsch et al. 2019

^{(5)}). Here, the accuracy tolerances for small error and high probability are user-specified. A relative-error algorithm for**E**[*X*] for the geometric distribution was given already in a note.How can

*Algorithm D*or*Algorithm E*be adapted to a known discontinuous function*g*, so that the algorithm finds*g*(**E**[*X*]) with either a high probability of a "small" absolute error or one of a "small" relative error at all points in [0, 1] except at a "negligible" area around*g*'s discontinuities? Is it enough to replace*g*with a continuous function*f*that equals*g*everywhere except at that "negligible" area? Here, the accuracy tolerances for small error, high probability, and "negligible" area are user-specified. Perhaps the tolerance could be defined as the integral (area under the graph) of absolute differences between*f*and*f*instead of "negligible area"; in that case, how should the continuous*f*be built?Is it true that

*Algorithm F*remains valid when the sample size*n*is ceil(abs(*M*)/(*δ***γ*^{k})), given that the stream's distribution is known to have a maximum*k*^{th}central absolute moment of*M*?

## Notes

^{(1)}Huber, M., 2017. A Bernoulli mean estimate with known relative error distribution. Random Structures & Algorithms, 50(2), pp.173-182. (preprint in arXiv:1309.5413v2 [math.ST], 2015).^{(2)}Feng, J. et al. “Monte Carlo with User-Specified Relative Error.” (2016).^{(3)}Huber, Mark, and Bo Jones. "Faster estimates of the mean of bounded random variables." Mathematics and Computers in Simulation 161 (2019): 93-101.^{(4)}Huber, Mark, "**An optimal(**", arXiv:1706.01478, 2017.*ε*,*δ*)-approximation scheme for the mean of random variables with bounded relative variance^{(5)}Kunsch, Robert J., Erich Novak, and Daniel Rudolf. "Solvable integration problems and optimal sample size selection." Journal of Complexity 53 (2019): 40-67. Also in**https://arxiv.org/pdf/1805.08637.pdf**.^{(6)}Hickernell, F.J., Jiang, L., et al., "**Guaranteed Conservative Fixed Width Intervals via Monte Carlo Sampling**", arXiv:1208.4318v3 [math.ST], 2012/2013.^{(7)}As used here, kurtosis is the 4th c.a.m. divided by the square of the 2nd c.a.m.^{(8)}Jiang, L., Hickernell, F.J., "**Guaranteed Monte Carlo Methods for Bernoulli Random Variables**", arXiv:1411.1151 [math.NA], 2014.^{(9)}Kunsch, R.J., Rudolf, D., "**Optimal confidence for Monte Carlo integration of smooth functions**", arXiv:1809.09890, 2018.^{(10)}Agarwal, A., Agarwal, S., et al., "Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons",*Proceedings of Machine Learning Research*65 (2017).

## License

Any copyright to this page is released to the Public Domain. In case this is not possible, this page is also licensed under **Creative Commons Zero**.