The Sampling Problem

Peter Occil

This page is about a mathematical problem of sampling a probability distribution with unknown parameters. This problem can be described as sampling from a new distribution using an endless stream of random variates from an incompletely known distribution.

Suppose there is an endless stream of numbers, each generated at random and independently from each other, and as many numbers can be sampled from the stream as desired. Let $(X_0, X_1, X_2, X_3, …)$ be that endless stream, and call the numbers input values.

Let InDist be the probability distribution of these input values, and let $\lambda$ be an unknown parameter that determines the distribution InDist, such as its expected value (or mean or “long-run average”). Suppose the problem is to produce a random variate with a distribution OutDist that depends on the unknown parameter $\lambda$. Then, of the algorithms in the section “Sampling Distributions Using Incomplete Information”:

In all cases given above, each input value is independent of everything else.

There are numerous other cases of interest that are not covered in the algorithms above. An example is the case of Algorithm 5 except InDist is any discrete distribution, not just Bernoulli. 5 An interesting topic is to answer the following: In which cases (and for which functions $f$) can the problem be solved…

The answers to these questions will depend on—

An additional question is to find lower bounds on the input/output ratio that an algorithm can achieve as the number of inputs taken increases (e.g., Nacu and Peres (2005, Question 2)7).

My interest on the problem is in the existence and construction of simple-to-implement algorithms that solve the sampling problem given here. In addition, the cases that most interest me are when—

with or without other conditions.

Results

It should be noted that many special cases of the sampling problem have been studied and resolved in academic papers and books.

The problem here is one of bringing all these results together in one place.

The following are examples of results for this problem. The estimators are allowed to be randomized (to use outside randomness) unless specified otherwise.

There are also three other results on the existence of fixed-size and asymptotically unbiased estimators, but they are relatively hard to translate to this problem in a simple way: Liu and Brown (1993)16, Hirano and Porter (2012)17, Bickel and Lehmann (1969)18. Other results include Gajek (1995)19 (which has a result on building unbiased estimators from asymptotically unbiased ones), Rychlik (1995)20.

In a result closely related to the sampling problem, given a stream of independent random variates each distributed as $\varphi$ with probability $\lambda$ or as $Q$ otherwise (where $\varphi$ and $Q$ are probability distributions, $\varphi$ and $\lambda$ are known, and $Q$ is unknown), there is no way in general to generate a variate distributed as $Q$, even if values from $Q$ and $\varphi$ must come from the same set of numbers 21.

Question

For any case of the sampling problem, suppose the number of input values taken is random. If the number of inputs is allowed to depend on previously taken inputs, do more sequential unbiased estimators exist than otherwise?

Notes

License

Any copyright to this page is released to the Public Domain. In case this is not possible, this page is also licensed under Creative Commons Zero.

  1. Jacob, P.E., Thiery, A.H., “On nonnegative unbiased estimators”, Ann. Statist., Volume 43, Number 2 (2015), 769-784.  2 3 4 5

  2. Duvignau, R., “Maintenance et simulation de graphes aléatoires dynamiques”, Doctoral dissertation, Université de Bordeaux, 2015. 

  3. Lee, A., Doucet, A. and Łatuszyński, K., 2014. “Perfect simulation using atomic regeneration with application to Sequential Monte Carlo”, arXiv:1407.5770v1 [stat.CO]. 

  4. AKAHIRA, Masafumi, Kei TAKEUCHI, and Ken-ichi KOIKE. “Unbiased estimation in sequential binomial sampling”, Rep. Stat. Appl. Res., JUSE 39 1-13, 1992.  2

  5. Singh (1964, “Existence of unbiased estimates”, Sankhyā A 26) claimed that an estimation algorithm with expected value $f(\lambda)$ exists for a more general class of InDist distributions than the Bernoulli distribution, as long as there are polynomials that converge pointwise to $f$, and Bhandari and Bose (1990, “Existence of unbiased estimates in sequential binomial experiments”, Sankhyā A 52) claimed necessary conditions for those algorithms. However, Akahira et al. (1992) questioned the claims of both papers, and the latter paper underwent a correction, which I haven’t seen (Sankhyā A 55, 1993). 

  6. An algorithm that takes a finite number of inputs with probability 1 is also known as a closed sampling plan in papers and books about sequential estimation. 

  7. Nacu, Şerban, and Yuval Peres. “Fast simulation of new coins from old”, The Annals of Applied Probability 15, no. 1A (2005): 93-115. 

  8. Christman, M.C., Nayak, T.K., “Sequential unbiased estimation of the number of classes in a population”, Statistica Sinica 4(1), 1994.  2 3

  9. Christman and Nayak (1994) did not study the case when the estimator can use outside randomness or the case when $n$ is known to have a minimum of 2 or greater. Duvignau (2015) studied a closely related problem. 

  10. Lehmann, E.L., Theory of Point Estimation, 1983. 

  11. Paninski, Liam. “Estimation of Entropy and Mutual Information.” Neural Computation 15 (2003): 1191-1253. 

  12. R. Singh, “Existence of unbiased estimates”, Sankhyā A 26, 1964.  2

  13. In addition, Jacob and Thiery (2015) conjecture that this estimator exists if and only if $f$ is writable as $f(\lambda)=c_0 (\lambda-a)^0 + c_1 (\lambda-a)^1 + …$, where $c_0, c_1, …$ are all nonnegative. In that case, they showed that the random number of inputs need not depend on inputs already taken. 

  14. Keane, M. S., and O’Brien, G. L., “A Bernoulli factory”, ACM Transactions on Modeling and Computer Simulation 4(2), 1994.  2

  15. Goyal, V. and Sigman, K., 2012. On simulating a class of Bernstein polynomials. ACM Transactions on Modeling and Computer Simulation (TOMACS), 22(2), pp.1-5. 

  16. Liu., R.C., Brown, L.D., “Nonexistence of informative unbiased estimators in singular problems”, Annals of Statistics 21(1), 1993. 

  17. Hirano, Keisuke, and Jack R. Porter. “Impossibility results for nondifferentiable functionals.” Econometrica 80, no. 4 (2012): 1769-1790. 

  18. P. J. Bickel. E. L. Lehmann. “Unbiased Estimation in Convex Families.” Ann. Math. Statist. 40 (5) 1523 - 1535, October, 1969. https://doi.org/10.1214/aoms/1177697370 

  19. Gajek, L. (1995). Note on unbiased estimability of the larger of two mean values. Applicationes Mathematicae, 23(2), 239-245. 

  20. Rychlik, Tomasz. “A class of unbiased kernel estimates of a probability density function.” Applicationes Mathematicae 22, no. 4 (1995): 485-497. 

  21. Henderson, S.G., Glynn, P.W., “Nonexistence of a class of variate generation schemes”, Operations Research Letters 31 (2003). It is also believed that the paper’s Theorem 2 remains true even if $Q$ must be a polynomial.