Notes on Approximation Theory

This version of the document is dated 2026-06-01. Post an issue or comment on this document.

Peter Occil

The notes in this page generally relate to finding bounds on how close a polynomial is to a single-variable function on a compact interval.

The aim is to help find error bounds that are explicit, with no hidden constants and without introducing transcendental or trigonometric functions. If an error bound is explicit, it can be computed offline, without performing an approximation first, so that it can be known, for example, which degree polynomial to build in order to come close to a function with a given accuracy.

The mapping from a function to a function (in this case, from a single-variable function to a polynomial “close” to it) is called an operator, and operators involved in these bounds are often linear operators, whose behavior is relatively simple to examine.

Contents

Notation and Definitions

For definitions of continuous, derivative, convex, concave, Hölder continuous, and Lipschitz continuous, see the definitions section in “Supplemental Notes for Bernoulli Factory Algorithms”.

Bernstein Form and Bernstein Polynomials

Among the best known examples of linear operators are the Bernstein polynomials.

In this document, a polynomial $P(x)$ is written in Bernstein form of degree $n$ if it is written as—

\[P(x)=\sum_{k=0}^n a_k \frac{n!}{(k!)((n-k)!)} x^k (1-x)^{n-k},\]

where $0\le x\le 1$ and the real numbers $a_0, …, a_n$ are the polynomial’s Bernstein coefficients.3

The degree-$n$ Bernstein polynomial of an arbitrary function $f(x)$ has Bernstein coefficients $a_k = f(k/n)$. In general, this Bernstein polynomial differs from $f$ even if $f$ is a polynomial. In this document, the degree-$n$ Bernstein polynomial of $f$ is denoted $B_n(f)$.

$B_n(f)$ is a positive linear operator. It maps a bounded function on the closed unit interval to a polynomial of degree $n$ or less on that interval.

This page’s emphasis is on methods that produce polynomials in Bernstein form, ratios of such polynomials, or functions like $f(g(x))$ where $f$ and $g$ are such polynomials or ratios.

“Moments” of Linear Operators

To examine the approximation behavior of linear operators, it is helpful to find the so-called “moments” of those operators, that is, the functions they map certain functions to.

For a linear operator $L$, they are:

Because $L$ is linear, if $L(e_i) = e_i$ for each $i$ from 0 through $j$ ($j$ is zero or a positive integer), then:

Also, because $L$ is linear, the “moments” of degree up to $m$, say, lead to easy ways to find the mapping by $L$ of any polynomial of degree up to $m$, when the polynomial is written in “power” form.

Example: Let $f(x)$ be the polynomial $4x^3 - 6x^2 + 8x^1 - 10$. Then:

\[L(f) = 4L(e_3) - 6L(e_2) + 8L(e_1) - 10L(e_0).\]

“Moments” of Bernstein Polynomials

The following results deal with useful quantities when discussing the error in approximating a function by Bernstein polynomials.

Suppose a coin shows heads with probability $p$, and $n$ independent tosses of the coin are made, where $n$ is 1 or greater. Then the total number of heads $X$ follows a binomial distribution. The following are useful quantities of this distribution.

The following gives bounds on $M_{n,r}$; some results in approximation theory rely on bounds like these.6

Proposition 1: Let $r\ge 0$, and let $\sigma(r,t) = (r!)/(((r/2)!)t^{r/2})$. Then for real numbers $r$ and integers $n$ described in the following table, $M_{n, r}(p)\le \mu_{n,r}/n^{r/2}$, where $\mu_{n,r}$ is as given in the table.

If $r$… Then $\mu_{n,r}$ is…
Is an even integer. $\sigma(r,6)$, for positive $n$.
Is an even integer, but not greater than 44. $\sigma(r,8)$, for positive $n$.
Is 1. $1/2$, for positive $n$.
Is odd, and $3\le r\le 43$. $\sqrt{\sigma(r-1,8)\sigma(r+1,8)} = r^{1/2}(r-1)! / (2\cdot 8^{(r-1)/2}((r-1)/2)!)$, for $n\ge 2$.
Is odd and greater than 43. $\sqrt{\sigma(r-1,6)\sigma(r+1,6)}$, for $n\ge 2$.

Proof: The first row comes from a result of Adell and Cárdenas-Morales (2018)7. The second row is an improved result of the first, from Molteni (2022)8. The third row follows from Cheng (1983)9. The fourth and fifth rows follow from the first and second as well as that the absolute central moment for odd $r$ can be bounded for every integer $n\ge 2$, using Schwarz’s inequality (Weisstein)10 (see also Bojanić and Shisha 197511 for the case $r=4$). □

Taylor Expansion of Linear Operators

Continuous functions can be “unwrapped” into a Taylor expansion. The linear mapping of those functions also has a Taylor expansion of sorts, which is described next.

Let $f(\lambda)$ have a continuous $s$-th derivative on a compact interval, where $s$ is zero or a positive integer, and let $L(f)$ be a linear operator that maps continuous functions on that interval to functions of that kind. Then:

\[L(f)(\lambda) = L(R_s(f, \lambda)) + \sum_{i=0}^s L((e_1-\lambda)^i)(\lambda)\frac{f^{(i)}(\lambda)}{i!}, \tag{1}\]

where $R_s(f,\lambda)$ is the remainder after subtracting from $f$ the degree-$s$ Taylor polynomial of $f$ centered at $\lambda$. (See also Piţul (2007, proof of theorem 5.8)12.) $R_s(f,\lambda)$ is 0 if $f$ is a polynomial of degree $s$ or less.

If $L$ reproduces constants, so that $L(e_0)=1$, this becomes:

\[L(f)(\lambda) - f(\lambda) = L(R_s(f, \lambda)) + \sum_{i=1}^s L((e_1-\lambda)^i)(\lambda)\frac{f^{(i)}(\lambda)}{i!}.\tag{2}\]

If $L$ reproduces polynomials up to degree $s$, this even reduces to $L(f)(\lambda) - f(\lambda) = L(R_s(f, \lambda))$.

It can be seen from the expansions just given that finding upper bounds for $L_n(f)(\lambda)$ involves:

Meanwhile, bounds for the derivatives of $f$ (here, $f^{(i)}$) are often assumed to be known beforehand.

Results on Error Bounds

Some results on error bounds for certain classes of operators.

Bounds for General Positive Linear Operators

The following results give bounds that apply to large classes of positive linear operators. While these operators have relatively simple approximation behavior, in general they do not approximate functions with more than two continuous derivatives “better” than functions with only two.

In this section:

Lemma 1. Let $f(\lambda)$ be continuous on a compact interval, and let $L$ be a positive linear operator that maps continuous functions on that interval to functions of that kind and reproduces all constants (so that $L(e_0) = 1$ ). Then:

No. $\text{abs}(L(f)(\lambda)-f(\lambda))\le …$
1 $\tilde\omega_1(f, \tau_1)$.
2 $2 \omega_1(f, (\sigma_2)^{1/2})$.
3 $(1 + (\sigma_2)^{1/2}/h) \omega_1(f, h)$.
4 $(1 + (\sigma_2)/h^2) \omega_1(f, h)$.
5 (Use ineq. 3 if $h<(\sigma_2)^{1/2}$, or ineq. 4 otherwise.)
6 $\tilde\omega_1(f, (\sigma_2)^{1/2})$.

Proof: Inequality 1 follows from a result of Gonska and Meier (1985, theorem 3.1)14. Inequality 2 follows from a result of Shisha and Mond (1968, theorem 1)15; inequality 4 comes from another result in the same paper (see also Mamedov (1959)16); inequality 3 follows from a result of Mond (1978)17; inequality 5, a result of Păltănea (2004, corollary 1.2.2)18; inequality 6, a result of Peetre (1969)19 (also mentioned in Gonska (1998/2023)20, which has an extensive discussion on error bounds for linear operators). □

Remark 1: The moduli of continuity $\omega_1(f, \delta)$ and $\tilde\omega_1(f, \delta)$ offer concise ways to express different error bounds depending on how “regular” $f$ is. Properties of these moduli are given in Sevy 199121, sec. 2.0.2; Gonska 198522. For example, let $f$ be continuous on a compact interval. Then:

Example: Let $f$ and $L$ be as in Lemma 1. If $f$ is Lipschitz continuous with Lipschitz constant $M$ or less, or has a continuous derivative with maximum absolute value $M$ or less, $\text{abs}(L(f)(\lambda)-f(\lambda))\le M (\sigma_2)^{1/2}$; this follows from the combination of Remark 1 and inequality 6 of Lemma 1.

Lemma 2. Let $f(\lambda)$ be continuous on a compact interval, and let $L$ be a positive linear operator that maps continuous functions on that interval to functions of that kind and reproduces all polynomials up to degree 1 (constants and linear functions). Let $h>0$ be a real number. Then:

No. If $f$ … Then $\text{abs}(L(f)(\lambda)-f(\lambda))\le … $
1 Has a continuous derivative. $((h+2)^2/(8h))\cdot \omega_1(f^{(1)}, h\cdot\sqrt{\sigma_2}) \cdot\sqrt{\sigma_2}$.
2 Has a continuous derivative. $\frac{1}{2}(\sigma_2)^{1/2} \tilde\omega_1(f^{(1)}, (\sigma_2)^{1/2})$.
3 Has a Hölder-continuous derivative with Hölder exponent $\alpha$ ($0\lt\alpha\le 1$) and Hölder constant $M$ or less. $\frac{M}{2}(\sigma_2)^{(1+\alpha)/2}$.
4 Has a Lipschitz-continuous derivative with Lipschitz constant $M$ or less, or has a continuous second derivative with maximum absolute value $M$ or less. $\frac{M}{2} (\sigma_2)$.

Proof: Inequality 1 is a special case of Theorem 2.19 (in conjunction with Remark 2.21) of Anastassiou (1985), with the interval $[a, b]$, $m=1$ (since the function is defined on all of $[a, b]$), $r=h$, and $x_0$ equal to $\lambda$. Inequality 2 follows from a result of Gonska and Meier (1985, theorem 4.1)14; see also Păltănea and Dimitriu (2016, remark 3)24. Inequalities 3 and 4 follow from inequality 2 because of Remark 1. □

Lemma 3. Let $f(\lambda)$ have a continuous $k$-th derivative on a compact interval, and let $L$ be a positive linear operator that maps continuous functions on that interval to functions of that kind. Let $h>0$ be a real number. Then $L(f)(\lambda) = L(Q_k(f,\lambda))(\lambda) + L(R_k(f,\lambda))(\lambda)$, where:

\[\text{abs}(L(Q_k(f,\lambda)) - f(\lambda))\le \left(\sum_{i=0}^k \frac{\max(\text{abs}(f^{(i)})) \text{abs}(\sigma_i)}{i!}\right),\] \[\text{abs}(L(R_k(f,\lambda)))\le\left(\frac{\tau_k}{k!}+\frac{\tau_{k+1}}{(k+1)!\cdot h}\right)\cdot\omega_1(f^{(k)}, h),\] \[\text{and }\text{abs}(L(R_k(f,\lambda)))\le\max\left(\frac{\tau_k}{k!}, \frac{\tau_{k+1}}{(k+1)!\cdot 2h}\right)\cdot\tilde\omega_1(f^{(k)}, 2h),\] \[\text{and }\text{abs}(L(R_k(f,\lambda)))\le\frac{\tau_k}{k!}\cdot\tilde\omega_1(f^{(k)}, \frac{\tau_{k+1}}{(k+1)\tau_k}),\]

and where:

Proof: The second to fourth bounds given relate to the Taylor remainder. The second bound comes from Păltănea and Smuc (2019, Theorem 1)25; the third bound comes from corollary 3.2 of Dimitriu (2010)26 and Brudnyĭ’s lemma; and the fourth bound follows from the second with $h=\tau_{k+1}/(2(k+1)\tau_k)$ and comes from Gonska et al. (2006)27, where the compact interval assumed was the closed unit interval; see also Gonska (2007)28, Piţul (2007)12. See also Anastassiou (1985, theorem 2.31)29.30

Lemma 4. Let $k$ be zero or a positive integer. Let $f(\lambda)$—

  1. have a Lipschitz-continuous $k$-th derivative on a compact interval, with Lipschitz constant $M$ or less, or
  2. have a continuous $(k+1)$-th derivative on that interval, with maximum absolute value $M$ or less,

and let $L$ be a positive linear operator that maps continuous functions on that interval to functions of that kind. Then $\text{abs}(L(R_k(f,\lambda)))\le M \tau_{k+1}/((k+1)!)$, where $R_k(f,\lambda)$ is as in Lemma 3.

Proof: Follows from the third bound for $L(R_k(f,\lambda))$ in Lemma 3 in the same manner as inequality 10 of Lemma 2, using Remark 1. □

The following two lemmas are more general, but not as easy to use.

Lemma 4A (special case of Theorem 3.4 in Gonska (1998/2023)20). Let $f(\lambda)$ be continuous on a compact interval or a closed subset thereof, and let $L$ be a positive linear operator that maps continuous functions on $f$’s domain to bounded functions on that domain. Let $h>0$ be a real number. Then:

\[\text{abs}(L(f)(\lambda)-f(\lambda))\le\max(\Vert L(e_0)\Vert ,L(\text{abs}(e_1-\lambda))(\lambda))\cdot\tilde\omega_1(f,h)\] \[+\text{abs}(L(e_0)(\lambda)-1)\cdot\text{abs}(f(\lambda))\] \[\le(\Vert L(e_0)\Vert +L(\text{abs}(e_1-\lambda))(\lambda))\cdot\tilde\omega_1(f,h)+\text{abs}(L(e_0)(\lambda)-1)\cdot\text{abs}(f(\lambda)),\]

where $\Vert L(e_0)\Vert$ is the maximum of $L(e_0)$ over $f$’s domain.

Lemma 4B (special case of Theorem 4.7 in Gonska (1998/2023)20). Let $f(\lambda)$ be continuous on a compact interval, and let $L$ be a positive linear operator that maps bounded functions on $f$’s domain to bounded functions on that domain. Let $h>0$ be a real number. Then:

\[\text{abs}(L(f)(\lambda)-f(\lambda))\le(L(e_0)(\lambda)+L(\text{ceil}((e_1-\lambda)/h-1))(\lambda))\cdot\omega_1(f,h)\] \[+\text{abs}(L(e_0)(\lambda)-1)\cdot\text{abs}(f(\lambda)),\] \[\text{abs}(L(f)(\lambda)-f(\lambda))\le(L(e_0)(\lambda)+L(\text{abs}(e_1-\lambda))(\lambda)/h)\cdot\omega_1(f,h)\] \[+\text{abs}(L(e_0)(\lambda)-1)\cdot\text{abs}(f(\lambda)).\]

The second inequality also works if $L$ maps from continuous functions instead of from bounded functions.

Notes:

  1. Using Lemma 4A requires calculating $L(e_0)$, $\Vert L(e_0)\Vert$, and $L(\text{abs}(e_1-\lambda))$, or finding upper bounds for these.
  2. Using Lemma 4B requires calculating $L(e_0)$ and either $L(\text{abs}(e_1-\lambda))$ or $L(\text{ceil}((e_1-\lambda)/h-1))$, or finding upper bounds for these values.
  3. Unlike Lemma 4A, Lemma 4B is not guaranteed to work if $f$’s domain is a closed subset of an interval (see Remark 2.5 in Gonska (1998/2023)20.

The following lemma adapts the previous lemmas to the setting of random variables.

Lemma 5. Let $f(\lambda)$ be continuous on a compact interval, and let $Y$ be a random variable taking only values in that interval. Then Lemmas 1 through 4A apply as appropriate to $f$ meeting their conditions, with $L(f)=\mathbb{E}[f(Y)]$ and $\lambda =\mathbb{E}[Y]$.

Proof: With these assumptions there is a positive linear operator $L(f) = \mathbb{E}[f(Y)]$ for $Y$ and $f$, according to Theorem 3.1.1 of Frantz (1984)31, letting $x_o = \lambda$. Then $L(e_0)$ = $\mathbb{E}[e_0(Y)]$ = $\mathbb{E}[1]$ = 1 regardless of $Y$, and $L(e_1)$ = $\mathbb{E}[e_1(Y)]$ = $\mathbb{E}[Y]$ = $\lambda$, so $L$ reproduces all polynomials of degree up to 1. □

Bounds for Remainder of Bernstein Polynomials

The following results specialize the previous ones to the case of Bernstein polynomials $B_n$. They apply to the Bernstein polynomial of the result of subtracting a Taylor polynomial from a function, and are useful when a linear operator contains $B_n(f)$ in its definition and reproduces all polynomials of degree $r$ or less.

Lemma 6: Let $k$ be zero or a positive integer. Let $f(\lambda)$—

  1. have a Lipschitz-continuous $k$-th derivative on the closed unit interval, with Lipschitz constant $M$ or less, or
  2. have a continuous $(k+1)$-th derivative on that interval, with maximum absolute value $M$ or less.

Then the following bound holds true: $\text{abs}(B_n(R_k(f, \lambda)) \le (M \mu_{k+1})/ ( ((k+1)!) n^{(k+1)/2})$ for every integer $n\ge 2$ (and also for $n=1$ if $k$ is odd), where $\mu_k$ is as defined in Proposition 1.

Proof: Follows from Lemma 4, with $L(f)=B_n(f)$, and from Proposition 1. □

Corollary 1: Let $f(\lambda)$, $k$, and $M$ be as in Lemma 6. Then, for every $0\le\lambda\le 1$:

If $k$ is: Then $\text{abs}(B_n(R_k(f, \lambda))) \le$ …
0. $M(1/2)/n^{1/2}$ for every integer $n\ge 1$.
1. $M(1/8)/n = 0.125M/n$ for every integer $n\ge 1$.
2. $M(\sqrt{3}/48)/n^{3/2} < 0.3609M/n^{3/2}$ for every integer $n\ge 2$.
3. $M(1/128)/n^{2} = 0.0078125M/n^{2}$ for every integer $n\ge 1$.
4. $M(\sqrt{5}/1280)/n^{5/2} < 0.001747/n^{5/2}$ for every integer $n\ge 2$.
5. $M(1/3072)/n^{3} < 0.0003256/n^{3}$ for every integer $n\ge 1$.

Bounds for General Linear Operators

The results in this section give error bounds for important classes of linear operators (not necessarily positive ones). But, in general, they are harder to use than the ones for positive linear operators, because more has to be computed for nonpositive operators than just the “moments”.

Roughly speaking, the integral of $f(\lambda)$ on the compact interval $[a,b]$ is the “area under the graph” of that function when the function is restricted to that interval. If $f$ is continuous there, this is the value that—

\[\frac{1}{n} \sum_{i=1}^n f\left(a+(b-a)(i-\frac{1}{2})/n\right),\tag{2A}\]

approaches as $n$ gets larger and larger. The integral of $f(\lambda)$ on $[a,b]$ is denoted $\int_a^b f(\lambda) d\lambda$.

The next two lemmas rely on the so-called Peano kernel theorem, which was originally developed to assess the error in estimating the integral of a function from samples of it32 (for more on this theory, see Brass and Förster 199833; Waldron 199934).

Lemma 7. Let $k$ be zero or a positive integer, let $f(\lambda)$ have a continuous $(k+1)$-th derivative on the compact interval $[a, b]$. Let $C$ and $c$ be real numbers such that $c\le f^{(k+1)}\le C$ over that interval. Let $L$ be a bounded linear operator that—

Then:

\[\text{abs}(LF(f)(\lambda)) = \text{abs}(f(\lambda) - L(f)(\lambda))\] \[\le \frac{C - c}{2} \int_a^b \text{abs}\left(LF((e_1-t)_+^k)(\lambda))/(k!)\right) dt\tag{3}\] \[= \frac{C - c}{2(k!)} \int_a^b \text{abs}\left(LF((e_1-t)_+^k)(\lambda))\right) dt,\tag{4}\] \[\text{abs}(LF(f)(\lambda))\le \frac{\Vert f^{(k+1)}\Vert}{k!} \int_a^b \text{abs}\left(LF((e_1-t)_+^k)(\lambda))\right) dt,\tag{5}\]

where $LF(f) = f - L(f)$, and the notation $\lgroup x\rgroup^k_+$ is as follows. If $k\gt 0$, this equals $((x+\text{abs}(x))/2)^k$, or $\max(0, x)^k$, and if $k$ is 0, this equals either 1 if $x\ge 0$ or 0 otherwise.

Formulas (3) and (4) are because, in this case, the operator $LF$ equals 0 for every polynomial of degree $k$ or less, so that $LF(e_i)=0$ whenever $0\le i\le k$, so that $LF$ satisfies theorem 3 of Gavrea and Ivan (2015)35. Formula (5) is an easy consequence of (4); see also Brass and Förster (1998, theorem 5)33.36

Note: The operator—

\[\frac{LF(\lgroup e_1-t\rgroup_+^k)}{k!} = \frac{\lgroup e_1-t\rgroup_+^k - L(\lgroup e_1-t\rgroup_+^k)}{k!},\]

where $k\ge 0$, is called the Peano kernel of order $k+1$ of $LF$ (Brass and Förster 1998)33), with a fixed value of $t$ such that $a\le t\le b$. But finding a “closed form” of Peano kernels is relatively hard compared to “raw moments” and “central moments”. Luckily, only an upper bound of the Peano kernel is needed to use the formulas in this lemma.

Lemma 8 (see Theorem 4 of Gavrea and Ivan (2015)35). With the assumptions in Lemma 7, if $LF$ is the difference of two positive linear operators $LA$ and $LB$, so that $LF(f)=LA(f)-LB(f)$ (or $L(f)=f-LA(f)+LB(f)$), and $LA$ and $LB$ both map continuous functions on that interval to functions of that kind, then:

\[\text{abs}(L(f)(\lambda) - f(\lambda))\le \frac{C - c}{(k+1)!} \text{abs}(LA(e_{k+1})(\lambda)),\] \[\text{abs}(L(f)(\lambda) - f(\lambda))\le \frac{2\Vert f^{(k+1)}\Vert}{(k+1)!} \text{abs}(LA(e_{k+1})(\lambda)).\]

Lemma 9 (special case of Theorem 3.2 in Gonska (1998/2023)20). Let $f(\lambda)$ be continuous on a compact interval or a closed subset thereof, and let $L$ be a bounded linear operator that maps continuous functions on $f$’s domain to bounded functions on that domain. Let $h>0$ be a real number. Then for each $\lambda$ in $f$’s domain:

\[\text{abs}(L(f)(\lambda)-f(\lambda))\le\max((\Vert L\Vert+\alpha)/2, (\gamma(\beta(\lambda)-L(e_0)(\lambda))+\text{abs}(L(\text{abs}(e_1-\lambda))(\lambda)))/h)\] \[\cdot\tilde\omega_1(f,h)+\text{abs}(L(e_0)(\lambda)-1)\cdot\text{abs}(f(\lambda)),\]

where $\Vert L\Vert$ is the operator norm of $L$, $\alpha$ is the maximum of $\text{abs}(L(e_0))$ over $f$’s domain; $\beta(\lambda)$ is the maximum of $\text{abs}(L(g)(\lambda))$ over all continuous functions $g$ on $f$’s domain with a maximum absolute value of 1 or less; and $\gamma$ is the difference between the highest and lowest value of $\lambda$ in $f$’s domain.

Lemma 10. With the assumptions in Lemma 9, if $L$ reproduces constants, so that $L(e_0)=1$, the inequality in that lemma becomes:

\[\text{abs}(L(f)(\lambda)-f(\lambda))\le\max((1+\Vert L\Vert)/2, (\gamma(\beta(\lambda)-1)+\text{abs}(L(\text{abs}(e_1-\lambda))(\lambda)))/h)\cdot\tilde\omega_1(f,h).\]

Lemma 11 (special case of Theorem 4.4 and Corollary 4.5 in Gonska (1998/2023)20). Let $f(\lambda)$ be continuous on a compact interval $[a, b]$, and let $L$ be a bounded linear operator that maps continuous functions on $f$’s domain to bounded functions on that domain. Let $h>0$ be a real number. Then for each $\lambda$ in $f$’s domain:

\[\text{abs}(L(f)(\lambda)-f(\lambda))\le\big((\beta(\lambda)-\text{abs}(L(e_0)(\lambda)))\cdot(1+(b-a)/h)\] \[+\text{abs}(L(e_0)(\lambda))+\text{abs}(L(\text{abs}(e_1-\lambda))(\lambda))/h\big)\cdot\omega_1(f,h)\] \[+\text{abs}(L(e_0)(\lambda)-1)\cdot\text{abs}(f(\lambda)),\]

where $\beta(\lambda)$ is as in Lemma 9.

Lemma 12. With the assumptions in Lemma 11, if $L$ reproduces constants, so that $L(e_0)=1$, the inequality in that lemma becomes:

\[\text{abs}(L(f)(\lambda)-f(\lambda))\le\big((\beta(\lambda)-1)\cdot(1+(b-a)/h)\] \[+1+\text{abs}(L(\text{abs}(e_1-\lambda))(\lambda))/h\big)\cdot\omega_1(f,h)\]

Whitney’s Inequality on Polynomial Errors

The following inequality gives a bound on the “best possible” error that a polynomial of degree $n$ can achieve in approximating a function.

Let $n$ be zero or a positive integer, let $f(\lambda)$ be continuous on a compact interval $[a, b]$, and let $P$ be a polynomial of degree $n$ or less with the least maximum absolute difference between $f$ and the polynomial on that interval. Then the error of $P$ in approximating $f$ is bounded as follows (see Babenko and Kryakin 201937):

\[\Vert f-P\Vert \le W \cdot \omega_{n+1}(f,\frac{b-a}{n+1}),\]

where—

Using properties of moduli of continuity (see Sevy 199121, sec. 2.0.2; Gonska 198522), if $f$ has a continuous $(n+1)$-th derivative on $[a, b]$:

\[\Vert f-P\Vert \le W \cdot \left(\frac{b-a}{n+1}\right)^{n+1}\Vert f^{(n+1)}\Vert,\]

and if $f$ has a continuous $n$-th derivative on that interval:

\[\Vert f-P\Vert \le W \cdot \left(\frac{b-a}{n+1}\right)^n\omega_1(f^{(n)}, \frac{b-a}{n+1}).\]

Another Inequality on Polynomial Errors

Like Whitney’s inequality, the following gives a bound on the “best possible” error between a polynomial and a function.

Let $n$ be zero or a positive integer, let $f(\lambda)$ have a continuous $(n+1)$-th derivative on the compact interval $[-1, 1]$,39 and let $P$ be a polynomial of degree $n$ or less with the least maximum absolute difference between $f$ and the polynomial on that interval. Then the error of $P$ in approximating $f$ is bounded as follows (Phillips 2003, theorem 2.4.6):40

\[\Vert f-P\Vert \le\frac{1}{2^n}\frac{\Vert f^{(n+1)}\Vert }{(n+1)!}.\]

Lebesgue Inequality for Certain Linear Operators

Let $f(\lambda)$ be a continuous function on a compact interval. Let $L$ be a linear operator that—

Then the following error bound (also known as Lebesgue’s lemma or the Lebesgue inequality) holds true for every function $P$ mapped to by $L$:

\[\text{abs}(L(f)(x) - f(x))\le(1+\Vert L\Vert )\cdot\max_t(\text{abs}(f(t)-P(t))),\]

where $\Vert L\Vert$ is the operator norm of $L$ (see also DeVore and Lorentz (1993, p. 30; ch. 5)43, Cheney (1996, chapter 6)44, Powell (1981, theorem 3.1)45).

If every function mapped to by $L$ is a polynomial, though, this error bound will generally be crude or trivial unless $L_n$ are nonpositive operators. Indeed, the only positive linear operator $L$ that reproduces all polynomials up to degree 2 is the identity operator $I(f)=f$.46

Examples:

  1. Let $f$ have a continuous third derivative on the closed unit interval. Combining the previous inequality with the Whitney-type inequalities given earlier leads to the following error bound for linear operators $L$ that map continuous functions on that interval to polynomials up to degree 2 and reproduce all polynomials up to degree 2:

    \[\text{abs}(L(f)(x) - f(x))\le(1+\Vert L\Vert )\cdot 1\cdot \left(\frac{1}{3}\right)^{3}\Vert f^{(3)}\Vert\] \[= (1+\Vert L\Vert )\Vert f^{(3)}\Vert /27.\]
  2. Let $f$ again have a continuous third derivative on the closed unit interval. Suppose $L_n$ is a spline operator that maps to a continuous function (a spline) that equals a degree-2 polynomial at each of the subintervals $[0,1/n]$, $[1/n, 2/n]$, …, $[(n-1)/n, n/n]$ (for example, Sablonnière 200747). On each of these subintervals, $L_n(f)$ is always a polynomial up to degree 2 and $L_n$ reproduces each polynomial up to degree 2. By contrast, on the whole closed unit interval, $L_n(f)$ is generally not a polynomial. Thus, the inequality in this section holds true for every degree-2 polynomial on each subinterval (as opposed to the whole closed unit interval). In turn, for each subinterval the Whitney-type inequality reads the following, since each subinterval has the same length:

    \[\Vert f-P\Vert \le W \cdot \left(\frac{b-a}{3}\right)^{3}\max(\text{abs}(f^{(3)}))\] \[= 1\cdot\left(\frac{1}{n}\frac{1}{3}\right)^3\max(\text{abs}(f^{(3)}))=\frac{1}{27n^3}\max(\text{abs}(f^{(3)})),\]

    where the maximums are taken over the corresponding subinterval. Thus, the error bound for $L_n$ reads $\text{abs}(L_n(f)(x) - f(x))$ $\le (1+\Vert L_n\Vert )\frac{1}{27n^3}\Vert f^{(3)}\Vert$.48

  3. The Bernstein polynomial $B_n$ maps continuous functions to polynomials up to degree $n$, but if $n$ is 2 or greater it is not idempotent because it does not reproduce all polynomials up to degree $n$. Thus, this inequality does not apply to Bernstein polynomials of degree 2 or greater.

  4. The identity operator $I(f)=f$ has operator norm 1 and maps every function to itself. For this operator, the inequality becomes:

    \[\text{abs}(I(f)(x) - f(x))\le(1+\Vert I\Vert)\cdot 0 = (1 + 1)\cdot 0 = 0.\]

Bounds for Certain Nonlinear Operators

The following comes from a result in Bede and Gal (2010)49; see also Bede et al. (2009)50.

Let $f(\lambda)$ be continuous, bounded, and nonnegative on an interval. Let $L$ be an operator that maps functions of that kind to functions of that kind and also has the following properties:

  1. (Monotone.) For every pair of allowed functions $g$ and $h$, if $g\le h$, then $L(g)\le L(h)$.
  2. (Subadditive.) For every pair of allowed functions $g$ and $h$, $L(g+h)\le L(g)+L(h)$.
  3. (Positively homogeneous.) $xL(g)=L(xg)$ for every allowed function $g$ and every $x\ge 0$.

If $L(e_0)=1$, then for every $h>0$:

\[\text{abs}(f(x)-L(f)(x))\le(1+L(\text{abs}(e_0-x))(x)/h)\cdot\omega_1(f, h),\]

provided $L(\text{abs}(e_0-x))(x)$ (the “absolute moment” of $L$) exists (and is finite or infinite).

Notes: An operator meeting conditions 2 and 3 is also called a sublinear operator. Every linear operator is also sublinear. A linear operator is monotone if and only if it is positive. For more on nonlinear operators, see Gal and Niculescu (2023)51.

Example

This example shows how to find a linear operator’s bounds.

Let $L_n(f)$ be a linear operator inspired by a conjecture I have on polynomial approximation. It is described as follows:

\[L_n(f)(\lambda) = \sum_{i=0}^n \left( W_{2n}\left(f\right)\left(\frac{k}{2n}\right) - W_n\left(f\right)\left(\frac{i}{n}\right)\right)\sigma_{n,k,i}\] \[=\mathbb{E}\left[W_{2n}\left(f\right)\left(\frac{k}{2n}\right) - W_n\left(f\right)\left(\frac{X_k}{n}\right)\right],\]

where:

$L_n$ and $W_n$ are generally nonpositive operators. As an example, take $W_n=2f-B_n(f)$. Then $B_n(W_n(f))$ is a linear operator that is the iterated Boolean sum of degree-$n$ Bernstein polynomials, with one iteration; see Güntürk and Li (2021a, Theorem 5)53. That paper, among others (for example, Micchelli 197354), showed that this operator approaches $f$ at the rate $O(1/n^{3/2})$ if $f$ has a continuous third derivative. (“$O(1/n^{3/2})$” means the error is no greater than a constant times $1/n^{3/2}$ for all values of $n$.)

With this choice of $W_n$, $L_n$ becomes:

\[L_n(f)(\lambda) = \sum_{i=0}^n\left((2f\left(\frac{k}{2n}\right) - B_{2n}(f)\left(\frac{k}{2n}\right)) - (2f\left(\frac{i}{n}\right) - B_n(f)\left(\frac{i}{n}\right))\right) \sigma_{n,k,i}\] \[= \mathbb{E}\left[(2f\left(\frac{k}{2n}\right) - B_{2n}(f)\left(\frac{k}{2n}\right)) - (2f\left(\frac{X_k}{n}\right) - B_n(f)\left(\frac{X_k}{n}\right))\right]\] \[= \sum_{i=0}^n\left((2f\left(\frac{k}{2n}\right) + B_{n}(f)\left(\frac{i}{n}\right))\right)\sigma_{n,k,i} - \sum_{i=0}^n \left((2f\left(\frac{i}{n}\right) + B_{2n}(f)\left(\frac{k}{n}\right))\right) \sigma_{n,k,i}\] \[= LA_n(f)(\lambda) - LB_n(f)(\lambda).\]

Here, $LA_n$ and $LB_n$ are positive linear operators, making it easier to assess their approximation properties.

It will be shown that, if $f$ has a continuous third derivative, the rate of $L_n$ towards zero is $O(M/n^{3/2})$, where $M$ is the maximum absolute value of $f$ and its derivatives up to the third derivative. The proof of this relies on exact expressions of $L_n$’s “raw moments” and “central moments”, and those for the combined operator $(LA_n+LB_n)$.

The following are some of these values and those for related operators:

To find values like those just listed, it is useful to calculate raw moments (Wang et al. 2023)55 and central moments (Weisstein)56 of hypergeometric random variables (such as $X_k$). Indeed, if $g(y)=W_{2n}(e_r;k/(2n))-W_n(e_r;y)$ is a polynomial in $y$ of degree $r$ or less, then $L_n(e_r)$ can be found using a Taylor expansion, namely as—

\[L_n(e_r) = \sum_{i=0}^r \mathbb{E}[(X_k/n-\mathbb{E}[X_k/n])^i]\frac{g^{(i)}(\mathbb{E}[X_k/n])}{i!}\] \[= \sum_{i=0}^r \frac{\mathbb{E}[(X_k-\mathbb{E}[X_k])^i]}{n^i}\frac{g^{(i)}(k/(2n))}{i!},\]

where the derivatives are taken with respect to $y$, and where $\mathbb{E}[(X_k-\mathbb{E}[X_k])^i]$ is the $i$-th central moment of $X_k$.

The first step is to find the Taylor expansion of $L_n(f)(\lambda)$. Given that $L_n((e_1-x)^0)(x)$ = $L_n((e_1-x)^1)(x)$ = 0, this becomes:

\[L_n(f)(\lambda) = L_n(R_3(f, \lambda)) + \sum_{i=2}^3 L_n((e_1-\lambda)^i)(\lambda)\frac{f^{(i)}(\lambda)}{i!},\] \[\text{abs}(L_n(f)(\lambda)) \le \Vert L_n(R_3(f, \lambda))\Vert + \Vert L_n((e_1-\lambda)^2)\Vert\cdot\Vert f^{(2)}\Vert /2\] \[+ \Vert L_n((e_1-\lambda)^3)\Vert\cdot\Vert f^{(3)}\Vert /6.\]

The function $\text{abs}(L_n((e_1-x)^3)(x))$ has its maximum at $x=1/2-\sqrt{3}/6$; and $\text{abs}(L_n((e_1-x)^2)(x))$ has its maximum at $x=1/2$, so:

\[\text{abs}(L_n(f)(\lambda)) \le \Vert L_n(R_3(f, \lambda))\Vert + \text{abs}(\frac{3\lambda(\lambda - 1)}{2n(2n-1)})\Vert f^{(2)}\Vert /2\] \[+ \Vert L_n((e_1-\lambda)^3)\Vert\cdot\Vert f^{(3)}\Vert /6\] \[\le \Vert L_n(R_3(f, \lambda))\Vert + \frac{3}{8n(2n-1)}\Vert f^{(2)}\Vert /2\] \[+ \frac{\sqrt{3} (6 n - 5)}{24 n^{2} (2 n - 1)}\Vert f^{(3)}\Vert /6.\]

Meanwhile the remainder is estimated as follows, using the proof of corollary 2.3 of Gonska et al. (2006)4:

\[\Vert L_n(R(f, \lambda))\Vert \le \frac{1}{6} \Vert f^{(3)}\Vert\cdot\Vert (LA_n+LB_n)(\text{abs}(e_1-\lambda)^3)\Vert .\]

In turn, using Schwarz’s inequality (see proof of the same paper’s corollary 2.1):

\[\Vert (LA_n+LB_n)(\text{abs}(e_1-\lambda)^3)\Vert \le (\Vert (LA_n+LB_n)((e_1-\lambda)^4)\Vert )^{1/2}\] \[\times (\Vert (LA_n+LB_n)((e_1-\lambda)^2)\Vert )^{1/2} \le \frac{3\sqrt{3}}{8n^{3/2}}.\]

(The expression in the middle takes its maximum at $\lambda = 1/2$; the right-hand side is an upper bound of that expression for all positive integers $n$.) Altogether:

\[\Vert L_n(f)\Vert \le \frac{3}{8n(2n-1)}\frac{1}{2}\Vert f^{(2)}\Vert\] \[+ \left(\frac{3\sqrt{3}}{8n^{3/2}} + \frac{\sqrt{3} (6 n - 5)}{24 n^{2} (2 n - 1)}\right)\frac{1}{6}\Vert f^{(3)}\Vert = LC_n(f)\] \[\le 0.1875 \frac{\Vert f^{(2)}\Vert }{n^{3/2}} + \frac{5\sqrt{3}}{72} \frac{\Vert f^{(3)}\Vert }{n^{3/2}} \le \frac{0.3078 M}{n^{3/2}} = O(1/n^{3/2}).\]

If $n\ge 2$ is an integer, $LC_n(f)\le 0.2165 M/n^{3/2}$.

Example: An Interesting Linear Operator

For a continuous function $f$ on the closed unit interval and for nonnegative integers $m$ and $n$, let $H_{n,m}$ be a linear operator as follows:

\[H_{n,m}(f)=B_n(f) + \text{Lag}_m(f) - B_n(\text{Lag}_m(f)),\]

where $B_n$ is the degree-$n$ Bernstein polynomial (see “Bernstein Form and Bernstein Polynomials”, earlier) and $\text{Lag}_m$ is the polynomial of degree up to $m$ that equals $f$ at “$m+1$ distinct points on” the closed unit interval. This operator was mentioned in Remark 2 of Gavrea and Ivan (2018)57, but appears not to have been studied elsewhere.

It is known that $Lag_m$ is a linear operator and reproduces all polynomials of degree $m$ or less, so that $Lag_m(e_i) = e_i$ whenever $0\le i\le m$ is an integer. Thus, if $f$ is such a polynomial, $B_n(f)=B_n(Lag_m(f))$ and therefore $H_{n,m}(f)$ = $Lag_m(f)=f$, and therefore $H_{n,m}(e_i)=e_i$ whenever $0\le i\le m$ is an integer.

(The foregoing sentence would remain true if $B_n$ were replaced with any other operator mapping to and from the same functions.)

Because $H_{n,m}$ is linear and reproduces all polynomials up to degree $m$, the following holds if $f$ has a continuous $m$-th derivative:

\[H_{n,m}(f)(\lambda) - f(\lambda) = H_{n,m}(R_m(f, \lambda))(\lambda)\] \[=B_n(R_m(f,\lambda)) + \text{Lag}_m(R_m(f,\lambda)) - B_n(\text{Lag}_m(R_m(f,\lambda))).\]

With the help of Lemma 6, the following holds if $n$ is also 2 or greater and $m$ is a positive integer:

\[\Vert H_{n,m}(f)(\lambda)\Vert \le \frac{\Vert f^{(m)}\Vert \mu_{m}}{ (m!) n^{m/2}} + \Vert \text{Lag}_m(R_m(f,\lambda)) - B_n(\text{Lag}_m(R_m(f,\lambda)))\Vert\]

where $\mu_r$ is as in Proposition 1.

It will be helpful to estimate the second derivative of $\text{Lag}_m(f)$.

Given that that function is a polynomial of degree $m$ or less, this can be estimated as:

\[\text{abs}(\text{Lag}_m(f)^{(2)}(\lambda))\le \Vert \text{Lag}_m\Vert\cdot\Vert f\Vert\cdot M(m),\tag{E}\]

where:

Lemma: Let $p(\lambda)=c_0 \lambda^0 + … + c_n \lambda^n$ be a polynomial on the interval $[a,b]$, where $c_0$, …, $c_n$ are real numbers and $c_n$ is not zero. If $\Vert p\Vert\le 1$, then $\Vert p^{(2)}\Vert\le (b-a)^2 n^2 (n^2-1)/3$.

An error bound for $H_{n,m}$ can also be written as:

\[H_{n,m}(f) - f=B_n(f) + \text{Lag}_m(f) - B_n(\text{Lag}_m(f)) - f\] \[=(B_n(f) - f) + (\text{Lag}_m(f) - B_n(\text{Lag}_m(f))),\] \[\text{abs}((H_{n,m}(f) - f)(\lambda))\le\text{abs}(B_n(f) - f) + \text{abs}(B_n(\text{Lag}_m(f)) - \text{Lag}_m(f)),\]

so now there are two error bounds to find: one for $f$ and the other for $\text{Lag}_m(f)$. And, if $f$ has a continuous second derivative, both have the same form:

\[B_n(g)\le \Vert f^{(2)}\Vert/(8n).\]

(This follows from Lorentz (1963)59 and the well-known fact that $\Vert g^{(2)}\Vert$, the maximum absolute value of $g$’s second derivative, is an upper bound of $g$’s first derivative’s smallest Lipschitz constant.)

Altogether, if $f$ has a continuous second derivative and $m$ is fixed:

\[\text{abs}((H_{n,m}(f) - f)(\lambda))\le \frac{\Vert f^{(2)}\Vert}{8n} + \frac{\Vert \text{Lag}_m\Vert\cdot\Vert f\Vert\cdot M(m)}{8n}.\]

It is suspected that, using $(E)$, the following is true if $f$ has a continuous $m$-th derivative and $m$ is a positive integer:

\[H_{n,m}(f)(\lambda) - f(\lambda) = B_n(R_m(f,\lambda)) + \lgroup\text{Lag}_m(f) - B_n(\text{Lag}_m(f))\rgroup,\]

so that, using Lemma 6 again, there is a better error bound:

\[\text{abs}((H_{n,m}(f) - f)(\lambda))\le \frac{\Vert f^{(m)}\Vert \mu_{m}}{ (m!) n^{m/2}} + (\Vert Lag_m\Vert\cdot\Vert f\Vert\cdot M(m)) \frac{\mu_{m}}{ (m!) n^{m/2}}.\]

Example: The Lorentz Operators

The Lorentz operators were introduced by Lorentz (1963)60 and studied by Holtz et al. (2011)61.

This section touches on the Lorentz operator of order 2, defined as—

\[Q_{n,2}(f)(\lambda)=B_n(f)(\lambda)-\frac{\lambda(1-\lambda)}{2n} B_n(f^{(2)})(\lambda)\] \[=B_n\left(f(e_1)-\frac{\lambda(1-\lambda)}{2n}f^{(2)}(e_1)\right)(\lambda),\]

where $0\le\lambda\le 1$.62

This operator is a nonpositive linear operator that maps continuous functions to polynomials of degree $n+2$. Because $Q_{n,2}(e_i) = e_i$ if $i$ is 0, 1, or 2, the operator reproduces all polynomials of degree 2 or less (for another proof, see Lemma 14 of Holtz et al. 201161). (The Lorentz operators of order 0 and 1 are simply the Bernstein polynomials.)

$Q_{n,2}$ can be bounded as follows:

\[\Vert Q_{n,2}(f)\Vert\le\Vert B\Vert+\frac{1}{8n}\Vert B\Vert\cdot\Vert f^{(2)}\Vert = 1+\Vert f^{(2)}\Vert/(8n),\]

where $\Vert B\Vert$ is the operator norm of the Bernstein polynomials.

However, this operator has an infinite operator norm and so is unbounded. Indeed, a function can have a maximum absolute value of 1 or less yet have an arbitrarily large and continuous second derivative. Take, for example, the family of functions $g_m(\lambda)=\cos(m\cos(\lambda\pi))$: as $m$ goes to infinity, $g_m^{(2)}(1/2)$ diverges; in turn, the expression $(B_n(g_m^{(2)})(1/2))\frac{1}{8n}$, based on a degree-$n$ Bernstein polynomial, likewise diverges, at least if $n$ is 7 or less.

Because $Q_{n,2}$ is unbounded, lemmas 7 and 8 (see “Bounds for General Linear Operators”), both of which relate to Peano kernels and apply only to bounded operators, can’t be used. For the same reason, $Q_{n,2}$ cannot be written as a difference of two positive linear operators that map continuous functions to continuous functions; every positive operator of that kind is bounded (Piţul 2007, Corollary 1.5)63.

Some of the “moments” of this operator are:

Let $LF = f - Q_{n,2}(f)$ be the error in approximating $f$ with $Q_{n,2}(f)$.

It is suspected that—

\[\Vert LF(f)\Vert \le \frac{C \Vert f^{(3)}\Vert}{n^{3/2}},\]

for some $C>0$, and it is of interest to find an explicit upper bound for $C$, especially a tight one.

Probabilistic Interpretations of Linear Operators

The Bernstein polynomials featured in a proof in 1912 of the result that any continuous function on a compact interval can be approximated as well as desired by polynomials (Bernstein 1912)64. That proof used probability theory. In a series of papers, Adell and De la Cal use probability theory to interpret a number of linear operators in addition to those polynomials (Adell and De la Cal 199665, 199566).

License

Any copyright to this page is released to the Public Domain. In case this is not possible, this page is also licensed under Creative Commons Zero.

Notes

  1. This term is used instead of “closed interval”, a term which can also encompass infinitely long intervals, which is not the intent of this document (Weisstein, Eric W. “Closed Interval.” From MathWorld–A Wolfram Resource. https://mathworld.wolfram.com/ClosedInterval.html). 

  2. A better term for positive operators is probably nonnegativity-preserving operators. 

  3. n! = 1*2*3*…*n is also known as n factorial; in this document, (0!) = 1.
    Summation notation, involving the Greek capital sigma (Σ), is a way to write the sum of one or more terms of similar form. For example, $\sum_{k=0}^n g(k)$ means $g(0)+g(1)+…+g(n)$, and $\sum_{k\ge 0} g(k)$ means $g(0)+g(1)+…$. 

  4. Gonska, Heiner, Paula Piƫul, and Ioan Raşa. “On differences of positive linear operators.” Carpathian Journal of Mathematics (2006): 65-78.  2

  5. Skorski, Maciej. “Handy formulas for binomial moments.” Modern Stochastics: Theory and Applications 12.1 (2024): 27-41. 

  6. It is also possible to bound the “absolute moment” as $M_{n,r}(p)\le C(r)(\max(1/n, (p(1-p)/n)^{1/2})^r$ or $M_{n,r}(p)\le D(r)(1/n + (p(1-p)/n)^{1/2})^r$ (G.G. Lorentz, “The degree of approximation by polynomials with positive coefficients”, 1966), but the constants $C(r)$ and $D(r)$ seem to be higher (and less favorable) than the $E(r)$ in $M_{n,r}(p)\le E(r)/n^{r/2}$. 

  7. Adell, J.A., Cárdenas-Morales, D., “Quantitative generalized Voronovskaja’s formulae for Bernstein polynomials”, Journal of Approximation Theory 231, July 2018. https://doi.org/10.1016/j.jat.2018.04.007 

  8. Molteni, Giuseppe. “Explicit bounds for even moments of Bernstein’s polynomials.” Journal of Approximation Theory 273 (2022): 105658. 

  9. Cheng, F., “On the rate of convergence of Bernstein polynomials of functions of bounded variation”, Journal of Approximation Theory 39 (1983). 

  10. Weisstein, Eric W. “Schwarz’s Inequality.” From MathWorld–A Wolfram Resource. https://mathworld.wolfram.com/SchwarzsInequality.html 

  11. R. Bojanić, O. Shisha, “Degree of $L^1$ approximation to integrable functions by modified Bernstein polynomials”, Journal of Approximation Theory 13, 66–72 (1975). 

  12. Piţul, P., “Evaluation of the Approximation Order by Positive Linear Operators”, dissertation, Universität Duisberg-Essen, 2007.  2

  13. I suspect that, whenever $L$ is a bounded linear operator that maps continuous functions on a compact interval to functions of that kind, $L$ can be written as a difference between two positive linear operators. But I have not seen a proof of that statement; Acu et al. (“Grüss-type and Ostrowski-type inequalities in approximation theory”, Ukr Math J 63, 843–864, 2011) give a similar statement but without proof. 

  14. Gonska, H.H., Meier, J., “On approximation by Bernstein-type operators: best constants”, Studia Sci. Math. Hungar. 22, 1987.  2

  15. Shisha, O., Mond. B, “The degree of convergence of linear positive operators”, 1968. 

  16. R. G. Mamedov, “On the order of approximation of functions by linear positive operators” (Russian), Dokl. Akad. Nauk SSSR 128 (1959). 

  17. Mond, B., “On the degree of approximation by linear positive operators”, Journal of Approximation Theory 18 (1976). 

  18. Păltănea, R., Approximation Theory Using Positive Linear Operators, Birkhäuser, 2004. https://doi.org/10.1007/978-1-4612-2058-9 

  19. Peetre, J., “On the connection between the theory of interpolation spaces and approximation theory”, in Approximation Theory, 1969. 

  20. Gonska, Heiner. “The rate of convergence of bounded linear processes on spaces of continuous functions.” Journal of Numerical Analysis and Approximation Theory 52.2 (2023): 182-232. https://doi.org/10.33993/jnaat522-1326  2 3 4 5 6

  21. Sevy, J., “Acceleration of convergence of sequences of simultaneous approximants”, dissertation, Drexel University, 1991. https://doi.org/10.17918/00010296  2

  22. H. H. Gonska, Quantitative Approximation in C(X), Habilitationschrift, Universität Duisburg, 1985.  2

  23. Peetre, J., “Exact interpolation theorems for Lipschitz-continuous functions”, Ricerche Mat. 18 (1969). 

  24. Păltănea, R, Dimitriu, M.T., “On some second order moduli of smoothness.” General Mathematics 24 (2016) 

  25. Păltănea, R., Smuc, M. “Sharp Estimates of Asymptotic Error of Approximation by General Positive Linear Operators in Terms of the First and the Second Moduli of Continuity”, Results in Mathematics 74, 70 (2019). https://doi.org/10.1007/s00025-019-0997-8 

  26. Dimitriu, M.T., “Estimates with optimal constants using Peetre’s K-functionals”, Carpathian Journal of Mathematics 26 (2010). 

  27. Gonska, Heiner, Paula Piţul, and Ioan Raşa. “On Peano’s form of the Taylor remainder, Voronovskaja’s theorem and the commutator of positive linear operators”. In Proceedings of the International Conference on Numerical Analysis and Approximation Theory, Cluj-Napoca. Romania, July 2006. 

  28. Gonska, Heiner. “On the degree of approximation in Voronovskaja’s theorem”, Studia Univ. Babeş-Bolyai, Math., September 2007. 

  29. Anastassiou, George A. “A study of positive linear operators by the method of moments, one-dimensional case.” Journal of Approximation Theory 45.3 (1985): 247-270. 

  30. The paper Cichoń et al., “On delta-method of moments and probabilistic sums”, ANALCO 2013, has very similar results, but they assume the function $f$ has a $k$-th derivative defined on an open interval (say, $0\lt\lambda\lt 1$), rather than a compact interval, making those results harder to use if $Y$ is a random variable that can take a value equal to either endpoint of the interval (in this example, 0 or 1). 

  31. Frantz, Deborah A. Summability methods, probability distributions, and associated positive linear operators. Lehigh University, 1984. 

  32. This kind of estimation is called quadrature or numerical integration, and methods for such estimation, such as the one given in (2A), are called quadrature rules

  33. Brass, H., Förster, KJ. (1998). On the Application of the Peano Representation of Linear Functionals in Numerical Analysis. In: Milovanović, G.V. (eds) Recent Progress in Inequalities. Mathematics and Its Applications, vol 430. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-9086-0_10  2 3

  34. Waldron, Shayne. “Refinements of the Peano kernel theorem.” Numerical functional analysis and optimization 20.1-2 (1999): 147-161. https://doi.org/10.1080/01630569908816885 

  35. Gavrea, I., Ivan, M., “A sharp estimate for the Peano error representation”, Applied Mathematics and Computation 252 (2015). https://doi.org/10.1016/j.amc.2014.12.017  2

  36. Note that for formulas (3) to (5), $(e_1-t)_+^0$ is discontinuous and so is not accepted by $LF$ and $L$ if they map from only continuous functions; thus the results in this section suppose both operators map from bounded functions for $k=0$. Brass and Förster 1998 adequately provides for the case $k=0$, but not Gavrea and Ivan 2015, unfortunately. 

  37. Babenko, Alexander G., and Yuriy V. Kryakin. “Special difference operators and the constants in the classical Jackson-type theorems.” Topics in Classical and Modern Analysis: In Memory of Yingkang Hu. Cham: Springer International Publishing, 2019. 35-46. 

  38. Jaskaran Singh Kaire and Andriy Prymak, “Whitney-type estimates for convex functions”, arXiv:2311.00912 (2023).  2

  39. It would be interesting to find a version of this inequality that works for any compact interval $[a, b]$. 

  40. Phillips, G.M., Interpolation and Approximation by Polynomials, Springer New York, NY, 2003. https://doi.org/10.1007/b97417 

  41. More generally, any family of functions that is a finite-dimensional linear subspace of continuous or bounded functions. A linear subspace of functions has the following property: if functions $f$ and $g$ are in the family, so is $f+g$, and if $f$ is in the family, so is $f\cdot\alpha$ for every real number $\alpha$. 

  42. This includes the case that $L$ reproduces all functions it maps to (for example, if $L$ maps to polynomials up to degree 5, it reproduces all such polynomials). 

  43. R.A. DeVore and G.G. Lorentz, Constructive Approximation, 1993. https://link.springer.com/book/9783540506270 

  44. E. W. Cheney, Introduction to Approximation Theory, 1998. 

  45. Powell, Michael James David. Approximation theory and methods. Cambridge University Press, 1981. 

  46. Guessab, A., Nouisser, O. & Schmeisser, G. Enhancement of the algebraic precision of a linear operator and consequences under positivity. Positivity 13, 693–707 (2009). https://doi.org/10.1007/s11117-008-2253-4. However, Gavrea and Ivan (“A note on the fixed points of positive linear operators”, Journal of Approximation Theory (227), 2018) pointed out that there are positive linear operators besides the identity that reproduce all polynomials of the form $x^i$ where $i>0$. 

  47. Sablonniere, Paul. “A quadrature formula associated with a univariate spline quasi interpolant.” BIT Numerical Mathematics 47.4 (2007): 825-837. https://doi.org/10.1007/s10543-007-0146-8 

  48. It has been argued that the inequality in this section applies to spline operators that reproduce every polynomial up to degree $d$, say, on subintervals of their domain even though they map to more functions than polynomials on the whole domain. For example, compare—
    Sablonnière, P. “Univariate spline quasi-interpolants and applications to numerical analysis.” Rend. Sem. Mat. Univ. Pol. Torino 63.3 (2005), section 2, and
    Sablonnière, P. “Quadratic spline quasi-interpolants on bounded domains of Rd, d= 1, 2, 3.” Rend. Sem. Mat. Univ. Pol. Torino 61.3 (2003), matter after Remark 1,
    with the operators in Lee, B-G., et al., “Some examples of quasi-interpolants constructed from local spline projectors”, Mathematical methods for curves and surfaces, Oslo 2000 (2000), section 2, which were shown to reproduce all functions they map to.
    But what stops us from subdividing the intervals further into, say, $10n$ subintervals of length $1/(10n)$, and inferring a much smaller error bound? Indeed, in this example, $L(f)$ then still equals a degree-2-or less polynomial at each of the new subintervals and reproduces polynomials of that kind at each of them. 

  49. Bede, Barnabás, and Sorin G. Gal. “Approximation by Nonlinear Bernstein and Favard-Szász-Mirakjan Operators of Max-Product Kind.” Journal of Concrete & Applicable Mathematics 8.1 (2010). 

  50. Bede, Barnabás, Coroianu, Lucian, Gal, Sorin G., Approximation and Shape Preserving Properties of the Bernstein Operator of Max-Product Kind, International Journal of Mathematics and Mathematical Sciences, 2009, 590589, 26 pages, 2009. https://doi.org/10.1155/2009/590589 

  51. Gal, Sorin G., and Constantin P. Niculescu. “Korovkin-type theorems for weakly nonlinear and monotone operators”, arXiv:2206.14102v1 [math.FA], also in Mediterranean Journal of Mathematics 20.2 (2023): 56. https://doi.org/10.1007/s00009-023-02271-y 

  52. $W_n$ can, in principle, be nonlinear instead, but this would require a totally different approach to finding the approximation error, and $L_n$ would then be nonlinear in general. 

  53. Güntürk, C. Sinan, and Weilin Li. “Approximation with one-bit polynomials in Bernstein form”, arXiv:2112.09183 (2021); Constr Approx 57, 601–630 (2023). https://doi.org/10.1007/s00365-022-09608-y 

  54. Micchelli, Charles. “The saturation class and iterates of the Bernstein polynomials”, Journal of Approximation Theory 8, no. 1 (1973): 1-18. 

  55. Wang, Y.Q., Zhang, Y.Y, Liu, J.L., “Expectation identity of the hypergeometric distribution and its application in the calculations of high-order origin moments”, Communications in Statistics–Theory and Methods 52(17), 2023. https://doi.org/10.1080/03610926.2021.2024235 

  56. Weisstein, Eric W. “Central Moment.” From MathWorld–A Wolfram Resource. https://mathworld.wolfram.com/CentralMoment.html 

  57. Ioan Gavrea, Mircea Ivan, “A note on the fixed points of positive linear operators”, Journal of Approximation Theory (227), 2018, https://doi.org/10.1016/j.jat.2017.12.001.

  58. Schaeffer, A. C., and R. J. Duffin. “On some inequalities of S. Bernstein and W. Markoff for derivatives of polynomials.” Bulletin of the American Mathematical Society 44.4 (1938): 289-297. 

  59. G.G. Lorentz, “Inequalities and saturation classes for Bernstein polynomials”, 1963. 

  60. Lorentz, G.G. The degree of approximation by polynomials with positive coefficients. Math. Ann. 151, 239–251 (1963). https://doi.org/10.1007/BF01398235 

  61. Holtz, O., Nazarov, F. & Peres, Y. New Coins from Old, Smoothly. Constr Approx 33, 331–363 (2011). https://doi.org/10.1007/s00365-010-9108-5  2

  62. $Q_{n,2}$ can also be seen as the Bernstein polynomial of a so-called linear differential operator: $1\cdot f^{(0)} + 0\cdot f^{(1)} + (\lambda(1-\lambda)/(2n))\cdot f^{(2)}$. 

  63. Piţul, P., “Evaluation of the Approximation Order by Positive Linear Operators”, dissertation, Universität Duisberg-Essen, 2007 

  64. S.N. Bernstein, “Démonstration du théorème de Weierstrass fondée sur le calcul des probabilités”, Comm. Kharkov Math. Soc. 13, 1-2, 1912. 

  65. Adell, J. A., and J. De la Cal. “Bernstein-type operators diminish the φ-variation.” Constructive Approximation 12.4 (1996): 489-507. https://doi.org/10.1007/BF02437505 

  66. Adell, J. A., and J. De la Cal. “Bernstein-Durrmeyer operators.” Computers & Mathematics with Applications 30.3-6 (1995): 1-14. https://doi.org/10.1016/0898-1221%2895%2900081-X