Hikikomori

16 object(s)
 

208. central limit theorem

Central Limit Theorem


The LLNs state that $\bar{X} = n^{-1}\Sigma_{k=1}^{n}X_k$, consist of i.i.d. realisations $X_1, \dots, X_n$ of $\operatorname{E}X_k = \mu$ and $\operatorname{Var}X_k = \sigma^2 < \infty$, converges in prob. and a.s. to $\mu$, respectively. Whereas, the CLTs state that a scaled difference $n^{-1}\Sigma_{k=1}^{n}X_k - \mu$ converges in dist. to the standard normal. The Galton’s quincunx has ingeniously illustrated the fundamental idea behind the CLT.

I


Let $X \sim \operatorname{Bin}(n,p)$ be a binomially distributed random variable of $\operatorname{E}X = np$ and $\operatorname{Var}X = npq$, where $p>0$ and $p+q=1$. The de Moivre–Laplace theorem says a normal approaches to a binomial: $\binom{n}{h}p^{h}q^{n-h} \simeq \exp(-(h-np)^{2}/2npq)/\sqrt{2\pi{npq}}$ as $n \to \infty$. one can show that $Z_n = (X-np)/\sqrt{npq} \xrightarrow{d} Z$, and thus $P(Z_n \leq z) \xrightarrow{p} \Phi(z)$ as $n\to\infty$, where $Z \sim \mathcal{N}(0,1)$ is continuous, $\Phi(z) = \int_{-\infty}^{z} f_Z(v)\,\mathrm{d}v$ and $f_Z(z) = \exp(-z^2/2)/\sqrt{2\pi}$ are the distribution and the density function of $Z$, respectively. de Moivre initially applied a normal to approximate a number of heads occured by fair coin tosses (#1) and Laplace continued his legacy.

By definition, for any $Y \sim \mathcal{N}(\mu,\sigma^2)$, the first-order derivative of its density function is given by $f_{Y}^{\prime}(y) = -\sigma^{-2}(y-\mu) \cdot f_{Y}(y)$ such that $\int_{-\infty}^{\infty} f_{Y} = 1$ and $[f^{\prime}_Y(y) / f_{Y}(y)] \cdot [\sigma^{2} / (\mu-y)] = 1$. One can prove the statement by instead computing its discrete derivative $[(p_{X}(n,k+1)-p_{X}(n,k))/p_{X}(n,k)] \cdot [npq / np-h] \to 1$ as $n\to\infty$, where $c>0$ and $h = np + c\sqrt{npq}$ (#2). Despite recent computational advancements easing the handling of binomial distributions, the theorem endures mainly due to its applicability across all different probability distributions, and its validity stemming from its independence of the i.i.d. assumption over numerous trials.

Let $(X_n)_{n \in \mathbb{N}}$ consists of ind. random variables $X_{k}$ such that $\operatorname{E}X_k = \mu_k$ and $\operatorname{Var}X_k = \sigma_{k}^2 < \infty$, $S_n = \Sigma_{k=1}^{n}X_k$ be a partial sum, and $\operatorname{Var}S_n = \Sigma_{k=1}^{n}\sigma^{2}_k = s_n^2$. If $(X_n)_{n \in \mathbb{N}}$ holds Lyapunov’s condition: $\lim_{n\to\infty} s^{-(2+\delta)}_n \Sigma_{k=1}^{n}\operatorname{E}{\vert X_k - \mu_k \vert}^{2+\delta} = 0$ for some $\delta > 0$, then the Lyapunov’s CLT states $Z_n = s^{-1}_n \Sigma_{k=1}^{n} (X_k-\mu_k) = (S_n-\operatorname{E}S_n)/\sqrt{\operatorname{Var}S_n} \xrightarrow{d} Z$ as $n\to\infty$. His study emphasises that the overall behavior is paramount, permitting a few exceptionally large $(2+\delta)$-moments. That is, the condition in essence ensures that the influence of $\sigma^{2}_k$ on $s^2_n$ becomes negligible as $n$ grows, mitigating distortions of a few undesirable summands $X_k$.

II


The Lyapunov’s CLT remains true as long as $\sigma^{2}_k$ are not excessively large compared to $s^2_n$. Accordingly, if we assume Lindeberg’s condition: $\lim_{n\to\infty} s_n^{-2} \Sigma_{k=1}^{n}\operatorname{E}[(X_k - \mu_k)^{2} I_{\vert X_k - \mu_k \vert > \varepsilon{s_n}}] = 0$ for all $\varepsilon>0$, then the Lindeberg-Feller’s CLT guarantees that $Z_n = s_n^{-1}\Sigma_{k=1}^{n}(X_k - \mu_k) \xrightarrow{d} Z$ as $n \to \infty$. While the thoeorem merely holds the sufficiency ($\Rightarrow$: if), Feller’s condition holds the partial necessity ($\Leftarrow$: only if) by observing the two facts: (i) Lindeberg’s condition implies $\lim_{n\to\infty} \max_k \sigma_k^2/s_n^{2} = 0$; (ii) a sequence of ind. random variables $X_k$ which converges in dist. to $Z$ and holds $\lim_{n\to\infty} \max_k \sigma_k^2/s_n^{2} = 0$ implies Lindeberg’s condition;

Suppose $\mu_{k}=0$ wlog and let $\varphi_{Z_n}(t) = \operatorname{E}e^{itZ}$ for $Z_{n} = S_{n}/s_{n}$ to study the convergence. If $S^{\prime}_{n} = \Sigma_{k=1}^{n}X^{\prime}_{k}$ with $X^{\prime}_{k} = X_{k} - \mu_{k}$, then $Z^{\prime}_{n} = S^{\prime}_{n} / s^{\prime}_{n} = S_{n} / s_{n} = Z_{n}$, where $s^{\prime}_{n} = s_{n}$. If the Lindeberg’s condition holds, we have $Z^{\prime}_{n} \xrightarrow{d} Z$ as $n\to\infty$ and $Z_{n} \xrightarrow{d} Z$, because $\operatorname{E}[(X_{k})^{2} I_{\vert X_{k} \vert > \varepsilon{s_{n}}}] = \operatorname{E}[(X^{\prime}_{k})^{2} I_{\vert X^{\prime}_{k} \vert > \varepsilon{s^{\prime}_{n}}}]$. We shall concentrate on the following as usual: $\varphi_{Z_{n}}(t) = \prod_{k=1}^{n}\varphi_{X_{k}}(t/s_{n}) = \exp(\Sigma_{k=1}^{n}\log(\varphi_{X_{k}}(t/s_{n}))) \to \exp(-t^2/2) = \varphi_{Z}(t)$ as $n\to\infty$ (#4). It may be worth expanding $(S_{n})_{n\in\mathbb{N}}$ as a row-wise ind. triangular arrays $S_{n} = X_{n,1} + X_{n,2} + \dots + X_{n,n}$ to manifest generality and capability of the theorems.

// For the other half, we bound $C_{1}(n) = \max_k s_{n}^{-2}\operatorname{E}(X^{2}_{k} I_{\vert X_{k} \vert \leq\varepsilon{s_{n}}}) + \max_k s_{n}^{-2} \operatorname{E}(X^{2}_k I_{\vert X_k \vert >\varepsilon{s_n}}) \leq \varepsilon^2 + C_{2}(n)$ for any $\varepsilon > 0$, where $C_{1}(n) = \max_{k}s_{n}^{-2}\sigma_{k}^2$ and $C_{2}(n) = s_{n}^{-2}\operatorname{E}(X^{2}_{k} I_{\vert X_{k} \vert >\varepsilon{s_{n}}})$. Lindeberg’s condition implies $C_{2}(n) \to 0$ as $n\to\infty$ and so $C_{1}(n) \to 0$. In particular, $\mu_{k} = 0$ guarantees the existence of $k < n$ such that $0 < \sigma^2_{k}$ and the term $s^{-2}_{n} \sigma^2_{k} \to 0$ if $C_{1}(n) \to 0$. A numerator in Lindeberg’s condition seeks for all $X_{k}$ with $\operatorname{Var}(X_{k} I_{\vert X_{k} \vert > \varepsilon{s_{n}}})$, but the one in Feller’s seeks for $X_{k}$ with $\max_{k} \operatorname{Var}X_{k}$ (#5). At last, suppose $C_{1}(n) \to 0$ and $\varphi_{S_{n}/s_{n}}(t) \to \exp(-t^2/2)$ as $n\to\infty$, then we would show that $C_2(n) \to 0$ as $n\to\infty$.

III


The Lindeberg–Lévy’s CLT gians popularity due to its simplicity. Suppose $(X_n)_{n \in \mathbb{N}}$ is a sequence of i.i.d. random variables with $\operatorname{E}X_n = \mu$ and $\operatorname{Var}X_n = \sigma^2 < \infty$. The CLT states that $\bar{X}_n \xrightarrow{d} \mathcal{N}(\mu, \sigma^2/n)$ as $n\to\infty$, and so $\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \sigma^2)$, or equivalently, $Z_n = \sigma^{-1}[\sqrt{n}(\bar{X}_n - \mu)] \xrightarrow{d} Z$ as $n\to\infty$. It is a fundamental of statistical hypothesis tests whereby we examine an assumption regarding a population parameter such as $\operatorname{E}X$. We can assume that the dist. of the sum of residuals $\Sigma_{k=1}^{n}\varepsilon^2_k$ is roughly normal when scaled by $1/n$, even if $\varepsilon^2_k = (X_k - \hat{X}_k)^2 \not \sim \mathcal{N}(0,\sigma^{2}_k)$. The QQ plot and/or the Shapiro-Wilk test studies the normality.

As before, let $Z_n = (S_n - n\mu)/\sqrt{n\sigma^2} = \Sigma_{k=1}^{n}(X_k - \mu)/\sqrt{n\sigma^2} = \Sigma_{k=1}^{n}Y_{k}/\sqrt{n}$, where $Y_{k} = (X_{k} - \mu)/\sigma$ with $\operatorname{E}Y_{k}=0$ and $\operatorname{Var}Y_{k} = 1$. Thus the characteristic function is given by $\varphi_{Z_n}(t) = \varphi_{\Sigma_{k=1}^{n}Y_{n}/\sqrt{n}}(t) = \prod_{k=1}^{n}\varphi_{Y_k}(t/\sqrt{n}) = [\,\varphi_{Y_1}(t/\sqrt{n})\,]^n$. For all $t\in\mathbb{R}$, we can write $\varphi_{Z_n}(t) =[\, 1 - t^{2}/2n + \mathcal{o}(t^2/n) \,]^n \to \exp(-t^2/2)$ as $n \to \infty$ for the continuity theorem to complete proof. The convergence is uniform in $z \in \mathbb{R}$ such that $\lim_{n\to\infty}\sup_{z\in\mathbb{R}}{\vert P(\sqrt{n}(\bar{X}_n-\mu)) - \Phi(z/\sigma) \vert} = 0$, and so $\lim_{n\to\infty}Z_n = 0$ if $X$ is degenerate (i.e. $\sigma=0$). Lévy is renowned for employing a characteristic function and the continuity theorem to facilitate proofs.

The Lindeberg–Lévy’s CLT is the special case of the above two. Under the i.i.d. assumption, whenever $\sigma^2 < \infty$, the Feller’s condition yields $\sigma^2 / n\sigma^2 = 1/n \to 0$ as $n\to\infty$. In fact, if each realisation $\mathbf{X}_{n} = [X_{1}, \dots, X_{m}]^\top$ is an element of $\mathbb{R}^m$, then a sample mean is given by $\mathbf{\bar{X}}_{n} = n^{-1}\Sigma_{k=1}^{n}\mathbf{X}_{k}$ while summation is component-wise. Provided that $\mathbf{\mu} = \operatorname{E}\mathbf{X} = [\operatorname{E}X_{1}, \dots, \operatorname{E}X_{m}]^{\top}$ and $\operatorname{K}_{\mathbf{X}\mathbf{X}} = \operatorname{Var}\mathbf{X} = \operatorname{Cov}(\mathbf{X},\mathbf{X}) = \operatorname{E}[(\mathbf{X} - \mathbf{\mu})(\mathbf{X} - \mathbf{\mu})^{\top}] < \mathbf{\infty}$, the multivariate CLT would yield $\sqrt{n}(\mathbf{\bar{X}}_{n}-\mathbf{\mu}) \xrightarrow{d} \mathcal{N}(0, \operatorname{K}_{\mathbf{X}\mathbf{X}})$ as $n\to\infty$. The Berry–Esseen theorem asserts a convergence rate of at least $n^{-1/2}$ with a finite 3rd central moment (#6).

**


(#1) For $n$ coin tosses, if $X_k=\text{Head}$ for some $k=1,\dots,n$, then we record $h \text{ += } 1$. (#2) Other proof uses the Stirling’s formula. (#3) Recall that the skewness $\mu_3 / \sigma^3$ and the kurtosis $\mu_4 / \sigma^4$ are measures of the dist. function (i.e. $\mu_r$ is the $r$-th central moment). (#4) Whole proof is in here. (#5) E.g. Let $(X_n)_{n\in\mathbb{N}}$ be a set of uniformly bounded $X_k$ such that ${\vert X_k \vert} \leq M$ for all $k \leq n$. If $s_\infty = \infty$, then $\operatorname{E}[(X_k - \mu_k)^{2} I_{\vert X_k - \mu_k \vert > \varepsilon{s_n}}] = \int_{\lbrace \vert X_k - \mu_k \vert > \varepsilon{s_n} \rbrace} (X_k - \mu_k)^{2}\,\mathrm{d}F_k(x) \leq (2M)^2P(\vert X_k - \mu_k \vert > \varepsilon{s_n}) \leq (2M)^2\sigma^2_k / \varepsilon^2{s^2_n}$, and thus Lindeberg’s condition is met. (#6) Shevtsova (2011) and Tyurin (2010) reduced the error to 0.56. from 7.59 that of Esseen (1942). Besides, the Stein’s method is a general method to obtain bounds on the distance between two probability distributions.




I gathered words solely for my own purposes without any intention to break the rigorosity of the subjects.
Well, I prefer eating corn in spiral .