202. random variable

Random Variable

A random variable $X$ maps an event onto a measurable space where its distribution function lies. The symbol $X$ distinguishes itself from an ordinary algebraic variable $x$ which is often assigned to a constant. That is, drawing samples $X_1, X_2, \dots, X_n$ is equivalent to gathering elements from randomness as each value is indeterminate until an excution of experiment.

I

Given $(\Omega, \mathcal{F})$ and $(\mathbb{S}, \mathcal{S})$, a map $X: \Omega \to \mathbb{S}$ is an $\mathbb{S}$-valued random variable if $X^{-1}(A) = \lbrace X \in A \rbrace = \lbrace \omega: X(\omega) \in A \rbrace \in \mathcal{F}$ for all $A \subset \mathbb{S}$, and we simply call $X: \Omega \to \mathbb{R}$ a random variable. $X$ is discrete if $P(X \in A) = 1$ for a finite or countable $A \subset \mathbb{R}$, and/or continuous if $P(X = x) = 0$ for all $x \in \mathbb{R}$. Every $X$ has a distribution function (cdf) $F_{X}(x) = P(X \leq x) = \mathbb{P}((-\infty, x])$ with an induced probability measure $\mathbb{P}$ (#1). If $X$ is discrete, then $F_{X}(x) = \Sigma_{x_{j} \leq x} p_{x_{j}}$, where $p_{x_{j}} = P(X = x_j)$ is a probability function (pmf). If continuous, we have $F_X(x) = \int_{-\infty}^{x} f_X(t)\, \mathrm{d}t$, where $f_X = {\mathrm{d} \over \mathrm{d}x}F_X$ is a density function (pdf).

If $X: \Omega \to \mathbb{R}$ is a random variable and $g: \mathbb{R} \to \mathbb{R}$ is a Borel measurable function, then a composition $g \circ X: \Omega \to \mathbb{R}$ is a random variable. Namely, $P(Y \in B) = P(X \in g^{-1}(B))$ for every $B \subset \mathbb{R}$, where $g^{-1}(B) = \lbrace \omega: g(X(\omega)) \in B \rbrace \in \mathcal{B}_\mathbb{R}$, by the change of variable which allows us to focus on on $X$. Moreover, if $X$ is discrete, then $Y$ is discrete, and thus $p_Y(y) = \Sigma_{\lbrace x\,:\, g(x)=y \rbrace} p_X(x)$. Note that $p_Y(y) = p_X(x)$ if and only if $g$ is injective (i.e. a one-to-one relationship). Whereas, if $X$ is continuous, then $Y$ can be discrete, continuous, or neither. One may prefer a simpler form $p_Y(y) = f_Y(y) = \int_{\lbrace x\,:\, g(x) = y \rbrace} f_X(x)\, \mathrm{d}x$.

In $n$-dimensional spaces, we use a random vector $\boldsymbol{X}: \Omega \to \mathbb{R}^n$ as a multivariate function, and it also has a joint distribution function $F_{\boldsymbol{X}}(\boldsymbol{x}) = P(\boldsymbol{X} \leq \boldsymbol{x})$, where $\boldsymbol{x} = (x_1, \dots, x_n)$ and $\lbrace \boldsymbol{X} \leq \boldsymbol{x} \rbrace = \bigcap_{k=1}^{n} \lbrace X_k \leq x_k \rbrace = \lbrace X_1 \leq x_1, \dots, X_n \leq x_n \rbrace$ (#2). For example, given that we have $\boldsymbol{X} = (X_1, X_2)$, the distribution of $X_1$ is given by summing up the joint probabilities of the variables other than $X_1$. Both $F_{X_1}(x_1) = P(X_1 \leq x_1, X_2 < \infty)$ and $F_{X_2}(x_2) = P(X_1 < \infty, X_2 \leq x_2)$ are called marginal distribution functions of $\boldsymbol{X}$. We want to ask whether knowing about an event occurring has influences on knowing about other events occuring.

II

We say a sequence $(A_n)_{n\in\mathbb{N}}$ of events is independent (ind.) if $P(\,\bigcap_{k=1}^{n} A_k) = \prod_{k=1}^{n} P(A_k)$ (#3). A sequence $(X_n)_{n\in\mathbb{N}}$ is indeed ind. if $P(\,\bigcap_{k=1}^{n} \lbrace X_k \in B_k \rbrace ) = \prod_{k=1}^{n} P(X_k \in B_k)$, where $B_k \subset \mathbb{R}$ is Borel measurable. Desirably, by virtue of independence, the joint cdf is given by $F_{\mathbf{X}}(\mathbf{x}) = \prod_{k=1}^{n} F_{X_k}(x_k)$ for any $\mathbf{x} \in \mathbb{R}^n$. If, in addition, $X$ and $Y$ permit pdfs, then the Fubini’s theorem yields $F_{X,Y}(x,y) = F_{X}(x)F_{Y}(y) := \int_{-\infty}^{(x,y)} f_{X,Y}(t,s)\, (\mathrm{d}t \times \mathrm{d}s) = \int_{-\infty}^{x} f_{X}(t)\, \mathrm{d}t \int_{-\infty}^{y} f_{Y}(s)\, \mathrm{d}s$ and thus $f_{X,Y}(x,y) = f_{X}(x)f_{Y}(y)$. Likewise, if ind. random variables $X$ and $Y$ permit pmfs, then we have $p_{X,Y}(x,y) = p_{X}(x)p_{Y}(y)$.

However, if $A$ and $B$ are dependent (jointly distributed), then we must derive a conditional probability $P(B \vert A) = P(A\cap{B}) / P(A)$ from the multiplication law $P(A \cap B) = P(A \vert B)P(B) = P(B \vert A)P(A)$, where $P(A) > 0$ is the normalising factor. The Bayes’ theorem denotes the reverse conditional probability $P(A \vert B) = P(B \vert A)P(A)/P(B)$, and the conditioning part (likelihood) can be either $P(A \vert B) = P(A)$ and $P(B \vert A) = P(B)$ whenever $A$ and $B$ are ind.. Assume that $(A_n)_{n\in\mathbb{N}}$ is the partitions of a discrete $\Omega$ (#4), then the law of total probability allows us to compute a marginal $P(B) = \Sigma_{k=1}^{n}P(A_k \cap B)= \Sigma_{k=1}^{n}P(B \vert A_k)P(A_k)$.

For instance, if $X$ and $Y$ are discrete and dep., a conditional cdf of $Y$ given $X=x$ is led by $F_{Y \vert X=x}(y) = \Sigma_{y_j\leq{y}}p_{y_{j} \vert X=x}$, where $p_{Y \vert X=x} = p_{X,Y}(x,y)/p_X(x)$ is a conditional pmf. One may consider a marginal pmf $p_Y(y) = \Sigma_{x} p_{Y \vert X=x}(y)p_X(x)\, dx$ as a mean. If, however, $X$ and $Y$ are continuous, then $F_{Y \vert X=x}(y) = \int_{-\infty}^{y} f_{Y \vert X=x}(y_j)\, \mathrm{d}y_j$, where $f_{Y \vert X=x}(y) = f_{X,Y}(x,y)/f_X(x)$ is a conditional pdf of $Y$ given $X \in (x-\varepsilon, x+\varepsilon)$. Whereas, a random vector consists of ind. random variables directly produces marginals. Namely, $\boldsymbol{Z} = (X, Y)$ with a joint pmf $p_{\mathbf{Z}}(x,y)$ has marginal pmfs $p_X(x) = \Sigma_{y} p_{\mathbf{Z}}(x,y)$ and $p_Y(y) = \Sigma_{x}p_{\mathbf{Z}}(x,y)$.

III

Suppose $(\mathcal{F}_n)_{n\in\mathbb{N}}$ is a sequence of sub-$\sigma$-algebras $\mathcal{F}_n = \sigma(A_k: k \leq n) = \sigma(\bigcup_{k=1}^{n}A_k)$. If $\mathcal{G}_n = \sigma(A_k: k \geq n) = \sigma(\bigcup_{k=n}^{\infty} A_k)$, then $\mathcal{G} = \bigcap_{n=1}^{\infty} \mathcal{G}_n$ is the tail-$\sigma$-algebra of $(\mathcal{F}_n)_{n\in\mathbb{N}}$. Namely, the set $\mathcal{G}$ contains information beyond time $n$ for any $n \in \mathbb{N}$ since each $\mathcal{G}_n$ contains information beyond time $n$. Furthermore, $\mathcal{F}_n \subseteq \mathcal{F}_{n+1}$ for all $n\in\mathbb{N}$, and also $\mathcal{G} \subset \mathcal{F} = \sigma(A_k: k \geq 1) = \sigma(\bigcup_{k \geq 1}\mathcal{F}_k)$. The Kolmogorov zero-one law states that a probability of any tail event $A \in \mathcal{G}$ is either $0$ or $1$ a.s., thus the occurrence of tail events can still be determined although a finite number of $\mathcal{F}_n$ at the front of a row are effaced. Note that we are refining the zero-one law.

The same occurs in $\mathbb{R}$. Suppose $\mathcal{S}_{n} = \sigma(X_{k} : k \geq n) = \sigma(\bigcup_{k=n}^{\infty} X_{k})$ is ind. to each other. If $B \in \bigcap_{n=1}^{\infty}\mathcal{S}_{n}$ is a tail event, then $B$ belongs to every one of the $\sigma$-algebras $\lbrace \mathcal{S}_{n}, \mathcal{S}_{n+1}, \dots \rbrace$. That is, $B$ can be interpreted independently from any finite subsets of $\lbrace X_{k} : 1 \leq k \leq n \rbrace$, or equivalently, omission of the front has no impacts on the investigation (i.e. measure) of a tail event. The zero-one law implies that the limit exists either with probability $0$ or $1$ (depending on $F_{X_n}$), and so, if does, then exists $c = \lim_{n\to\infty} X_n$ with probability $1$. An event that a sequence converges or its sum converges indeed is a tail event (#5).

Note that $\limsup_{n\to\infty}X_n \in \mathcal{G}$ is a tail event because $\limsup_{n+N\to\infty}X_{n+N}$ with any $N \in \mathbb{N}$ is a generator of a tail-$\sigma$-algebra $\mathcal{G}$. We recall the Borel-Cantelli lemmas saying that $P(\limsup_{n\to\infty}X_n) \in \lbrace 0,1 \rbrace$. The 1st lemma merely shows that the probability of unions of the tail segments is at most the sum of the probability of each segments. However, the independence assumption in the 2nd lemma makes the tail-$\sigma$-algebras trivial (i.e. generated by a singleton; $\sigma(\Omega) = \lbrace \Omega, \emptyset \rbrace$), and the Komolgorov zero-one law (it dose not actually clarifies whether $0$ or $1$ is correct) enables us to clarify that the event occurs almost surely under the sufficient condition.

**

(#1) If $H_0$ is examined on a continuous $X$, then one-sided p-value is given by $\overline{F_X(x)} = 1 - F_X(x) = \operatorname{P}(X > x)$. (#2) $X_1, \dots, X_n$ may or may not be on a common $\Omega$. If $F_{X_i} \neq F_{X_j}$ for all $i,j\in\mathbb{N}$ with $i \neq j$, then $(x_1, \dots, x_n) \in \mathcal{S}_1 \times \dots \times \mathcal{S}_n$ (#3) If $A$ and $B$ are ind., then so are $A$ and $B^c$, $A^c$ and $B$, and $A^c$ and $B^c$; If $X_1$ and $X_2$ are ind., then so are $g(X_1)$ and $g(X_2)$. (#4) $\lbrace A_n \rbrace_{n\in\mathbb{N}}$ is a partition of $\Omega$ iff (i) $\bigcup_{k=1}^{n}A_k = \Omega$; (ii) $A_i \cap A_j = \emptyset$ for all $1 \leq i \neq j \leq n$; We may assume that $A_k$ is measurable. (#5) Given an infinite #. of coin tosses, $X_n \in \lbrace H, T \rbrace$ and $\mathcal{B} = \lbrace X_{n}=H, \dots, X_{n+99}=H \; \text{i.o.} \rbrace$ is a generator and a tail event, respectively

I gathered words solely for my own purposes without any intention to break the rigorosity of the subjects.
Well, I prefer eating corn in spiral .

Hikikomori

202. random variable

Random Variable

I

II

III

**