Probability Theory
Stein’s Real Analysis III outlined an outer measure, browsed its limitations, and adopted measurable sets with sigma algebras. I have perused Amir Dembo’s Stanford STAT310 and Allan Gut’s Probability: A Graduate Course along with $\lbrace$W1, W2, W3$\rbrace$ to acquire the modern measure-theoretic foundations of probability theory.
I
A probability theory is grounded in the framework of random experiments (i.e. trials) where the outcome is inherently uncertain prior to execution. A practical execution allows for repetitions under temporally or spatially analogous (or even consistent) conditions (#1). Confidence gained from such repetitions can be futher enhanced with, or contended against our prior knowledge. The subject matter pertains to fully specified mathematical objects in contrast to statistics which deals with incomplete ones. We engage in the drawing, analysis, and visualisation of samples to determine an unbiased estimator for one or more unknown parameters in the latter.
We call a set $\Omega$ a sample space, an element of a set $\omega \in \Omega$ an elementary event, and a subset of a set $A \subseteq \Omega$ of $\lvert{A}\rvert \leq \lvert{\Omega}\rvert$ an event. $A$ is said to be occurred (not occurred) if $\omega \in A$ ($\omega \notin A$), and can be a generator for a $\sigma$-algebra $\mathcal{F} = \sigma(A)$ such that (i) $\Omega \in \mathcal{F}$; (ii) if $A \in \mathcal{F}$, then $A^{c} \in \mathcal{F}$; (iii) if $(A_{n})_{n \in \mathbb{N}} \in \mathcal{F}$, then $\bigcup_{n \in \mathbb{N}} A_n \in \mathcal{F}$; If one has $\sigma$-algebras $\mathcal{F}^* = \lbrace \mathcal{F}_0, \mathcal{F}_1, \dots \lvert \, A \subset \mathcal{F_n} \rbrace$, then $\mathcal{F}^*$ is said to be countably generated if there exists only countably many generators (#2). The fundamental idea of a $\sigma$-algebra is to quantify the amount of tangible information from its generator and the largest $\sigma$-algebra yields an extremely friutful tuple $(\Omega, \mathcal{F})$.
We are free to take a power set $\mathcal{F}_\Omega = \sigma(\Omega)$ for a finite or a countable $\Omega$, but the same set on an uncountable $\Omega$ (i.e. $\mathbb{R}$) violates the axioms as a sum of uncountably many probabilities $P(\Omega) = \Sigma_{\omega\in\Omega} p_\omega = \infty$. Hence we take the Borel $\sigma$-algebra $\mathcal{B}_{\mathbb{R}} = \sigma(\mathcal{O} \subset \mathbb{R})$, the smallest $\sigma$-algebra containing every open subsets (that are measurable), and build an Euclidean space $(\mathbb{R}^n, \mathcal{B}_{\mathbb{R}^n})$. Besides, the monotone class theorem and the $\pi - \lambda$ theorem exist for an algebra and a $\pi$-system, respectively. If an algebra $A$ is a subset of a monotone class $\mathcal{M}$ (#3), then the smallest $\sigma$-algebra $\sigma(A)$ is percisely equal to the smallest monotone class $\mathcal{M}(A)$.
II
A probability measure $P: \mathcal{F} \to [0,1]$ defined on any $(\Omega, \mathcal{F})$ must be under the Kolmogorov axioms (i) $P(A) \geq 0$ for any $A \in \mathcal{F}$; (ii) $P(\Omega) = 1$; (iii) if $(A_{n})_{n\in\mathbb{N}}$ is a set of pairwise disjoint events, then $P(\bigcup_{n=1}^{\infty} A_{n}) = \Sigma_{n=1}^{\infty} P(A_n)$ (#4); It inherits useful properties of a measure including (i) zero probability: $P\left(\emptyset\right) = 0$; (ii) complement: $P(A^c) = 1 - P(A)$; (iii) monotonicity: if $A \subseteq B$, then $P(A) \leq P(B)$; (iv) sub-additivity: if $A \subseteq \bigcup_n A_n$, then $P(A) \leq \Sigma_n P(A_n)$;. In particular, the monotonicity yields a well-defined $P$ on the limits (if exists) of a sequence $(A_n)_{n\in\mathbb{N}}$ of events. Note that there exists non-empty sets with zero probability.
By construction, if $P\left(\Omega\right) = \Sigma_{\omega\in\Omega}\, p_{\omega} = 1$ with $p_\omega \geq 0$ for all $\omega \in \Omega$, then a probability of $A$ occurring is given by $P(A) = \Sigma_{\omega\in{A}}\, p_\omega$. For instance, if the uniform probability $p_\omega = {\lvert \Omega \rvert}^{-1}$ is assigned to all elements of a finite $\Omega \subset \mathbb{R}$, then $P$ is the counting measure, or the Gaussian probability $f(x) = {1 \over \sigma\sqrt{2\pi}} e^{-{1 \over 2}({x - \mu \over \sigma})^2}$ assigns non-equal values to the subsets. In discrete sense, given a set of non-negative integers $\Omega = \lbrace k: k \in \mathbb{N}_{0} \rbrace$ and an expected rate of occurrences $\lambda > 0$, the Poisson probability $p_k = {\lambda^k\over{k!}}e^{-\lambda}$ which can be derived from the binomial distribution assigns measures to an event occurring $k$ times.
If there exists $B \in \mathcal{F}$ such that $A \subset B$ with $P(B) = 0$ (i.e. $B$ is measurable), then $A \subset \Omega$ which may or may not be an element of $\mathcal{F}$ is a null set. We often restrict a probability space $(\Omega, \mathcal{F}, P)$ to a complete probability space for the sake of measurability of null sets. The completion treats $A$ being measurable whenever $B$ is measurable. Besides, if there does not exists $\psi \in \Omega$ such that $P(\psi) > P(\omega) > 0$, then $\omega \in \Omega$ is an atom (#5). It hints us that an uncountable $\Omega$ can make a non-atomic proability space. I.e. the counting measure $\mu$ on $\Omega = \lbrace 1,2,\dots, 10 \rbrace$ takes a singleton $\lbrace j \rbrace$ as an atom, but the Lebesgue measure $\lambda$ on $\Omega = \mathbb{R}$ has no atom.
III
Suppose $(A_{n})_{n \in \mathbb{N}}$ such that $A \subset \Omega$. If $(A_{n})_{n\in\mathbb{N}}$ is strictly increasing (strictly decreasing), i.e. if $A_{n} \subset A_{n+1}$ ($A_{n} \supset A_{n+1}$), then $\lim_{n \to \infty} A_{n} = \bigcup_{n=1}^{\infty} A_{n}$ ($\lim_{n \to \infty} A_n = \bigcap_{n=1}^{\infty} A_{n}$). A limit of increasing unions of front segments $\bigcup_{k=1}^{n} A_{k} \subset \bigcup_{k=1}^{n+1} A_k$ is defined by $\lim_{n \to \infty} \bigcup_{k=1}^{n} A_k = \bigcup_{k=1}^{\infty} A_k$, and also a limit of decreasing unions of tail segments $\bigcup_{k=n}^{\infty} A_{k} \supset \bigcup_{k=n+1}^{\infty} A_{k}$ is defined by $\lim_{n\to\infty} \bigcup_{k=n}^{\infty} A_{k} = \bigcap_{n=1}^{\infty} \bigcup_{k=n}^{\infty} A_{k} = \limsup_{n\to\infty} A_{n}$. If $\lim_{n\to\infty} \bigcap_{k=n}^{\infty}A_{k} = \bigcup_{n=1}^{\infty}\bigcap_{k=n}^{\infty} A_{k} = \liminf_{n\to\infty} A_{n}$ is equal to $\limsup_{n\to\infty} A_{n}$, then $\liminf_{n\to\infty} A_{n} = \limsup_{n\to\infty} A_{n} = \lim_{n\to\infty} A_{n} = A$. Notice that the first equality is not true in general.
$P$ on a bounded monotonic sequence can be a continuous function (#6) as analogue to a real-valued function with well-defined limits. That is, if $A_n \nearrow A$ ($A_n \searrow A$) as $n \to \infty$, then $P(A_n) \nearrow P(A)$ ($P(A_n) \searrow P(A)$) as $n \to \infty$. In particular, based on the chain of probabilistic inequalities $P\left(\liminf_{n \to \infty}A_{n}\right) \leq \liminf_{n \to \infty} P\left(A_{n} \right) \leq \limsup_{n \to \infty} P\left(A_{n} \right) \leq P\left(\limsup_{n \to \infty} A_{n}\right)$, if $A_{n} \to A$ as $n \to \infty$, then $P\left(A_{n} \right) \to P(A)$ as $n \to \infty$. The interchange of a probability measure and a limit of a union is also allowed, thus we can express $\lim_{n \to \infty} P(\bigcup_{k=1}^{n} A_{k}) = P(\bigcup_{k=1}^{\infty} A_{k})$ and $\lim_{n \to \infty} P(\bigcup_{k=n}^{\infty} A_{n}) = P(\bigcap_{n=1}^{\infty} \bigcup_{k=n}^{\infty} A_{k}) = P(\limsup_{n\to\infty} A_{n})$.
The Borel–Cantelli lemmas state sufficient conditions for a convergence of a sequence $(A_n)_{n\in\mathbb{N}}$ of events. The 1st lemma states that if $\Sigma_{n=1}^{\infty} P(A_n) < \infty$, then $P(A_n \text{ i.o.}) = P(\limsup_{n \to \infty} A_n) \leq \lim_{n\to\infty}\Sigma_{k=n}^{\infty}P(A_k) = 0$ (#7), signifying that almost all $\omega\in\Omega$ belong to at most a finite number of events $A_n$. The 2nd lemma states that if $\Sigma_{n=1}^{\infty} P(A_n) = \infty$ and the events $A_n$ are independent, then $P(A_n \text{ i.o.}) = 1$. Nevertheless, desptie of their usabilities in mathematical proofs, the infinite monkey theorem which Émile Borel brought to the public encountered the pungent criticism and the 2nd lemma was considered to be quite nonsense.
**
(#1) In time: consecutive coin tosses; In space: throwing multiple coins at once; (#2) Let $\Omega = \lbrace 1,2,3 \rbrace$, $A_{0} = \lbrace 1 \rbrace$, $A_{1} = \lbrace 2,3 \rbrace$, then $\mathcal{F} := \sigma(A_0) = \sigma(A_1) = \lbrace \emptyset, \lbrace 1 \rbrace, \lbrace 2,3 \rbrace, \lbrace 1,2,3 \rbrace \rbrace$. (#3) $\mathcal{M}$ is a collection of sets that is closed under countable monotone unions and intersections. (#4) $A$ and $B$ are ind.: $P(A \cap B) = P(A)P(B)$; $A$ and $B$ are disjoint (mutually exclusive): $A \cap B = \emptyset$; (#5) $P(B) > 0$ means that $\exists A \subset B$ s.t. $0 < P(A) < P(B)$. (#6) $f:D \to R$ is continuous at $x_0 \in D$ iff $\forall\varepsilon>0\, \exists\delta>0\, \forall{x}\in{D}$ s.t. $|x-x_0|<\delta: |f(x)-f(x_0)|<\varepsilon$. (#7) Let $X = [0,1]$, $E_k = [0, 1/2^k]$, and $\lambda$ be a Lebesgue measure, then $\Sigma_{k=1}^{\infty}\lambda(E_k)=1$, and so $\lambda\left(\lbrace x \in X: x \in E_{k} \;\forall k \in \mathbb{N} \rbrace \right) = 0$.
I gathered words solely for my own purposes without any intention to break the rigorosity of the subjects.
Well, I prefer eating corn in spiral .