Stochastic Process
The next three notes borrow Billingsley’s Probability and Measure and Doob’s Stochastic Processes. A stochastic process is a vast object that can easily be sub-grouped. It requires times to comprehend its intricacies, navigate its components, and unveil the underlying patterns or behaviors that govern its evolution over time or across different dimensions.
I
Let $T$ be a totally ordered set, $(\Omega, \mathcal{F}, P)$ be a probability space, and $(\mathbb{S}, \mathcal{S})$ be a state space. A stochastic process $X = (X_{t})_{t \in T}$ is a sequence consists of $X_{t}: \Omega \to \mathbb{S}$ such that $X_{t}$ at some $t \in T$ may not be well-defined on $\mathbb{S} \subseteq \mathbb{R}$ (#1). One can index a discrete and a continuous process by $T = \mathbb{N}$ and $T = [0,\infty)$, respectively. Because $X:T\times\Omega\to\mathbb{R}$ is a function of $t \in T$ and $\omega \in \Omega$, we can write $X_{t}$, $X(t)$, $X_{t}(\omega)$, or $X(t,\omega)$, interchangeably. In addition, a function $X(\,\cdot, \omega): T \to \mathbb{R}$ with any $\omega \in \Omega$ is called a sample path or realisation of $X$, and a $\sigma$-algebra $\sigma(\mathbb{R})$ over $T$ which often denotes times contains full of pdfs and/or pmfs.
A full information about $X$ is accessible from the $\sigma$-algebra generated by $\bigcup_{t=1}^{\infty}X_t$, assumed that $X$ lies over the infinite time. If $X$ is on a finite time interval $T = \lbrace 1, 2, \dots, n \rbrace$, then useful information relevant to $X$ is conveyed by the $\sigma$-algebra generated by the set $\bigcap_{t=1}^{n}\lbrace \omega: X_{t}(\omega) \leq x_{t} \rbrace$. In particular, for all $x = (x_1, x_2, \dots, x_n) \in \mathbb{R}^n$, the joint cdf of $X$ is given by $F_X(x) = \mathbb{P}((-\infty, x])$ and a collection of $F_{X_t}(\cdot)$ is called a finite-dimensional distribution (f.d.d.) of $X$. This can be interpreted as the cdf of a random vector $\boldsymbol{X} = (X_{1}, \dots, X_{n}) \in \mathbb{R}^n$ which corresponds to a product measure whenever the selected timestamps yield ind. random variables.
// https://www.stat.cmu.edu/~cshalizi/754/notes/lecture-02.pdf // A marginal dist. of $X_t$ can be relevant to its neighbourhood marginals, and thus we should compute a joint dist.. What causes a problem is that their relationships can change along with $t$.
II
If a f.d.d. of $X$ is time-shift invariant $F_{(X_{1}, \cdots, X_{n})}(x_1, \dots, x_n) = F_{(X_{1+\tau}, \cdots, X_{n+\tau})}(x_{1+\tau}, \dots, x_{n+\tau})$ for all $n \in \mathbb{N}$, then $X$ is said to be strict-sense stationary (sss). If $n \in \lbrace 1, 2, \cdots, N \rbrace$, then $X$ is $N$-th order stationary (#2). Whereas, given the 1st moment and the 2nd central moment of $X$ are defined by $\mu_X(t) = \operatorname{E}X_t$ and $\gamma_{XX}(t,s) = \operatorname{Cov}(X_{t}, X_{s})$ for all $t,s \in T$, respectively, if $\mu_X(t) = \mu_X$ and $\gamma_{XX}(t, s) = \gamma_{XX}(\tau, 0)$ for $s \leq t$ and $\tau = t - s$, then $X$ is weak-sense stationary (wss). Note that an autocovariance yields an autocorrelation $\rho_{XX}(t,s) = \gamma_{XX}(t,s)[\sigma_{X}(t)\sigma_{X}(s)]^{-1}$, while we much prefer working with the wss as the sss is highly restrictive.
Let $X$ be stationary, $\bar{x} = {1\over{2T}}\int_{-T}^{T} X(t)\, \mathrm{d}t$ be the time average, and $\mu_{X} = \operatorname{E}X(t) = \int_{-\infty}^{\infty} zf_{X_t}(z)\,\mathrm{d}z$ be the ensemble average. If it is true that $\lim_{T\to\infty}\bar{x} = \mu_X$, then it is called mean-ergodic, and we can evaluate the statistical properties of $X$ by only using a single long sample path. (i.e. observing all possible samples at some fixed time $t$ may be difficult or even impossible). On the other hand, let $\bar{\gamma}_{XX}(\tau) = {1\over{2T}}\int_{-T}^{T}\,(X(t+\tau)-\mu_X)(X(t)-\mu_X) \, \mathrm{d}t$ be the time average estimate, and if $\lim_{T\to\infty}\bar{\gamma}_{XX}(\tau) = \gamma_{XX}(\tau)$, then the object is autocovariance-ergodic. A process that is both mean-ergodic and autocovariance-ergodic is wide-sense ergodic (#3).
For all $s, t \in T$ with $s \leq t$, a stochastic process $X$ has independent increments (i.e. differences) if $X_t - X_s$ is independent of $\mathcal{F}_s$, and stationary increments if $F_{X_t - X_s} = F_{X_{t-s} - X_0}$. Moreover, $X$ has independent and stationary increments if and only if $X$ is a partial sum process consists of a sequence $U = (U_{t})_{t \in T}$ of i.i.d. random variables such that $X_n = \Sigma_{t=1}^{n}U_{t}$. A pdf (f.d.d.) of such process $X$ on $\mathbb{R}^n$ is given by $f_{X}(x) = f_{X_1}(x_{1})f_{X_2-X_1}(x_{2}-x_{1}) \dots f_{X_n - X_{n-1}}(x_{n}-x_{n-1})$ when a dist. of $X_{t}$ on $\mathbb{R}$ is known for all $t \in T$. In particular, there exists $\mu \in \mathbb{R}$ and $\sigma \in [0,\infty)$ such that $\mu_{X}(t) = \mu{t}$ and $\rho_{XX}(t,s) = \sigma^{2}\min(s,t)$ for all $s,t \in T$, if $X \in L^2$ (#4).
III
A random walk on $\mathbb{Z}$, introduced by Karl Pearson in 1905, is a fundamental example of a stationary stochastic process. Specifically, a partial sum process $(S_{t})_{t \in T}$, consists of $S_0 = 0$ and $S_t = \Sigma_{k=1}^{t}X_k$ with ind. random variables $X_{k} \sim \operatorname{U}(-1, 1)$, is also known as a simple random walk of $\operatorname{E}S_t = 0$ and $\operatorname{Var}S_{t} = t$ (#5). More generally, suppose $(I_{t})_{t \in T}$ consists of a Bernoulli trial $I_{t} = (X_{t} + 1)/2$ of a success probability $p$, then $R_{t} = \Sigma_{k=1}^{t} I_{k} \sim \operatorname{Binom}(t,p)$. In particular, if $X_{t} = 1$ with $p$ and $X_{t} = -1$ with $q=1-p$, then $\operatorname{E}X_{t} = 2p-1$ and $\operatorname{Var}X_{t} = 4pq$, thus $p$ determines a random variable $Y_{t} = 2R_{t}-t$ by $\operatorname{E}Y_{t} = t(2p-1)$ and $\operatorname{Var}Y_{t} = 4tpq$ (#6).
A Poisson process consists of i.i.d. $X_t \sim \operatorname{Exp}(\lambda)$ is another 2nd order process of stationary and ind. increments. The process has $\operatorname{E}X_{t} = \operatorname{Var}X_{t} = \lambda{t}$ for all $t \in [0,\infty)$, so does monotonically increasing sample paths; Remark. given a set of $\mathbb{C}$-valued random variables $X_t \in L^2$ such that $\operatorname{E}X_t = 0$, if the inner product in the $L^2$-space is well-defined by $\langle X_{t_1}, X_{t_2} \rangle = \operatorname{E}[X_{t_1}\overline{X}_{t_2}]$ (and thus, ${\left\lVert X_t \right\rVert}_{2} = \sqrt{\langle X_t, X_t \rangle}$) for all $t \in T$, then $(X_t)_{t \in T} \subseteq L^2$ constructs a Hilbert space $H$. If the index set $T$ contains non-negative real numbers, in turn, $(X_t)_{t \in T}$ can be regarded as a curve $C$ exsiting in $H$;
OLS can also be studied in time series literature if any stochastic process (i.e. panel data) is stationary and under the assumptions of linear regression (taken for cross sectional data) (#7). Non-stationarities are usually hidden under the trend, seasonality, and/or heteroskedasticity. One can use the log-differencing to stablise the mean, and/ use the acf / pacf plots to observe autocorrelations and stablise the variacne of a time series. Refer to STL decomposition $y_{t} = S_{t} + T_{t} + R_{t}$ (or $\log{y_{t}} = \log{S_{t}} + \log{T_{t}} + \log{R_{t}}$), where $S_{t}$, $T_{t}$, $R_{t}$ are the seasonal, trend, and remainder parts, respectively.
IV
Let $(\Omega, \mathcal{F}, P)$ be a probability space, $X = (X_t)_{t \in T}$ be a stochastic process, and $F = (\mathcal{F}_t)_{t \in T}$ be a filtration representing a family of sub-$\sigma$-algebras $\mathcal{F}_{0} \subseteq \mathcal{F}_1 \subseteq \dots \subseteq \mathcal{F}_{s} \subseteq \mathcal{F}_{t} \subseteq \dots \subseteq \mathcal{F}$. We say $X$ is adapted to the filtration $F$ if $X_{t}:\Omega \to \mathbb{R}$ is $\mathcal{F}_{t}$-measurable, and predictable by $F$ if $X_{t+1}$ is $\mathcal{F}_{t}$-measurable, for all $t \in T$. A natural filtration $\mathcal{F}_{t} = \sigma(X_{s} \,\vert\, s \leq t)$ is often used as the smallest filtration containing all information up to time $t$. In addition, given that both $F$ and $G$ are on any fixed sample space $(\Omega, \mathcal{F})$, then we say $F$ is coarser than $G$ and $G$ is finer than $F$, denoted by $F \preceq G$, if $F_t \subseteq G_t$ for all $t \in T$.
We call a random variable $\tau: \Omega \to T$ on a filtered probability space $(\Omega, \mathcal{F}, F, P)$ a stopping time with respect to $F$ if $\lbrace \tau \leq t \rbrace \in \mathcal{F}_t$ for all $t \in T$, and all information stacked up to $\tau$ is encoded by a stopped $\sigma$-algebra $\mathcal{F}_{\tau} = \lbrace A \in \mathcal{F}_{\infty}: A \cap \lbrace \tau \leq t \rbrace \in \mathcal{F}_t, \,\forall t \in T \rbrace$, where an event $A \in S$. $\tau$ is a stopping time if and only if $\lbrace \tau > t \rbrace \in \mathcal{F}_t$ for all $t \in T$. If an arbitrary $\tau \in T$ is a stopping time w.r.t. $F$, and $F$ is coarser than $G$, then $\tau$ is a stopping time with respect to $G$. Moreover, if $\tau_1, \tau_2 \in F$ are stopping times w.r.t. $F$, then (i) $\tau_1 \vee \tau_2 = \max \lbrace \tau_1, \tau_2 \rbrace$; (ii) $\tau_1 \wedge \tau_2 = \min \lbrace \tau_1, \tau_2 \rbrace$; (iii) $\tau_1 + \tau_2$; are also stopping times w.r.t. $F$.
Loosely speaking, $\tau$ decides whether or not to stop evolving $X$ at time $t$ based only on information up to $t$ (i.e. no future information). That is, if $\mathcal{F}_t = F$ for all $t \in T$, then every $t$ are a stopping time. Similarly, if $\mathcal{F}_t = \lbrace \Omega, \emptyset \rbrace$ for all $t \in T$, then $\tau(\omega) = t$ for all $\omega \in \Omega$ with some $t \in T_{\infty}$ (#8). Whilst the examples contain impracticality, a first hitting time $\tau_B = \inf \lbrace t: X_t \in B \rbrace$ is a useful stopping time which $\lbrace \tau_B = t \rbrace = \lbrace X_0 \notin B, \dots, X_s \notin B, X_t \in B \rbrace = \bigcap_{r=0}^{s} {\lbrace X_r \in B \rbrace}^c \cap \lbrace X_t \in B \rbrace \in \mathcal{F}_{t}$. Whereas, $\tau_{B}^{\prime} = \sup \lbrace t: X_t \in B \rbrace$ is not a stopping time since a full trajectory of $X$ throughout $T$ must be given in order to have a stopping strategy.
**
(#1) Let $D = \text{discrete}, C = \text{continuous}, T = \text{time}, S = \text{space}$, so that we get $(DT,DS), (DT,CS), (CT,DS), (CT,CS)$. (#2) If $X$ is $N$-th order stationary, then it is $M$-th order stationary, where $M<N$. (#3) Ergodicity is a subset of wss and hence sss. (#8) I.e. $\tau$ is a constant. (#4) It is true because means and variances of $X$ satisfy the Cauchy’s functional equation. (#5) White noise is an example of continuous sequnece of i.i.d. random variables. (#6) If a random walk is permitted to continue walking forever on $\mathbb{Z}$, it will cross every point an infinite number of times. (#7) Otherwise, the model will capture non-stationarity within the processes, turning it into a spurious regression.
http://www.scieng.net/tech/12904?page=96
I gathered words solely for my own purposes without any intention to break the rigorosity of the subjects.
Well, I prefer eating corn in spiral .