212. markov chain

Markov Chain

The Markov property formally describes that the future is independent to the past. Weather forecast is an example which depends primarily on the current weather instead of the entire record of history. The link provides many sampling algorithms built upon the Monte Carlo methods and the Markov property.

I

We call a stochastic process $X = (X_t)_{t \in T}$ consists of $\mathbb{S}$-valued random variables $X_{t}:\Omega \to \mathbb{S}$ a Markov process if it satisfies the Markov property $P(X_{t} = x_{t} \,\vert\, X_{t-1} = x_{t-1}, \dots, X_{1} = x_1) = P(X_{t} = x_{t} \,\vert\, X_{t-1} = x_{t-1})$ for all $t \in T$. We primarily work with a discrete-time Markov process on a countable state space $\mathbb{S}$, called a Markov chain, and so a joint pmf for $t$ consecutive realisations is given by $P(X_{t}, X_{t-1}, \dots, X_{1}) = P(X_{t}\,\vert\,X_{t-1}) P(X_{t-1}\,\vert\,X_{t-2}) \dots P(X_{1}) = \prod_{s=1}^{t-1}P(X_{s+1}\,\vert\,X_{s}) P(X_{1})$. We focus on the initial state probabilities $\pi_{j}(1) = P(X_{1} = j)$ and the $1$-step transition probabilities $p_{ij}(s) = P(X_{s+1} = j \,\vert\, X_{s} = i)$, where $i, j \in \mathbb{S}$ (#1).

If $\mathbb{S}$ contains $n$ distinct elements (i.e. finite), then we can write a transition matrix $\mathbf{P} = (p_{ij})$ with $\Sigma_{j \in \mathbb{S}}p_{ij} = 1$, where $\mathbf{P}$ is an $n \times n$ positive semi-definite matrix. If, in addition, it is time-homogeneous $P(X_{t} = j \,\vert\, X_{t-1} = i) = P(X_{t-1} = j \,\vert\, X_{t-2} = i)$ for all $t > 2$, then the Chapman-Kolmogorov equation yields $\mathbf{P}(k)\mathbf{P}(s) = \mathbf{P}(k+s) = [p_{ij}(k+s)]$. That is, the $k$-step transition matrix is equal to that of $k$-th power $\mathbf{P}^k = \mathbf{P}(k) = [p^{(k)}_{ij}(1)]$ (#2). For instance, let $t=3$ and $h \in \mathbb{S}$ be an intermediate state in between $i$ and $j$, so we need $p^{(2)}_{ij}(1) = P(X_{3} = j \,\vert\, X_{1} = i) = \Sigma_{h}P(X_{3}=j \,\vert\, X_{2}=h) P(X_{2}=h \,\vert\, X_{1}=i) = \Sigma_{h}p_{ih}(1)p_{hj}(1)$ for the joint pmf.

We now consider a state pmf $\pi(t) = [\pi_{j}(t)]$, where $\pi_{j}(t) = \Sigma_{i}p_{ij}\pi_{i}(t-1)$. If $\pi_{j}(t)\to\pi_{j}$ and $\pi_{i}(t-1)\to\pi_{i}$ as $t\to\infty$, then $\lim_{t\to\infty}\pi(t) = \pi$ is a stationary state pmf or a steady state, where $\pi$ is a $(1 \times n)$ row vector with $\Sigma_{j}\pi_{j} = 1$ and $\pi_{j} = \Sigma_{i}p_{ij}\pi_{i}$. In fact, the Perron-Frobenius theorem enumerates the eigenvalues of $\mathbf{P}$ by $1= \vert \lambda_{1} \vert > \vert \lambda_{2} \vert \geq\cdots\geq \vert \lambda_{n} \vert$, and also its matrix notation $\pi\mathbf{P}=\pi$ is in the form $v^{\top}A = v^{\top}\lambda$. Thus, the left eigenvector $q_{1}$ of $q_{1}^{\top}\mathbf{P} = q^{\top}_{1}\lambda_{1}$ is a steady state, and the rate of convergence of $\pi(t)$ is in the order of $\lambda_2/\lambda_1$. A limiting distribution $\pi(1)\mathbf{P}^{\infty}=\pi$ of a time-homogeneous Markov chain, if exists, is always unique.

II

We introduce a few jargons to read the formal statements describing behaviours of a Markov chain and the conditions under which a state pmf becomes steady. A state $j$ is accessible from a state $i$, denoted by $i \rightarrow j$, if $p_{ij}(s) > 0$ for some $s\geq{1}$. Whereas, $i$ is absorbing if $p_{ij}(s) = 0$ for some $s\geq{1}$ and all $j \in S$. If both $i \rightarrow j$ and $j \rightarrow i$, and thus $i \leftrightarrow j$, we say $i$ and $j$ communicate with each other (#3) and belong to the same class. Intuitively, a Markov chain will be consisting of one or more disjoint communication classes, and a system with a single class is said to be irreducible in which all states in the chain communicate with each other.

Suppose $f_{i} = P(X_{k} = i \;\text{for some}\; k>1 \,\vert\, X_{1}=i)$ is a probability that a Markov chain departed from $i$ returns to $i$ at least once, then $i$ is recurrent if $f_{i} = 1$, and transient if $f_{i} < 1$. That is, a recurrent state occurs infinitly often, and a transient state occurs finitly often. We can easily assume that a probability of escaping a transient state $i$ follows $\operatorname{Bernoulli}(1-f_{i})$, and a number of returns to $i$ within a $k$-steps follows $\text{Geom}(1-f_{i})$. Let $p^{(k)}_{ii}$ be a probability that a Markov chain departed from $i$ returns to $i$ after $k$-step, then $i$ is recurrent if and only if $\Sigma_{k=2}^{\infty}p^{(k)}_{ii} = \infty$, and transient if and only if $\Sigma_{k=2}^{\infty}p^{(k)}_{ii} < \infty$, where $p^{(k)}_{ii}$ is ind. of an index $s$ on a conditioning.

Let $t_{ii}$ be an elapsed time of returning to $i$ from $i$ such that $\mu_{ii} = \operatorname{E}t_{ii}$ is a mean recurrence time. We call $i$ a positive recurrent if $\mu_{ii} < \infty$, and null recurrent if $\mu_{ii} = \infty$. If, in addition, $i$ is the initial departure and $\lbrace \tau_{ii}(s) \rbrace_{1 < s \leq k}$ is a set of i.i.d. elapsed times, then a proportion of time spent in $i$ until $s$-th returns will by $s/\Sigma_{l=1}^{s}\tau_{ii}(l)$ such that $s/\Sigma_{l=1}^{s}\tau_{ii}(l) \to 1/\mu_{ii} = \pi_i$ as $k\to\infty$ by the WLLN. A state $i$ has period $d = \gcd \lbrace k>1:p^{(k)}_{ii}>0 \rbrace$ (#4). We say $i$ is periodic if $d>1$, and aperiodic if $d=1$. All state in a class share the same $d$ as periodicity, recurrence, and transience are a class property. An irreducible Markov of $d=1$ is aperiodic.

III

A state $i$ is ergodic if it is aperiodic and positive recurrent. An irreducible Markov chain is ergodic if the states are ergodic. Moreover, an ergodic system comes if there exists $N$ such that any state $i$ is reachable from any other state $j$ within any step $n \leq N$. For example, a fully connected transition matrix (i.e. $p_{ij} > 0$ for all $i,j\in\mathbb{S}$) fulfils the sufficient condition with $N = 1$. A system with more than one state and just one out-going transition per state cannot be ergodic as it is either not irreducible or not aperiodic. Ergodicity shines because it provides a unique steady state which in general varies with an initial state pmf of a regular system.

A Markov chain is reversible with respect to $\pi$ if it holds the detailed balanced condition $\pi_{i}p_{ij} = \pi_{j}p_{ji}$ for all $i,j\in\mathbb{S}$. In particular, if $X$ is a general system and there exists $\pi$ such that the condition is held, then $\Sigma_{i}\pi_{i}p_{ij} = \Sigma_{i}\pi_{j}p_{ji} = \pi_{j}\Sigma_{i}p_{ji} = \pi_{j}$ and so $X$ holds the global balance condition $\pi = \pi{P}$. These yield a new sampling framework Markov chain Monte Carlo (MCMC) and overcome the limitation of the rejection sampling (5). The Metropolis algorithm is a popular instance which uses symmetric conditional candidate density functions to return (i.e. after a burn-in period) sequences of numbers which we hope can reflect a target density function.

MCMC under the Bayesian settings can generate observations from the posterior (to estimate parameters $\theta$ given data $\mathbf{x}$) (6). MCMC may suffer the curse of dimensionality whereby regions of higher probability tend to stretch and get lost in an increasing volume of space. A well-designed model that can learn all available information from the entire records may be more sophisticaed than a partly conditioned model. We naively assume the Markov property as computing the joint dist. of such model is impractical. The PageRank developed by Larry Page and Sergey Brin in 1996 can be understood as a Markov chain.

**

(#1) Many results for Markov chains can be extended to chains with an uncountable state space, so-called Harris chains. (#2) We may eigen-decompose a trainsion matrix: $\mathbf{P}^k = \mathbf{Q}\mathbf{\Lambda}^k\mathbf{Q}^{-1}$, so that a diagonal opeartion makes a matrix multiplication more efficient. (#3) If $i \rightarrow h$ and $h \rightarrow j$, then $i \rightarrow j$. (#4) That is, $p^{(k)}_{ii} = 0$ whenever $k$ % $d \neq 0$ for all $k > d$. (5) It merely uses ind. observations. (6) P Marjoram (2003) attempted the same without the use of likelihoods.

I gathered words solely for my own purposes without any intention to break the rigorosity of the subjects.
Well, I prefer eating corn in spiral .

Hikikomori

212. markov chain

Markov Chain

I

II

III

**