## Stat 295, 3/17/09

Is my face red! I apologize to each and every one of you for being late to class today.

Today we introduced the idea of Markov chains. Recall that a stochastic process is a sequence $\{X_1,X_2,...,X_n,...\}$ of random variables on some state space $S$ indexed by a discrete index set (e.g., the integers). A Markov chain is a stochastic process with the property that $P(X_{n+1}|X_1,X_2,...,X_n)=P(X_{n+1}|X_n)$, that is, each each state depends only in the immediately preceding state and is independent of all of the states before that; the Markov chain has no “memory” of exactly how it got to the last state.

A Markov chain is stationary if the transition probabilities are independent of the index, that is, if $P(X_{n+1}=j|X_n=k)$ does not depend on the value of $n$. A stationary Markov chain is characterized by a transition matrix $M_j^k=P(X_{n+1}=j|X_n=k)$, also known as the Markov matrix. The entries of the Markov matrix satisfy $0\le M_j^k \le 1$ and, since every state $k$ must go somewhere, probability must be conserved and $\sum_j M_j^k=1$.

A stationary distribution of a Markov chain is a distribution $\pi$ over the states such that $\pi_j \geq 0$ for all $j$, and $\pi = M \pi$.

We remarked that we are going to study Markov chains in a somewhat “backwards” way from the way they are usually studied. Usually, we are given the Markov matrix $M$ and asked to determine the stationary distribution $\pi$. But we know the stationary distribution we’re interested in: It’s usually our desired posterior distribution. We want to generate a Markov matrix with the property that our posterior distribution is the stationary distribution corresponding to that Markov matrix.

The nice thing about what we’re going to do is that we don’t need to know the normalizing constant for the posterior distribution (hard to compute). All we need is the joint distribution (easy to compute).

Basically the only way that this is done in practice is to use the principle of detailed balancing, which means that transitions between any pair of states must maintain the equilibrium distribution, i.e., if we have a Markov chain $(M,\phi)$ with transition matrix $M$ and (known to us) stationary distribution $\phi$, then $M_j^i\phi_i=M_i^j\phi_j$. Obviously if the equilibrium distribution is conserved for each pair of states, it will be conserved overall.

We showed that if $(M,\phi)$ satisfies the detailed balance condition, then $M\phi=\phi$ and $\phi$ is the stationary distribution of $M$.

We skipped Chart 36, on sufficient conditions for the $M$ that we so produced to converge to the desired distribution. I noted that these conditions aren’t hard to satisfy and in practice are invariably satisfied if we use the the standard methods to construct the chain.

I outlined the basic idea (which we will continue on Thursday). As with SIR, importance sampling, and rejection sampling, we will require a proposal distribution $q(j|i)$ which says that if we are in state $i$, what the probability that we will propose to go to state $j$ is. We will prescribe rules that use the proposal distribution and the target distribution to calculate a quantity $\alpha_j^i$ such that $M_j^i=q(j|i)\alpha^i_j$ is the required Markov matrix.

The proposal distribution is quite arbitrary, and so there are literally infinitely many ways that we can construct our transition matrix. Some are more efficient than others, and we will study ways to choose a sampling scheme that is efficient.

…to be continued on Thursday.