33. AR(1) Processes#

33.1. Overview#

In this lecture we are going to study a very simple class of stochastic models called AR(1) processes.

These simple models are used again and again in economic research to represent the dynamics of series such as

  • labor income

  • dividends

  • productivity, etc.

We are going to study AR(1) processes partly because they are useful and partly because they help us understand important concepts.

Let’s start with some imports:

import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5)  #set default figure size

33.2. The AR(1) model#

The AR(1) model (autoregressive model of order 1) takes the form

(33.1)#Xt+1=aXt+b+cWt+1X_{t+1} = a X_t + b + c W_{t+1}

where a,b,ca, b, c are scalar-valued parameters

(Equation (33.1) is sometimes called a stochastic difference equation.)

Example 33.1

For example, XtX_t might be

  • the log of labor income for a given household, or

  • the log of money demand in a given economy.

In either case, (33.1) shows that the current value evolves as a linear function of the previous value and an IID shock Wt+1W_{t+1}.

(We use t+1t+1 for the subscript of Wt+1W_{t+1} because this random variable is not observed at time tt.)

The specification (33.1) generates a time series {Xt}\{ X_t\} as soon as we specify an initial condition X0X_0.

To make things even simpler, we will assume that

  • the process {Wt}\{ W_t \} is IID and standard normal,

  • the initial condition X0X_0 is drawn from the normal distribution N(μ0,v0)N(\mu_0, v_0) and

  • the initial condition X0X_0 is independent of {Wt}\{ W_t \}.

33.2.1. Moving average representation#

Iterating backwards from time tt, we obtain

Xt=aXt1+b+cWt=a2Xt2+ab+acWt1+b+cWt=a3Xt3+a2b+a2cWt2+b+cWt= X_t = a X_{t-1} + b + c W_t = a^2 X_{t-2} + a b + a c W_{t-1} + b + c W_t = a^3 X_{t-3} + a^2 b + a^2 c W_{t-2} + b + c W_t = \cdots

If we work all the way back to time zero, we get

(33.2)#Xt=atX0+bj=0t1aj+cj=0t1ajWtjX_t = a^t X_0 + b \sum_{j=0}^{t-1} a^j + c \sum_{j=0}^{t-1} a^j W_{t-j}

Equation (33.2) shows that XtX_t is a well defined random variable, the value of which depends on

  • the parameters,

  • the initial condition X0X_0 and

  • the shocks W1,WtW_1, \ldots W_t from time t=1t=1 to the present.

Throughout, the symbol ψt\psi_t will be used to refer to the density of this random variable XtX_t.

33.2.2. Distribution dynamics#

One of the nice things about this model is that it’s so easy to trace out the sequence of distributions {ψt}\{ \psi_t \} corresponding to the time series {Xt}\{ X_t\}.

To see this, we first note that XtX_t is normally distributed for each tt.

This is immediate from (33.2), since linear combinations of independent normal random variables are normal.

Given that XtX_t is normally distributed, we will know the full distribution ψt\psi_t if we can pin down its first two moments.

Let μt\mu_t and vtv_t denote the mean and variance of XtX_t respectively.

We can pin down these values from (33.2) or we can use the following recursive expressions:

(33.3)#μt+1=aμt+bandvt+1=a2vt+c2\mu_{t+1} = a \mu_t + b \quad \text{and} \quad v_{t+1} = a^2 v_t + c^2

These expressions are obtained from (33.1) by taking, respectively, the expectation and variance of both sides of the equality.

In calculating the second expression, we are using the fact that XtX_t and Wt+1W_{t+1} are independent.

(This follows from our assumptions and (33.2).)

Given the dynamics in (33.2) and initial conditions μ0,v0\mu_0, v_0, we obtain μt,vt\mu_t, v_t and hence

ψt=N(μt,vt) \psi_t = N(\mu_t, v_t)

The following code uses these facts to track the sequence of marginal distributions {ψt}\{ \psi_t \}.

The parameters are

a, b, c = 0.9, 0.1, 0.5

mu, v = -3.0, 0.6  # initial conditions mu_0, v_0

Here’s the sequence of distributions:

from scipy.stats import norm

sim_length = 10
grid = np.linspace(-5, 7, 120)

fig, ax = plt.subplots()

for t in range(sim_length):
    mu = a * mu + b
    v = a**2 * v + c**2
    ax.plot(grid, norm.pdf(grid, loc=mu, scale=np.sqrt(v)),
            label=fr"$\psi_{t}$",
            alpha=0.7)

ax.legend(bbox_to_anchor=[1.05,1],loc=2,borderaxespad=1)

plt.show()
_images/8db60a2b124d2473ca0d9b88fef705a33356c88304fdfcaad3c1a55309459aa7.png

33.3. Stationarity and asymptotic stability#

When we use models to study the real world, it is generally preferable that our models have clear, sharp predictions.

For dynamic problems, sharp predictions are related to stability.

For example, if a dynamic model predicts that inflation always converges to some kind of steady state, then the model gives a sharp prediction.

(The prediction might be wrong, but even this is helpful, because we can judge the quality of the model.)

Notice that, in the figure above, the sequence {ψt}\{ \psi_t \} seems to be converging to a limiting distribution, suggesting some kind of stability.

This is even clearer if we project forward further into the future:

def plot_density_seq(ax, mu_0=-3.0, v_0=0.6, sim_length=40):
    mu, v = mu_0, v_0
    for t in range(sim_length):
        mu = a * mu + b
        v = a**2 * v + c**2
        ax.plot(grid,
                norm.pdf(grid, loc=mu, scale=np.sqrt(v)),
                alpha=0.5)

fig, ax = plt.subplots()
plot_density_seq(ax)
plt.show()
_images/6295177d66ec5822fb42fd749821807a1d291d69bd1cf5782672693cbfcc49a9.png

Moreover, the limit does not depend on the initial condition.

For example, this alternative density sequence also converges to the same limit.

fig, ax = plt.subplots()
plot_density_seq(ax, mu_0=4.0)
plt.show()
_images/50807281e41cf1cc9ca00c2b850c770500e6e00601e8aca34f5ef22e25a1c40f.png

In fact it’s easy to show that such convergence will occur, regardless of the initial condition, whenever a<1|a| < 1.

To see this, we just have to look at the dynamics of the first two moments, as given in (33.3).

When a<1|a| < 1, these sequences converge to the respective limits

(33.4)#μ:=b1aandv=c21a2\mu^* := \frac{b}{1-a} \quad \text{and} \quad v^* = \frac{c^2}{1 - a^2}

(See our lecture on one dimensional dynamics for background on deterministic convergence.)

Hence

(33.5)#ψtψ=N(μ,v)as t\psi_t \to \psi^* = N(\mu^*, v^*) \quad \text{as } t \to \infty

We can confirm this is valid for the sequence above using the following code.

fig, ax = plt.subplots()
plot_density_seq(ax, mu_0=4.0)

mu_star = b / (1 - a)
std_star = np.sqrt(c**2 / (1 - a**2))  # square root of v_star
psi_star = norm.pdf(grid, loc=mu_star, scale=std_star)
ax.plot(grid, psi_star, 'k-', lw=2, label=r"$\psi^*$")
ax.legend()

plt.show()
_images/00ec94e4259ade6e30e501a349175e925646cfaa308db64935a10c3b64b47d05.png

As claimed, the sequence {ψt}\{ \psi_t \} converges to ψ\psi^*.

We see that, at least for these parameters, the AR(1) model has strong stability properties.

33.3.1. Stationary distributions#

Let’s try to better understand the limiting distribution ψ\psi^*.

A stationary distribution is a distribution that is a “fixed point” of the update rule for the AR(1) process.

In other words, if ψt\psi_t is stationary, then ψt+j=ψt\psi_{t+j} = \psi_t for all jj in N\mathbb N.

A different way to put this, specialized to the current setting, is as follows: a density ψ\psi on R\mathbb R is stationary for the AR(1) process if

Xtψ    aXt+b+cWt+1ψ X_t \sim \psi \quad \implies \quad a X_t + b + c W_{t+1} \sim \psi

The distribution ψ\psi^* in (33.5) has this property — checking this is an exercise.

(Of course, we are assuming that a<1|a| < 1 so that ψ\psi^* is well defined.)

In fact, it can be shown that no other distribution on R\mathbb R has this property.

Thus, when a<1|a| < 1, the AR(1) model has exactly one stationary density and that density is given by ψ\psi^*.

33.4. Ergodicity#

The concept of ergodicity is used in different ways by different authors.

One way to understand it in the present setting is that a version of the law of large numbers is valid for {Xt}\{X_t\}, even though it is not IID.

In particular, averages over time series converge to expectations under the stationary distribution.

Indeed, it can be proved that, whenever a<1|a| < 1, we have

(33.6)#1mt=1mh(Xt)h(x)ψ(x)dxas m\frac{1}{m} \sum_{t = 1}^m h(X_t) \to \int h(x) \psi^*(x) dx \quad \text{as } m \to \infty

whenever the integral on the right hand side is finite and well defined.

Notes:

Example 33.2

If we consider the identity function h(x)=xh(x) = x, we get

1mt=1mXtxψ(x)dxas m \frac{1}{m} \sum_{t = 1}^m X_t \to \int x \psi^*(x) dx \quad \text{as } m \to \infty

In other words, the time series sample mean converges to the mean of the stationary distribution.

Ergodicity is important for a range of reasons.

For example, (33.6) can be used to test theory.

In this equation, we can use observed data to evaluate the left hand side of (33.6).

And we can use a theoretical AR(1) model to calculate the right hand side.

If 1mt=1mXt\frac{1}{m} \sum_{t = 1}^m X_t is not close to ψ(x)\psi^(x), even for many observations, then our theory seems to be incorrect and we will need to revise it.

33.5. Exercises#

Exercise 33.1

Let kk be a natural number.

The kk-th central moment of a random variable is defined as

Mk:=E[(XEX)k] M_k := \mathbb E [ (X - \mathbb E X )^k ]

When that random variable is N(μ,σ2)N(\mu, \sigma^2), it is known that

Mk={0 if k is oddσk(k1)!! if k is even M_k = \begin{cases} 0 & \text{ if } k \text{ is odd} \\ \sigma^k (k-1)!! & \text{ if } k \text{ is even} \end{cases}

Here n!!n!! is the double factorial.

According to (33.6), we should have, for any kNk \in \mathbb N,

1mt=1m(Xtμ)kMk \frac{1}{m} \sum_{t = 1}^m (X_t - \mu^* )^k \approx M_k

when mm is large.

Confirm this by simulation at a range of kk using the default parameters from the lecture.

Exercise 33.2

Write your own version of a one dimensional kernel density estimator, which estimates a density from a sample.

Write it as a class that takes the data XX and bandwidth hh when initialized and provides a method ff such that

f(x)=1hni=1nK(xXih) f(x) = \frac{1}{hn} \sum_{i=1}^n K \left( \frac{x-X_i}{h} \right)

For KK use the Gaussian kernel (KK is the standard normal density).

Write the class so that the bandwidth defaults to Silverman’s rule (see the “rule of thumb” discussion on this page). Test the class you have written by going through the steps

  1. simulate data X1,,XnX_1, \ldots, X_n from distribution ϕ\phi

  2. plot the kernel density estimate over a suitable range

  3. plot the density of ϕ\phi on the same figure

for distributions ϕ\phi of the following types

Use n=500n=500.

Make a comment on your results. (Do you think this is a good estimator of these distributions?)

Exercise 33.3

In the lecture we discussed the following fact: for the AR(1)AR(1) process

Xt+1=aXt+b+cWt+1 X_{t+1} = a X_t + b + c W_{t+1}

with {Wt}\{ W_t \} iid and standard normal,

ψt=N(μ,s2)    ψt+1=N(aμ+b,a2s2+c2) \psi_t = N(\mu, s^2) \implies \psi_{t+1} = N(a \mu + b, a^2 s^2 + c^2)

Confirm this, at least approximately, by simulation. Let

  • a=0.9a = 0.9

  • b=0.0b = 0.0

  • c=0.1c = 0.1

  • μ=3\mu = -3

  • s=0.2s = 0.2

First, plot ψt\psi_t and ψt+1\psi_{t+1} using the true distributions described above.

Second, plot ψt+1\psi_{t+1} on the same figure (in a different color) as follows:

  1. Generate nn draws of XtX_t from the N(μ,s2)N(\mu, s^2) distribution

  2. Update them all using the rule Xt+1=aXt+b+cWt+1X_{t+1} = a X_t + b + c W_{t+1}

  3. Use the resulting sample of Xt+1X_{t+1} values to produce a density estimate via kernel density estimation.

Try this for n=2000n=2000 and confirm that the simulation based estimate of ψt+1\psi_{t+1} does converge to the theoretical distribution.