A Brief Introduction to Modern Time Series
Definition A time
series is a random function xt of an argument t in
a set T. In other words, a time series is a family of random variables
..., xt-1, xt, xt+1, ... corresponding
to all elements in the set T, where T is supposed to be a denumerable,
infinite set.
Definition An observed
time series {xt | t e To
Ì T} is regarded as a part of
one realization of a random function xt. An infinite
set of possible realizations which might have been observed is
called an ensemble.
To put things more rigorously, the time series (or random function)
is a real function x(w,t) of the two
variables w and t, where wÎW
and tÎT. If we fix the value
of w, we have a real function x(t|w)
of the time t, which is a realization of the time series. If we
fix the value of t, then we have a random variable x(w|t).
For a given point in time there is a probability distribution
over x. Thus a random function x(w,t)
can be regarded as either a family of random variables or as a
family of realizations.
Definition We define
the distribution function of the random variable w
given t0 as P{x(w|to)
£ x} = (x).
Similarly we can define the joint distribution for n random variables
The points which distinguish time series analysis from ordinary
statistical analyses are the following
(1) The dependency among observations at different chronological
points in time plays an essential role. In other words, the order
of observations is important. In ordinary statistical analysis
it is assumed that the observations are mutually independent.
(2) The domain of t is infinite.
(3) We have to make an inference from one realization. The realization
of the random variable can be observed only once at each point
in time. In multivariate analysis we have many observations on
a finite number of variables. This critical difference necessitates
the assumption of stationarity.
Definition The random
function xt is said to be strictly stationary if all
the finite dimensional distribution functions defining xt
remain the same even if the whole group of points t1,
t2, ..., tn is shifted along the time axis.
That is, if
for any integers t1, t2, ..., tn
and k. Graphically, one could picture the realization of a strictly
stationary series as having not only the same level in two different
intervals, but also the same distribution function, right down
to the parameters which define it.
The assumption of stationarity makes our lives simpler and less
costly. Without stationarity we would have to sample the process
frequently at each time point in order to build up a characterization
of the distribution functions in the earlier definition. Stationarity
means that we can confine our attention to a few of the simplest
numerical functions, i.e., the moments of the distributions. The
central moments are given by
Definition (i) The mean
value of the time series {xt} is
i.e., the first order moment.
(ii) The autocovariance function of {xt} is
i.e., the second moment about the mean. If t=s then you have the
variance of xt. We will useto
denote the autocovariance of a stationary series, where k denotes
the difference between t and s.
(iii) The autocorrelation function (ACF) of {xt}
is
We will useto denote the autocorrelation
of a stationary series, where k denotes the difference between
t and s.
(iv) The partial autocorrelation (PACF), fkk,
is the correlation between zt
and zt+k
after removing their mutual linear dependence on the intervening
variables zt+1,
zt+2,
..., zt+k-1.
One simple way to compute the partial autocorrelation between
zt and zt+k is to run the two regressions
then compute the correlation between the two residual vectors.
Or, after measuring the variables as deviations from their means,
the partial autocorrelation can be found as the LS regression
coefficient on zt in the model
10
where the dot over the variable indicates that it is measured
as a deviation from its mean.
(v) The Yule-Walker equations provide an important relationship
between the partial autocorrelations and the autocorrelations.
Multiply both sides of equation 10 by zt+k-j
and take expectations. This operation gives us the following difference
equation in the autocovariances
or, in terms of the autocorrelations
This seemingly simple representation is really a powerful result.
Namely, for j=1,2, ..., k we can write the full system of equations,
known as the Yule-Walker equations,
From linear algebra you know that the matrix of r's
is of full rank. Therefore it is possible to apply Cramer's rule
successively for k=1,2,... to solve the system for the partial
autocorrelations. The first three are
We have three important results on strictly stationary series.
First, if {xt} is strictly stationary and E{xt}2
< ¥ then
The implication is that we can use any finite realization of the
sequence to estimate the mean.
Second, if {xt} is strictly stationary and E{xt}2
< ¥ then
The implication is that the autocovariance depends only on the
difference between t and s, not their chronological point in time.
We could use any pair of intervals in the computation of the autocovariance
as long as the time between them was constant. And we can use
any finite realization of the data to estimate the autocovariances.
Thirdly, the autocorrelation function in the case of strict stationarity
is given by
The implication is that the autocorrelation depends only on the
difference between t and s as well, and again they can be estimated
by any finite realization of the data.
If our goal is to estimate parameters which are descriptive of
the possible realizations of the time series, then perhaps strict
stationarity is too restrictive. For example, if the mean and
covariances of xt are constant and independent of the
chronological point in time, then perhaps it is not important
to us that the distribution function be the same for different
time intervals.
Definition
A random function is stationary in the wide sense (or weakly stationary,
or stationary in Khinchin's sense, or covariance stationary) if
m1(t)
ºm and m11(t,s)
= .
Strict stationarity does not in itself imply weak stationarity.
Weak stationarity does not imply strict stationarity. Strict stationarity
with E{xt}2
< ¥ implies weak
stationarity.
Ergodic theorems are concerned with the question of the necessary
and sufficient conditions for making inference from a single realization
of a time series. Basically it boils down to assuming weak stationarity.
Theorem
If {xt} is weakly stationary with mean m
and covariance function , then
That is, for any given e > 0 and
h > 0 there exists some number To
such that
for all T > To, if and only if
This necessary and sufficient condition is that the autocovariances
die out, in which case the sample mean is a consistent estimator
for the population mean.
Corollary If {xt}
is weakly stationary with E{xt+kxt}2
< ¥ for any t, and E{xt+kxtxt+s+kxt+s}
is independent of t for any integer s, then
if and only if where
A consequence of the corollary is the assumption that xtxt+k
is weakly stationary. The Ergodic Theorem is no more than a law
of large numbers when the observations are correlated.
One might ask at this point about the practical implications of stationarity. The most common application of use of time series techniques is in modelling macroeconomic data, both theoretic and atheoretic. As an example of the former, one might have a multiplier- accelerator model. For the model to be stationary, the parameters must have certain values. A test of the model is then to collect the relevant data and estimate the parameters. If the estimates are not consistent with stationarity, then one must rethink either the theoretical model or the statisticla model, or both.
We now have enough machinery to begin to talk
about the modeling
of univariate time series data. There are four steps in the process.
1. building models from theoretical and/or experiential knowledge
2. identifying models based on the data (observed series)
3. fitting the models (estimating the parameters of the model)
4. checking the model
If in the fourth step we are not satisfied we return to step one.
The process is iterative until further checking and respecification
yields no further improvement in results. Diagrammatically
Definition
Some simple operations include the following:
The backshift operator Bxt = xt-1
The forward operator Fxt = xt+1
The difference operator Ñ
= 1 - B
Ñxt = xt
- xt-1
The difference operator behaves in a fashion consistent
with the constant in an infinite series. That is, its inverse
is the limit of an infinite sum. Namely, Ñ-1
= (1-B)-1 = 1/(1-B) = 1+B+B2+ ...
The integrate operator S = Ñ
-1
Since it is the inverse of the difference operator, the
integrate operator serves to construct the sum .
MODEL BUILDING
In this section we offer a brief review of the most common sort
of time series models. On the basis of one's knowledge of the
data generating process one picks a class of models for identification
and estimation from the possibilities which follow.
Autoregressive Models
Definition Suppose that
Ext = m is independent of
t. A model such as
with the characteristics is called the
autoregressive model of order p, AR(p).
Definition If a time
dependent variable (stochastic process) {xt} satisfiesthen
{xt} is said to satisfy the Markov property. On the
LHS the expectation is conditioned on the infinite history of
xt. On the RHS it is conditioned on only part of the
history. From the definitions, an AR(p) model is seen to satisfy
the Markov property.
Using the backshift operator we can write our AR model as
Theorem A necessary and
sufficient condition for the AR(p) model to be stationary is that
all of the roots of the polynomial
lie outside the unit circle.
Example 1
Consider the AR(1)
The only root of 1 - f1B
= 0 is B = 1/f1. The condition
for stationarity requires that .
Ifthen the observed series will appear
very frenetic. E.g., consider
in which the white noise term has a normal distribution with a
zero mean and a variance of one. The observations switch sign
with almost every observation.
If, on the other hand, then the observed
series will be much smoother.
E.g.
In this series an observation tends to be above 0 if its predecessor
was above zero.
The variance of et is se2
for all t. The variance of xt, when it has zero mean, is given
by
Since the series is stationary we can write .
Hence,
The autocovariance function of an AR(1) series is, supposing without
loss of generality m=0
To see what this looks like in terms of the AR parameters we will
make use of the fact that we can write xt as follows
Multiplying by xt-k and taking expectations
Note that the autocovariances die out as k grows. The autocorrelation
function is the autocovariance divided by the variance of the
white noise term. Or,.
Using the earlier Yule-Walker formulae for the partial autocorrelations
we have
For an AR(1) the autocorrelations die out exponentially and the
partial autocorrelations exhibit a spike at one lag and are zero
thereafter.
Example 2
Consider the AR(2) The associated
polynomial in the lag operator is
The roots could be found using the quadratic formula. The roots
are
Whenthe roots are real and as a consequence
the series will decline exponentially in response to a shock.
When
the roots are complex and the series
will appear as a damped sign wave.
The stationarity theorem imposes the following conditions on the
AR coefficients
The autocovariance for an AR(2) process, with zero mean, is
Dividing through by the variance of xt gives the autocorrelation
function
Since
we can write
Similarly for the second and third autocorrelations
The other autocorrelations are solved for recursively. Their pattern
is governed by the roots of the second order linear difference
equation
If the roots are real then the autocorrelations will decline exponentially.
When the roots are complex the autocorrelations will appear as
a damped sine wave.
Using the Yule-Walker equations, the partial autocorrelations
are
Again, the autocorrelations die out slowly. The partial autocorrelation
on the other hand is quite distinctive. It has spikes at one and
two lags and is zero thereafter.
Theorem If xt
is a stationary AR(p) process then it can be equivalently written
as a linear filter model. That is, the polynomial in the
backshift operator can be inverted and the AR(p) written as a
moving average of infinite order instead.
Example
Suppose zt is an AR(1) process with zero mean;
. What is true for the current period
must also be true for prior periods. Thus by recursive substitution
we can write
Square both sides and take expectations
the right hand side vanishes as k ®
¥ since ½f½
< 1. Therefore the sum converges to
zt in quadratic mean. We can rewrite the AR(p) model
as a linear filter
that we know to be
stationary.
The Autocorrelation Function and Partial
Autocorrelation Generally
Suppose that a stationary series zt with mean
zero is known to be autoregressive. The autocorrelation function
of an AR(p) is found by taking expectations of
and dividing through by the variance of zt
This tells us thatrk
is a linear combination of the previous autocorrelations. We can
use this in applying Cramer's rule to (i) in solving for fkk.
In particular we can see that this linear dependence will cause
fkk = 0 for k > p. This
distinctive feature of autoregressive series will be very useful
when it comes to identification of an unknown series.
Moving Average Models
Consider a dynamic model in which the series of interest
depends only on some part of the history of the white noise term.
Diagrammatically this might be represented as
Definition Suppose at
is an uncorrelated sequence of i.i.d. random variables with zero
mean and finite variance. Then a moving average process of order
q, MA(q), is given by
Theorem: A moving average
process is always stationary.
Proof: Rather than start with a general proof we will do
it for a specific case. Suppose that zt is MA(1). Then
. Of course, at has zero mean
and finite variance. The mean of zt is always zero.
The autocovariances will be given by
You can see that the mean of the random variable does not depend
on time in any way. You can also see that the autocovariance depends
only on the offset s, not on where in the series we start. We
can prove the same result more generally by starting with ,
which has the alternate moving average representation
.
Consider first the variance of zt.
By recursive substitution you can show that this is equal to
The sum we know to be a convergent series so the variance is finite
and is independent of time. The covariances are, for example,
You can also see that the auto covariances depend only on the
relative points in time, not the chronological point in time.
Our conclusion from all this is that an MA(¥)
process is stationary.
For the general MA(q) process the autocorrelation function is
given by
The partial autocorrelation function will die out smoothly. You
can see this by inverting the process to get an AR( ) process.
Mixed Autoregressive - Moving Average
Models
Definition Suppose at
is an uncorrelated sequence of i.i.d. random variables with zero
mean and finite variance. Then an autoregressive, moving average
process of order (p,q), ARMA(p,q), is given by
The roots of the autoregressive operator must all lie outside
the unit circle. The number of unknowns is p+q+2. The p and q
are obvious. The 2 includes the level of the process, m,
and the variance of the white noise term, sa2.
Suppose that we combine our AR and MA representations so that
the model is
(1)
and the coefficients are normalized so that bo = 1.
Then this representation is called an ARMA(p,q) if the roots of
(1) all lie outside the unit circle.
Suppose that the yt are measured as deviations from
the mean so we can drop ao, then the autocovariance
function is derived from
if j>q then the MA terms drop out in expectation to give
That is, the autocovariance function looks like a typical AR for
lags after q; they die out smoothly after q, but we cannot say
how 1,2,
,q will look.
We can also examine the PACF for this class of model. The model
can be written as
We can write this as a MA() process
which suggests that the PACF's die out slowly. With some arithmetic
we could show that this happens only after the first p spikes
contributed by the AR part.
Empirical Law In actuality, a stationary time series may
well be represented by p £ 2 and
q £ 2. If your business is to
provide a good approximation to reality and goodness of fit is
your criterion then a prodigal model is preferred. If your interest
is predictive efficiency then the parsimonious model is preferred.
Autoregressive Integrate Moving Average
Models
MA filter AR filter Integrate filter
Sometimes the process, or series, we are trying to model is not
stationary in levels. But it might be stationary in, say, first
differences. That is, in its original form the autocovariances
for the series might not be independent of the chronological point
in time. However, if we construct a new series which is the first
differences of the original series, this new series satisfies
the definition of stationarity. This is often the case with economic
data which is highly trended.
Definition Suppose that
zt is not stationary, but zt - zt-1
satisfies the definition of stationarity. Also, at, the white
noise term has finite mean and variance. We can write the model
as
This is named an ARIMA(p,d,q) model. p identifies the order of
the AR operator, d identifies the power on Ñ,
q identifies the order of the MA operator.
If the roots of f(B)Ñ
lie outside the unit circle then we can rewrite the ARIMA(p,d,q)
as a linear filter. I.e., it can be written as an MA(¥).
We reserve the discussion of the detection of unit roots for another
part of the lecture notes.
Transfer Function Models
Consider a dynamic system with xt as an input series
and yt as an output series. Diagrammatically we have
These models are a discrete analogy of linear differential equations.
We suppose the following relation
where b indicates a pure delay. Recall that Ñ
= (1-B). Making this substitution the model can be written
If the coefficient polynomial on yt can be inverted
then the model can be written as
V(B) is known as the impulse response function. We will come across
this terminology again in our later discussion of vector autoregressive
, cointegration and error correction models.
MODEL IDENTIFICATION
Having decided on a class of models, one must now identify
the order of the processes generating the data. That is, one must
make best guesses as to the order of the AR and MA processes driving
the stationary series. A stationary series is completely characterized
by its mean and autocovariances. For analytical reasons we usually
work with the autocorrelations and partial autocorrelations. These
two basic tools have unique patterns for stationary AR and MA
processes. One could compute sample estimates of the autocorrelation
and partial autocorrelation functions and compare them to tabulated
results for standard models.
Definitions
Sample Mean
Sample Autocovariance Function
Sample Autocorrelation Function
The sample partial autocorrelations will be
Using the autocorrelations and partial autocorrelations is quite
simple in principle. Suppose that we have a series zt,
with zero mean, which is AR(1). If we were to run the regression
of zt+2 on zt+1 and zt we would
expect to find that the coefficient on zt was not different
from zero since this partial autocorrelation ought to be zero.
On the other hand, the autocorrelations for this series ought
to be decreasing exponentially for increasing lags (see the AR(1)
example above).
Suppose that the series is really a moving average. The autocorrelation
should be zero everywhere but at the first lag. The partial autocorrelation
ought to die out exponentially.
Even from our very cursory romp through the basics of time series
analysis it is apparent that there is a duality between AR and
MA processes. This duality may be summarized in the following
table.
| AR(p) | MA(q) | ARMA(p,q) | |
| The stationary AR(p) f(B)zt = at can be represented as an MA of infinite
order.
| The invertible MA(q) zt = q (B)at can be represented as an infinite order AR.
| Can be represented as either an AR() or MA(), conditional on roots. | |
| Autocorrelation | rk ® 0 but r ¹ 0 from f(B)=0. That is, the spikes in the correlogram decrease exp
onentially.
| rk = 0 for k ³ q+1.
There are spikes until lag q. | Autocorrelations die out smoothly after q lags. |
| Partial autocorrelation | fkk = 0 for k ³ p+1.
There are spikes in the correlogram at lags 1 through p. | fkk ® 0 but fkk ¹ 0 from q(B)=0. The spikes die out exponentially
| PACF's die out smoothly after p lags |
| Stationarity | All roots of f(B)=0 lie outside the unit circle. | No restriction. | |
| Invertibility | No restriction. | All roots of q (B)=0 lie outside the unit circle. |
Diagrams of the theoretical correlograms for AR and MA processes
can be found in Hoff(1983, Pps. 59-71) or Wei(1990, Pp. 32-66).
The correlograms for a stationary ARMA are somewhat more problematic.
The autocorrelations show an irregular pattern of spikes through
lag q, then the remaining pattern is the same as that for an AR
process. The partial autocorrelations are irregular through lag
p, then the PACs decay exponentially as in an MA process.