This volume develops the analysis and concepts of unit root testing and estimation, providing an accessible and critical account of recent advances and extensions of the basic framework. It provides practical guidance through examples and simulation, combined with a firm theoiretical base from which to evaluate competing approaches. This second volume of Unit Root Tests in Time Series will benefit readers who have an understanding of the basic concepts of unit root testing, such as the widely used Dickey-Fuller test, and can be read independently of volume one.

It includes developments such as nonparametric approaches to unit root testing, testing for fractional integration, nonlinear models including smooth transition and discrete change models and structural breaks with known or unknown break points. Each technique is illustrated with an empirical example showing theory at work in the context of real economic issues such as the prices of assets, world oil production and measures of economic activity.

More Business. This concept of convergence extends that of the simpler case in which the n-th term in a sequence of random variables converges in some well-defined sense either to a random variable or a constant. Chapter 5 starts with the basic ideas underlying random walks, which motivate their use as prototypical stochastic processes in economics. Random walks also turn out to be at the heart of Brownian motion, which is introduced in Chapter 6. The differentiation and integration of stochastic processes involving Brownian motion is considered in xxiv Preface Chapter 7.

Strictly, Brownian motion is nowhere differentiable, but the reader may have seen expressions that look like differentials or derivatives being applied to Brownian motion: what, therefore, is meant by dB t where B t is Brownian motion? Despite the extent of research on the topic of unit root testing, even some 30 years after the seminal contributions by Dickey and Fuller see, for example, Fuller, , and Dickey and Fuller, , and hundreds of articles on theory and applications, there are still unresolved issues or areas where practical difficulties may arise; Chapter 8 concludes with a brief summary of some of these areas.

My sincere thanks go to Lorna Eames, my secretary at the University of Reading, who has always responded very positively to my many requests for assistance in preparing the manuscript for this book. A website has been set up to support this book. It gives access to more examples, both numerical and theoretical, a number of the programs that have been used to draw the graphs, estimate the illustrative models and the data that has been used.

- Unit Root Tests in Time Series Volume 2 by Kerry Patterson | Waterstones?
- THE KOREAN WAR: RESTORING THE BALANCE!
- FOR KINDLE Unit Root Tests in Time Series Volume 2: Extensions and De…!
- [PDF] Unit Root Tests in Time Series Volume 2: Extensions and Developments (Palgrave Texts in;
- Theres More Than One Way to Get to Cleveland: 10 Lifestyles of Recovery That Lead to Freedom From Addiction;

Additionally, if you have comments on any aspects of the book please contact me at my email address given below. Essential concepts include the formalisation of the intuitive concept of probability, the related concepts of the sample space, the probability space and random variable, and the development of these to cases such as uncountable sample spaces, which are critical to stochastic processes.

The reader is likely to have some familiarity with the basic concepts in this chapter, perhaps in the context of countably discrete random variables and distributions such as the binomial distribution and for continuous random variables that are normally distributed. This chapter is organised as follows. The idea of a random variable, in contrast to a deterministic variable, underlies this chapter and Section 1.

The triple of the probability space, that is the sample space, a field and a probability measure, is developed in Section 1. A random vector, that is a collection of random variables, is considered in Section 1. Section 1. The idea of constructing a random variable by conditioning an existing random variable on an event in the sample space of another random variable is critical to the concepts of dependence and independence and is considered in Section 1. The final substantive Section, 1.

Describing some of these will help to indicate the essence of a random experiment and related consequences that we model as random variables. In order to reach a destination for a particular time, we may catch a bus or drive. The bus will usually have a stated arrival time at the bus stop, but its actual arrival time will vary, and so a number of outcomes are possible; equally its arrival at the destination will have a stated time, but a variety of possible outcomes will actually transpire.

Driving to the destination will involve a series of possible outcomes depending, for example, on the traffic and weather conditions. A fruitful line of examples to illustrate probability concepts relate to gambling in some form, for example betting on the outcome of the toss of a coin, the throw of a dice or the spin of a roulette wheel. An essential characteristic is that the experiment has a number of possible outcomes, in contrast to the certain outcome of a deterministic experiment.

We may also link or map the outcomes from the random experiment by way of a function to a variable. However, one could argue that repeating the experiment and recording the proportion of heads should in the limit determine the probability of heads. Whilst this relative frequency frequentist approach is widely adopted it is not the only view of what probability is and how it should be interpreted; see Porter for an account of the development of probability in the nineteenth century and for an account of subjective probability see Jeffrey and Wright and Ayton Whatever view is taken on the quantification of uncertainty, the measures so defined must satisfy properties P1—P3 detailed below in Section 1.

Notation: the basic random variables of this chapter are denoted either as x or y, although other random variables based on these, such as the Introduction to Probability and Random Variables 3 sum, may also be defined. The distinction between the use of x and y is as follows: in the former case there is no presumption that time is necessarily an important aspect of the random variable, although it may occur in a particular interpretation or example; on the other hand, the use of y implies that time is an essential dimension of the random variable.

Some preliminary notation is first established in Section 1. Typically this will be denoted by an upper case letter such as A or B, or Aj. The power set: the power set is the set of all possible subsets, it is denoted, 2 , and may be finite or infinite.

- The Sons of New Years Day.
- Water Power (Energy Today);
- Fish Florida Saltwater: Better Than Luck—The Foolproof Guide to Florida Saltwater Fishing.
- Davidson | AMSE.
- Bibliographic Information.
- Rosario+Vampire: Season II, Vol. 12: Awakening.
- Holdings information at the University of Leicester Library!
- The Vampires Grave and Other Stories.
- Browse more videos.

In the example of the previous paragraph, 1 is the sample space of the basic random experiment of tossing a coin; however, it is often Introduction to Probability and Random Variables 5 more useful to map this into a random variable, typically denoted x, or x1, where the subscript indicates the first of a possible multiplicity of random variables. Indeed, the term random variable is a misnomer, with random function being a more accurate description; nevertheless this usage is firmly established. The next step, having identified the sample space of the basic experiment, is to be able to associate probabilities with the elements, collections of elements, or functions of these as in the case of random variables, in the sample space.

Whilst this can be done quite intuitively in some simple cases, the kinds of problems that can be solved in this way are rather limited. For example, let a coin be fair, and the experiment is to toss the coin once; we wish to assign a probability, or measure, that accords with our intuition that it should be a non-negative number between 0 and 1. The probabilities sum to one. This suggests that we will need to consider rather carefully experiments where the sample space has an infinite number of outcomes.

The generic notation for a field, also known as an algebra, is F. What we have in mind is that these subsets will be the ones to which we seek to assign a probability measure. At an introductory level, in the case of a random experiment with a finite number of outcomes, this concept is implicit rather than explicit; the emphasis is usually on listing all the possible elementary events and then assigning probabilities, P. A probability measure, generically denoted P, is a function defined on the event space of F. Provided the conditions P1, P2 and P3 are met, the resulting function P is a probability measure.

Usually, considerations such as the limiting relative frequency of events will motivate the probabilities assigned. This is captured in condition F3 and the associated condition P3. However, we will need to consider sample spaces that comprise an infinite number of outcomes. The case to consider first is where the number of outcomes is countably infinite or denumerable , in that the outcomes can be mapped into a one-to-one correspondence with the integers.

What is required is an extension of conditions F3 and P3, to allow infinite unions of events. The condition and its extension to the probability measure are as follows. Introduction to Probability and Random Variables 9 F4. There is an equivalent extension of P3 as follows. See, for example, Billingsley , chapter 1. A set that is not countable is said to be uncountable; in an intuitive sense it is just too big to be countable — it is nondenumerable. The general case is considered in Section 1.

This approach suggests that the sets that we would like to assign probabilities to, that is define a probability measure over, be quite general and certainly include open and closed intervals, half-open intervals and, for completeness, ought to be able to deal with the case of singletons. There are a number of equivalent ways of doing this, equivalent because one can be generated from the other by a countable number of the set operations of union, intersection and complementation. Introduction to Probability and Random Variables 11 1. The question then is whether there are any complications or qualifications that arise when this is the case?

The requirement is that of measurability defined as follows. Intuitively, we must be able to map the event s of interest in x back to the field of the original sample space. A brief outline of the meaning of image and pre-image is provided in the glossary. The distribution function also referred to as the cumulative distribution function, abbreviated to cdf, uniquely characterises the probability measure. The properties of a distribution function are: D1. If a density function exists for F X then it must have the following properties: f1.

Example 1. The density function, f X , assigns the same positive value to all elements in the interval or equal sub-intervals. To make sense as a density function, the integral of all such points must be unity. The normal distribution is of such importance that it has its own notational 0. For example, interest may focus on whether the prices of two financial assets are related, suggesting we consider two random variables x1 and x2, and the relationship between them. This will often reflect the practical situation, but it is not essential.

By letting the index j take the index of time, x becomes a vector of a random variable at different points in time; such a case is distinguished throughout this book by reserving the notation y j or y t where time is of the essence. By extension, we seek a probability space for the vector of random variables.

A particular subset is a Borel set if it can be obtained by repeated, countable operations of union, intersection and complementation. Introduction to Probability and Random Variables 17 In fact, as the reader may already have noted, the assumption of independence defined in Section 1. The difference is in the aspects that we choose to emphasise. A description of the sample path would require a functional relationship rather than a single number.

We will often think of the index set T as comprising an infinite number of elements, even in the case of discrete-time processes, where N is countably infinite; in the case of a continuous-time stochastic process even if T is an finite interval of time, such as [0, 1], the interval is infinitely divisible. In either case, the collection of random variables in Y is infinite. We can now return to the question of what is special about a stochastic process, other than that it is a sequence of random variables.

To highlight the difference it is useful to consider a question that is typically considered for a sequence of random variables, in the general notation x1, Such an example occurs when the distribution of a test statistic has a degrees of freedom effect. In this case we interpret the sample space of interest as being that for each xj, rather than the sequence as a whole. In the case of a stochastic process, the sample space is the space of a sequence of length n or T in the case of a random variable with an inherent time dimension.

If we regard n tosses of a coin as taking place sequentially in time, then we have already encountered the sample space of a stochastic process in Section 1. This is why the appropriate space for a stochastic process is a function space: each sample path is a function not a single outcome. The distribution of interest is not the distribution of a single element, say yt, but the distribution of the complete sample paths, which is the distribution of the functions on time.

Thus, in terms of convergence, it is of limited interest to focus on the t-th or any particular element of the stochastic process. Replication of the process through simulation generates a distribution of sample paths associated with different realisations over the complete sample path and convergence is a now a question of the convergence of one process to another process; for example, the convergence of the random walk process, used in Chapter 5, to another process, in this case Brownian motion, considered in Chapter 6. Of interest in assessing convergence of a stochastic process are the finite-dimensional distributions, fidis; in the continuous-time case, these are the joint distributions of the n-dimensional vector y t1 , y t2 , Although it is not generally sufficient to establish convergence, at an intuitive level one can think of the distribution of the stochastic process Y as being the collection of the fidis for all possible choices of sequences of time, t1, t2, This becomes relevant when providing a meaning to the idea that one stochastic process converges to another; we return to this question in Chapter 4, Section 4.

The first of these is the expectation of a random variable, which accords with the common usage of the average or mean of a random variable; the second is the variance, which is one measure of the dispersion in the distribution of outcomes of a random variable; the third necessarily involves more than one random variable and relates to the covariance between random variables; and, finally, the correlation coefficient which is a scaled version of the covariance. A particular important case of the covariance and correlation between random variables occurs when in the case of two random variables, say, one random variable is a lag of the other.

This case is of such importance that whilst the basic 20 A Primer for Unit Root Testing concepts are introduced here, they are developed further in Chapter 2, Section 2. An example is the Poisson distribution function, which assigns mass at points in the set of nonnegative integers, see Section 3.

Also, in each case, the integral in the first line is the Lebesgue-Stieltjes integral, whereas in the second line it is a ordinary Reimann integral; for more on this distinction see Rao , especially Appendix 2A and Chapter 7, Sections 7. In a variation, used below in example 1. This direct way of computing the mean and variance of Bernoulli trials is cumbersome.

This section summarises some rules that apply to the expectation of a function of a random variable. Although similar considerations apply to obtaining the distribution and density of a function, it is frequently the case that the expectation is sufficient for the purpose. There is one case in Chapter 8 the half-normal where the distribution of a nonlinear function is needed, but that case can be dealt with intuitively. The reader may note that an extension of this rule was used implicitly in example 1.

Some rules for the expectation and variance of simple linear functions of random variables follow. The variance of Sn is given by: L4. Applying 1. We start with this case and then generalise the argument. In the case that x is a discrete random variable then the following should be familiar from introductory courses see also Equation 1. What is perhaps not apparent here is why the probability in 1. It turns out that this is an application of a theorem that greatly simplifies the evaluation of the expectation for nonlinear functions; for a formal statement of the theorem and proof see, for example, Ross The answers are, of course, the same.

In this case it is simple enough to obtain the pmf of z from the pmf of x, however, this is not always the case and it is in any case unnecessary. In some cases we can say something about E[g x ] from knowledge of g x. A convex function requires that the second derivative of g x with respect to x is positive; for example, for positive x, the slope of g x increases with x.

An example will illustrate the application of this inequality. There is no presumption here that the random variables have an index of time that is important to their definition. For example, in a manufacturing process, the two random variables x1 and x 2 may measure two dimensions of an engineered product. The simplest case to start with is that random variables are independent. The idea of stochastic independence of random variables captures the intuitive notion that the outcome of the random variable x1 does not affect the outcome of the random variable x 2, for all possible outcomes of x1 and x 2.

It is thus rather more than just that two events are independent, but that any pairwise comparison of events that could occur for each of the random variables, leads to independence. An example presented below in Table 1. This notation is shorthand for much more. The definition of independence 1. For this reason, some authors emphasise this point by referring to the global independence of events for random variables.

It is against this background that simple statements such as 1. The subtle difference is that whilst 1. The conditional expectation of x2 given x1 follows using the conditional probabilities, but note that there is one expectation for each outcome value of x1. Substituting the right-handside of 1. The conditional expectation is a random variable unlike the ordinary expectation; the values it takes depend on the conditioning event. The equality follows because the conditional expectation is evaluated over all possible values of the conditioning event; see Q1.

Other moments, such as the variance, can be conditioned on events in the space of the random variables. This property becomes particularly important in a time series context when x 2 is a lag of x1, in which case the covariance and correlation between these two variables are referred to as the autocovariance and autocorrelation; for example, if x k is the k-th lag of x1, then the covariance of x1 and xk is known as the k-th order autocovariance and scaling by the square root of the variance of x1 times the variance of xk results in the k-th order autocorrelation coefficient, see Chapter 2, Section 2.

Under independence the joint event table has the following entries. The probabilities in the final row and final column are just the probabilities of the events comprising x1 and x2, respectively; these are referred to as the marginal probabilities, and their whole as the marginal distribution s , to distinguish them from the conditional probabilities and conditional distributions in the body of the table. Note that summing the joint probabilities across a row or column gives the marginal probability. In example 1. However, in the case of stochastic processes, there is a natural ordering to the random variables: x 2 comes after x1 in the time series sequence, hence it is more natural to condition x 2 on x1.

In a partial sum process, the order of the sequence, which is usually related to an index of time, is important. The coin-tossing experiment is an example of a psp provided that tosses of the coin are inherently consecutive and the random variable is that which keeps a running tally sum of the number of heads or tails. The variance of St depends essentially on its ordered place in the sequence.

This example is considered further in example 1. The development is not completely analogous to the 32 A Primer for Unit Root Testing discrete case because if the conditioning event is a single value it is assigned a zero probability and so an expression analogous to 1. To outline the approach, but to avoid this difficulty in the first instance, we consider the conditioning event to have a non-zero probability.

In seeking a conditional expectation, we could approach the task by first defining a conditional distribution function, by analogy with the discrete case, as the ratio of the joint distribution function to the conditioning marginal distribution function, or in terms of density functions as the ratio of the joint density function to the conditioning marginal density function. As the density functions exist for the distributions considered in this book, we will approach the conditional expectation from that viewpoint.

The expression 1. This rules out singletons, that is single points on the X1 axis; these are Borel sets, but lead to the problem that zero probability is assigned to such events for a continuous random variable. To see the problem, note that in the case of a discrete random variable, a conditional probability mass function is obtained by taking a Introduction to Probability and Random Variables 35 value for, say, X1, as given; this fixes a row of the joint event table.

Each cell entry in that row is then normalised by the sum of such entries, which necessarily results in each new probability being non-zero with a sum that is unity. The extension by analogy to a continuous random variable breaks down because the normalising factor is zero. There are two ways forward. One is to redefine the conditional probability as a limit and the second is to go directly to the concept of a conditional expectation without first defining a conditional distribution or conditional density function.

The solution outlined here is of the former kind and follows Mittelhammer ; for an earlier reference see Feller The second approach is adopted in more advanced treatments, where the emphasis is on a measure-theoretic approach; the interested reader may like to consult Davidson , chapter 10 and Billingsley , chapter 6.

The difference between 1. The end result is simple enough and does have the same form as the discrete case. Thus, as 1. Also of interest in deriving maximum likelihood based estimators is the log of the joint pdf. In the case of independent xi, the log of 1. The emphasis here is on the time series context.

The iterating can be continued. For the second property, start by considering the product of two random variables, x and z. There is an important case in the context of conditional expectations where, in the product of random variables one random variable can, in effect, be treated like a constant. The general result is stated as follows, see for example Jacod and Protter , Theorem If the process does not change at all over time, it does not matter which sample portion of observations we use to estimate the parameters of the process; we may as well, therefore, use all available observations.

On the other hand, this may be too strict a requirement for some purposes. There may be a break Introduction to Probability and Random Variables 39 in the mean of the process, whilst the variance of the process remains the same. In that case, assuming that the mean is unchanging, which is a form of nonstationarity, is clearly wrong and will lead us into error; but rather than use only that part of the sample where the mean is constant, we may be able to model the mean change and use all of the sample. The leading case of nonstationarity, at least in econometric terms, is that induced by a unit root in the AR polynomial of an ARMA model for y t, considered more extensively in Chapter 2.

This implies that the variance of y t is not constant over time and that the k-th order autocovariance of y t depends on t. This is, however, just one example of how nonstationarity can be induced. Note that stationarity refers to a property of the process generating the outcomes — or data — that we observe; thus we should refer to a stationary or nonstationary process, not to stationary or nonstationary data.

Notwithstanding this correct usage, it is often the case that sample data is referred to as stationary or nonstationary. This is particularly so in the case of data generated from a stochastic process and presented in the form of a time series, when one finds a sample, for example data on GDP for — , being referred to as nonstationary. This usage is widespread and does no particular harm provided that the correct meaning is understood. By independence, we can multiply together the pmfs for each P y t and as, by assumption, each of these is identical, the joint pmf is: P y1 , y 2 ,!

Suppose we wanted to assess the assumption that the two outcomes for each t were, indeed, equally likely. This is a sensible estimator given that the probability structure over the sequence is unchanging see Chapter 3, Section 3. This illustration uses the independence property explicit in the random experiment of coin tossing, but it is not a part of the definition of stationarity.

The next two subsections show what is required depending on the particular concept of stationarity. This means that it does not matter which T-length portion of the sequence we observe. These results imply that other moments, including joint moments, such as the covariances, are invariant to arbitrary time shifts. The first condition states that the mean is constant, the second that the variance is constant and the third that the k-th order autocovariance is invariant to an arbitrary shift in the time origin.

From these three conditions, it is evident that a stochastic process could fail to be weakly stationary, because at least one of the following holds over time: i. A stochastic process that is not stationary is said to be nonstationary. Usually it is apparent from the context whether the stationarity being referred to is strict or weak. When the word stationary is used without qualification, it is taken to refer to weak stationarity, shortened to WS, but, perhaps, most frequently referred to as covariance stationarity.

### Latest news for staff

Weak or covariance stationarity is also referred to as wide-sense stationary, leading to the initials WSS. Ross gives examples of processes that are weakly stationary but not strictly stationary; however, note that, exceptionally, a process could be strictly stationary, but not weakly stationarity by virtue of the non-existence of its moments. For example, a random process where the components have unchanging marginal and joint Cauchy distributions will be strictly stationary, but not weakly stationary because the moments do not exist.

From 1. One cannot make sense of random walks and Brownian motion without a background knowledge of probability or of, for example, memory and persistence in a stochastic process without the concept of dependence. The partial sum Introduction to Probability and Random Variables 43 process is critical to not only random walks and Brownian motion but also to the development of the distribution theory of unit root tests. There are a number of excellent texts on probability and stochastic processes that can serve as a follow up to this chapter.

Questions Q1. Is he correct? However, we also need to consider how many ways 5 heads can occur in 10 tosses. This example is due to Bean, In practice we observe one set of observations, but conceptualise these as outcomes from a process that is inherently capable of replication. In Chapter 1, this sequence was referred to as a stochastic process, where an outcome of such a process is a path function or sample path, not a single point. This chapter proceeds as follows. The lag operator is introduced in Section 2. As a key preliminary in this chapter is characterising the degree of dependence, autocovariances, autocorrelations and variances are introduced in Section 2.

Section 2. Estimation of the long-run variance is considered in 2. Throughout this chapter, time is of the essence and, therefore, the adopted notation is of the form for a discrete-time random variable. We outline some basic principles in this subsection; a more extensive discussion can be found in Dhrymes The lag operator is more than just a convenience of notation; it opens the way to write functions of the lags and leads of a time series variable that enable some quite complex analysis.

At this point, it is useful to clarify a distinction in approach that arises in connection with the roots of polynomials. The benefit is a neater way of representing the roots. The roots are the solutions to 1 — 1. Divide the lag polynomial through by 0. Note that isolating the unit root and rewriting the lag polynomial in terms of the remaining root, results in an invertible polynomial, specifically: y t — 1. It features widely in tests for a unit root, which in effect focus on the AR component of the model. This section outlines some important features of this class of model; and subsequent sections use some simple ARMA models for illustration.

References to follow up this important area of time series analysis are given at the end of the chapter. For simplicity the specification in 2. It is usual to add a specification of t to complete the ARMA model, and we digress briefly to cover this point. Another possibility requires us to look ahead to a martingale difference sequence MDS , see Chapter 3, Section 3. In practice, such exact cancellation is rare, but near-cancellation does occur especially as p and q are increased. At this point, it is worth pointing out a departure from the notation of Equation 2.

This simple model, or a slight generalisation of it, is an often used vehicle for unit root tests. In fact, this notational convention has a substantive base in terms of the formulation of a model for unit root testing, see Chapter 8. The MA form 2. A measure of persistence based on 2. Note that 2. Thus 2. It is usual to impose an invertibility condition on the MA polynomial to ensure identifiability of the MA coefficients.

## Unit Root Tests in Time Series | evezixyxizon.tk

This is because different sets of MA coefficients give rise to the same autocorrelation structure. The problem can be illustrated most simply with the MA 1 model. Evidently, the lag coefficients in the moving average representation capture the differences due to the unit shock. By way of motivation, consider the ARMA 2, 0 model, with the lag polynomial of example 2.

The concept of an integrated process is intimately related to the concept of stationarity, which was considered in Chapter 1, Section 1. A key aspect in characterising a time series is the extent of its dependence on itself, usually referred to as serial dependence, and for linear time series, the basic concept is the autocovariance, considered in the next section. The autocovariance function of pure MA processes, which are just linear combinations of white noise and hence uncorrelated inputs, are particularly easy to obtain.

An alternative method of obtaining the autocovariance and autocorrelation functions is by way of the autocovariance generating function, see Section 2. Multiplying out the terms in 2. Implicit in 2. Example 2. The last expectation is zero because y t—k is a function of t—k, t—k—1, Furthermore, the sequences of autocovariances and autocorrelations are clearly summable. In practice, these are replaced by their sampling counterparts.

These variations affect the small sample properties, but have no effect asymptotically. In Section 2. Additionally, in Section 2. The tests are, however, also applied as tests of in dependence to the levels of series, see, for example, Escanciano and Lobato However, it is important to bear in mind the underlying hypothesis tests implied by this multiple testing approach. In the context of ARMA models, this is known as the identification of the model and there is an extensive literature on this subject, see, for example, Brockwell and Davis , chapter 9.

These are examples of selection criteria that penalise the addition of terms that improve the fit of the model. The role of the second term is to impose a penalty on increasing k. Different choices of the penalty function give different information criteria IC. Also, in practice, the use of an information criterion may be joined by another criterion, such as no evidence of serial correlation in the residuals of the resulting model.

The k-th order autocovariance is read off the ACGF as the coefficient on the k-th power of z. Applying 2. Two methods predominate. The first makes no assumption about the parametric form of the DGP and bases the estimator on the sample variance and autocovariances; the second, assumes either that the DGP is an ARMA p, q model or can be reasonably approximated by such a model. The practical problem here is that the upper summation limit cannot extend to infinity; therefore, an estimator is defined by truncating the upper limit to a finite number m 2.

Phillips , theorem 4. The practical problem is to determine the orders p and q and estimate the associated AR and MA parameters. As to estimation, standard econometric software packages, such as Eviews, RATS and TSP, offer estimation routines based on the method of moments, conditional least squares and exact maximum likelihood; when available, the latter is usually to be preferred.

Estimation methods differ in how the pre-sample values are dealt with; for example, pre-sample values of t may be set to zero or they may backcast given initial estimates of the ARMA coefficients. For details of different estimation methods see, for example, Brockwell and Davis , chapter 8 and Fuller , chapter 8.

The estimated coefficients from 2. There are three sub-figures for each model. The realisations from the AR 1 model are shown in Figure 2. The sample autocorrelations are shown in Figure 2. There is negative first order autocorrelation in the MA 1 model, so there are a relatively large number of changes of sign in adjacent realisations, see Figure 2. The observations, in natural logarithms, are graphed in Figure 2. Clearly there is some positive dependency in the series. Figure 2. The short-memory nature of the process is suggested by the finite sum of the autocorrelations in Figure 2.

An ARMA p, q model was fitted to the data, with the upper limits of p and q set at 3. The estimated model was Time Series Concepts 15 The estimated ARMA 2, 0 model is: 1 — 0. Semi-parametric estimation of the long-run variance was described in Section 2. The results are presented graphically in Figure 2.

### My Shopping Bag

The unweighted estimator shows a more marked peak compared to the estimator using the Newey-West weights. Classic texts include Box and Jenkins and Anderson , both of which are a must for serious time series analysis. In turn they could be followed by Priestley , Fuller , Brockwell and Davis ; two books with a slightly more econometric orientation are Hamilton and Harvey The problems associated with estimating the long-run variance are considered by Sul et al.

Questions Q2. This can be solved using the standard high school formula for the roots of a quadratic equation; however, it is easier to use a program in, Time Series Concepts 79 for example MATLAB, as used here, to obtain the roots. The trick is to pick off the coefficients on powers of z. Time Series Concepts 81 A2. Alternatively, refer back to A2. In an economic context, the more usual situation is that stochastic processes have memory to some degree and our interest is in assessing the extent of that memory.

This is important for the generalisation of the central limit theorem CLT , see Chapter 4, Section 4. The concept of temporal dependence is introduced in Section 3. An important way of assessing dependence is through the autocovariances, which is the subject of the next section. This is an example of a more general concept known as strong mixing, which is a way of characterising the extent of dependence in the sequence of y t.

It was introduced by Rosenblatt and is one of a number of mixing conditions; for a survey see Bradley , who considers eight measures of dependence. Strong mixing is put into a more formal context as follows.

What is required additionally is that the distribution of the random variable 0 is smooth. The formal condition is stated in Andrews , especially Theorem 1 , and is satisfied by a large number of distributions including the normal, exponential, uniform and Cauchy which has no moments. The importance of the concept of strong mixing is two-fold in the present context. However, notwithstanding its importance elsewhere, what is more important in context is the extension of the CLT to the functional CLT, see Chapter 6, Section 6.

A slight variation on weak stationarity WS , which is particularly relevant for AR models, is asymptotic weak stationarity.

## Books by Professor Kerry Patterson

An example will illustrate the point. Example 3. We observe one particular set of realisations, or sample outcomes, and the question is what can we infer from this one set? What we would like to do is replicate the DGP for Y, say R times; the data could then be arranged into a matrix of dimension R T, so that the t-th column is the vector of R replicated values for the random variable y t and the r-th row is the vector of one realisation of the sequence of T random variables.

For a development of this argument, with some time series examples, see Kay , who also gives an example where a WS random process is not ergodic in the mean. Although this example has assumed, that y t is a discrete-time random variable, with a discrete number of outcomes, that was simply to fix ideas; there is nothing essential to the argument in this specification, which carries across to a stochastic process comprising continuous time, continuous random variables. Dependence and Related Concepts 3. In part this just collects together previous results for convenience.

Some of the necessary background has already been undertaken in Section 3. The second property results is that of a Markov process which, although less significant in the present development, characterises an important feature of some economic time series. Note that E y t y t, y t—1, No systematic gains or losses will be made if the game is replicated.

If yt is an asset price, and the conditioning set is current information, then the change in price is unpredictable in the mean. An MDS has serially uncorrelated, but not necessarily independent, stochastic increments. Neither does a MDS require that the variance or higher order moments are constant. Consider the conditional expectation 3. See question Q3. The impact of this property is that it enables a generalisation of some important theorems in statistics and econometrics to other than iid sequences; for example, the central limit theorem and the weak law of large numbers see Chapter 4, Section 4.

First note that time is of the essence in defining the Markov property. The Markov property is easily extended to stochastic processes in continuous time; all that is required is a time index that distinguishes the future, the present and the past, see for example Billingsley , p.

https://paysubpebigssanpai.tk Examples of stochastic processes with the Markov property are the Poisson process and Brownian motion BM ; this is a result due essentially to the independent increments involved in both processes. BM is considered at length in Chapter 7 the Poisson process is described in the following section. The question then is what is a reasonable model for the arrivals? One possibility is the Poisson process. Making these substitutions, 3. The right-hand-side of 3.

A martingale is obtained from a Poisson process in the same manner as in example 3. A Poisson process also has the Markov property, and thus is an example of a Markov process, a result due to the independence of the increments, for a proof see Billingsley , p. The cumulative arrivals for the first ten minutes are shown in Figure 3. The path is right continuous and it jumps at the discrete points associated with the positive integers. If the particular element of the sample space that was realised for this illustration was realised exactly in a second replication, the whole sample path would be repeated, otherwise the sample paths Dependence and Related Concepts 99 0.

To make this point, a number of sample paths for a two hour period are shown in Figure 3. For a survey of mixing concepts, see Bradley and Withers considers the relationship between linear processes and strong mixing; and for an application in the context of unit root tests, see Phillips For the connection between mixing sequences and martingale difference sequences at an advanced level, see Hall and Heyde Martingales and Markov processes are an essential part of the ideas of probability and stochastic processes. At a more advanced level see Hall and Heyde An econometric perspective is provided by Davidson , who covers the concepts of dependence, mixing and martingales at a more advanced level; and Billingsley , chapter 6, section 35 gives a similarly advanced coverage of martingales.

McCabe and Tremayne , chapter 11 consider dependent sequences and martingales. Questions Q3. Dependence and Related Concepts Q3. Quite often it is not possible to determine the finite sample properties of, for example, the mean or distribution, of a random variable, such as an estimator or a test statistic; but in the limit, as the sample size increases, these properties are more easily determined. There is, however, more than one concept of convergence and different concepts may be applicable in different circumstances. The matter of convergence is more complex in the case of a sequence of random variables compared to a nonstochastic sequence, since a random variable has multiple outcomes.

This would require that xn and x are defined on the same probability space. Overall, this is a stringent condition for convergence and some lesser form of convergence may well be sufficient for practical purposes. Starting at the other end of the spectrum, we could ask what is the minimum form of convergence that would be helpful for a test statistic to satisfy in the event that the finite sample distribution is analytical intractable or difficult to use. Even when we know the finite sample distribution, the limit distribution may be easier to use, an example being the A Primer for Unit Root Testing normal approximation to the binomial distribution, which works well even for moderate n when p and q are reasonably equal.

Convergence in distribution turns out to be the weakest form of convergence that is sensible and useful. Our central interest is in a sequence of random variables, for example, a test statistic indexed by the sample size, but, by way of introduction, Section 4. Section 4. A concept related to convergence whether, for example, in distribution or probability is the order of convergence which, loosely speaking, is a measure of how quickly sequences converge if they do ; again this idea can be related to nonstochastic or stochastic sequences and both are outlined in Section 4.

Finally, the convergence of a stochastic process as a whole that is viewed not as the n-th term in a sequence, but of the complete trajectory, is considered in Section 4. The corresponding sequence of the partial sums does not necessarily have the same convergence property as its component elements.

Consider some of the previous examples with an S to indicate the sum. Example 4. The reader may recognise this as a harmonic series; the proof of its divergence has A Primer for Unit Root Testing an extensive history not pursued here. The interested reader is referred to Kifowit and Stamps , who examine 20 proofs on divergence! In the case of random sequences or sequences of random functions then, as noted in the introduction to this section, this raises some interesting questions about what convergence could be taken to mean when there is a possibly infinite number of outcomes.

Four concepts of convergence are outlined in this section, together with related results. The weak convergence condition is qualified by adding that it holds for each X that is a continuity point of F. F X is referred to as the limiting distribution of the sequence.

Concepts of Convergence That the definition 4. Points of discontinuity in the limit distribution function are excepted so that, for example, discrete limiting distributions are permitted, see McCabe and Tremayne , chapter 3. In the context of convergence in distribution, the probability spaces of xn and x need not be the same, unlike the concepts of convergence in probability and convergence almost surely that are considered below, as the following example shows.

The probability space is the triple n, Fn, Pn. The probability measure is as in Equations 3. Thus, the Poisson distribution is the limiting distribution of a sequence of binomial distributed random variables; see Billingsley , p. Note that in making the notational translation to Chapter 3, Section 3. For an elaboration, see Billingsley , Theorem An analogous result holds for convergence in probability, see Section 4. Concepts of Convergence This result in Equation 4. Convergence in distribution that is of the cdfs implies that the corresponding pdfs converge when they exist, as in this example.

Case 1 is shown in Figure 4. Because the xj do not have to be defined on the same probability space, Billingsley , Theorem This case is also the exception in that convergence in distribution to a constant that is to a degenerate distribution implies convergence in probability. This is a very useful theorem as it is often relatively easy to determine the plim of xn by direct means, but less easy to obtain the plim of g xn in the same manner.

Concepts of Convergence A related theorem is the analogue of the CMT for convergence in probability. It provides a justification for using the average as an estimator of the population mean. Sure convergence is probably closest to an intuitive understanding of what is meant by convergence of a random variable. Sometimes statements are qualified as holding almost surely or, simply, a. See Stout for a detailed analysis of the concept of almost sure convergence. An example will illustrate this point and the idea of the concept.

Almost sure convergence is stronger than convergence in distribution and convergence in probability, and implies both of these forms of convergence. Concepts of Convergence 4. We note this case briefly. This idea is extended in convergence in r-th mean, defined for xn as follows. Otherwise the principle is the same, but the met- ric different, for different values of r. The relationship between the convergence concepts is summarised in Table 4. Table 4. For example, consider a test statistic that is the ratio of two stochastic sequences. The asymptotic behaviour of the ratio, for example does it converge or diverge, depends on the order of convergence of the numerator and denominator components.

We start with the order of nonstochastic sequences as this is simply extended to stochastic sequences. Following mathematical convention, it is usual to use n, rather than t or T , as an index for the last term in a sequence. This convention is followed in this section apart from where time series are being directly addressed. Note that all that matters when the function under examination comprises elements that are polynomials in n, is the dominant power of n. In such cases, sums of the tj appear in the subsequent analysis and their order of magnitude is of interest. The most common cases are sums of linear and quadratic trends, respectively.

For a more comprehensive list and analysis, see Banerjee et al. In words, the probability of the absolute value of the n-th term in a bounded sequence exceeding the finite real number c is less than. Notice that c. To illustrate what the scaling is doing consider Figures 4. The top panel of Figure 4. Then in Figure 4. The densities shown in the figure are estimated from 1, simulations of Sn for the two values of n.

These relationships also hold if O. Source: Mittlehammer , lemma 5. The question then arises as to what is meant by the convergence of one stochastic process to another. This is a different concept from convergence of a sequence of random variables, as considered in previous sections.

In contrast, when the sequence of random variables is a stochastic process, so that the elements are indexed by t representing time, the focus is now on the sample paths and the function space generated by such stochastic processes. The generation of these sample paths depends not just on the distribution of each random variable in the sequence, but also on their joint distributions. Concepts of Convergence What can we say about the many possibly infinite number of sample paths that could be generated in this way? It is clear that we would need to know not only about the distribution of each of the component random variables y t , but also how they are related.

These are the finite-dimensional distributions, or fidis, of the stochastic process. For example, we could specify a Gaussian stochastic process comprising independent N 0, 1 random variables. Whilst establishing the convergence of one stochastic process to another is more than just the convergence of the respective fidis, that is a good place to start. This is not sufficient by itself to enable us to say that one stochastic process converges to another; the additional condition is uniform tightness, a condition that is also required for a sequence of cdfs to have a cdf as its limit.

The following statement of this condition is from Davidson , section Subject to this point the convergence in 4. Indeed, it is hard to make sense of the properties of estimators without knowledge of, for example, convergence in distribution weak convergence and convergence in probability plim. In essence this form of convergence relates to the limiting behaviour of the n-th term in a sequence of random variables. A prototypical case would be a sequence of estimators of the same population quantity, for example a regression parameter or a test statistic, where the terms in the sequence are indexed by the sample size.

Interest then centres on the limiting behaviour of the estimator as the sample size increases without limit. However given two estimators of the same quantity, we would prefer the one that is quicker in approaching the limit. In this context it is of interest to know the order of convergence of the two estimators. Concepts of Convergence Once the groundwork of convergence of a sequence is achieved, it is possible to move onto the idea of the convergence of a stochastic process.

The technical detail of this form of convergence is beyond the scope of this book, but much of the intuition derives from weak convergence and the convergence of the finite dimensional distributions. For further reading, see Mittelhammer for an excellent treatment of the concepts of convergence and, for their application in a time series context, see Brockwell and Davis Classic texts on stochastic processes include Doob and Gihman and Skorohod Questions Q4. Are the following statements correct? The sample space of x — y was considered in example 4. In the first section, the emphasis is on the probability background of the random walk.

It introduces the classic two-dimensional walk, primarily through the fiction of a gambler, which can be illustrated graphically. Some economic examples are given that confirm the likely importance of the random walk as an appropriate model for some economic processes. The random walk is a natural introduction to Brownian motion, which is the subject of the next chapter, and is an example of a stochastic difference equation, in which the steps in the walk are driven by a random input. By making the steps in the walk smaller and smaller, the random walk can be viewed in the limit as occurring in continuous time and the stochastic difference equation becomes a stochastic differential equation.

Equally, one might consider the random process as occurring in continuous time and the discrete-time version, that is the random walk, is what is observed. The basic idea of a random walk is introduced in Section 5. Variations on the random walk theme are considered in Section 5. Some intuition about the nature of random walks is provided by looking at the variance of a partial sum process in Section 5. Section 5. It is an important process in its own right and has been the subject of extension and study.

It is an example of a process that is both a martingale and a Markov process see Chapter 3, Sections 3. His steps at each t are independent; that is the direction of the step at t is not affected by any step taken previously. The reader may recognise this as a particularly simply Markov Chain, the theory of which offers a very effective means of analysing random walks, see for example, Ross, In this random walk, not all points are possible, since the walker must, at each stage, take a step to the left north or the right south.

This suggests that an interesting variation would be to allow the walker to continue in a straight line, perhaps with a relatively small probability, and we consider this in Section 5. The possible paths are shown in Figure 5. To enable such a distinction in Figure 5. His overall tally is kept by a banker who allows Peter credit should he find that either at the first throw, or subsequently, that Peter is losing on the overall tally.

In a variation of the game, Peter starts with a capital sum, but this development is not required here. We assume that there are T individual games played sequentially, where the precise nature of T is yet to be determined, but serves the purpose of indicating that time is an essential element of this game. For simplicity, the games are assumed to be played at the rate of one per period t, so t increments in units of 1 from 1 to T.

A question explores this case further. It is clear that St is simply the partial sum of the y t up to and including t, and that the progression of St is determined by a simple one-period recursion. Some examples of symmetric random walking are given in Figures 5. Students and professional alike are often surprised by the time series patterns that can be generated by this process.

To illustrate some key features of the random walk, the results from four simulations of a symmetric binomial random walk are reported in Table 5. To check that the coin is indeed fair in each of the simulations, the results in Table 5. The table reports the proportion of positive and negative outcomes for each individual trial. The final row gives the length of the maximum sequence in which Peter is not losing, as a percentage of the total time; and the final column reports the average of 5, trials.

It is reasonably evident from the first two rows that the coin is fair, but St is predominantly positive in simulation 2 and predominantly negative in simulations 3 and 4. The last column the averages confirms that the walks were indeed constructed from outcomes that were equally likely — on average. As fascinating as this process is, its importance as far as economic time series are concerned may not yet be clear. However, it turns out to contain some very important insights into economic processes, and by Table 5.

Next is the probability that Peter is always winning. The binomial coefficient, as in 5. A related problem of interest, of which the last is a special case, is what fraction of the overall time Peter spends on the positive side of the axis. It gives a general result that we can interpret as follows. Since k varies across all possible values, these probabilities must sum to unity.

Further, the approximation, or direct calculation using Equation 5. In the first case, we focus on P S1! This graph shows how the probability that Peter is never on the losing side varies as the number of trials increases. Thus, the probabilities are, sequentially: P S1! Figure 5. These probabilities exhaust the sample space and, therefore, sum to unity, see Figure 5. The first two variations are concerned with imparting a direction to the random walk; the third to generalising the distribution and range of outcomes.

This change will impart a direction to the random walk given by the larger probability. These simulations are shown in Figures 5. To illustrate the effect of drift, four cases are shown in Figures 5. The impact of the drift becomes apparent quickly, imparting a clear direction to the random walk. This section considers some variations on the random walk theme.

That is, even in a simple random walk there are actually three options: to the left, to the right and straight on; hence, probabilities can be assigned to each of these outcomes. Now there are three probabilities, say p1 of a positive outcome, p2 of no change and p3 of a negative outcome. In order to ensure that there is no change in the variance of the process compared to the symmetric random walk, the positive and negative outcomes are symmetric, but are varied so that in each case the variance is standardised at unity. When the probability of no change is positive it tends to extend the sojourn times.

As might be anticipated, when the outcomes are drawn from a normal distribution, then the random walk path becomes smoother than in the discrete cases so far considered. To illustrate, the simulations of Figures 5. We may also note at this point a feature of the way that the graphs have been drawn. The covariances of the partial sums will also be of interest. We can infer the general pattern from some simple examples. Recall from example 1. A change of sign means that the path crosses the zero axis, which here marks the expected value of St; it is for this reason that a sign change is referred to as mean reversion.

Thus, a frequent descriptive feature by which to broadly judge whether a path has been generated by a random walk, is to look for mean reversion or, equivalently, sign changes taking E St as the reference point. The identification of sign changes is slightly different if the underlying random variable is continuous. The expected values of the number of sign changes are 3. Thus, it is incorrect to assume that the lead will change hands quite evenly. Let F T be the distribution function of T for T trials. Then Feller , chapter III.

The limiting distribution F T is plotted in Figure 5.