Текст

handbook of statistics 14 Statistical Methods in Finance Edited by i ;.s C.R.kau l

ELSEVIER SCIENCE B.V. Sara Burgerhartstraal 25 P.O. Box 211, 1000 AE Amsterdam, The Netherlands ISBN: 0-444-81964-9 © 1996 Elsevier Science B.V. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science B.V . Copyright & Permissions Department. P.O. Box 521, 1000 AM Amsterdam. The Netherlands. Special regulations for readers in the U.S.A.-This publication has been registered wilh the Copyright Clearance Center Inc. (CCC). 222 Rosewood Drive. Danvers, MA 0192.1. Information can be obtained from the CCC about conditions under which photocopies of parts oi' this publication may be made in the U.S.A. All other copyright questions, including photocopying outside the U.S.A., should be referred to the Publishers unless otherwise specified. No responsibility is assumed by the publisher for any injury and or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use of operation of any methods, products, instructions or ideas contained in the material herein. This book is printed on acid-free paper. Printed in The Netherlands.

Table of contents Preface v Contributors xv PART I. ASSET PRICING Ch. I. Econometric Evaluation of Asset Pricing Models 1 W. E. Person and R. Jagannathan 1. Introduction 1 2. Cross-sectionaI regression methods for testing beta pricing models 3 3. Asset pricing models and stochastic discount factors 10 4. The generalized method of moments 15 5. Model diagnostics 23 6. Conclusions 28 Appendix 29 References 30 Ch. 2. Instrumental Variables Estimation of Conditional Beta Pricing Models 35 C. R. Harvey and C. M. Kirby 1. Introduction 35 2. Single beta models 37 3. Models with multiple betas 44 4. Latent variables models 46 5. Generalized method of moments estimation 47 6. Closing remarks 58 References 58

VIII Table of contents Ch. 3. Semiparametric Mclhods for Asset Pricing Models 61 B. N. Lehmann 1. Introduction 61 2. Some relevant aspects of the generalized method of moments (GMM) 62 3. Asset pricing relations and their econometric implications 68 4. Efficiency gains within alternative beta pricing formulations 74 5. Concluding remarks 87 References 88 PART II. TERM STRUCTURES OF INTEREST RATES Ch. 4. Modeling the Term Structure 91 A. R. Pagan, A. D. Hall, and V. Martin 1. Introduction 91 2. Characteristics of term structure data 92 3. Models of the term structure 104 4. Conclusion 116 References 116 PART III. VOLATILITY Ch. 5. Stochastic Volatility 119 E. Ghysels, A. C. Harvey and E. Renault 1. Introduction 119 2. Volatility in financial markets 120 3. Discrete lime models 139 4. Continuous time models 153 5. Statistical inference 167 6. Conclusions 182 References 183 Ch. 6. Slock Price Volatility 193 £ E. LeRoy 1. Introduction 193 2. Statistical issues 195 3. Dividend-smoothing and non stationarity 198 4. Bubbles 201 5. Time-varying discount rates 203

Table of contents ix 6. Interpretation 204 7. Conclusion 206 References 207 Ch. 7. GARCH Models of Volatility 209 1. Introduction 209 2. GARCH models 210 3. Statistical inference 224 4. Statistical properties 229 5. Conclusions 234 References 235 PART IV. PREDICTION Ch. 8. Forecast Evaluation and Combination 241 F. X. Diebold and J. A. Lopez 1. Evaluating a single forecast 242 2. Comparing the accuracy of multiple forecasts 247 3. Combining forecasts 252 4. Special topics in evaluating economic and financial forecasts 256 5. Concluding remarks 264 References 265 Ch. 9. Predictable Components in Stock Returns 269 G. Kaul I Introduction 269 2. Why predictability 270 3. Predictability of stock returns: The methodology 273 4. Power comparisons 287 5. Conclusion 291 References 292 Ch. 10. Interest Rate Spreads as Predictors of Business Cycles 297 K. Lahiri and 7. G. Wang 1. Introduction 297 2. Hamilton's non-linear filter 299 3. Empirical results 301 4. Implications for the monetary transmission mechanism 308

X Table of contents 5. Conclusion 311 Acknowledgement 313 References 313 PART V. ALTERNATIVE PROBABILISTIC MODELS Ch. 11. Nonlinear Time Series, Complexity Theory, and Finance 317 W. A. Brock and P. J. F. de Lima 1. Introduction 317 2. Nonlinearity in stock returns 326 3. Long memory in stock returns 337 4. Asymmetric information structural models and stylized features of stock returns 349 5. Concluding remarks 353 References 353 Ch. 12. Count Data Models for Financial Data 363 A. C. Cameron and P. K. Trivedi 1. Introduction 363 2. Stochastic process models for count and duration data 366 3. Econometric models of counts 371 4. Concluding remarks 388 Acknowledgement 389 References 390 Ch. 13. Financial Applications of Stable Distributions 393 /. //. McCulloch 1. Introduction 393 2. Basic properties of stable distributions 394 3. Stable portfolio theory 401 4. Log-stable option pricing 404 5. Parameter estimation and empirical issues 415 Appendix 420 Acknowledgements 421 References 421 Ch. 14. Probability Distributions for Financial Models 427 ./. B. McDonald 1. Introduction 427 2. Alternative models 428

Table of contents xi Applicalions in finance 437 Appendix A: Special functions 454 Appendix B: Data 456 Acknowledgement 458 References 45H PART VI. APPLICATIONS OF SPECIALIZED STATISTICAL METHODS Ch. 15. Bootstrap Based Tests in Financial Models 463 (7. S, Maddala and H. Li I. Introduction 463 2 3 4 5 6 7 8 A review of different bootstrap methods 464 Issues in the generation of bootstrap samples and the test statistics 466 A critique of the application of bootstrap methods in financial models 469 Bootstrap methods for model selection using trading rules 476 Bootstrap methods in long-horizon regressions 478 Impulse response analysis in nonlinear models 483 Conclusions 484 References 485 Ch 16. Principal Components and Factor Analyses 489 C, R. Rao 1. Introduction 489 2. Principal components 490 3. Model based principal components 496 4. Factor analysis 498 5. Conclusions 503 References 5(M Ch. 17. Errors-in-Variables Problems in Financial Models 507 G. S. Maddala and M'. Nimalendran 1. Introduction 507 2. Grouping methods 508 3. Alternatives to the two-pass estimation method 513 4. Direct and reverse regression methods 514 5. Latent variables / structural equation models with measurement 515 6. Artificial neural networks (ANN) as alternatives to MIMIC models 522 7. Signal extraction methods and tests for rationality 523 8. Qualitative and limited dependent variable models 523 9. Factor analysis with measurement errors 524 10. Conclusion 525 References 525

Ml Table of contents Ch. 18. Financial Applications of Artificial Neural Networks 529 M. Qi \. Introduction 529 2. Artificial Neural Networks 529 3. Relationship between ANN and interpretational statistical models 533 4. ANN implementation and interpretation 537 5. Financial applications 544 6. Conclusions 547 Acknowledgement 548 References 548 Ch. 19. Applications of Limited Dependent Variable Models in Finance 553 G. S. Maddala 1. Introduction 553 2. Studies on loan discrimination and default 553 3. Studies on bond ratings and bond yields 555 4. Event studies 557 5. Savings and loan and bank failures 559 6. Miscellaneous other applications 562 7. Suggestions for future research 564 References 565 PART VII. MISCELLANEOUS OTHER PROBLEMS Ch. 20. Testing Option Pricing Models 567 D. S. Bates 1. Introduction 367 2. Option pricing fundamentals 568 3. Time series-based tests of option pricing models 574 4. Implicit parameter estimation 587 5. Implicit parameter tests of alternate distributional hypotheses 600 6. Summary and conclusions 604 References 605 Ch. 21. Peso Problems: Their Theoretical and Empirical Implications 613 M. D. D. Evans 1. Introduction 613 2. Peso problems and forecast errors 615 3. Peso problems, asset prices and fundamentals 626 4. Risk aversion and peso problems 634

Table of contents 5. Econometric issues 641 6. Conclusion 644 References 645 Ch. 22. Modeling Market Microstructure Time Series 647 J. Ilasbrouck 1. Introduction 647 2. Simple univariate models of prices 651 3. Simple bivariate models of prices and trades 657 4. General specifications 667 5. Time 673 6. Discreteness 677 7. Nonlinearity 679 8. Multiple mechanisms and markets 680 9. Summary and directions for further work 685 References 687 Ch. 23. Statistical Methods in Tests of Portfolio Efficiency: A Synthesis J. Shanken 1. Introduction 693 2. Testing efficiency with a riskless asset 695 3. Testing efficiency without a riskless asset 701 4. Related work 708 References 709 Subject Index 713 Contents of Previous Volumes 719

Preface As with the earlier volumes in this series, the main purpose of this volume of the Handbook of Statistics is to serve as a source reference and teaching supplement to courses in empirical finance. Many graduate students and researchers in the finance area today use sophisticated statistical methods but there is as yet no comprehensive reference volume on this subject. The present volume is intended to fill this gap. The first part of the volume covers the area of asset pricing. In the first paper, Ferson and Jagannathan present a comprehensive survey of the literature on econometric evaluation of asset pricing models. The next paper by Harvey and Kirby discusses the problems of instrumental variable estimation in latent variable models of asset pricing. The next paper by Lehman reviews semi-parametric methods in asset pricing models. Chapter 23 by Shanken also falls in the category of asset pricing. Part II of the volume on term structure of interest rates consists of only one paper by Pagan, Hall and Martin. The paper surveys both the econometric and finance literature in this area, and shows some similarities and divergences between the two approaches. The paper also documents several stylized facts in the data that prove useful in assessing the adequacy of the different models. Part III of the volume deals with different aspects of volatility. The first paper by Ghysels, Harvey and Renault present a comprehensive survey on the important topic of stochastic volatility models. These models have their roots both in mathematical finance and financial econometrics and are an attractive alternative to the popular ARCH models. The next paper by LeRoy presents a critical review of the literature on variance-bounds tests for market efficiency. The third paper by Palm on GARCH models of stock price volatility, surveys some more recent developments in this area. Several surveys on the ARCH models have appeared in the literature and these are cited in the paper. The paper surveys developments since the appearance of these surveys. Part IV of the volume deals with prediction problems. The first paper by Diebold and Lopez deals with the statistical methods of evaluation of forecasts. The second paper by Kaul, reviews the literature on the predictability of stock returns. This area has always fascinated those involved in making money in financial markets as well as academics who presumably are interested in studying whether one can, in fact, make money in the financial markets. The third paper by Lahiri reviews statistical

VI Preface evidence on interest rate spreads as predictors of business cycles. Since there is not much of a literature to survey in this area, Lahiri presents some new results. Part V of the volume deals with alternative probabilistic models in finance. The first paper by Brock and deLima surveys several areas subsumed under the rubic "complexity theory." This includes chaos theory, nonlinear time series models, long memory models and models with asymmetric information. The next paper by Cameron and Trivedi surveys the area of count data models in finance. In some financial studies, the dependent variable is a count, taking non-negative integer values. The next paper by McCulloch surveys the literature on stable distributions. This area was very active in finance in the early 60's due to the work by Mandelbrot but since then has not received much attention until recently when interest in stable distributions has revived. The last paper by McDonald reviews the variety of probability distributions which have been and can be used in the statistical analysis of financial data. Part VI deals with application of specialized statistical methods in finance. This part covers important statistical methods that are of general applicability (to all the models considered in the previous sections) and not covered adequately in the other chapters. The first paper by Maddala and Li covers the area of bootstrap methods. The second paper by Rao covers the area of principal component and factor analyses which has, during recent years, been widely used in financial research particularly in arbitrage pricing theory (APT). The third paper by Maddala and Nimalendran reviews the area of errors in variables models as applied to finance. Almost all variables in finance suffer from the errors in variables problems. The fourth paper by Qi surveys the applications of artificial neutral networks in financial research. These are general nonparametric nonlinear models. The final paper by Maddala reviews the applications of limited dependent variable models in financial research. Part VII of the volume contains surveys of miscellaneous other problems. The first paper by Bates surveys the literature on testing option pricing models. The next paper by Evans discusses what are known in the financial literature as "peso problems." The next paper by Hasbrouck covers market microstructure, which is an active area of research in finance. The paper discusses the fime series work in this area. The final paper by Shanken gives a comprehensive survey of tests of portfolio efficiency. One important area left out has been the use of Bayesian methods in finance. In principle, all the problems discussed in the several chapters of the volume can be analyzed from the Bayesian point of view. Much of this work remains to be done. Finally, we would like to thank Ms. Jo Ducey for her invaluable help at several stages in the preparation of this volume and patient assistance in seeing the manuscript through to publication. G. S. Maddala C. R. Rao

Contributors D. S. Bates, Department of Finance, Wharton School, University of Pennsylvania, Philadelphia, PA 19104, USA (Ch. 20) W. A. Brock, Department of Economics, University of Wisconsin, Madison, WI 53706, USA (Ch. 11) A. C. Cameron, Department of Economics, University of California at Davis, Davis, CA 95616-8578, USA (Ch. 12) P. J. F. de Lima, Department of Economics, The Johns Hopkins University, Baltimore, MD 21218, USA (Ch. 11) F. X. Diebold, Department of Economics, University of Pennsylvania, Philadelphia, PA 19104, USA (Ch. 8) M. D. D. Evans, Department of Economics, Georgetown University, Washington DC 20057-1045, USA (Ch. 21) W. E. Ferson, Department of Finance, University of Washington, Seattle, WA 98195, USA (Ch. 1) E. Ghysels, Department of Economics, The Pennsylvania State University, University Park, PA 16802 and CIRANO (Centre interuniversitaire de recherche en analyse des organisations), Universite de Montreal, Montreal, Quebec, Canada H3A2A5 (Ch. 5) A. D. Hall, School of Business, Bond University, Gold Coast, QLD 4229, Australia (Ch. 4) A. C. Harvey, Department of Statistics, London School of Economics, Houghton Street, London WC2A 2AE, UK (Ch. 5) C. R. Harvey, Department of Finance, Fuqua School of Business, Box 90120, Duke University, Durham, NC 27708-0120, USA (Ch. 2) J. Hasbrouck, Department of Finance, Stern School of Business, 44 West 4th Street, New York, NY 10012-1126, USA (Ch. 22) R. Jagannathan, Finance Department, School of Business and Management, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong (Ch. 1) G. Kaul, University of Michigan Business School, Ann Harbor, MZ 48109- 1234 (Ch. 9) C. M. Kirby, Department of Finance, College of Business & Mgm., University of Maryland, College Park, MD 20742, USA (Ch. 2) K. Lahiri, Department of Economics, State University of New York at Albany, Albany, NY 12222 USA (Ch. 10) XV

XVI Contributors B. N. Lehmann, Graduate School of International Relations, University of California at San Diego, 9500 Gilman Drive, LaJolla, CA 92093-0519, USA (Ch. 3) S. F. LeRoy, Department of Economics, University of California at Santa Barbara, Santa Barbara, CA 93106-9210 (Ch. 6) H. Li, Department of Management Science, The Chinese University of Hongkong, 302 Leung Kau Kui Building, Shatin, NT, Hong Kong (Ch. 15) J. A. Lopez, Department of Economics, University of Pennsylvania, Philadelphia, PA 19104, USA (Ch. 8) G. S. Maddala, Department of Economics, Ohio State University, 1945 N. High Street, Columbus, OH 43210-1172, USA (Chs. 15, 17, 19) V. Martin, Department of Economics, University of Melbourne, Parkville, VIC 3052, Australia (Ch. 4) J. H. McCulloch, Department of Economics and Finance, 410 Arps Hall, 1945 N. High Street, Columbus, OH 43210-1172, USA (Ch. 13) J. B. McDonald, Department of Economics, Brigham Young University, Provo, UT 84602, USA (Ch. 14) M. Nimalendran, Department of Finance, College of Business, University of Florida, Gainesville, FL 32611, USA (Ch. 17) A. R. Pagan, Economics Program, RSSS, Australian National University, Canberra, ACT 0200, Australia (Ch. 4) F. C. Palm, Department of Quantitative Economics, University of Limburg, P.O. Box 616, 6200 MD Maastricht, The Netherlands (Ch. 7) M. Qi, Department of Economics, College of Business Administration, Kent State University, P.O. Box 5190, Kent, OH 44242 (Ch. 18) C. R. Rao, The Pennsylvania State University, Center for Multivariate Analysis, Department of Statistics, 325 Classroom Bldg., University park, PA 16802- 6105, USA (Ch. 16) E. Renault, Institut D'Economie Industrielle, Universite des Sciences Sociales, Place Anatole France, F-31042 Toulouse Cedex, France (Ch. 5) J. Shanken, Department of Finance, Simon School of Business, University of Rochester, Rochester, NY 14627, USA (Ch. 23) P. K. Trivedi, Department of Economics, Indiana University, Bloomington, IN 47405-6620, USA (Ch. 12) J. G. Wang, AT&T, Rm. N460-WOS, 412 Mt. Kemble Avenue, Morristown, NJ 07960, USA (Ch. 10)

G. S. Maddala and C. R. Rao, eds., Handbook of Statistics, Vol. 14 © 1996 Elsevier Science B.V. All rights reserved. Econometric Evaluation of Asset Pricing Models Wayne E. Ferson and Ravi Jagannathan We provide a brief review of the techniques that are based on the generalized method of moments (GMM) and used for evaluating capital asset pricing models. We first develop the CAPM and multi-beta models and discuss the classical two- stage regression method originally used to evaluate them. We then describe the pricing kernel representation of a generic asset pricing model; this representation facilitates use of the GMM in a natural way for evaluating the conditional and unconditional versions of most asset pricing models. We also discuss diagnostic methods that provide additional insights. 1. Introduction A major part of the research effort in finance is directed toward understanding why we observe a variety of financial assets with different expected rates of return. For example, the U.S. stock market as a whole earned an average annual return of 11.94% during the period from January of 1926 to the end of 1991. U.S. Treasury bills, in contrast, earned only 3.64%. The inflation rate during the same period was 3.11% (see Ibbotson Associates 1992). To appreciate the magnitude of these differences, note that in 1926 a nice dinner for two in New York would have cost about $10. If the same $10 had been invested in Treasury bills, by the end of 1991 it would have grown to $110, still enough for a nice dinner for two. Yet $10 invested in stocks would have grown to $6,756. The point is that the average return differentials among financial assets are both substantial and economically important. A variety of asset pricing models have been proposed to explain this phenomenon. Asset pricing models describe how the price of a claim to a future payoff is determined in securities markets. Alternatively, we may view asset pri- * Ferson acknowledges financial support from the Pigott-PACCAR Professorship at the University of Washington. Jagannathan acknowledges financial support from the National Science Foundation, grant SBR-9409824. The views expressed herein are those of the authors and not necessarily those of the Federal Reserve Bank of Minneapolis or the Federal Reserve System. 1

2 W. E. Ferson and R. Jagannathan cing models as describing the expected rates of return on financial assets, such as stocks, bonds, futures, options, and other securities. Differences among the various asset pricing models arise from differences in their assumptions that restrict investors' preferences, endowments, production, and information sets; the stochastic process governing the arrival of news in the financial markets; and the type of frictions allowed in the markets for real and financial assets. While there are differences among asset pricing models, there are also important commonalities. All asset pricing models are based on one or more of three central concepts. The first is the law of one price, according to which the prices of any two claims which promise the same future payoff must be the same. The law of one price arises as an implication of the second concept, the no-arbitrage principle. The no-arbitrage principle states that market forces tend to align the prices of financial assets to eliminate arbitrage opportunities. Arbitrage opportunities arise when assets can be combined, by buying and selling, to form portfolios that have zero net cost, no chance of producing a loss, and a positive probability of gain. Arbitrage opportunities tend to be eliminated by trading in financial markets, because prices adjust as investors attempt to exploit them. For example, if there is an arbitrage opportunity because the price of security A is too low, then traders' efforts to purchase security A will tend to drive up its price. The law of one price follows from the no-arbitrage principle, when it is possible to buy or sell two claims to the same future payoff. If the two claims do not have the same price, and if transaction costs are smaller than the difference between their prices, then an arbitrage opportunity is created. The arbitrage pricing theory (APT, Ross 1976) is one of the most well-known asset pricing model based on arbitrage principles. The third central concept behind asset pricing models is financial market equilibrium. Investors' desired holdings of financial assets are derived from an optimization problem. A necessary condition for financial market equilibrium in a market with no frictions is that the first-order conditions of the investors' optimization problem be satisfied. This requires that investors be indifferent at the margin to small changes in their asset holdings. Equilibrium asset pricing models follow from the first-order conditions for the investors' portfolio choice problem and from a market-clearing condition. The market-clearing condition states that the aggregate of investors' desired asset holdings must equal the aggregate "market portfolio" of securities in supply. The earliest of the equilibrium asset pricing models is the Sharpe-Lintner- Mossin-Black capital asset pricing model (CAPM), developed in the early 1960s. The CAPM states that expected asset returns are given by a linear function of the assets' betas, which are their regression coefficients against the market portfolio. Merton (1973) extended the CAPM, which is a single-period model, to an economic environment where investors make consumption, savings, and investment decisions repetitively over time. Econometrically, Merton's model generalizes the CAPM from a model with a single beta to one with multiple betas. A multiple-beta model states that assets' expected returns are linear functions of a number of betas. The APT of Ross (1976) is another example of a multiple-beta

Econometric evaluation of asset pricing models 3 asset pricing model, although in the APT the expected returns are only approximately a linear function of the relevant betas. In this paper we emphasize (but not exclusively) the econometric evaluation of asset pricing models using the generalized method of moments (GMM, Hansen 1982). We focus on the GMM because, in our opinion, it is the most important innovation in empirical methods in finance within the past fifteen years. The approach is simple, flexible, valid under general statistical assumptions, and often powerful in financial applications. One reason the GMM is "general" is that many empirical methods used in finance and other areas can be viewed as special cases of the GMM. The rest of this paper is organized as follows. In Section 2 we develop the CAPM and multiple-beta models and discuss the classical two-stage regression procedure that was originally used to evaluate these models. This material provides an introduction to the various statistical issues involved in the empirical study of the models; it also motivates the need for multivariate estimation methods. In Section 3 we describe an alternative representation of the asset pricing models which facilitates the use of the GMM. We show that most asset pricing models can be represented in this stochastic discount factor form. In Section 4 we describe the GMM procedure and illustrate how to use it to estimate and test conditional and unconditional versions of asset pricing models. In Section 5 we discuss model diagnostics that provide additional insight into the causes for statistical rejections and that help assess specification errors in the models. In order to avoid a proliferation of symbols, we sometimes use the same symbols to mean different things in different subsections. The definitions should be clear from the context. We conclude with a summary in Section 6. 2. Cross-sectional regression methods for testing beta pricing models In this section we first derive the CAPM and generalize its empirical specification to include multiple-beta models. We then describe the intuitively appealing cross- sectional regression method that was first employed by Black, Jensen, and Scholes (1972, abbreviated here as BJS) and discuss its shortcomings. 2.1. The capital asset pricing model The CAPM was the first equilibrium asset pricing model, and it remains one of the foundations of financial economics. The model was developed by Sharpe (1964), Lintner (1965), Mossin (1966), and Black (1972). There are a huge number of theoretical gapers which refine the necessary assumptions and provide derivations of the CAPM. HeTewe provide-a-brief-review of the theory. Let Rit denote one plus the return on asset i during period t, i = 1,2,..., N. Let Rmt denote the corresponding gross return for the market portfolio of all assets in the economy. The return on the market portfolio envisioned by the theory is not observable. In view of this, empirical studies of the CAPM commonly assume

4 W. E. Ferson and R. Jagannathan that the market return is an exact linear function of the return on an observable portfolio of common stocks.1 Then, according to the CAPM, E(Ru) = So + SiPt (2.1) where ft = Cov(tf;„tfm,)/Var(tfm,). According to the CAPM, the market portfolio with return Rmt is on the minimum-variance frontier of returns. A return is said to be on the minimum- variance frontier if there is no other portfolio with the same expected return but lower variance. If investors are risk averse, the CAPM implies that Rmt is on the positively sloped portion of the minimum-variance frontier, which implies that the coefficient b\ > 0. In equation (2.1), d0 = E(Ro,), where the return i?0i is referred to as a zero-beta asset to Rmt because of the condition Cov(Rot,Rmt) = 0. To derive the CAPM, assume that investors choose asset holdings at each date t — 1 so as to maximize the following one-period objective function: V[E(Rpt\I),Var(Rpl\I)} (2.2) where Rpt denotes the date t return on the optimally chosen portfolio and E(-\I) and Var(-|7) denote the expectation and variance of return, conditional on the information set / of the investor as of time t-\. We assume that the function V[- > •] is increasing and concave in its first argument, decreasing in its second argument, and time-invariant. For the moment we assume that the information set /includes only the unconditional moments of asset returns, and we drop the symbol / to simplify the notation. The first-order conditions for the optimization problem given above can be manipulated to show that the following must hold: E(RU) = E(R0t) + pipE(Rpt - R0I) (2.3) for every asset / = 1, 2, ..., N, where Rpt is the return on the optimally chosen portfolio, Rot is the return on the asset that has zero covariance with Rpt, and pip = Cov(Rit,Rpt)IVar(Rpt). To get from the first-order condition for an investor's optimization problem, as stated in equation (2.3), to the CAPM, it is useful to understand some of the properties of the minimum-variance frontier, that is, the set of portfolio returns with the minimum variance, given their expected returns. It can be readily verified that the optimally chosen portfolio of the investor is on the minimum-variance frontier. One property of the minimum-variance frontier is that it is closed to portfolio formation. That is, portfolios of frontier portfolios are also on the frontier. 1 When this assumption fails, it introduces market proxy error. This source of error is studied by Roll (1977), Stambaugh (1982), Kandel (1984), Kandel and Stambaugh (1987), Shanken (1987), Hansen and Jagannathan (1994), and Jagannathan and Wang (1996), among others. We will ignore proxy error in our discussion.

Econometric evaluation of asset pricing models 5 Suppose that all investors have the same beliefs. Then every investor's optimally chosen portfolio will be on the same frontier, and hence the market portfolio of all assets in the economy - which is a portfolio of every investor's optimally chosen portfolio - will also be on the frontier. It can be shown (Roll 1977) that equation (2.3) will hold if Rpt is replaced by the return of any portfolio on the frontier and RQt is replaced by its corresponding zero-beta return. Hence we can replace an investor's optimal portfolio in equation (2.3) with the return on the market portfolio to get the CAPM, as given by equation (2.1). 2.2. Testable implications of the CAPM Given an interesting collection of assets, and if their expected returns and market- portfolio betas Pi are known, a natural way to examine the CAPM would be to estimate the empirical relation between the expected returns and the betas and see if that relation is linear. However, neither betas nor expected returns are observed by the econometrician. Both must be estimated. The finance literature first attacked this problem by using a two-step, time-series, cross-sectional approach. Consider the following sample analogue of the population relation given in (2.1): Rt = 50 + Sibi + <?,-, i = 1,... ,JV (2.4) which is a cross-sectional regression of i?; on bt, with regression coefficients equal to S0 and ^i- IQ equation (2.4), i?, denotes the sample average return of the asset, i, and bt is the (OLS) slope coefficient estimate from a regression of the return, Rit, over time on the market index return, Rmt; bt is a constant. Let ut = Ri—E(Rit) and vt = Pf-bi. Substituting these relations for E(RU) and /?, in (2.1) leads to (2.4) and specifies the composite error as et = Ut + d^t- This gives rise to a classic errors-in-variables problem, as the regressor bt in the cross-sectional regression model (2.4) is measured with error. Using finite time-series samples for the estimate of bh the regression (2.4) will deliver inconsistent estimates of d0 and di, even with an infinite cross-sectional sample. However, the cross-sectional regression will provide consistent estimates of the coefficients as the time-series sample size T (which is used in the first step to estimate the beta coefficient /},) becomes very large. This is because the first-step estimate of pt is consistent, so as T becomes large, the errors-in-variables problem of the second-stage regression vanishes. The measurement error in beta may be large for individual securities, but it is smaller for portfolios. In view of this fact, early research focused on creating portfolios of securities in such a way that the betas of the portfolios could be estimated precisely. Hence one solution to the errors-in-variables problem is to work with portfolios instead of individual securities. This creates another problem. Arbitrarily chosen portfolios tend to exhibit little dispersion in their betas. If all the portfolios available to the econometrician have the same betas, then equation (2.1) has no empirical content as a cross-sectional relation. Black, Jensen, and Scholes (BJS, 1972) came up with an innovative solution to overcome

6 W. E. Ferson and R. Jagannathan this difficulty. At every point in time for which a cross-sectional regression is run, they estimate betas on individual securities based on past history, sort the securities based on the estimated values of beta, and assign individual securities to beta groups. This results in portfolios with a substantial dispersion in their betas. Similar portfolio formation techniques have become standard practice in the empirical finance literature. Suppose that we can create portfolios in such a way that we can view the errors-in-variables problem as being of second-order importance. We still have to determine how to assess whether there is empirical support for the CAPM. A standard approach in the literature is to consider specific alternative hypotheses about the variables which determine expected asset returns. According to the CAPM, the expected return for any asset is a linear function of its beta only. Therefore, one natural test would be to examine if any other cross-sectional variable has the ability to explain the deviations from equation (2.1). This is the strategy that Fama and MacBeth (1973) followed by incorporating the square of beta and measures of nonmarket (or residual time-series) variance as additional variables in the cross-sectional regressions. More recent empirical studies have used the relative size of firms, measured by the market value of their equity, the ratio of book-to-market-equity, and related variables.2 For example, the following model may be specified: E(Ru) = d0 + SiP, + <5size LMEt (2.5) where LMEt is the natural logarithm of the total market value of the equity capital of firm i. In what follows we will first show that these ideas extend easily to the general multiple-beta model. We will then develop a sampling theory for the cross-sectional regression estimators. 2.3. Multiple-beta pricing models and cross-sectional regression methods According to the CAPM, the expected return on an asset is a linear function of its market beta. A multiple-beta model asserts that the expected return is a linear function of several betas, i.e., E(Rit)=50+ J2 W* (2-6) k=\,...JC where fiik, k — 1,... ,K, are the multiple regression coefficients of the return of asset i on K economy-wide pervasive risk factors, fk,k — \,...,K. The coefficient (5o is the expected return on an asset that has p0k = 0, for k = 1,..., K; i.e., it is the expected return on a zero- (multiple-) beta asset. The coefficient bk, corresponding to the £th factor, has the following interpretation: it is the expected return differential, or premium, for a portfolio that has Pik = 1 and ptJ = 0 for all j =£ k, 2 Fama and French (1992) is a prominent recent example of this approach. Berk (1995) provides a justification for using relative market value and book-to-price ratios as measures of expected returns.

Econometric evaluation of asset pricing models 7 measured in excess of the zero-beta asset's expected return. In other words, it is the expected return premium per unit of beta risk for the risk factor, k. Ross (1976) showed that an approximate version of (2.6) will hold in an arbitrage-free economy. Connor (1984) provided sufficient conditions for (2.6) to hold exactly in an economy with an infinite number of assets in general equilibrium. This version of the multiple-beta model, the exact APT, has received wide attention in the finance literature. When the factors, /&, are observed by the econometrician, the cross-sectional regression method can be used to empirically evaluate the multiple-beta model.3 For example, the alternative hypothesis that the size of the firm is related to expected returns, given the factor betas, may be examined by using cross-sectional regressions of returns on the K factor betas and the LMEh similar to equation (2.5), and by examining whether the coefficient Ssize is different from zero. 2.4. Sampling distributions for coefficient estimators: The two-stage, cross-sectional regression method In this section we follow Shanken (1992) and Jagannathan and Wang (1993, 1996) in deriving the asymptotic distribution of the coefficients that are estimated using the cross-sectional regression method. For the purposes of developing the sampling theory, we will work with the following generalization of equation (2.6): E(**)=f>,^* + f>2*fl* (2.7) fc=0 k=\ where {Aik} are observable characteristics of firm /, which are assumed to be measured without error (the first "characteristic," when k = 0, is the constant 1.0). One of the attributes may be the size variable LMEt. The ft are regression betas on a set of A^ economic risk factors, which may include the market index return. Equation (2.7) can be written more compactly using matrix notation as li=Xy (2.8) where Rt = [Rlt,... ,Rm], A* = E(Rt),X = [A : ft1, and the definition of the matrices A and ji and the vector y follow from (2.7). The cross-sectional method proceeds in two stages. First, ft is estimated by time-series regressions of Ru on the risk factors and a constant. The estimates are denoted by b. Let x = [A : b], and let R denote the time-series average of the return vector Rt. Let g denote the estimator of the coefficient vector obtained from the following cross-sectional regression: g = (x'x)~xx!R (2.9) 3 See Chen (1983), Connor and Korajczyk (1986), Lehmann and Modest (1987), and McElroy and Burmeister (1988) for discussions on estimating and testing the model when the factor realizations are not observable under some additional auxiliary assumptions.

8 W. E. Ferson and R. Jagannathan where we assume that x is of rank I + K\ + K2. If b and R converge respectively to fi and E(Rt) in probability, then g will converge in probability to y. Black, Jensen, and Scholes (1972) suggest estimating the sampling errors associated with the estimator, g, as follows. Regress Rt on x at each date / to obtain gt, where g, = {x'x)~lx'Rt . (2.10) The BJS estimate of the covariance matrix of Tl! (g — y) is given by v = T-1'£(gt-g)(gt-g)' (2.11) t which uses the fact that g is the sample mean of the gt's. Substituting the expression for gt given in (2.10) into the expression for v given in (2.11) gives v = (x'x)-lx'[T-l^2(Rt - R)(Rt - R)'}x{x'XyX . (2.12) t To analyze the BJS covariance matrix estimator, we write the average return vector, R, as R=xy + (R-n) -{x-X)y . (2.13) Substitute this expression for R into the expression for g in (2.9) to obtain g-y = (x'xylx'[(R - ») - (b - fi)y2] . (2.14) Assume that b is a consistent estimate of ft and that TXI2{R — fi) —>d u and Txl2{b — fi) —>,/ h, where u and h are random variables with well-defined distributions and —></ indicates convergence in distribution. We then have Tl/2(g - y) -^d (x'x)~lx'u - (x'x)~lx'hy2 . (2.15) In (2.15) the first term on the right side is that component of the sampling error that arises from replacing u by the sample average R. The second term is the component of the sampling error that arises due to replacing fi by its estimate b. The usual consistent estimate of the asymptotic variance of u is given by T-lJ2(Rt-R)(Rt-R)' . (2.16) t Therefore, a consistent estimate of variance of the first term in (2.15) is given by (x'x)-lx'[T-1 J2(Rt -R)(Rt -R)']x{x'xYx t which is the same as the expression for the BJS estimate for the covariance matrix of the estimated coefficients v, given in (2.12). Hence if we ignore the sampling error that arises from using estimated betas, then the BJS covariance estimator

Econometric evaluation of asset pricing models 9 provides a consistent estimate of the variance of the estimator g. However, if the sampling error associated with the betas is not small, then the BJS covariance estimator will have a bias. While it is not possible to determine the magnitude of the bias in general, Shanken (1992) provides a method to assess the bias under additional assumptions.4 Consider the following univariate time-series regression for the return of asset i on a constant and the kth economic factor: Ru = ocik + Pikfkt + eat . (2.17) We make the following additional assumptions about the error terms in (2.17): (1) the error £,& is mean zero, conditional on the time series of the economic factors fk; (2) the conditional covariance of eikt and £/&, given the factors, is a fixed constant <7y«. We denote the matrix of the {ffywjy by Z«. Finally, we assume that (3) the sample covariance matrix of the factors exists and converges in probability to a constant positive definite matrix Q, with the typical element Q«. Theorem 2.1. (Shanken, 1992/Jagannathan and Wang, 1996) Txl2(g — y) converges in distribution to a normally distributed random variable with zero mean and covariance matrix V + W, where V is the probability limit of the matrix v given in (2.12) and W= Y. {^x)-xX'{y2ky2l{Qr^Ilkl^)}x{xlx)-x (2.18) l,k=\,...fa where JJW is defined in the appendix. Proof. See the appendix. Theorem 2.1 shows that in order to obtain a consistent estimate of the co- variance matrix of the BJS two-step estimator g, we first estimate v (a consistent estimate of V) by using the BJS method. We then estimate W by its sample analogue. Although the cross-sectional regression method is intuitively very appealing, the above discussion shows that in order to assess the sampling errors associated with the parameter estimators, we need to make rather strong assumptions. In addition, the econometrician must take a stand on a particular alternative hypothesis against which to reject the model. The general approach developed in Section 4 below has, among its advantages, weaker statistical assumptions and the ability to handle both unspecified as well as specific alternative hypotheses. 4 Shanken (1992) uses betas computed from multiple regressions. The derivation which follows uses betas computed from univariate regressions, for simplicity of exposition. The two sets of betas are related by an invertible linear transformation. Alternatively, the factors may be orthogonalized without loss of generality.

10 W. E. Ferson and R. Jagannathan 3. Asset pricing models and stochastic discount factors Virtually all financial asset pricing models imply that any gross asset return Ri,t+i, multiplied by some market-wide random variable mt+1, has a constant conditional expectation: E, {mt+iRitt+i} = l,all i. (3.1) The notation Et{} will be used to denote the conditional expectation, given a market-wide information set. Sometimes it will be convenient to refer to expectations conditional on a subset Z, of the market information, which are denoted as E(-1 Zt). For example, Zt can represent a vector of instrumental variables for the public information set which are available to the econometrician. When Zt is the null information set, the unconditional expectation is denoted as E(-). If we take the expected values of equation (3.1), it follows that versions of the same equation must hold for the expectations E(-1Zt) and E(). The random variable mt+\ has various names in the literature. It is known as a stochastic discount factor, an equivalent martingale measure, a Radon-Nicodym derivative, or an intertemporal marginal rate of substitution. We will refer to an mt + i which satisfies (3.1) as a valid stochastic discount factor. The motivation for use of this term arises from the following observation. Write equation (3.1) as Pit = E,{m,+iXij+i} where Xiyt+1 is the payoff of asset i at time t + 1 (the market value plus any cash payments) and R^t+i = Xif+\jPit. Equation (3.1) says that if we multiply a future payoff Xtj+i by the stochastic discount factor mt+\ and take the expected value, we obtain the present value of the future payoff. The existence of an mt+\ that satisfies (3.1) says that all assets with the same payoffs have the same price (i.e., the law of one price). With the restriction that mt+1 is a strictly positive random variable, equation (3.1) becomes equivalent to a no-arbitrage condition. The condition is that all portfolios of assets with payoffs that can never be negative, but are positive with positive probability, must have positive prices. The no-arbitrage condition does not uniquely identify mt+\ unless markets are complete, which means that there are as many linearly independent payoffs available in the securities markets as there are states of nature at date t + 1. To obtain additional insights about the stochastic discount factor and the no-arbitrage condition, assume for the moment that the markets are complete. Given complete markets, positive state prices are required to rule out arbitrage opportunities.5 Let qts denote the time t price of a security that pays one unit at date t + 1 if, and only if, the state of nature at t + 1 is s. Then the time t price of a 5 See Debreu (1959) and Arrow (1970) for models of complete markets. See Beja (1971), Rubinstein (1976), Ross (1977), Harrison and Kreps (1979), and Hansen and Richard (1987) for further theoretical discussions.

Econometric evaluation of asset pricing models 11 security that promises to pay {XiiS>l+l} units at date t + 1, as a function of the state of nature s, is given by 2_^1ts^i,s,t+\ = / Kts{<ltslftts)Xi,s,t+\ s s where nts is the probability, as assessed at time t, that state s occurs at time t + 1. Comparing this expression with equation (3.1) shows that mStt+\ = qts/nts is the value of the stochastic discount factor in state s, under the assumption that the markets are complete. Since the probabilities are positive, the condition that the random variable defined by {mst+\} is strictly positive is equivalent to the condition that all state prices are positive. Equation (3.1) is convenient for developing econometric tests of asset pricing models. Let Rt+\ denote the vector of gross returns on the N assets on which the econometrican has observations. Then (3.1) can be written as E{Rt+lmt+l} - I =0 (3.2) where 1_ denotes the N vector of ones and 0 denotes the N vector of zeros. The set of N equations given in (3.2) will form the basis for tests using the generalized method of moments. It is the specific form of mt+ \ implied by a model that gives the equation empirical content. 3.1. Stochastic discount factor representations of the CAPM and multiple-beta asset pricing models Consider the CAPM, as given by equation (2.1): E(Rit+l) = d0 + dlpi where ft = Cov(Ri!+uRmt+i)/Yar(Rmt+i) . The CAPM can also be expressed in the form of equation (3.1), with a particular specification of the stochastic discount factor. To see this, expand the expected product in (3.1) into the product of the expectations plus the covariance, and then rearrange to obtain E(Ri[+l) = l/E(i»r+i) + Cov(Rit+i;-mt+i/E(mt+l)) . (3.3) Equating terms in equations (2.1) and (3.3) shows that the CAPM of equation (2.1) is equivalent to a version of equation (3.1), where E(Rit+iml+i) = 1 where mt+\ = cq - c\Rmt+\ c0 = [1 +E(Rmt+l)dl/Yar(Rml+l)}/So (3.4)

12 W. E. Ferson and R. Jagannathan and ci = <5i/[<50Var(tfm(+i)]. Equation (3.4) was originally derived by Dybvig and Ingersoll (1982). Now consider the following multiple-beta model which was given in equation (2.6): E(Rlt+i) = S0 + ]T dkpik . k=\,...,K It can be readily verified by substitution that this model implies the following stochastic discount factor representation: E(Rit+lmit+l) = 1 where mu+i = Co + ci/h+i + ■ • • + cKfKt+i with co = [1 + £{4E(/*)/Var(/t)}]/«5o (3.5) k and cj = - {8j/50Var(fj)}, j=l,...,K . The preceding results apply to the CAPM and multiple-beta models, interpreted as statements about the unconditional expected returns of the assets. These models are also interpreted as statements about conditional expected returns in some tests where the expectations are conditioned on predetermined, publicly available information. All of the analysis of this section can be interpreted as applying to conditional expectations, with the appropriate changes in notation. In this case, the parameters c0, cu S0, Su etc., will be functions of the time / information set. 3.2. Other examples of stochastic discount factors In equilibrium asset pricing models, equation (3.1) arises as a first-order condition for a consumer-investor's optimization problem. The agent maximizes a lifetime utility function of consumption (including possibly a bequest to heirs). Denote this function by V(-). If the allocation of resources to consumption and to investment assets is optimal, it is not possible to obtain higher utility by changing the allocation. Suppose that an investor considers reducing consumption at time / to purchase more of (any) asset. The utility cost at time / of the forgone consumption is the marginal utility of consumption expenditures Ct, denoted by (dV jdCt) > 0, multiplied by the price Pu of the asset, measured in the same units as the consumption expenditures. The expected utility gain of selling the share and consuming the proceeds at time / + 1 is

Econometric evaluation of asset pricing models 13 Et{{Pif+l+Dtt+l){dVldCt+l)} where A,*+i is the cash flow or dividend paid at time t+l. If the allocation maximizes expected utility, the following must hold: p^Midv/dc,)} = e,{(/Vh +Dit!+l)(dv/dct+l)}. This intertemporal Euler equation is equivalent to equation (3.1), with mt+l = (dv/dct+l)/Et{(dv/dct)} . (3.6) The mt+i in equation (3.6) is the intertemporal marginal rate of substitution (IMRS) of the representative consumer. The rest of this section shows how many models in the asset pricing literature are special cases of (3.1), where mt+\ is defined by equation (3.6).6 If a representative consumer's lifetime utility function V(-) is time-separable, the marginal utility of consumption at time t, (dV/dCt), depends only on variables dated at time t. Lucas (1978) and Breeden (1979) derived consumption-based asset pricing models of the following type, assuming that the preferences are time- separable and additive: V = ^P'u(Ct) t where ft is a time discount parameter and «(■) is increasing and concave in current consumption Ct. A convenient specification for w() is u(C) = [Cl-« - 1]/(1 - a) . (3.7) In equation (3.7), a > 0 is the concavity parameter of the period utility function. This function displays constant relative risk aversion equal to a.7 Based on these assumptions and using aggregate consumption data, a number of empirical studies test the consumption-based asset pricing model.8 Dunn and Singleton (1986) and Eichenbaum, Hansen, and Singleton (1988), among others, model consumption expenditures that may be durable in nature. Durability introduces nonseparability over time, since the flow of consumption services depends on the consumer's previous expenditures, and the utility is de- 6 Asset pricing models typically focus on the relation of security returns to aggregate quantities. It is therefore necessary to aggregate the Euler equations of individuals to obtain equilibrium expressions in terms of aggregate quantities. Theoretical conditions which justify the use of aggregate quantities are discussed by Gorman (1953), Wilson (1968), Rubinstein (1974), Constantinides (1982), Lewbel (1989), Luttmer (1993), and Constantinides and Duffle (1994). 7 Relative risk aversion in consumption is defined as -Cw"(C)/w'(C). Absolute risk aversion is —u"(C)/u'(C), where a prime denotes a derivative. Ferson (1983) studies a consumption-based asset pricing model with constant absolute risk aversion. 8 Substituting (3.7) into (3.6) shows that m,+1 = /3(C,+i/'C,)~a:. Empirical studies of this model include Hansen and Singleton (1982, 1983), Ferson (1983), Brown and Gibbons (1985), Jagannathan (1985), Ferson and Merrick (1987), and Wheatley (1988).

14 W. E. Ferson and R. Jagannathan fined over the services. Current expenditures increase the consumer's future utility of services if the expenditures are durable. The consumer optimizes over the expenditures Ct; thus, durability implies that the marginal utility, (dV/dCt), depends on variables dated other than date t. Another form of time-nonseparability arises if the utility function exhibits habit persistence. Habit persistence means that consumption at two points in time are complements. For example, the utility of current consumption is evaluated relative to what was consumed in the past. Such models are derived by Ryder and Heal (1973), Becker and Murphy (1988), Sundaresan (1989), Constantinides (1990), Detemple and Zapatero (1991), and Novales (1992), among others. Ferson and Constantinides (1991) model both the durability of consumption expenditures and habit persistence in consumption services. They show that the two combine as opposing effects. In an example where the effect is truncated at a single lag, the derived utility of expenditures is V=(l~arlJ2P'(Ct + bCt^0i . (3.8) t The marginal utility at time t is (dv/dct) = p{c, + bc-x)-" + pt+lbEt {(ct+y + bctya} . (3.9) The coefficient b is positive and measures the rate of depreciation if the good is durable and there is no habit persistence. If habit persistence is present and the good is nondurable, this implies that the lagged expenditures enter with a negative effect (b < 0). Ferson and Harvey (1992) and Heaton (1995) consider a form of time-nonseparability which emphasizes seasonality. The utility function is (l-ay^PiQ + bQ-rf-" t where the consumption expenditure decisions are assumed to be quarterly. The subsistence level (in the case of habit persistence) or the flow of services (in the case of durability) is assumed to depend only on the consumption expenditure in the same quarter of the previous year. Abel (1990) studies a form of habit persistence in which the consumer evaluates current consumption relative to the aggregate consumption in the previous period, consumption that he or she takes as exogenous. The utility function is like equation (3.8), except that the "habit stock," bCt-\, refers to the aggregate consumption. The idea is that people care about "keeping up with the Joneses." Campbell and Cochrane (1995) also develop a model in which the habit stock is taken as exogenous by the consumer. This approach results in a simpler and more tractable model, since the consumer's optimization does not have to take account of the effects of current decisions on the future habit stock. Epstein and Zin (1989, 1991) consider a class of recursive preferences which can be written as Vt =F(Ct,CEQt{Vt+i)). CEQt{-) is a time t "certainty equiva-

Econometric evaluation of asset pricing models 15 lent" for the future lifetime utility Vt+\. The function F(-,CEQt(-)) generalizes the usual expected utility function of lifetime consumption and may be time-non- separable. Epstein and Zin (1989) study a special case of the recursive preference model in which the preferences are V, = [(1 - P)C? + pEt(V?-nP/(l~x)]l/p ■ (3-10) They show that when p =£ 0 and 1 - a =£ 0, the IMRS for a representative agent becomes [P(Ct+ilCt)p-l}(l-*)lp{Rm,t+x}({l-a-p)lp) (3.11) where Rm>t+\ is the gross market portfolio return. The coefficient of relative risk aversion for timeless consumption gambles is a, and the elasticity of substitution for deterministic consumption is (1 — p)~ . If a = 1 — p, the model reduces to the time-separable, power utility model. If a = 1, the log utility model of Rubinstein (1976) is obtained. In summary, many asset pricing models are special cases of the equation (3.1). Each model specifies that a particular function of the data and the model parameters is a valid stochastic discount factor. We now turn to the issue of estimating the models stated in this form. 4. The generalized method of moments In this section we provide an overview of the generalized method of moments and a brief review of the associated asymptotic test statistics. We then show how the GMM is used to estimate and test various specifications of asset pricing models. 4.1. An overview of the generalized method of moments in asset pricing models Let xt+\ be a vector of observable variables. Given a model which specifies mt+\ = m(8,xt+\), estimation of the parameters 8 and tests of the model can then proceed under weak assumptions, using the GMM as developed by Hansen (1982) and illustrated by Hansen and Singleton (1982) and Brown and Gibbons (1985). Define the following model error term: Uij+i =m{e,xt+])Rut+i - 1 . (4.1) The equation (3.1) implies that Et{uiit+i} = 0 for all i. Given a sample of TV assets and T time periods, combine the error terms from (4.1) into a T x N matrix u, with typical row u't+1. By the law of iterated expectations, the model implies that E(iijj+i \Zt) = 0 for all / and t (for any Zt in the information set at time t), and therefore E(ut+\Zt) = 0 for all t. The condition E(ut+\Zt) = 0 says that ut+\ is orthogonal to Zt and is therefore called an orthogonality condition. These or-

16 W. E. Ferson and R. Jagannathan thogonality conditions are the basis of tests of asset pricing models using the GMM. A few points deserve emphasis. First, GMM estimates and tests of asset pricing models are motivated by the implication that E(«,-]t+i \Zt) = 0, for any Zt in the information set at time t. However, the weaker condition H(ut+\Zt) = 0, for a given set of instruments Zt, is actually used in the estimation. Therefore, GMM tests of asset pricing models have not exploited all of the predictions of the theories. We believe that further refinements to exploit the implications of the theories more fully will be useful. Empirical work on asset pricing models relies on rational expectations, interpreted as the assumption that the expectation terms in the model are mathematical conditional expectations. For example, the rational expectations assumption is used when the expected value in equation (3.1) is treated as a mathematical conditional expectation to obtain expressions for E(|Z) and E(). Rational expectations implies that the difference between observed realizations and the expectations in the model should be unrelated to the information that the expectations are conditioned on. Equation (3.1) says that the conditional expectation of the product of mt+\ and Ritt+i is the constant 1.0. Therefore, the error term 1 — mt+\Riit+\ in equation (4.1) should not be predictably different from zero when we use any information available at time t. If there is variation over time in a return Ri^+\ that is predictable using instruments Z„ the model implies that the predictability is removed when Ri,t+\ is multiplied by a valid stochastic discount factor, mt+\. This is the sense in which conditional asset pricing models are asked to "explain" predictable variation in asset returns. This idea generalizes the "random walk" model of stock values, which implies that stock returns should be completely unpredictable. That model is a special case which can be motivated by risk neutrality. Under risk neutrality the IMRS is a constant. In this case, equation (3.1) implies that the return Riit+\ should not differ predictably from a constant. GMM estimation proceeds by defining an NxL matrix of sample mean orthogonality conditions, G — (u'Z/T), and letting g = vec(G), where Z is a TxL matrix of observed instruments with typical row Z/, a subset of the available information at time t? The vec(-) operator means to partition G into row vectors, each of length L: (hu h2, ..., hN). Then one stacks the h's into a vector, g, with length equal to the number of orthogonality conditions, NL. Hansen's (1982) GMM estimates of 8 are obtained by searching for parameter values that make g close to zero by minimizing a quadratic form g'Wg, where W is an NLxNL weighting matrix. Somewhat more generally, let ut+\{&) denote the random TV vector Rt+xm(0,x,+\)-\, and define gT(0) = T~%(u,(8) <8>Zt_i). Let 8T denote the parameter values that minimize the quadratic form c/tAt9t, where AT is any positive definite NLxNL matrix that may depend on the sample, and let JT 9 This section assumes that the same instruments are used for each of the asset equations. In general, each asset equation could use a different set of instruments, which complicates the notation.

Econometric evaluation of asset pricing models 17 denote the minimized value of the quadratic form g'TArgT- Jagannathan and Wang (1993) show that JT will have a weighted chi-square distribution which can be used for testing the hypothesis that (3.1) holds. Theorem 4.1. (Jagannathan and Wang, 1993). Suppose that the matrix AT converges in probability to a constant positive definite matrix A. Assume also that VTgriOo) -+d N(0,S), where iV(-> •) denotes the multivariate normal distribution, do are the true parameter values, and S is a positive definite matrix. Let D = E[dgT/de}\g=9o and let Q= (Sl'2)(Al?2)[I - {All2)'D{D'AD)-lD'{All2)\(All2){Sll2) where A1'2 and S1'2 are the upper triangular matrices from the Cholesky decompositions of A and S. Then the matrix Q has NL-d\m(Q) nonzero, positive eigenvalues. Denote these eigenvalues by 1;, i = 1,2,..., NL-dim(6). Then JT converges to ^lZl + • • • + ^NL-dim(9)XNL-dim(e) where Xi, i = 1,2,..., NL-dim(6) independent random variables, each with a Chi- Square distribution with one degree of freedom. Proof. See Jagannathan and Wang (1993). Notice that when the matrix A is W = S~l, the matrix Q is idempotent of rank NL-dim(6). Hence the nonzero eigenvalues of Q are unity. In this case, the asymptotic distribution reduces to a simple chi-square distribution with NL- dim(0) degrees of freedom. This is the special case considered by Hansen (1982), who originally derived the asymptotic distribution of the /r-statistic. The JT- statistic and its extension, as provided in Theorem 4.1, provide a goodness-of-fit test for models estimated by the GMM. Hansen (1982) shows that the estimators of 6 that minimize g'Wg are consistent and asymptotically normal, for any fixed W. If the weighting matrix Wis chosen to be the inverse of a consistent estimate of the covariance matrix of the orthogonality conditions S, the estimators are asymptotically efficient in the class of estimators that minimize g'Wg for fixed Ws. The asymptotic variance matrix of this optimal GMM estimator of the parameter vector is given as Cov(0) = [E(dg/de)'WE(dg/de)]-1 (4.2) where dg/86 is an NLxdim(6) matrix of derivatives. A consistent estimator for the asymptotic covariance of the sample mean of the orthogonality conditions is used in practice. That is, we replace W in (4.2) with Cov(#)-1 and replace E(dg/d6) with its sample analogue. An example of a consistent estimator for the optimal weighting matrix is given by Hansen (1982) as

18 W. E. Ferson and R. Jagannathan Cov(gr) = [(l/r)£ 5>n-i«;+w) ® (Z^)] (4.3) where <g> denotes the Kronecker product. A special case that often proves useful arises when the orthogonality conditions are not serially correlated. In that special case, the optimal weighting matrix is the inverse of the matrix Cov(gr), where Cov(gr) = [(1/T) 5>,+i«;+1) ® (ZtZ't)} . (4.4) t The GMM weighting matrices originally proposed by Hansen (1982) have some drawbacks. The estimators are not guaranteed to be positive definite, and they may have poor finite sample properties in some applications. A number of studies have explored alternative estimators for the GMM weighting matrix. A prominent example by Newey and West (1987a) suggests weighting the autocovariance terms in (4.3) with Bartlett weights to achieve a positive semi-definite matrix. Additional refinements to improve the finite sample properties are proposed by Andrews (1991), Andrews and Monahan (1992), and Ferson and Foerster (1994). 4.2. Testing hypotheses with the GMM As we noted above, the /^-statistic provides a goodness-of-fit test for a model that is estimated by the GMM, when the model is overidentified. Hansen's JT- statistic is the most commonly used test in the finance literature that has used the GMM. Other standard statistical tests based on the GMM are also used in the finance literature for testing asset pricing models. One is a generalization of the Wald test, and a second is analogous to a likelihood ratio test statistic. Additional test statistics based on the GMM are reviewed by Newey (1985) and Newey and West (1987b). For the Wald test, consider the hypothesis to be tested as expressed in the M- vector valued function H{9) = 0, where M < dim(0). The GMM estimates of 9 are asymptotically normal, with mean 9 and variance matrix Cov(0). Given standard regularity conditions, it follows that the estimates of H are asymptotically normal, with mean zero and variance matrix HeCov(9)H'e, where subscripts denote partial derivatives, and that the quadratic form TH>[HeCov(9)H'e]~lH is asymptotically chi-square, providing a standard Wald test. A likelihood ratio type test is described by Newey and West (1987b), Eichen- baum, Hansen, and Singleton (1988, appendix C), and Gallant (1987). Newey and West (1987b) call this the D test. Assume that the null hypothesis implies that the orthogonality conditions E(gr*) — 0 hold, while, under the alternative, a subset E(gr) — 0 hold. For example, g* = (g, h). When we estimate the model under the null hypothesis, the quadratic form g*'W*g* is minimized. Let W\x be the upper left block of W; that is, let it be the estimate of Cov (g)'1 under the null. When we

Econometric evaluation of asset pricing models 19 hold this matrix fixed the model can be estimated under the alternative by minimizing (/W^ g. The difference of the two quadratic forms T[g*'Wg* - g'W*ng] is asymptotically chi-square, with degrees of freedom equal to M if the null hypothesis is true. Newey and West (1987b) describe additional variations on these tests. 4.3. Illustrations: Using the GMM to test the conditional CAPM The CAPM imposes nonlinear overidentifying restrictions on the first and second moments of asset returns. These restrictions can form a basis for econometric tests. To see these restrictions more clearly, notice that if an econometrician knows or can estimate Cov(Rit,Rmt), E(Rmt), Var(i?mr), and E(R0t), it is possible to compute E(Rit) from the CAPM, using equation (2.1). Given a direct sample estimate of E(Rit), the expected return is overidentified. It is possible to use the overidentification to construct a test of the CAPM by asking if the expected return on the asset is different from the expected return assigned by the model. In this section we illustrate such tests by using both the traditional, return-beta formulation and the stochastic discount factor representation of the CAPM. These examples extend easily to the multiple-beta models. 4.3.1. Static or unconditional CAPMs If we make the assumption that all the expectation terms in the CAPM refer to the unconditional expectations, we have an unconditional version of the CAPM. It is straightforward to estimate and then test an unconditional version of the CAPM, using equation (3.1) and the stochastic discount factor representation given in equation (3.4). The stochastic discount factor is mt+\ = co + c\Rmt+\ where c0 and c\ are fixed parameters. Using only the unconditional expectations, the model implies that E{(c0+ci^mr+i)^+i-l} = 0 where Rt+i is the vector of gross asset returns. The vector of sample orthogonality conditions is gr = gT(co,ci) = (l/T)^2{(c0 + ciRmt+l)Rt+l - 1} . t With assets N > 2, the number of orthogonality conditions is N and the number of parameters is 2, so the /^-statistic has N - 2 degrees of freedom. Tests of the unconditional CAPM using the stochastic discount factor representation are conducted by Carhart et al. (1995) and Jagannathan and Wang (1996), who reject the model using monthly data for the postwar United States.

20 W. E. Ferson and R. Jagannathan Tests of the unconditional CAPM may also be conducted using the linear, return-beta formulation of equation (2.1) and the GMM. Let rt = Rt-.%! be the vector of excess returns, where i?or is the gross return on some reference asset and 1 is an N vector of ones; also let ut = rt- f5rmt, where /? is the N vector of the betas of the excess returns, relative to the market, and rmt = Rmt - Rot is the excess return on the market portfolio. The model implies that E(ut) = E(utrmt) = 0 . Let the instruments be Zt = (l,rmt)'. The sample orthogonality condition is then gT(P) = T-lJ2(rt-Prmt)®Zt . t The number of orthogonality conditions is 2N and the number of parameters is N, so the model is overidentified and may be tested using the /r-statistic. An alternative approach to testing the model using the return-beta formulation is to estimate the model under the hypothesis that expected returns depart from the predictions of the CAPM by a vector of parameters a, which are called Jensen's alphas. Redefining ut = rt — a — firmt, the model has 2N parameters and 2N orthogonality conditions, so it is exactly identified. It is easy to show that the GMM estimators of a and /? are the same as the OLS estimators, and equation (4.4) delivers White's (1980) heteroskedasticity-consistent standard errors. The CAPM may be tested using a Wald test or the D-statistic, as described above. Tests of the unconditional CAPM using the linear return-beta formulation are conducted with the GMM by MacKinlay and Richardson (1991), who reject the model for monthly U.S. data. 4.3.2. Conditional CAPMs Empirical studies that rejected the unconditional CAPM, as well as mounting evidence of predictable variation in the distribution of security rates of return, led to empirical work on conditional versions of the CAPM starting in the early 1980s. In a conditional asset pricing model it is assumed that the expectation terms in the model are conditional expectations, given a public information set that is represented by a vector of predetermined instrumental variables Zt. The multiple- beta models of Merton (1973) and Cox, Ingersoll, and Ross (1985) are intended to accommodate conditional expectations. Merton (1973, 1980) and Cox-Ingersoll- Ross also showed how a conditional version of the CAPM may be derived as a special case of their intertemporal models. Hansen and Richard (1987) describe theoretical relations between conditional and unconditional versions of mean- variance efficiency. The earliest empirical formulations of conditional asset pricing models were the latent variable models developed by Hansen and Hodrick (1983) and Gibbons and Ferson (1985) and later refined by Campbell (1987) and Ferson, Foerster, and Keim (1993). These models allow time-varying expected returns, but maintain the assumption that the conditional betas are fixed parameters. Consider the

Econometric evaluation of asset pricing models 21 linear, return-beta representation of the CAPM under these assumptions, writing E(r(|Z(_i) = /}E(r„,f|Zf_i). The returns are measured in excess of a risk-free asset. Let r1( be some reference asset with nonzero /?i, so that E(r1(|Z(_1)=jS1E(rm(|Z(_1) . Solving this expression for E(rmt\Zt-i) and substituting, we have E(r,|Zi_i) = CE(n,|£_i) where C= (j3.//?i) and ./ denotes element-by-element division. With this substitution, the expected market risk premium is the latent variable in the model, and C is the N vector of the model parameters. When we form the error term ut = rt — Cr\t, the model implies E(«f|Zf_i) = 0 and we can estimate and test the model by using the GMM. Gibbons and Ferson (1985) argued that the latent variable model is attractive in view of the difficulties in measuring the true market portfolio, but Wheatley (1989) emphasized that it remains necessary to assume that ratios of the betas, measured with respect to the unobserved market portfolio, are constant parameters. Campbell (1987) and Ferson and Foerster (1995) show that a single-beta latent variable model is rejected in U.S. data. This finding rejects the hypothesis that there is a (conditional) minimum-variance portfolio such that the ratios of conditional betas on this portfolio are fixed parameters. Therefore, the empirical evidence suggests that conditional asset pricing models should be consistent with either (1) a time-varying beta or (2) more than one beta for each asset.10 Conditional, multiple-beta models with constant betas are examined empirically by Ferson and Harvey (1991), Evans (1994), and Ferson and Korajczyk (1995). They reject such models with the usual statistical tests but find that they still capture a large fraction of the predictability of stock and bond returns over time. When allowing for time-varying betas, these studies find that the time- variation in betas contributes a relatively small amount to the time-variation in expected asset returns. Intuition for this finding can be obtained by considering the following approximation. Suppose that time-variation in expected excess returns is E(r|Z) = A/?, where X is a vector of time-varying expected risk premiums for the factors and /? is a matrix of time-varying betas. Using a Taylor series, we can approximate Var[E(r|Z)] « E(jS)'Var[l]E(jS) + E(l)'Var[jS]E(l) . The first term in the decomposition reflects the contribution of the time-varying risk premiums; the second reflects the contribution of time-varying betas. Since the average beta E(fi) is on the order of 1.0 in monthly data, while the average risk 1 A model with more than one fixed beta, and with time-varying risk premiums, is generally consistent with a single, time-varying beta for each asset. For example, assume that there are two factors with constant betas and time-varying risk premiums, where a time-varying combination of the two factors is a minimum-variance portfolio.

22 W. E. Ferson and R. Jagannathan premium E(l) is typically less than 0.01, the first term dominates the second term. This means that time-variation in conditional betas is less important than time- variation in expected risk premiums, from the perspective of modeling predictable variation in expected asset returns. While from the perspective of modeling predictable time-variation in asset returns, time-variation in conditional betas is not as important as time-variation in expected risk premiums, this does not imply that beta variation is empirically unimportant. From the perspective of modeling the cross-sectional variation in expected asset returns, beta variation over time may be very important. To see this, consider the unconditional expected excess return vector, obtained from the model as E{E(r|Z)} = E{ljS} = E(A)EQS) + Cov(l, J?) . Viewed as a cross-sectional relation, the term Cov(l, /?) may vary significantly in a cross section of assets. Therefore, the implications of a conditional version of the CAPM for the cross section of unconditional expected returns may depend importantly on common time-variation in betas and expected market risk premiums. The empirical tests of Jagannathan and Wang (1996) suggest that this is the case. Harvey (1989) replaced the constant beta assumption with the assumption that the ratio of the expected market premium to the conditional market variance is a fixed parameter, as in E(rw,|Z,_i)/Var(r,„,|Z,_i) = y . The conditional expected returns may then be written according to the conditional CAPM as E(r,\Z,-i) = yCo\(r„rmt\Z,-i) . Harvey's version of the conditional CAPM is motivated by Merton's (1980) model in which the ratio y, called the market price of risk, is equal to the relative risk aversion of a representative investor in equilibrium. Harvey also assumes that the conditional expected risk premium on the market (and the conditional market variance, given fixed y) is a linear function of the instruments, as in E(rwt|Zt_i) = &'mZ,-\ where bm is a coefficient vector. Define the error terms vt = rmt — 5'mZt-i and wt = rt(\ - vty). The model implies that the stacked error term ut = (vt,wt) satisfies E(wt|Zf_i) = 0, so it is straightforward to estimate and then test the model using the GMM. Harvey (1989) rejects this version of the conditional CAPM for monthly data in the U.S. In Harvey (1991) the same formulation is rejected when applied using a world market portfolio and monthly data on the stock markets of 21 developed countries. The conditional CAPM may be tested using the stochastic discount factor representation given by equation (3.4): mt+\ — cq( - cuRmt+\. In this case the

Econometric evaluation of asset pricing models 23 coefficients c0, and cu are measurable functions of the information set Zt. To implement the model empirically it is necessary to specify functional forms for the c0t and c\t. From the expression (3.4) it can be seen that these coefficients are nonlinear functions of the conditional expected market return and its conditional variance. As yet there is no theoretical guidance for specifying the functional forms. Cochrane (1996) suggests approximating the coefficients using linear functions, and this approach is followed by Carhart et al. (1995), who reject the conditional CAPM for monthly U.S. data. Jagannathan and Wang (1993) show that the conditional CAPM implies an unconditional two-factor model. They show that mt+i =a0+ fliE(rm,+i \It) + Rmt+\ (where /, denotes the information set of investors and ao and a.\ are fixed parameters) is a valid stochastic discount factor in the sense that E(i?^+im,+1) = 1 for this choice of mt+\. Using a set of observable instruments Z„ and assuming that E(rmH-i \Zt) is a linear function of Zt, they find that their version of the model explains the cross section of unconditional expected returns better than does an unconditional version of the CAPM. Bansal and Viswanathan (1993) develop conditional versions of the CAPM and multiple-factor models in which the stochastic discount factor m,+ 1 is a nonlinear function of the market or factor returns. Using nonparametric methods, they find evidence to support the nonlinear versions of the models. Bansal, Hsieh, and Viswanathan (1993) compare the performance of nonlinear models with linear models, using data on international stocks, bonds, and currency returns, and they find that the nonlinear models perform better. Additional empirical tests of the conditional CAPM and multiple-beta models, using stochastic discount factor representations, are beginning to appear in the literature. We expect that future studies will further refine the relations among the various empirical specifications. 5. Model diagnostics We have discussed several examples of stochastic discount factors corresponding to particular theoretical asset pricing models, and we have shown how to test whether these models assign the right expected returns to financial assets. The stochastic discount factors corresponding to these models are particular parametric functions of the data observed by the econometrician. While empirical studies based on these parametric approaches have led to interesting insights, the parametric approach makes strong assumptions about the economic environment. In this section we discuss some alternative econometric approaches to the problem of asset pricing models.

24 W. E. Ferson and R. Jagannathan 5.1. Moment inequality restrictions Hansen and Jagannathan (1991) derive restrictions from asset pricing models while assuming as little structure as possible. In particular, they assume that the financial markets obey the law of one price and that there are no arbitrage opportunities. These assumptions are sufficient to imply that there exists a stochastic discount factor mt+i (which is almost surely positive, if there is no arbitrage) such that equation (3.1) is satisfied. Note that if the stochastic discount factor is a degenerate random variable (i.e., a constant), then equation (3.1) implies that all assets must earn the same expected return. If assets earn different expected returns, then the stochastic discount factor cannot be a constant. In other words, cross-sectional differences in expected asset returns carry implications for the variance of any valid stochastic discount factor, which satisfies equation (3.1). Hansen and Jagannathan make use of this observation to derive a lower bound on the volatility of stochastic discount factors. Shiller (1979, 1981), Singleton (1980), and Leroy and Porter (1981) derive a related volatility bound in specific models, and their empirical work suggests that the stochastic discount factors implied by these simple models are not volatile enough to explain expected returns across assets. Hansen and Jagannathan (1991) show how to use the volatility bound as a general diagnostic device. In what follows we derive the Hansen and Jagannathan (1991) bound and discuss their empirical application. To simplify the exposition, we focus on an unconditional version of the bound using only the unconditional expectations. We posit a hypothetical, unconditional, risk-free asset with return Rf = E(mr+i)_1. We take the value of Rf, or equivalently E(wr+ 0, as a parameter to be varied as we trace out the bound. The law of one price guarantees the existence of some stochastic discount factor which satisfies equation (3.1). Consider the following projection of any such mt+\ on the vector of gross asset returns, Rt+\. mt+i = R't+lP + et+i (5.1) where E(er+1^r+1) = 0 and where /? is the projection coefficient vector. Multiply both sides of equation (5.1) by Rt+i and take the expected value of both sides of the equation, using E[Rt+iet+i] = 0, to arrive at an expression which may be solved for /?. Substituting this expression back into (5.1) gives the "fitted values" of the projection as <i = K+iP = K+MRt+iK+i)~ll ■ (5-2) By inspection, the m*+l given by equation (5.2) is a valid stochastic discount factor, in the sense that equation (3.1) is satisfied when m*+l is used in place of mt+\. We have therefore constructed a stochastic discount factor m*+l that is also a payoff on an investment position in the N given assets, where the vector

Econometric evaluation of asset pricing models 25 E(Rt+\R't+l)~l I provides the weights. This payoff is the unique linear least squares approximation of every admissible stochastic discount factor in the space of available asset payoffs. Substituting m*+l for R't+l(3 in equation (5.1) shows that we may write any stochastic discount factor, mt+\, as m,+l = m*+l + e,+l where E{et+\m*t+x) = 0. It follows that Var(w?+i) > Var(/»*+1). This expression is the basis of the Hansen-Jagannathan bound11 on the variance of mt+1. Since m*+l depends only on the second moment matrix of the N returns, the lower bound depends only on the assets available to the econometrician and not on the particular asset pricing model that is being studied. To obtain an explicit expression for the variance bound in terms of the underlying asset-return moments, substitute from the previous expressions to obtain Var(m,+i) > Var(m,*+1) = pVai(Rt+l)P (5.3) = [Cov(m,R')VsLr(Ryl] xVar(R)[Var(R)~lCoy(m,R')} = [l~ E(m)E(R')]Var(R)~l\l - E(m)E(R)] where the time subscripts are suppressed to conserve notation and the last line follows from E(mR) = I = E(m)E(R) + Cov(m,R). As we vary the hypothetical values of E(m) = Rjl, the equation (5.3) traces out a parabola in E(m), o.(m) space, where o.(m) is the standard deviation of mt+\. If we place o.(m) on the y axis and E(m) on the x axis, the Hansen-Jagannathan bounds resemble a cup, and the implication is that any valid stochastic discount factor mt+\ must have a mean and standard deviation that place it within the cup. The lower bound on the volatility of a stochastic discount factor, as given by equation (5.3), is closely related to the standard mean-variance analysis that has long been used in the financial economics literature. To see this, recall that if r = R — Rf is the vector of excess returns, then (3.1) implies that 0 = E(mr) = E(m)E(r) + po(m)o(r) . Since —1 < p < 1, we have that ff(m)/E(m) > E(rt)/ff(rt) for all /. The right side of this expression is the Sharpe ratio for asset /. The Sharpe ratio is defined as the expected excess return on an asset, divided by the standard deviation of the excess return (see Sharpe 1994 for a recent discussion of this ratio). Consider plotting every portfolio that can be formed from the N assets in the Standard Deviation (x axis) - Mean (y axis) plane. The set of such portfolios "Related bounds were derived by Kandel and Stambaugh (1987), Mackinlay (1987, 1995), and Shanken (1987).

26 W. E. Ferson and R. Jagannathan with the smallest possible standard deviation for a given mean return is the minimum-variance boundary. Consider the tangent to the minimum-variance boundary from the point 1/E(m) on the y axis. The tangent point is a portfolio of the asset returns, and the slope of this tangent line is the maximum Sharpe ratio that can be attained with a given set of N assets and a given risk-free rate, Rf = 1/E(m). The slope of this line is also equal to Rf multiplied by the Hansen-Jagannathan lower bound on a(m) for a given E(m) =R7X- That is, we have that ff(m) > E(m)|Max{E(r,-)/ff(r,-)}| for the given Rf. The preceding analysis is based on equation (3.1), which is equivalent to the law of one price. If there are no arbitrage opportunities, it implies that mt+\ is a strictly positive random variable. Hansen and Jagannathan (1991) show how to obtain a tighter bound on the standard deviation of mt+\ by making use of the restriction that there are no arbitrage opportunities. They also show how to incorporate conditioning variables into the analysis. Snow (1991) extends the Hansen-Jagannathan analysis to include higher moments of the asset returns. His extension is based on the Holder inequality, which implies that for given values of 5 and p such that (l/5) + (l//») = l it is true that E(mR) < E(ms)l/dE(Rp)l/p. Cochrane and Hansen (1992) refine the Hansen-Jagannathan bound to consider information about the correlation between a given stochastic discount factor and the vector of asset returns. This provides a tighter set of restrictions than the original bounds, which only make use of the fact that the correlation must be between -1 and + 1. 5.2. Statistical inference for moment inequality restrictions Cochrane and Hansen (1992), Burnside (1994), and Cecchetti, Lam, and Mark (1994) show how to take sampling errors into account when examining whether a particular candidate stochastic discount factor satisfies the Hansen-Jagannathan bound. In what follows we will outline a computation which allows for sampling errors, following the discussion in Cochrane and Hansen (1992). Assume that the econometrician has a time series of T observations on a candidate for the stochastic discount factor, denoted by yt, and the N asset returns Rt. We also assume that the risk-free asset is not one of the N assets. Hence v = E(m) = \/RF is an unknown parameter to be estimated. Consider a linear regression of mt+\ onto the unit vector and the vector of asset returns as m,+1 = a + R't P + ut+\. We use the regression function in the following system of population moment conditions:

Econometric evaluation of asset pricing models 27 E(a + R'tp) = v (5.4) E(*,a + RtR'tp) = \N E(yt) = v E[(a + R'tP)2}-E[yt}<0 . The first equation says that the expected value of mt = a + R'tP = v. The second equation says that the regression function for mt is a valid stochastic discount factor. The third equation says that v is the expected value of the particular candidate discount factor that we wish to test. The fourth equation states that the Hansen-Jagannathan bound is satisfied by the particular candidate stochastic discount factor. We can estimate the parameters v, a, and the N vector /?, using the N + 3 equations in (5.4), by treating the last inequality as an equality and using the GMM. Treating the last equation as an equality corresponds to the null hypothesis that the mean and variance of yt place it on the Hansen-Jagannathan boundary. Under the null hypothesis that the last equation of (5.4) holds as an equality, the minimized value of the GMM criterion function JT, multiplied by T, has a chi-square distribution with one degree of freedom. Cochrane and Hansen (1992) suggest testing the inequality relation using the one-sided test. 5.3. Specification error bounds The methods we have examined so far are developed, for the most part, under the null hypothesis that the asset pricing model under consideration by the econo- metrician assigns the right prices (or expected returns) to all assets. An alternative is to assume that the model is wrong and examine how wrong the model is. In this section we will follow Hansen and Jagannathan (1994) and discuss one possible way to examine what is missing in a model and assign a scalar measure of the model's misspecification.12 Let yt denote the candidate stochastic discount factor corresponding to a given asset pricing model, and let m* denote the unique stochastic discount factor that we constructed earlier, as a combination of asset payoffs. We assume that E[j(if(] does not equal lN, the N vector of ones; i.e., the model does not correctly price all of the gross returns. We can project yt on the N asset returns to get yt = R'ta + ut, and project m* on the vector of asset returns to get m* = R'tP + et. Since the candidate yt does not correctly price all of the assets, then a. and /? will not be the same. Define pt = {fi — a)'Rt as the modifying payoff'to the candidate stochastic 12 GMM-based model specification tests are examined in a general setting by Newey (1985). Other related work includes that by Boudoukh, Richardson, and Smith (1993), who compute approximate bounds on the probabilities of the test statistics in the presence of inequality restrictions; Chen and Knez (1992) develop nonparametric measures of market integration by using related methods; and Hansen, Heaton, and Luttmer (1995) show how to compute specification error and volatility bounds when there are market frictions such as short-sale constraints and proportional transaction costs.

28 W. E. Ferson and R. Jagannathan discount factor yt. Clearly, (jt+pt) is a valid stochastic discount factor, satisfying equation (3.1). Hansen and Jagannathan (1994) derive specification tests based on the size of the modifying payoff, which measures how far the model's candidate for a stochastic discount factor yt is from a valid stochastic discount factor. Hansen and Jagannathan (1994) show that a natural measure of this distance is 5 = E(/?/2), which provides an economic interpretation for the model's mis- specification. Payoffs that are orthogonal to pt are correctly priced by the candidate yt, and E(p2) is the maximum amount of mispricing by using yt for any payoff normalized to have a unit second moment. The modifying payoff pt is also the minimal modification that is sufficient to make yt a valid stochastic discount factor. Hansen and Jagannathan (1994) consider an estimator of the distance measure 5 given as the solution to the following maximization problem: ST = Maxo.r-1 JTtf - (yt + a%)2 + 2oc'IJV]1/2. (5.5) t If a.T is the solution to (5.5), then the estimate of the modifying payoff is tx'TRt. It can be readily verified that the first-order condition to (5.5) implies that a!TRt satisfies the sample counterpart to the asset pricing equation (3.1). To obtain an estimate of the sampling error associated with the estimated value 6t, consider ut = y2t - (yt + a'TRt)2 + 2a'rlJV • The sample mean of ut is b\. We can obtain a consistent estimator of the variance of b\ by the frequency zero spectral density estimators described in Newey and West (1987a) or Andrews (1991) and applied to the time series {u, - b\}t=lT. Let sT denote the estimated standard deviation of b\ obtained this way. Then, under standard assumptions, we have that TXI2(5T — 5)/sT converges to a normal (0,1) random variable. Hence, using the delta method, we obtain Tx?2dT/2sT(dT - S) — N(0,1) . (5.6) 6. Conclusions In this article we have reviewed econometric tests of a wide range of asset pricing models, where the models are based on the law of one price, the no-arbitrage principle, and models of market equilibrium with investor optimization. Our review included the earliest of the equilibrium asset pricing models, the CAPM, and also considered dynamic multiple-beta and arbitrage pricing models. We provided some results for the asymptotic distribution of traditional two-pass estimators for asset pricing models stated in the linear, return-beta formulation. We emphasized the econometric evaluation of asset pricing models by using

Econometric evaluation of asset pricing models 29 Hansen's (1982) generalized method of moments. Our examples illustrate the simplicity and flexibility of the GMM approach. We showed that most asset pricing models can be represented in a stochastic discount factor form, which makes the application of the GMM straightforward. Finally, we discussed model diagnostics that provide additional insight into the causes of the statistical rejections in GMM tests and which help assess the specification errors in these models. Appendix Proof of Theorem 2.1 The proof comes from Jagannathan and Wang (1996). We first introduce some additional notation. Let IN be the TV-dimensional identity matrix and lrbe a T-dimensional vector of ones. It follows from equation (2.17) that R-V=T-1 (IN ® \'T)'ek, k=l,...,K2 where e* = (eijti, • • •, eitr, • • •, emi, ■ ■ ■, emr) ■ By the definition of bk, we have that bk-Pk = [In ® ((fk'fk)-lfk%k where fk is the vector-demeaned factor realizations, conformable to the vector ek. In view of the assumption that the conditional covariance of e^ and tjit, given the time series of the factors (denoted by fk), is a fixed constant aijki, we have that E[(i4-/J4)(*i-A«i)|/t] = T~l[IN ® (U'ir'jOlEh^liK/iv ® lr) = T-%®((g£)-lg)]Xkl(Ilf®lT) = r-%®[(//i)-1//ir)] = o where we denote the matrix of the {ffy/u},-,- by S&. The last line follows from the fact fhat^'lr = 0. Hence we have shown that (bk - /}t) is uncorrelated with (R - n). Therefore, the terms u and hyi should be uncorrelated, and the asymptotic variance of Tll2{g — y) in equation (2.15) is given by (x'xylx'[VsLr(u) + VeLT{hy2)]x{x'x)~l . Let nt]ti denote the limiting value of Con(\ff fleikl y/Tfltji), as T —> oo. Let the matrix with nt]ki being its ifh element be denoted by n&. We assume that the sample covariance matrix of the factors exists and converges in probability to a constant positive definite matrix Q, with typical element Qw. Since \ff (bik - fiik) converges in distribution to the random variable Si^y/f' f'k^ik, we have

30 W. E. Person and R. Jagannathan and W = {x'x)~xx'y&x(hy2)x(x'x)~x = J2 (x'x)-lx'{y2ky2l(n^Uk!ai!l)}x(x'xrl l,k=l,...,k2 where Tl'kl is a matrix whose 1,7th element is the limiting value of Cov(Vffk €,*, Vffl&ji) as T-^oo. Q.E.D. References Abel, A. (1990). Asset prices under habit formation and catching up with the Jones. Amer. Econom. Rev. Papers Proc. 80, 38-42. Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817-858. Andrews, D. W. K. and J. C. Monahan (1992). An improved heteroskedasticity and autocorrelation consistent covariance matrix estimator. Econometrica 60, 953-966. Arrow, K. J. (1970). Essays in the Theory of Risk-Bearing. Amsterdam: North-Holland. Bansal, R. and S. Viswanathan (1993). No- arbitrage and arbitrage pricing: A new approach. /. Finance 8, 1231-1262. Bansal, R., D. A. Hsieh and S. Viswanathan (1993). A new approach to international arbitrage pricing. /. Finance 48, 1719-1747. Becker, G. S. and K. M. Murphy (1988). A theory of rational addiction. /. Politic. Econom. 96, 675- 700. Beja, A. (1971). The structure of the cost of capital under uncertainty. Rev. Econom. Stud. 38(8), 359- 368. Berk, J. B. (1995). A critique of size-related anomalies. Rev. Financ. Stud. 8, 275-286. Black, F. (1972). Capital market equilibrium with restricted borrowing. /. Business 45, 444-455. Black, F., M. C. Jensen and M. Scholes (1972). The capital asset pricing model: Some empirical tests. In: Studies in the Theory of Capital Markets, M. C. Jensen, ed., New York: Praeger, 79-121. Boudoukh, J., M. Richardson and T. Smith (1993). Is the ex ante risk premium always positive? A new approach to testing conditional asset pricing models. /. Financ. Econom. 34, 387-408. Breeden, D. T. (1979). An intertemporal asset pricing model with stochastic consumption and investment opportunities. J. Financ. Econom. 7, 265-296. Brown, D. P. and M. R. Gibbons (1985). A simple econometric approach for utility-based asset pricing models. /. Financed, 359-381. Burnside, C. (1994). Hansen-Jagannathan bounds as classical tests of asset-pricing models. /. Business Econom. Statist. 12, 57-79. Campbell, J. Y. (1987). Stock returns and the term structure. /. Financ. Econom. 18, 373-399. Campbell, J. Y. and J. Cochrane (1995). By force of habit. Manuscript, Harvard Institute of Economic Research, Harvard University. Carhart, M., K. Welch, R. Stevens and R. Krail (1995). Testing the conditional CAPM. Working Paper, University of Chicago. Cecchetti, S. G., P. Lam and N. C. Mark (1994). Testing volatility restrictions on intertemporal marginal rates of substitution implied by Euler equations and asset returns. /. Finance 49, 123-152.

Econometric evaluation of asset pricing models 31 Chen, N. (1983). Some empirical tests of the theory of arbitrage pricing. J. Finance 38, 1393-1414. Chen, Z. and P. Knez (1992). A measurement framework of arbitrage and market integration. Working Paper, University of Wisconsin. Cochrane, J. H. (1996). A cross-sectional test of a production based asset pricing model. Working Paper, University of Chicago. Cochrane, J. H. and L. P. Hansen (1992). Asset pricing explorations for macroeconomics. In: NBER Macroeconomics Annual 1992, O. J. Blanchard and S. Fischer, eds., Cambridge, Mass.: MIT Press. Connor, G. (1984). A unified beta pricing theory. /. Econom. Theory 34, 13-31. Connor, G. and R. A. Korajczyk (1986). Performance measurement with the arbitrage pricing theory: A new framework for analysis. /. Financ. Econom. 15, 373-394. Constantinides, G. M. (1982). Intertemporal asset pricing with heterogeneous consumers and without demand aggregation. J. Business 55, 253-267. Constantinides, G. M. (1990). Habit formation: A resolution of the equity premium puzzle. /. Politic. Econom. 98, 519-543. Constantinides, G. M. and D. Duffle (1994). Asset pricing with heterogeneous consumers. Working Paper, University of Chicago and Stanford University. Cox, J. C, J. E. Ingersoll, Jr. and S. A. Ross (1985). A theory of the term structure of interest rates. Econometrica 53, 385-407. Debreu, G. (1959). Theory of Value: An Axiomatic Analysis of Economic Equilibrium. New York: Wiley. Detemple, J. B. and F. Zapatero (1991). Asset prices in an exchange economy with habit formation. Econometrica 59, 1633-1657. Dunn, K. B. and K. J. Singleton (1986). Modeling the term structure of interest rates under non- separable utility and durability of goods. /. Financ. Econom. 17, 27-55. Dybvig, P. H. and J. E. Ingersoll, Jr., (1982). Mean-variance theory in complete markets. /. Business 55, 233-251. Eichenbaum, M. S., L. P. Hansen and K. J. Singleton (1988). A time series analysis of representative agent models of consumption and leisure choice under uncertainty. Quart. J. Econom. 103, 51-78. Epstein, L. G. and S. E. Zin (1989). Substitution, risk aversion and the temporal behavior of consumption and asset returns: A theoretical framework. Econometrica 57, 937-969. Epstein, L. G. and S. E. Zin (1991). Substitution, risk aversion and the temporal behavior of consumption and asset returns. /. Politic. Econom. 99, 263-286. Evans, M. D. D. (1994). Expected returns, time-varying risk, and risk premia. /. Finance 49, 655-679. Fama, E. F. and K. R. French. (1992). The cross-section of expected stock returns. /. Finance 47, 427- 465. Fama, E. F. and J. D. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. /. Politic. Econom. 81, 607-636. Ferson, W. E. (1983). Expectations of real interest rates and aggregate consumption: Empirical tests. /. Financ. Quant. Anal. 18, 477-497. Ferson, W. E. and G. M. Constantinides (1991). Habit persistence and durability in aggregate consumption: Empirical tests. /. Financ. Econom. 29, 199-240. Ferson, W. E. and S. R. Foerster (1994). Finite sample properties of the generalized method of moments tests of conditional asset pricing models. /. Financ. Econom. 36, 29-55. Ferson, W. E. and S. R. Foerster (1995). Further results on the small-sample properties of the generalized method of moments: Tests of latent variable models. In: Res. Financ, Vol. 13. Greenwich, Conn.: JAI Press, pp. 91-114. Ferson, W. E., S. R. Foerster and D. B. Keim (1993). General tests of latent variable models and mean-variance spanning. /. Finance 48, 131-156. Ferson, W. E. and C. R. Harvey (1991). The variation of economic risk premiums. /. Politic. Econom. 99, 385-415. Ferson, W. E. and C. R. Harvey (1992). Seasonality and consumption-based asset pricing. /. Finance 47,511-552.

32 W. E. Ferson and R. Jagannathan Ferson, W. E. and R. A. Korajczyk (1995). Do arbitrage pricing models explain the predictability of stock returns? J. Business 68, 309-349. Ferson, W. E. and J. J. Merrick, Jr. (1987). Non-stationarity and stage-of-the-business-cycle effects in consumption-based asset pricing relations. J. Financ. Econom. 18, 127-146. Gallant, R. (1987). Nonlinear Statistical Models. New York: Wiley. Gibbons, M. R. and W. Ferson (1985). Testing asset pricing models with changing expectations and an unobservable market portfolio. J. Financ. Econom. 14, 217-236. Gorman, W. M. (1953). Community preference fields. Econometrica 21, 63-80. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50, 1029-1054. Hansen, L. P., J. Heaton and E. G. J. Luttmer (1995). Econometric evaluation of asset pricing models. Rev. Financ. Stud. 8, 237-274. Hansen, L. P. and R. Hodrick (1983). Risk averse speculation in the forward foreign exchange market: An econometric analysis of linear models. In: Exchange Rates and International Macroeconomics, J. A. Frenkel, ed., Chicago: University of Chicago Press. Hansen, L. P. and R. Jagannathan (1991). Implications of security market data for models of dynamic economies. J. Politic. Econom. 99, 225-262. Hansen, L. P. and R. Jagannathan (1994). Assessing specification errors in stochastic discount factor models. NBER Technical Working Paper No. 153. Hansen, L. P. and S. F. Richard (1987). The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models. Econometrica 55, 587-613. Hansen, L. P. and K. J. Singleton (1982). Generalized instrumental variables estimation of nonlinear rational expectations models. Econometrica 50, 1269-1286. Hansen, L. P. and K. J. Singleton (1983). Stochastic consumption, risk aversion, and the temporal behavior of asset returns. J. Politic. Econom. 91, 249-265. Harrison, M. and D. Kreps (1979). Martingales and arbitrage in multi-period securities markets. /. Econom. Theory 20, 381-408. Harvey, C. R. (1989). Time-varying conditional covariances in tests of asset pricing models. J. Financ. Econom. 24, 289-317. Harvey, C. R. (1991). The world price of covariance risk. J. Finance 46, 111-157. Heaton, J. (1995). An empirical investigation of asset pricing with temporally dependent preference specifications. Econometrica 63, 681-717. Ibbotson Associates. (1992). Stocks, bonds, bills, and inflation. 1992 Yearbook. Chicago: Ibbotson Associates. Jagannathan, R. (1985). An investigation of commodity futures prices using the consumption-based intertemporal capital asset pricing model. /. Finance 40, 175-191. Jagannathan R. and Z. Wang (1993). The CAPM is alive and well. Federal Reserve Bank of Minneapolis Research Department Staff Report 165. Jagannathan, R. and Z. Wang (1996). The conditional-CAPM and the cross-section of expected returns. J. Finance 51, 3-53. Kandel, S. (1984). On the exclusion of assets from tests of the mean-variance efficiency of the market portfolio. J. Finance 39, 63-75. Kandel, S. and R. F. Stambaugh (1987). On correlations and inferences about mean-variance efficiency. /. Financ. Econom. 18, 61-90. Lehmann, B. N. and D. M. Modest (1987). Mutual fund performance evaluation: A comparison of benchmarks and benchmark comparisons. J. Finance 42, 233-265. Leroy, S. F. and R. D. Porter (1981). The present value relation: Tests based on implied variance bounds. Econometrica 49, 555-574. Lewbel, A. (1989). Exact aggregation and a representative consumer. Quart. J. Econom. 104, 621-633. Lintner, J. (1965). The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Rev. Econom. Statist. 47, 13-37. Lucas, R. E. Jr. (1978). Asset prices in an exchange economy. Econometrica 46, 1429-1445. Luttmer, E. (1993). Asset pricing in economies with frictions. Working Paper, Northwestern University.

Econometric evaluation of asset pricing models 33 McElroy, M. B. and E. Burmeister (1988). Arbitrage pricing theory as a restricted nonlinear multivariate regression model. /. Business Econom. Statist. 6, 29-42. MacKinlay, A. C. (1987). On multivariate tests of the CAPM. /. Financ. Econom. 18, 341-371. MacKinlay, A. C. and M. P. Richardson (1991). Using generalized method of moments to test mean- variance efficiency. /. Finance 46, 511-527. MacKinlay, A. C. (1995). Mulifactor models do not explain deviations from the CAPM. /. Financ. Econom. 38, 3-28. Merton, R. C. (1973). An intertemporal capital asset pricing model. Econometrica 41, 867-887. Merton, R. C. (1980). On estimating the expected return on the market: An exploratory investigation. /. Financ. Econom. 8, 323-361. Mossin, J. (1966). Equilibrium in a capital asset market. Econometrica 34, 768-783. Newey, W. (1985). Generalized method of moments specification testing. /. Econometrics 29, 229-256. Newey, W. K. and K. D. West (1987a). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703-708. Newey, W. K. and K. D. West. (1987b). Hypothesis testing with efficient method of moments estimation. Internat. Econom. Rev. 28, 777-787. Novales, A. (1992). Equilibrium interest-rate determination under adjustment costs. /. Econom. Dynamic Control 16, 1-25. Roll, R. (1977). A critique of the asset pricing theory's tests: Part 1: On past and potential testability of the theory. J. Financ. Econom. 4, 129-176. Ross, S. A. (1976). The arbitrage pricing theory of capital asset pricing. J. Econom. Theory 13, 341- 360. Ross, S. (1977). Risk, return and arbitrage. In: Risk and Return in Finance, I. Friend and J. L. Bicksler, eds. Cambridge, Mass.: Ballinger. Rubinstein, M. (1974). An aggregation theorem for securities markets. /. Financ. Econom. 1, 225-244. Rubinstein, M. (1976). The valuation of uncertain income streams and the pricing of options. Bell J. Econom. Mgmt. Sci. 7, 407-425. Ryder H. E., Jr. and G. M. Heal (1973). Optimum growth with intertemporally dependent preferences. Rev. Econom. Stud. 40, 1-33. Shanken, J. (1987). Multivariate proxies and asset pricing relations: Living with the roll critique. /. Financ. Econom. 18, 91-110. Shanken, J. (1992). On the estimation of beta-pricing models. Rev. Financ. Stud. 5, 1-33. Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. /. Finance 19, 425-442. Sharpe, W. F. (1994). The Sharpe ratio. /. Port. Mgmt. 21, 49-58. Shiller, R. J. (1979). The volatility of long-term interest rates and expectations models of the term structure. /. Politic. Econom. 87, 1190-1219. Shiller, R. J. (1981). Do stock prices move too much to be justified by subsequent changes in dividends? Amer. Econom. Rev. 71, 421-436. Singleton, K. J. (1980). Expectations models of the term structure and implied variance bounds. J. Politic. Econom. 88, 1159-1176. Snow, K. N. (1991). Diagnosing asset pricing models using the distribution of asset returns. /. Finance 46, 955-983. Stambaugh, R. F. (1982). On the exclusion of assets from tests of the two-parameter model: A sensitivity analysis. /. Financ. Econom. 10, 237-268. Sundaresan, S. M. (1989). Intertemporally dependent preferences and the volatility of consumption and wealth. Rev. Financ. Stud. 2, 73-89. Wheatley, S. (1988). Some tests of international equity integration. /. Financ. Econom. 21, 177-212. Wheatley, S. M. (1989). A critique of latent variable tests of asset pricing models. /. Financ. Econom. 23, 325-338. White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817-838. Wilson, R. (1968). The theory of syndicates. Econometrica 36, 119-132.

G. S. Maddala and C. R. Rao, eds., Handbook of Statistics, Vol. 14 © 1996 Elsevier Science B.V. All rights reserved. 2 Instrumental Variables Estimation of Conditional Beta Pricing Models Campbell R. Harvey and Chris Kirby A number of well-known asset pricing models imply that the expected return on an asset can be written as a linear function of one or more beta coefficients that measure the asset's sensitivity to sources of undiversifiable risk. This paper provides an overview of the econometric evaluation of such models using the method of instrumental variables. We present numerous examples that cover both single- beta and multi-beta models. These examples are designed to illustrate the various options available to researchers for estimating and testing beta pricing models. We also examine the implications of a variety of different assumptions concerning the time-series behavior of conditional betas, covariances, and reward-to-risk ratios. The techniques discussed in this paper have applications in other areas of asset pricing as well. 1. Introduction Asset pricing models often imply that the expected return on an asset can be written as a linear combination of market-wide risk premia, where each risk premium is multiplied by a beta coefficient that measures the sensitivity of the return on the asset to a source of undiverifiable risk in the economy. Indeed, this type of tradeoff between risk and expected return is implied by some of the most famous models in financial economics. The Sharpe (1964) - Lintner (1965) capital asset pricing model (CAPM), the Black (1972) CAPM, the Merton (1973) intertemporal CAPM, the arbitrage pricing theory (APT) of Ross (1976), and the Breeden (1979) consumption CAPM can all be classified under the general heading of beta pricing models. Although these models differ in terms of underlying structural assumptions, each implies a pricing relation that is linear in one or more betas. The fundamental difference between conditional and unconditional beta pricing models is the specification of the information environment that investors use to form expectations. Unconditional models imply that investors set prices based on an unconditional assessment of the joint probability distribution of future returns. Under such a scenario we can construct an estimate of an investor's 35

36 C. R. Harvey and C. Kirby expected return on an asset by taking an average of past returns. Conditional models, on the other hand, imply that investors have time-varying expectations concerning the joint probability distribution of future returns. In order to construct an estimate of an investor's conditional expected return on an asset we have to use the information available to the investor at time t — 1 to forecast the return for time t. Both conditional and unconditional models attempt to explain the cross- sectional variation in expected returns. Unconditional models imply that differences in average risk across assets determine differences in average returns. There are no time-series predictions other than expected returns are constant. Conditional models have similar cross-sectional implications: differences in conditional risk determine differences in conditional expected returns. But conditional models have implications concerning the time-series properties of expected returns as well. Conditional expected returns vary with changes in conditional risk and fluctuations in market-wide risk premiums. In theory, we can test a conditional beta pricing model using a single asset. Empirical tests of beta pricing models can be interpreted within the familiar framework of mean-variance analysis. Unconditional tests seek to determine whether a certain portfolio is on the efficient portion of the unconditional mean- variance frontier. The unconditional frontier is determined by the unconditional means, variances and covariances of the asset returns. Conditional tests of beta pricing models are designed to answer a similar question: does a certain portfolio lie on the efficient portion of the mean-variance frontier at each point in time? In conditional tests, however, the mean-variance frontier is determined by the conditional means, conditional variances, and conditional covariances of asset returns. As a general rule, the rejection of unconditional efficiency does not imply a rejection of conditional mean-variance efficiency. This is easily demonstrated using an example given by Dybvig and Ross (1985) and Hansen and Richard (1987). Suppose we are testing whether the 30-day Treasury bill is unconditionally efficient using monthly data. Unconditionally, the 30-day bill does not lie on the efficient frontier. It is a single risky asset (albeit low risk) whose return has non-zero variance. Thus it is surely dominated by an appropriately chosen portfolio. At the conditional level, however, the conclusion is much different. Conditionally, the 30-day bill is nominally risk free. At the end of each month we know precisely what the return will be over the next month. Because the conditional variance of the return on the T-bill is zero, it must be conditionally efficient. A number of different methods have been proposed for testing beta pricing models. This paper focuses on one in particular: the method of instrumental variables. Instrumental variables are a set of data, specified by the econome- trician, that proxy for the information that investors use to form expectations. The primary advantage of the instrumental variables approach is that it provides a highly tractable way of characterizing tinie-varying risk and expected returns. Our discussion of the instrumental variables methodology is organized along the

Instrumental variables estimation of conditional beta pricing models 37 following lines. Section 2 uses the conditional version of the Sharpe (1964) - Lintner (1965) CAPM to illustrate how the instrumental variables approach can be employed to estimate and test single beta models. Section 3 extends the analysis to multi-beta models. Section 4 introduces the technique of latent variables. Section 5 provides an overview of the estimation methodology. The final section offers some brief closing remarks. 2. Single beta models A. The conditional CAPM The conditional version of the Sharpe (1964) - Lintner (1965) CAPM is undoubtedly one of the most widely studied conditional beta pricing models. We can express the pricing relation associated with this model as: cr lrt , Cov [rjt,rmt | *Vi]m ,n i m E[r*|0-l] = Var[r„|fl^] E[r-|fl'-l] ' (1) where rjt is the return on portfolio j from time t — 1 to time t measured in excess of the risk free rate, rmt is the excess return on the market portfolio, and ilt~\ represents the information set that investors use to form expectations. The ratio of the conditional covariance between the return on portfolio j and the return on the market, Co\[rjt,rmt\Qt-\\, to the variance of the return on the market, Var[r„rt|ftf_i], is the conditional beta of portfolio j with respect to the market. Any cross-sectional variation in expected returns can be attributed solely to differences in conditional beta coefficients. As it stands the pricing relation shown in (1) is untestable. To make it testable we have to impose additional structure on the model. In particular, we have to specify a model for conditional expectations. Thus any test of (1) will be a joint test of the conditional CAPM and the assumed specification for conditional expectations. In theory any functional form could be used. Let f(Zt-\) denote the statistical model that generates conditional expectations where Z is a set of instrumental variables. The function /(•) could be a linear regression model, a Fourier flexible form [Gallant (1982)], a nonparametric kernel estimator [Silverman (1986), Harvey (1991), and Beneish and Harvey (1995)], a seminon- parametric density [Gallant and Tauchen (1989)], a neural net [Gallant and White (1990)], an entropy encoder [Glodjo and Harvey (1995)], or a polynomial series expansion [Harvey and Kirby (1995)]. Once we take a stand on the functional form of the conditional expectations operator it is straightforward to construct a test of the conditional CAPM. First we use /(•) to obtain fitted values for the conditional mean of rp. This nails down the left-hand side of the pricing relation in (1). Then we apply /(•) again to get fitted values for the three components on the right-hand side of (1). Combining the fitted values for the conditional mean of rmt, those for the conditional covariance between rjt and rmt, and those for the conditional variance of rmt yields

38 C. R. Harvey and C. Kirby fitted values for the right-hand side of (1). If the conditional CAPM is valid then the pricing errors - the difference between the fitted values for the left-hand and right-hand sides of (1) - should be small and unpredictable. This is the basic intuition behind all tests of conditional beta pricing models. In the presentation that follows we focus on one particular specification for conditional expectations: the linear model. This model, though very simple, has distinct advantages over the many nonlinear alternatives. The linear model is exceedingly easy to implement, and Harvey (1991) shows that it performs well against nonlinear alternatives in out-of-sample forecasting of the market return. In addition, the linear specification is actually more general than it may seem. Recent work has shown that many nonlinear models can be consistently approximated via an expanding sequencing of finite-dimensional linear models. Harvey and Kirby (1995) exploit this fact to develop a simple procedure for constructing analytic tests of both single beta and multi-beta pricing models. B. Linear conditional expectations The easiest way to motivate the linear specification for conditional expectations is to assume that the joint distribution of the asset returns and instrumental variables is spherically invariant. This class of distributions is analyzed in Vershik (1964), who shows that it is sufficient for linear conditional expectations, and applied to tests of the conditional CAPM in Harvey (1991). Vershik (1964) provides the following characterization. Consider a set of random variables, {xi,... ,xn}, that have finite second moments. Let H denote a linear manifold spanned by this set. If all random variables in the linear manifold H that have the same variance have the same distribution then: (i) H is a spherically invariant space; (ii) {xi,... ,x„} is spherically invariant; and (iii) every distribution function of any variable in H is a spherically invariant distribution. The above requirements are satisfied, for example, by both the multivariate normal and multivariate t distributions. A potential disadvantage of Vershik's (1964) definition is that it does not encompass processes like Cauchy for which the variance is undefined. Blake and Thomas (1968) and Chu (1973) propose a definition for an elliptical class of distributions that addresses this shortcoming. A random vector x is said to have an elliptical distribution if and only if its probability density function p(x) can be expressed as a function of a quadratic form, p(x) = f(^x'C~:x), where C is positive definite. When the variance-covariance matrix of x exists it is proportional to C and the Vershik (1964), Blake and Thomas (1968) and Chu (1973) definitions are equivalent.2 But the quadratic form of the density also covers processes like Cauchy that imply linear conditional expectations where the projection constants depend on the characteristic matrix. 2 Implicit in Chu's (1973) definition is the existence of the density function. Kelker (1970) provides an alternative approach in terms of the characteristic function. See also Devlin, Gnanadesikan and Kettenring (1976).

Instrumental variables estimation of conditional beta pricing models 39 C. A general framework for testing the CAPM A linear specification for conditional expectations implies that the return on portfolio j can be written as: rjt = Z,^Sj + Uj, , (2) where uJt is the error in forecasting the return on portfolio j at time t, Zt_\ is a row vector of I instrumental variables, and dj is a I x 1 set of time-invariant weights. Substituting the expression shown in (2) into equation (1) yields the restriction: Zt-\$j = pr -, | J" , E[M/fKmf|Zf_i] , (3) h[Umt\Zt-\\ where umt is the error in forecasting the return on the market portfolio. Note that both the variance term, E[m^|Z,_i], and the covariance term, E[ujtumt\Zt^i], are conditioned on Zt-\. Therefore, the pricing relation in (3) should be regarded as an approximation. This is the case because the expectation of the true conditional covariance is not the covariance conditioned on Zt_\. The two are connected via the relation: E[Cov(ry/,rIB/|ft/_i)|Z/_i] = Cov(r//,rm/|Z/_i)-Cov(E[r//|fl/_i],E[rIB/ |fl,_i]|Z/_i). An analogous relation holds for the true conditional variance of rmt and the variance conditioned on Zt-\. There is no way to construct a test of the original version of pricing restriction given that the true information set Q is unobservable. If we multiply both sides of (3) by the conditional variance of the return on the market portfolio we obtain the restriction: E[m^i/Z/_i5/|Z/_i] = E[m//mib/Z/_i5ib|Z/_i] . (4) Notice that the conditional expected return on both the market portfolio and portfolio j have been moved inside the expectations operator. This can be done because both of these quantities are known conditional on Z,_i. As a result, we do not need to specify an explicit model for the conditional variance and co- variance terms. We simply note that, under the null hypothesis, the disturbance: eP = umtzt-i8j - UjtUmtZt-\5m , (5) should have mean zero and be uncorrelated with the instrumental variables. If we divide ejt by the conditional variance of the market return, then the resulting quantity can be interpreted as the deviation of the observed return from the return predicted by the model. Thus ep is essentially just a pricing error. A negative pricing error implies the model is overpricing while a positive pricing error indicates that the model is underpricing. The generalized method of moments (GMM), which is discussed in detail in Section 5, provides a direct way to test the above restriction. Suppose we have a total of n assets. We can stack the disturbances in (2) and the pricing errors in (5) into the (2n + 1) x 1 vector:

40 C. R. Harvey and C. Kirby £,= («( umt et)'=\ [rmt - Zt-i5m]' J , (6) where u is the innovation in the lxn vector of conditional means and e is the 1 x n vector of pricing errors. The conditional CAPM implies that s, should be uncorrected with Z,_i. So if we form the Kronecker product of st with the vector of instrumental variables: *k®Z't-\ , (7) and take unconditional expectations, we obtain the vector of orthogonality conditions: E[et ® Z't_x] = 0 . (8) With n assets there are n + 1 columns of innovations for the conditional means and n columns of pricing errors. Thus, with I instrumental variables we have l(2n + 1) orthogonality conditions. Note, however, that there are £(n + 1) parameters to estimate. This leaves n(. overidentifying restrictions.3 We can obtain consistent estimates of the n£ matrix of coefficients d and the I x 1 vector of coefficients 5m by minimizing the quadratic objective function: JT = g'TS^gT , (9) where: 1 T t=\ and Sj denotes a consistent estimate of: oo So ee J2 e[(*®zufa-j®z;^)'] . (ii) j=—oo If the conditional CAPM is true then T times the minimized value of the objective function converges to a central chi-square random variable with nl degrees of freedom. Thus we can use this criterion as a measure of the overall goodness-of-fit of the model. 3 An econometric specification of this form is explored for New York Stock Exchange returns in Harvey (1989) and Huang (1989), for 17 international equity returns in Harvey (1991), for international bond returns in Harvey, Solnik and Zhou (1995), and for emerging equity market returns in Harvey (1995).

Instrumental variables estimation of conditional beta pricing models 41 D. Constant conditional betas The econometric specification shown in (6) assumes that all of the conditional moments - the means, variances and covariances - change through time. If some of these moments are constant then we can construct more powerful tests of the conditional CAPM by imposing this additional structure. Traditionally, tests of the CAPM have focused on whether expected returns are proportional to the expected return on a benchmark portfolio. We can construct the same type of test within our conditional pricing framework with a specification of the form: % = (r, -/wfl' , (12) where /? is a row vector of n beta coefficients. The coefficient /?y represents the ratio of conditional covariance between the return on portfolio j and the return on the benchmark to the conditional variance of the benchmark return. Typically, we think of rmt as a proxy for the market portfolio. It is important to note, however, that the beta coefficients in (12) are left unrestricted. Thus (12) can also be interpreted as a test of a single factor latent variables model.4 In the latent variables framework, /?y represents the ratio of conditional covariance between the return on portfolio j and an unobserved factor to the conditional covariance between the return on the benchmark portfolio and this factor. The testable implication is that E[et|.Zr_i] = 0 where st is the vector of pricing errors associated with the constant conditional beta model. There are nl orthogonality conditions and n parameters to estimate so we have l{n — 1) overidentifying restrictions. Of course we can easily incorporate the restrictions on the conditional beta coefficients by changing the specification to: £; = («; umt b, e, )'= ( [rt - Z^Q' f\ [rmt ~ Zt-\$m]' l"ltP ~ "*<»<]' (13) where b is the disturbance vector associated with the constant conditional beta assumption. Tests based on this specification may shed additional light on the plausibility of the assumption of constant conditional betas. With n assets there are n+\ columns of innovations in the conditional means, n columns in b and n columns in e. Thus there are £(3n+l) orthogonality conditions, l(n+\)+n parameters to estimate, and n{2l~ 1) overidentifying restrictions. E. Constant conditional reward-to-risk ratio Another formulation of the conditional CAPM assumes that the conditional reward-to-risk ratio is constant. The conditional reward-to-risk ratio, 4 See, for example, Hansen and Hodrick (1983), Gibbons and Ferson (1985) and Ferson (1990).

42 C. R. Harvey and C. Kirby E[rm,|f2,_i]/Var[rm,|f2,_i], is simply the price of covariance risk. This version of the conditional CAPM is examined in Campbell (1987) and Harvey (1989). The vector of pricing errors for the model becomes: e, = rt - Xutumt , (14) where X is the conditional expected return on the market divided by its conditional variance. To complete the econometric specification we have to include models for the conditional means. The overall system is: / [rt-Zt^g\' \ £, = {ut um e,)'= \ [r„, - Z,_i3m]' \ . (15) \ [r, - X{umtut)}' j With n assets there are n + 1 columns of innovations in the conditional means and n columns in e. Thus with £ instrumental variables there are £(2n + 1) orthogonality conditions and l + (£(n + l)) parameters. This leaves n£ — 1 over- identifying restrictions. One way to simplify the estimation in (15) is to note that E[umtUjt\Zt_i] = E[umtrjt\Zt-i]. This follows from the fact that: E[umtuJt\Zt-i] = E[umt{rjt - Zt-idj)\Z,-i] = ^[umtrjt\Zt-i] - E[umtZt-i5j\Zt_i] = E[umtrjt\Z,^} - E[umt\Zt_i]Zt_idj = ^[umtrJt\Zt^} . As a result, we can drop n of the conditional mean equations. The more parsimonious system is: Now we have n + 1 equations and £(n + 1) orthogonality conditions. With £ + 1 parameters there are (n£) — 1 overidentifying restrictions. The specifications shown in (15) and (16) are asymptotically equivalent. But (16) is more computationally manageable. The specifications in (15) and (16) do not restrict X to be the conditional covariance to variance ratio. We can easily add this restriction: [rt-Zt-XS( \ [i-, - X(umtr,)]' J (17) where m is the disturbance associated with the constant reward-to-risk assumption. Tests of this specification should shed additional light on the plausibility of the assumption of a constant price of covariance risk. With n assets there are n columns in u, one column in um, one column in m and n columns in e. Thus there

Instrumental variables estimation of conditional beta pricing models 43 are l{2n + 2) orthogonality conditions, l(n + 1) + 1 parameters, and n - 1 over- identifying restrictions. F. Linear conditional betas Ferson and Harvey (1994, 1995) explore specifications where the conditional betas are modelled as a linear functions of the instrumental variables. We could, for example, specify an econometric system of the form: Zi.w s? tll<>i u2t = rmt — Z t-\ m uat = [ul^Z'^Ki)' - rmtuUt]' (18) Zi.w s t-\°i uat = (-«/ + Hi) - Z't^lKi(Z^_lSm)' where the elements of Z''wKt are the fitted conditional betas for portfolio /, nt is the mean return on portfolio /, and <x,- is the difference between the unrestricted mean return and the mean return that incorporates the pricing restriction of the conditional CAPM. Note that (18) uses two sets of instruments. The set used to estimate the conditional mean return on portfolio i and the conditional beta for the portfolio, Z''w, includes both asset specific (/) and market-wide (w) instruments. The conditional mean return on the market is estimated using only the market-wide instruments. This yields an exactly identified system of equations.5 The intuition behind the system shown in (18) is straightforward. The first two equations follow from our assumption of linear conditional expectations. They represent statistical models for expected returns. The third equation follows from the definition of the conditional beta: h = {^\u\\Z^_x\)-^\rmmu\Z)\\ . (19) In (18) the conditional beta is modelled as a linear function of both the asset- specific and market-wide information. The last two equations deliver the average pricing error for the conditional CAPM. Note that m is the average fitted return from the statistical model. Thus a, is the difference between the average fitted return from our statistical model and the fitted return implied by the pricing relation of conditional CAPM. It is analogous to the Jensen a. In the current analysis, however, both the betas and the risk premiums are changing through time. Because of the complexity and size of the above system it is difficult to estimate from more one asset at a time. Thus, in general, not all the cross-sectional restrictions of conditional CAPM can be imposed, and it is not possible to report a multivariate test of whether the a, are equal to zero. Note, however, that (18) 5 For analysis of related systems see Ferson (1990), Shanken (1990), Ferson and Harvey (1991), Ferson and Harvey (1993), Ferson and Korajzcyk (1995), Ferson (1995), Harvey (1995) and Ja- gannathan and Wang (1996).

44 C. R. Harvey and C. Kirby does impose one important cross-sectional restriction. Because the system is exactly identified, the market risk premium, Z^_x8m, will be identical for every asset examined. There are no overidentifying restrictions, so tests of the model are based on whether the coefficient a, is significantly different from zero. Additional insights might be gained by analyzing the time-series properties of the disturbance: u6it = rit - Z^MZT-iO)' • (20) Under the null hypothesis, E[Mfe|Z^j] is equal to zero. Thus diagnostics can be conducted by regressing u^it on various information variables. We could also construct tests for time-varying of betas based on the coefficient estimates associated with Z''wKj. 3. Models with multiple betas A. The multi-beta conditional CAPM The conditional CAPM can easily be generalized to a model that has multiple sources of risk. Consider, for example, a Ar-factor pricing relation of the form: E[r,|Z,_i] = EK|Z,_i] (E[u'ftuft\Zt-{\yXV[u'ftut\Zt^} (21) where r is a row vector of n asset returns,/is \ x K vector of factor realizations, Uf is a vector of innovations in the conditional means of the factors, and u is a vector of innovations in the conditional means of the returns. The first term on the right-hand side of (21) represents the conditional expectation of the factor realizations. It has dimension 1 x k. The second term is the inverse of the k x k conditional variance-covariance matrix of the factors. The final term measures the conditional covariance of the asset returns with the factors. Its dimension is k x n. The multi-beta pricing relation shown in (21) cannot be tested in the same manner as its single-beta counterpart. Recall that in our analysis of single-beta models it was possible to take the conditional variance of the market return to the left-hand side of the pricing relation. As a result, we could move the conditional means inside the expectations operator. This is not possible with a multi-beta specification. We can, however, get around this problem by focusing on specializations of the multi-beta model that parallel those discussed in the previous section. We begin by considering specifications that restrict the conditional betas to be linear functions of the instruments. B. Linear conditional betas The multi-beta analogue of the linear conditional beta specification shown in (18) takes the form:

Instrumental variables estimation of conditional beta pricing models 45 Zl.W ff t-\0i tHt=f,-Zy_l5f u3it = W2tu2i(z't-iKi)' -f'i«m]' (22) Zl.W s t-\di M5, = (-a,. + ^-z;i>!.(zr^/)/ where the elements of Z'<wKi are the fitted conditional betas associated with the k sources of risk and/is a row vector of factor realizations. Note that as before the system is exactly identified, and the vector of conditional betas: /?, = (EK«2(|Z«> ]r'E[/>h.(|Z;> ] . (23) is modelled as a linear function, ZhwKt, of the instruments. This specification can be tested by assessing the statistical significance of the pricing errors and checking to see whether the disturbance: u6u = ru-Zi£Mz?-isf)' - (M) is orthogonal to instruments. The primary advantage of the above formulation is that fitted values are obtained for the risk premiums, the expected returns, and the conditional betas. Thus it is simple to conduct diagnostics that focus on the performance of the model. Its main disadvantage is that it requires a heavy parameterization. C. Constant conditional reward-to-risk ratios Harvey (1989) suggests an alternative approach for testing multi-beta pricing relations. His strategy is to assume that the conditional reward-to-risk ratio is constant for each factor. This results is a multi-beta analogue of the specification shown in (15): / [r,-Z,_,*r\ £, = («( Uft e,)'= I \ft-Z,-i5f]' , (25) where A is a row vector of k time-invariant reward-to-risk measures. The above system can be simplified to: -<■/>*>'=(#:&;$)• w using the same approach that allowed us to simplify the single-beta specification discussed earlier.6 6 Kan and Zhang (1995) generalize this formulation by modelling the conditional reward-to-risk ratios as linear functions of the instrumental variables. Their approach eliminates the need for asset- specific instruments and permits joint estimation of the pricing relation using multiple portfolios. But the type of diagnostics that fall out of the linear conditional beta model - fitted expected returns, betas, etc. - are no longer available.

46 C. R. Harvey and C. Kirby 4. Latent variables models The latent variables technique introduced by Hansen and Hodrick (1983) and Gibbons and Ferson (1985) provides a rank restriction on the coefficients of the linear specifications that are assumed to describe expected returns. Suppose we assume that ratio formed by taking the conditional beta for one asset and dividing it by the corresponding conditional beta another asset is constant. Under these circumstances, the ^-factor conditional beta pricing model implies that all of the variation in the expected returns is driven by changes in the k conditional risk premiums. We can still form our estimates of the conditional means by projecting returns on the ^-dimensional vector of instrumental variables. But if all the variation in expected returns is being driven changes in the k risk premiums then we should not need all n£ projection coefficients to characterize the time variation in the n returns. Thus the basic idea of the latent variables technique is to test restrictions on the rank of the projection coefficient matrix. A. Constant conditional beta ratios First we take the vector of excess returns on our set of portfolios and partition it as: »"r=(rir i r2t), (34) where r\t is a 1 x k vector of returns on the reference assets and j-2r is a 1 x (n — k) vector of returns on the test assets. Then we partition the matrix of conditional beta coefficients associated with our multi-factor pricing model accordingly: P={Pi '■ hi (35) where fLx is k x k and /J2 is k x (n - k). The pricing relation for the multi-beta model tells us that: E[rlr|Zr_!] = yA (36) and E[i*|Zr_i] = ytp2 , (37) where y, is a 1 x k vector of time-varying market-wide risk premiums. We can manipulate (36) to obtain the relation yt = E[rit\Zt_i}P^1. Substituting this expression for yt into (37) yields the pricing restriction: E[i*|Zr_i] = ElmlZ,-!]^ . (38) This says that the conditional expected returns on the test assets are proportional to the conditional expected returns on the reference assets. The constants of proportionality are determined by ratios of conditional betas.

Instrumental variables estimation of conditional beta pricing models 47 The pricing relation in (38) can be tested in much the same manner as the models discussed earlier.7 The only real difference is that we no longer have to identify the factors. One possible specification is: / [n,-z,_i*,]' \ *, = (*!, "2, et)'= [rx-Z^dj]' , (39) \[Z,_i*2-Z,_i*itf]7 where «& = /fj"1/^- There are k columns in u\t, n — K columns in u2t and n — k columns in et. Thus we have l(2n - k) orthogonality conditions and In + k(n — k) parameters. This leaves {(. — k){n — k) overidentifying restrictions. Note that both the number of instrumental variables and the total number of assets must be greater than the number of factors. B. Linear conditional covariance ratios An important disadvantage of (39) is that the ratio of conditional betas, <5 = PllP2, is assumed to be constant. One way to generalize the latent variables model is to assume the elements of <5 are linear in the instrumental variables.8 This assumption follows naturally from the previous specifications that imposed the assumption of linear conditional betas. The resulting latent variables system is: / [rlt - Z,_!*i]' \ t,t = (ult u2t £t)'=\ [r* - Z,_i*2]' , (40) V[z_1^2-z_1^(I®z;_1)4>t/ where i is a k x 1 vector of ones. With the original set of instruments the dimension of <5* in the final set of moment conditions is l(n — k) and the system is not identified. Thus the researcher must specify some subset of the original instruments, Z*, with dimension I * < I to be used in the estimation. Finally, the parameterization in both (39) and (40) can be reduced by substituting the third equation block into the second block. For example, * = («„ et)>=(.^<-Z<-fl) , (41) \[r2t - Zt^di0] J In this system, it is not necessary to estimate d2. 5. Generalized method of moments estimation Contemporary empirical research in financial economics makes frequent use of a wide variety of econometric techniques. The generalized method of moments has proven to be particularly valuable, however, especially in the area of estimating and testing asset pricing models. This section provides an overview of the gen- 7 Harvey, Solnik and Zhou (1995) and Zhou (1995) show to construct analytic tests of latent variables models. 8 See Ferson and Foerster (1994).

48 C. R. Harvey and C. Kirby eralized method of moments (GMM) procedure. We begin by illustrating the intuition behind GMM using a simple example of classical method of moments estimation. This is followed by brief discussion of the assumptions underlying the GMM approach to estimation and testing along with a review of some of the key distributional results. For detailed proofs of the consistency and asymptotic normality of GMM estimators see Hansen (1982), Gallant and White (1988), and Potscher and Pracha (1991a,b). A. The Classical method of moments The easiest way to illustrate the intuition behind the GMM procedure is to consider a simple example of classical method of moments (CMM) estimation. Suppose we observe a random sample %\,X2, ■ ■. ,xj of T observations drawn from a distribution with probability density function f(x; 0), where 0= [61,62, •■-,0k] denotes a k x 1 vector of unknown parameters. The CMM approach to estimation exploits the fact that in general the /h population moment of x about zero: mj = W] , (42) can be written as known function of 0. To implement the CMM procedure we first compute they* sample moment of x about zero: 1 i=\ Then we set the /h sample moment equal to the corresponding population moment for j = 1,2,..., k: rh\ = m\{0) m2 = m2{0) : : : <44) mk = mk(0) This yields a set of k equations in k unknowns that can be solved to obtain an estimator for the unknown vector 0. Thus the basic idea behind the CMM procedure is to estimate 0 by replacing population moments with their sample analogues. Now let's take a more concrete version of the above example. Suppose that x\,X2, ■ ■ ■ ,xj is a random sample of size T drawn from a normal distribution with mean (j. and variance a2. To obtain the classical method of moments estimators of ix and a2 we note that a2 = rti2 - {m\)2. This implies that the system of moments equations takes the form: 1 T TU ±±x2=a2+,2 (45) T^

Instrumental variables estimation of conditional beta pricing models 49 Consequently, the CMM estimators for the mean and variance are: 1 N 1 i=i i=i \ i=i Notice that these are also the maximum likelihood estimators of fj. and a1. B. The Generalized method of moments The classical method of moments is just a special case of the generalized method of moments developed by Hansen (1982). This latter procedure provides a general framework for estimation and hypothesis testing that can be used to analyze a wide variety of dynamic economic models. Consider, for example, the class of models that generate conditional moment restrictions of the form: ErfBr+x] = 0 , (47) where E,[ • ] is the expectations operator conditional on the information set at time t, ut+t = h{Xt+x, #o) is an n x 1 vector of vector of disturbance terms, Xt+t is an 5x1 vector of observable random variables, and 0o is an m x 1 vector of unknown parameters. The basic idea behind the GMM procedure is to exploit the moment restrictions in (47) to construct a sample objective function whose minimizer is a consistent and asymptotically normal estimate of the unknown vector #o- In order to construct such an objective function, however, we need to make some assumptions about the nature of the data generating process. Let Zt denote the date t realization of an I x 1 vector of observable instrumental variables. We assume, following Hansen (1982), that the vector process {Xt,Zt]^_oo is strictly stationary and ergodic. Note that this assumption rules out a number of features sometimes encountered in economic data such as deterministic trends, unit roots, and unconditional heteroskedasticity. It accommodates many common forms of conditional heterogeneity, however, and it does not appear to be overly restrictive in most applications.9 With suitable restrictions on the data generating process in place we can proceed to construct the GMM objective function. First we form the Kronecker product: jXXt+t,ZuOo) = ut+z®Zt . (48) Then we note that because Zt is in the information set at time t, the model in (47) implies that: 9 Although is possible to establish consistency and asymptotic normality of GMM estimators under weaker assumptions, the associated arguments are too complex for an introductory discussion. The interested reader can consult Potscher and Prucha (1991a,b) for an overview of recent advances in the asymptotic theory of dynamic nonlinear econometric models.

50 C. R. Harvey and C. Kirby Et[f(Xt+t,Zt,00)}=0 . (49) Applying the law of iterated expectations to equation (49) yields the unconditional restriction: E\f(Xt+I,Zt,00)]=0 . (50) Equation (50) represents a set of n£ population orthogonality conditions. The sample analogue ofE[f(Xt+r,Zt,8)]: 9A9)=]=Y/{Xt+t,Zue) , (51) (=1 forms the basis for the GMM objective function. Note that for any given value of 8 the vector gT(0) is just the sample mean of T realizations of the random vector f[Xt+1,Zt,8). Given that/(-) is continuous and {Xt,Zty^=_ao is strictly stationary and ergodic we have: gT(8)^E[f(Xt+1,Zt,8)] (52) by the law of large numbers. Thus if the economic model is valid the vector gT(6o) should be close to zero when evaluated for a large number of observations. The GMM estimator of 8q is obtained by choosing the value of 8 that minimizes the overall deviation of gT(0) from zero. As long as E\f(Xt+r, Zt, 8)] is continuous in 8 it follows that this estimator is consistent under fairly general regularity conditions. If the model is exactly identified (m — rd), the GMM estimator is the value of 8 that sets the sample moments equal to zero. For the more common situation where the model is overidentified (m < rd), finding a vector of parameters that sets all of the sample moments equal to zero is not feasible. It is possible, however, to find a value of 8 that sets m linear combinations of the rd sample moment conditions equal to zero. We simply let AT be an m x n£ matrix such that ATgT{8) = 0 has a well-defined solution. The value of 8 that solves this system of equations is the GMM estimator. Although we have considerable leeway in choosing the weighting matrix At, Hansen (1982) shows that the variance- covariance matrix of the estimator is minimized by letting At equal D'TS^1 where DT and St are consistent estimates of: oo and So = Y, r«C/') , (53) J=—oo with r0(j) = E\f(Xt+r,Zt>Oo)f(Xt+T-j,Zt-j,Oo)']. Before considering how to derive this result we first have to establish the asymptotic normality of GMM estimators. C. Asymptotic normality of GMM estimators We begin by expressing equation (51) as: Do dh(Xt+r,8) 88'

Instrumental variables estimation of conditional beta pricing models 51 y/TgT{6)=±=Y,f{Xt+l,Zt,0) (54) The assumption that {Xt, Zty^_ao is stationary and ergodic, along with standard regularity conditions, implies that a version of the central limit theorem holds. In particular we have that: VfgT(e0)^N(0,S0) , (55) with So given by (53). This result allows us to establish the limiting distribution of the GMM estimator 6T. First we make the following assumptions: 1. The estimator 8T converges in probability to do- 2. The weighting matrix AT converges in probability to A0 where A0 has rank m. 3. Define: 1 T ^4e 'dh{Xt+T,0) T § flWt\ A . , HI 1 TfiK d6' ~-i- (56) t=\ \ Or / For any 0T such that #7—>#o the matrix DT converges in probability to Z)0 where Do has rank m. Then we apply the mean value theorem to obtain: gT{0T) = gT(00) + D*T(0T - 00) , (57) where D*T is given by (56) with QT replaced by a vector 8*T that lies somewhere within the interval whose endpoints are given by 8T and do- Recall that 8T is the solution to the system of equations ATgT(0) = 0. So if we premultiply equation (57) by At we have: ATgr(.Oo) + ATDT{0T - 6>0) = 0 . (58) Solving (58) for (8T - 00) and multiplying by y/f gives: Vf(6T - e0) = -[ATDrrlATVfgT(6o) , (59) and by Slutsky's theorem we have: Vf(0T-Oo)^-[AoDo}-1Ao x {the limiting distribution of VfgT(6o)} . Thus the limiting distribution of the GMM estimator is: Vf(eT-eo)^N(0,(A0DoylA0SoA'0(AoDo)-1') . (61) Now that we know the limiting distribution of the generic GMM estimator we can determine the best choice for the weighting matrix AT- The natural metric by

52 C. R. Harvey and C. Kirby which to measure our choice is the variance-covariance matrix of the distribution shown in (61). We want, in other words, to choose the AT that minimizes the variance-covariance matrix of the limiting distribution of the GMM estimator. D. The asymptotically efficient weighting matrix The first step in determining the efficient weighting matrix is to note that S0 is symmetric and positive definite. Thus S0 can be written as S0 = PP' where P is nonsingular, and we can express the variance-covariance matrix in (61) as: V=(A0D0ylA0S0A'0(A0D0yv = (A0D0r[A0P((A0D0)-[A0P)' (62) = (h+ (Di0^D0yiD'0(pr1)(H+ MtfDor'wr1)' where: H= (A0D0r1A0P-(D'0^D0r1D'0(P)-1 . At first it may appear a bit odd to define H in this manner, but it simplifies the problem of finding the efficient choice for At- To see why this is true note that: HP1 Do = (AoDoy'AoPP-'Do - (D,0S^1Do)-iD'0(Pyip-lDo = 1-1 (63) = 0 As a consequence equation (62) reduces to: V=HH' + (D'0SQ-1Do)-i (64) Because His an m x n£ matrix with rank m it follows that Hit is positive definite. Thus (Dr0S0~1 D0)~[ is the lower bound on the asymptotic variance-covariance matrix of the GMM estimator. It is easily verified by direct substitution that choosing A0 = D'0So~l achieves this lower bound. This completes our review of the distribution theory for GMM estimators. Next we want to consider some of the practical aspects of GMM estimation and see how we might go about testing the restrictions implied economic models. We begin with a strategy for implementing the GMM procedure. E. The estimation procedure To obtain an estimate for the vector of unknown parameters 60 we have to solve the system of equations: ArgT(0) = 0 . Substituting the optimal choice for the weighting matrix into this expression yields:

Instrumental variables estimation of conditional beta pricing models 53 D'TSjXgT{e) = 0 , (65) where St is a consistent estimate of the matrix So- But it is apparent that (65) is just the first-order condition for the problem: min JT(B) = gABySr'gAO) . (66) 8 So given a consistent estimate of So we can obtain the GMM estimator for #o by minimizing the quadratic form shown in equation (66). In order to estimate #o we need a consistent estimate of So. But, in general, So is a function of do- The solution to this dilemma is to perform a two-step estimation procedure. Initially we set St equal to the identify matrix and perform the minimization to get a first-stage estimate for do- Although this estimate is not asymptotically efficient it is still consistent. Thus we can use it to construct a consistent estimate of So. Once we have a consistent estimate of So we obtain the second-stage estimate for 60 by minimizing the quadratic form shown above. Let's assume that we have performed the two-step estimation procedure and obtained the efficient GMM estimate of the vector of parameters do- Typically we would like to have some way of evaluating how well the model fits the observed data. One way of obtaining such a goodness-of-fit measure is to construct a test of the overidentifying restrictions. F. The test for overidentifying restrictions Suppose the model under consideration is overidentified (m < nl). Under such circumstances we can develop a test for the overall goodness-of-fit of the model. Recall that by the mean value theorem we can express gT(0T) as: 9t(0t) = gT(0o) + D*r(6T - fl0) • (67) If we multiply equation (67) by \ff and substitute for Vf(6T - Oo) from equation (59) we obtain: VfgT(eT) = (I~DT(ATDr)-1AT)VfgT(eo) . (68) Substituting in the optimal choice for AT yields: VfgT(eT) = (I~DT(D'rST:1D*rylDlrST-l)VTgT(0o) , (69) so that by Slutsky's theorem: VfgT(eT)M'- iMJWA))-1^1) x N(0,So) . (70) Because So is symmetric and positive definite it can be factored as S0 = W, where P is nonsingular. Thus (70) can be written as:

54 C. R. Harvey and C. Kirby Vfp-'gAer^il-p-'Doiiy^Doy'D^Py^xN^r) . (71) The matrix premultiplying the normal distribution in (71) is idempotent with rank nl — m. It follows, therefore, that the overidentifying test statistic: MT = Tgr(0T)'S^lgr(0T) (72) converges to a central chi-square random variable with nl — m degrees of freedom. The limiting distribution of Mj remains the same if we use a consistent estimate ST in place of S0- Note that in many respects the test for overidentifying restrictions is analogous to the Lagrange multiplier test in maximum likelihood estimation. The GMM estimator of Oq is obtained by setting m linear combinations of the nl orthogonality conditions equal to zero. Thus there are nl — m linearly independent combinations which have not been set equal to zero. Suppose we took these nl — m linear combinations of the moment conditions and set them equal to a (nl — m) x 1 vector of unknown parameters a. The system would then be exactly identified and Mj would be identically equal to zero. Imposing the restriction that a = 0 yields the efficient GMM estimator along with a quantity TgT(6T)'S^.1 gT(6T) that can be viewed as the GMM analogue of the score form of the Lagrange multiplier test statistic. The test for overidentifying restrictions is appealing because it provides a simple way to gauge how well the model fits the data. It would also be convenient, however, to be able to test restrictions on the vector of parameters for the model. As we shall see, such tests can be constructed in a straightforward manner. G. Hypothesis testing in GMM Suppose that we are interested in testing restrictions on the vector of parameters of the form: q(00) = 0 , (73) where q is a known p x 1 vector of functions. Let the p x m matrix Q0 = dq/dO' denote the Jacobian of q(6) evaluated at Oq. By assumption Q0 has rank p. We know that for the efficient choice of the weighting matrix the limiting distribution of the GMM estimator is: Vf(eT-e0)^N(o,(iy0sslDo)-1) . (74) Thus under fairly general regularity conditions the standard large-sample test criteria are distributed asymptotically as central chi-square random variables with p degrees of freedom when the restrictions hold. Let 6ur and ffT denote the unrestricted estimator and the estimator obtained by minimizing Jt{0) subject to q(0) = 0. The Wald test statistic is based on the unrestricted estimator. It takes the form:

Instrumental variables estimation of conditional beta pricing models 55 WT = Tq{0T)'{QT{D'TS-TlDT)-lQ'T)-lq{eT) , (75) where QT, DT and St are consistent estimates of Q0, D0 and So computed using ffj. The Lagrange multiplier test statistic is constructed using the gradient of Jt(0) evaluated at restricted estimator. It is given by: LMT = TgT{erT)%lDT{BfTS^lDT)~lDfTS^gT[»T) » (76) where DT and St are consistent estimates of Do and So computed from ffj. The likelihood ratio type test statistic is equal to the difference between the over- identifying test statistic for the restricted and unrestricted estimations: LRT = T(gT(tfT)'SjlgT((rT) ~ 9tW)%19t{^t)) ■ (77) The same estimate St must be used for both estimations. It should be clear from the foregoing discussion that a consistent estimate of So is one of the key elements of the GMM approach to estimation and testing. In practice there are a number of different methods for estimating So, and the appropriate method often depends on the specific characteristics of the model under consideration. The discussion below provides an introduction to heteroskedasticity and autocorrelation consistent estimation of the variance- covariance matrix. A more detailed treatment can be found in Andrews (1991). H. Robust estimation of the variance-covariance matrix The variance-covariance matrix of Vfgr(Oo) is given by: oo So = £ r0(j) , (78) J=-oo where r0(/) = E[f(Xt+z,Zt,Oo)f(Xt+z-j,Zt-j,Oo)']. Because we have assumed stationarity, this matrix can also be written as: oo So = ro(o) + ^(ro(/) + ro(/)') , (79) using the relation r0(-y') = r0(/')'. Now we want to consider how we might go about estimating So consistently. First take the scenario where the vector J[Xt+z,Zt,Oo) is serially uncorrelated. Under such circumstances the second term on the right-hand side of equation (79) drops out and rr(0) = i/r^+tlz(,flr)/(^t,zr,()r)' t=\ provides a consistent estimate for So- The case where f[-) exhibits serial correlation is more complicated. Note that the sum in equation (79) contains an infinite number of terms. It is obviously

56 C. R. Harvey and C. Kirby impossible to estimate each of these terms. One way to proceed would be to treat /(•) as if it were serially correlated for a finite number of lags L. Under such circumstances a natural estimator for S0 would be: sr = rr(o)+ ]£(/>(/)+ />(/)') (80) 7=1 where />(/) = \/TY:li+jJ{Xt+x,Zt,er)J{Xt+^j,Zt_j,er)'. As long as the individual rT(j) in equation (80) are consistent the estimator ST will be consistent providing that L is allowed to increase at suitable rate as the sample size T increases. But the estimator of 50 in (80) is not guaranteed to be positive semi- definite. This can lead to problems in empirical work. The solution to this difficulty is to calculate St as a weighted sum of the rT{j) where the weights gradually decline to zero as j increases. If these weights are chosen appropriately then ST will be both consistent and positive semidefinite. Suppose we begin by defining the nl(L + 1) x rd(L + 1) partitioned matrix: CT(L) = />(0) />(1) />(0) rT{L) rT{L-\) rT{L)' rT(L-\)' />(o) (81) The matrix Ct(L) can always be written in the form Ct(L) = FT where Y is an (T + L) x n£(L + 1) partitioned matrix. Take L = 2 as an example. The matrix Y is given by: Y = Vf o o f[xl+1,zx,eT)' o j[xl+t,zueT)' ■ J[Xi+x,Zi,6T) : J{Xt+%,Zt,Qt) '■ J[Xt+i,Zt,0t) 0 \j[xT+x,zT,eT)' o o (82) From this result it follows that Ct(L) is a positive semidefinite matrix. Next consider the matrix: ST(L) = [a07 oliI...ollI] />(0) ... rT(L)' />(1) ... rT{L-\)' jT(L) ... rr(0) a0/' a]/ (83) where the a, are scalars. Because St(L) is the partitioned-matrix equivalent of a quadratic form in a positive semidefinite matrix it must also be positive semi- definite. Equation (83) can be rearranged to show that:

Instrumental variables estimation of conditional beta pricing models 57 ST(L) = (a20 + --- + a2L)rT(0) + E (|! a<a^) (rrW + *>(/)') • (84) The weighted sum on right-hand side of equation (84) has the general form of an estimator for the variance-covariance matrix So. Thus if we select the a, so that the weights in (84) are a decreasing function of L and we allow L to increase with the sample size at an appropriately slow rate we obtain a consistent positive semidefinite estimator for S0. The modified Bartlett weights proposed by Newey and West (1987) have been used extensively in empirical research. Let wj be the weight placed on rT(j) in the calculation of the variance-covariance matrix. The weighting function for modified Bartlett weights takes the form: Wj={l~l^\ 7 = 0,1,2,...,! ,85j 1 \0 ]>L, X ' where L is the lag truncation parameter. Note that these weights are obtained by setting at = l/y/T+T for i = 0,1,... ,L. Newey and West (1987) show that if L is allowed to increase at a rate proportional to T1^ then St based on these weights will be a consistent estimator of So. Although the weighting scheme proposed by Newey and West (1987) is popular, recent research has shown that other schemes may be preferable. Andrews (1991) explores both the theoretical and empirical performance of a variety of different weighting functions. Based on his results Parzen weights seem to offer an good combination of analytic tractability and overall performance. The weighting function for Parzen weights is: i-£ + ¥ 0<£<i 20 -if i<f<l • (86) 0 i>\ The final question we need to address is how choose the lag truncation parameter L in (86). The simplest strategy is to follow the suggestions of Gallant (1987) and set L equal to the integer closest to T1^5. The main advantage of this plug-in approach is that it is yields an estimator that depends only on the sample size for the data set in question. An alternative strategy developed by Andrews (1991), however, may lead to better performance in small samples. He suggests the following data-dependent approach: use the first-stage estimate of 0O to construct the sample analogue of J[Xt+t,Zt,6o). Then estimate a first-order autoregressive model for each element of this vector. The autocorrelation coefficients along with the residual variances can be used to estimate the value of L that minimizes the asymptotic truncated mean-squared-error of the estimator. Andrews (1991) presents Monte Carlo results that suggest that estimators of So constructed in this manner perform well under most circumstances.

58 C. R. Harvey and C. Kirby 6. Closing remarks Asset pricing models often imply that the expected return on an asset can be written as a linear function of one or more beta coefficients that measure the asset's sensitivity to sources of undiversifiable risk in the economy. This linear tradeoff between risk and expected return makes such models both intuitively appealing and analytically tractable. A number of different methods have been proposed for estimating and testing beta pricing models, but the method of instrumental variables is the approach of choice in most situations. The primary advantage of the instrumental variables approach is that it provides a highly tractable way of characterizing time-varying risk and expected returns. This paper provides an introduction the econometric evaluation of both conditional and unconditional beta pricing models. We present numerous examples of how the instrumental variable methodology can be applied to various models. We began with a discussion of the conditional version of the Sharpe (1964) - Lintner (1965) CAPM and used it to illustrate how the instrumental variables approach could be used to estimate and test single beta models. Then we extended the analysis to models with multiple betas and introduced the concept of latent variables. We also provided an overview of the generalized method of moments approach (GMM) to estimation and testing. All of the techniques developed in this paper have applications in other areas of asset pricing as well. References Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817-858. Bansal, R. and C. R. Harvey (1995). Performance evaluation in the presence of dynamic trading strategies. Working Paper, Duke University, Durham, NC. Beneish, M. D. and C. R. Harvey (1995). Measurement error and nonlinearity in the earnings-returns relation. Working Paper, Duke University, Durham, NC. Black, F. (1972). Capital market equilibrium with restricted borrowing. J. Business 45, 444-454. Blake, I. F. and J. B. Thomas (1968). On a class of processes arising in linear estimation theory. IEEE Transactions on Information Theory IT-14, 12-16. Bollerslev, T., R. F. Engle and J. M. Wooldridge (1988). A capital asset pricing model with time varying covariances. J. Politic. Econom. 96, 116-31. Breeden, D. (1979). An intertemporal asset pricing model with stochastic consumption and investment opportunities. J. Financ. Econom. 7, 265-296. Campbell, J. Y. (1987). Stock returns and the term structure. J. Financ. Econom. 18, 373^00. Carhart, M. and R. J. Krail (1994). Testing the conditional CAPM. Working Paper, University of Chicago. Chu, K. C. (1973). Estimation and decision for linear systems with elliptically random processes. IEEE Transactions on Automatic Control AC-18, 499-505. Cochrane, J. (1994). Discrete time empirical finance. Working Paper, University of Chicago. Devlin, S. J. R. Gnanadesikan and J. R. Kettenring, Some multivariate applications of elliptical distributions. In: S. Ideka et al., eds., Essays in probability and statistics, Shinko Tsusho, Tokyo, 365-393.

Instrumental variables estimation of conditional beta pricing models 59 Dybvig, P. H. and S. A. Ross (1985). Differential information and performance measurement using a security market line. J. Finance 40, 383-400. Dumas, B. and B. Solnik (1995). The world price of exchange rate risk. J. Finance 445^80. Fama, E. F. and J. D. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. J. Politic. Econom. 81, 607-636. Ferson, W. E. (1990). Are the latent variables in time-varying expected returns compensation for consumption risk. J. Finance 45, 397-430. Ferson, W. E. (1995). Theory and empirical testing of asset pricing models. In: Robert A. J. W. T. Ziemba and V. Maksimovic, eds. North Holland 145-200 Ferson, W. E., S. R. Foerster and D. B. Keim (1993). General tests of latent variables models and mean-variance spanning. J. Finance 48, 131-156. Ferson, W. E. and C. R. Harvey (1991). The variation of economic risk premiums. J. Politic. Econom. 99, 285-315. Ferson, W. E. and C. R. Harvey (1993). The risk and predictability of international equity returns. Rev. Financ. Stud. 6, 527-566. Ferson, W. E. and C. R. Harvey (1994a). An exploratory investigation of the fundamental determinants of national equity market returns. In: Jeffrey Frankel, ed., The internationalization of equity markets, Chicago: University of Chicago Press, 59-138. Ferson, W. E. and R. A. Korajczyk (1995) Do arbitrage pricing models explain the predictability of stock returns. J. Business, 309-350. Ferson, W. E. and Stephen R. Foerster (1994). Finite sample properties of the Generalized Method of Moments in tests of conditional asset pricing models. J. Financ. Econom. 36, 29-56. Gallant, A. R. (1981). On the bias in flexible functional forms and an essentially unbiased form: The Fourier flexible form. J. Econometrics 15, 211-224. Gallant, A. R. (1987). Nonlinear statistical models. John Wiley and Sons, NY. Gallant, A. R. and G. E. Tauchen (1989). Seminonparametric estimation of conditionally constrained heterogeneous processes. Econometrica 57, 1091-1120. Gallant, A. R. and H. White (1988). A unified theory of estimation and inference for nonlinear dynamic models. Basil Blackwell, NY. Gallant, A. R. and H. White (1990). On learning the derivatives of an unknown mapping with multilayer feedforward networks. University of California at San Diego. Gibbons, M. R. and W. E. Ferson (1985). Tests of asset pricing models with changing expectations and an unobservable market portfolio. J. Financ. Econom. 14, 217-236. Glodjo, A. and C. R. Harvey (1995). Forecasting foreign exchange market returns via entropy coding. Working Paper, Duke University, Durham NC. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50, 1029-1054. Hansen, L. P. and R. J. Hodrick (1983). Risk averse speculation in the forward foreign exchange market: An econometric analysis of linear models. In: Jacob A. Frenkel, ed., Exchange rates and international macroeconomics, University of Chicago Press, Chicago, IL. Hansen, L. P. and R. Jagannathan (1991). Implications of security market data for models of dynamic economies. J. Politic. Econom. 99, 225-262. Hansen, L. P. and R. Jagannathan (1994). Assessing specification errors in stochastic discount factor models. Unpublished working paper, University of Chicago, Chicago, IL. Hansen, L. P. and S. F. Richard (1987). The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models. Econometrica 55, 587-613. Hansen, L. P. and K. J. Singleton (1982). Generalized instrumental variables estimation of nonlinear rational expectations models. Econometrica, 50, 1269-1285. Harvey, C. R. (1989). Time-varying conditional covariances in tests of asset pricing models. J. Financ. Econom. 24, 289-317. Harvey, C. R. (1991a). The world price of covariance risk. J. Finance 46, 111-157. Harvey, C. R. (1991b). The specification of conditional expectations. Working Paper, Duke University.

60 C. R. Harvey and C. Kirby Harvey, C. R. (1995), Predictable Risk and returns in emerging markets, Rev. Financ. Stud. 773-816. Harvey, C. R. and C. Kirby (1995). Analytic tests of factor pricing models. Working Paper, Duke University, Durham, NC. Harvey, C. R., B. H. Solnik and G. Zhou (1995). What determines expected international asset returns? Working Paper, Duke University, Durham, NC. Huang, R. D. (1989). Tests of the conditional asset pricing model with changing expectations. Unpublished working Paper, Vanderbilt University, Nashville, TN. Jagannathan, R. and Z. Wang (1996). The CAPM is alive and well. J. Finance 51, 3-53. Kan, R. and C. Zhang (1995). A test of conditional asset pricing models. Working Paper, University of Alberta, Edmonton, Canada. Keim, D. B. and R. F. Stambaugh (1986). Predicting returns in the bond and stock market. J. Financ. Econom. 17, 357-390. Kelker, D. (1970). Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhya, series A, 419-430. Kirby, C (1995). Measuring the predictable variation in stock and bond returns. Working Paper, Rice University, Houston, Tx. Lintner, J. (1965). The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Rev. Econom. Statist. 47, 13-37. Merton, R. C. (1973). An intertemporal capital asset pricing model. Econometrica 41, 867-887. Newey, W. K. and K. D. West (1987). A simple, positive semi-definite, heteroskedasticity-consistent covariance matrix. Econometrica 55, 703-708. Potscher, B. M. and I. R. Prucha (1991a). Basic structure of the asymptotic theory in dynamic nonlinear econometric models, part I: Consistency and approximation concepts. Econometric Rev. 10, 125-216. Potscher, B. M. and I. R. Prucha (1991b). Basic structure of the asymptotic theory in dynamic nonlinear econometric models, part II: Asymptotic normality. Econometric Rev. 10, 253-325. Ross, S. A. (1976). The arbitrage theory of capital asset pricing. J. Econom. Theory 13, 341-360. Shanken, J. (1990). Intertemporal asset pricing: An empirical investigation. J. Econometrics 45, 99- 120. Sharpe, W. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. J. Finance 19, 425^42. Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall. Solnik, B. (1991). The economic significance of the predictability of international asset returns. Working Paper, HEC-School of Management. Vershik, A. M. (1964). Some characteristics properties of Gaussian stochastic processes. Theory Probab. Appl. 9, 353-356. White, H. (1980). A heteroskedasticity consistent covariance matrix estimator and a direct test of heteroskedasticity. Econometrica 48, 817-838. Zhou, G. (1995). Small sample rank tests with applications to asset pricing. J. Empirical Finance 2, 71- 94.

G.S. Maddala and C.R. Rao, eds., Handbook of Statistics, Vol. 14 © 1996 Elsevier Science B.V. All rights reserved. 3 Semiparametric Methods for Asset Pricing Models Bruce N. Lehmann This paper discusses semiparametric estimation procedures for asset pricing models within the generalized method of moments (GMM) framework. GMM is widely applied in the asset pricing context in its unconditional form but the conditional mean restrictions implied by asset pricing theory are seldom fully exploited. The purpose of this paper is to take some modest steps toward removing these impediments. The nature of efficient GMM estimation is cast in a language familiar to financial economists: the language of maximum correlation or optimal hedge portfolios. Similarly, a family of beta pricing models provides a natural setting for identifying the sources of efficiency gains in asset pricing applications. My hope is that this modest contribution will facilitate more routine exploitation of attainable efficiency gains. 1. Introduction Asset pricing relations in frictionless markets are inherently semiparametric. That is, it is commonplace for valuation models to be cast in terms of conditional moment restrictions without additional distributional assumptions. Accordingly, a natural estimation strategy replaces population conditional moments with their sample analogues. Put differently, the generalized method of moments (GMM) framework of Hansen (1982) tightly links the economics and econometrics of asset pricing relations. While applications of GMM abound in the asset pricing literature, empirical workers seldom make full use of the GMM apparatus. In particular, researchers generally employ the unconditional forms of the procedures which do not exploit all of the efficiency gains inherent in the moment conditions implied by asset pricing models. There are two plausible reasons for this: (1) the information requirements are often sufficiently daunting to make full exploitation seem infeasible and (2) the literature on efficient semiparametric estimation is somewhat dense. The purpose of this paper is to take some modest steps toward removing these impediments. The nature of efficient GMM estimation is cast in terms familiar to financial economists: the language of maximum correlation or optimal hedge 61

62 B. N. Lehmann portfolios. Similarly, a family of beta pricing models provides a natural setting for identifying the sources of efficiency gains in asset pricing applications. My hope is that this modest contribution will facilitate more routine exploitation of attainable efficiency gains. The layout of the paper is as follows. The next section provides an outline of GMM basics with a view toward the subsequent application to asset pricing models. The third section lays out the links between the economics of asset prices when markets do not permit arbitrage opportunities and the econometrics of asset pricing model estimation given the conditional moment restrictions implied by the absence of arbitrage. The general efficiency gains discussed in these two sections are worked out in detain in the fourth section, which documents the sources of efficiency gains in beta pricing models. The final section provides some concluding remarks. 2. Some relevant aspects of the generalized method of moments (GMM) Before elucidating the links between GMM and asset pricing theory, it is worthwhile to lay out some GMM basics with an eye toward the applications that follow. The coverage is by no means complete. For example, the relevant large sample theory is only sketched (and not laid out rigorously) and that which is relevant is only a subset of the estimation and inference problems that can be addressed with GMM. The interested reader is referred to the three surveys in Volume 11 of this series Hall (1993), Newey (1993), and Ogaki (1993) for more thorough coverage and references. The starting point for GMM is a moment restriction of the form: E[gt(60)\It-l]=E\gt(80)} = 0 (2.1) where g (00) is the conditional mean zero random qxl vector in the model, 0o is the associated pxl vector of parameters in the model, and It-\ is some unspecified information set that at least includes lagged values of g (0O). The restriction to zero conditional mean random variables means that g (0q) follows a martingale difference sequence and, thus, is serially uncorrected.1 A variety of familiar econometric models take this form. Consider, for example, the linear regression model: Yt=xt% + et (2.2) where yt is the tth observation on the dependent variable, xt is a pxl vector of explanatory variables, and et is a random disturbance term. In this model, suppose that the econometrician observes a vector zt for which it is known that E[£f|if_i] = 0. Then this model is characterized by the conditional moment condition: 1 The behavior of GMM estimators can be readily established when gjjf) is serially dependent so long as a law of large numbers and central limit theorem apply to its time series average.

Semiparametric methods for asset pricing models 63 £»(&) = *&-*'> E[£^-ila-il = E[£^-i] = E[£fe-i = o ■ (2.3) When z,_j = x, this is the linear regression model with possibly stochastic re- gressors; otherwise, it is an instrumental variables estimator. GMM involves setting sample analogues of these moment conditions as close to zero as possible. Of course, they cannot all be set to zero if the number of linearly independent moment conditions exceeds the number of unknown parameters. Instead, GMM takes p linear combinations of these moment conditions and seeks values of 6 for which these linear combinations are zero. First, consider the unconditional version of the moment condition - that is, E[g (Gq)] = 0. In order for the model to be identified, assume that a (Oq) possesses a nonsingular population covariance matrix and that E[dgt(60)'/d6] has full row rank. The GMM estimator can be derived in two ways. Following Hansen (1982), the GMM estimator 6T minimizes the sample quadratic form based on a sample of T observations on g (0O): mragr(0)'0VC0o)£r(0) ; lr(0) =^^(0) (2.4) - t=\ given a positive definite weighting matrix ^V(^o) converging in probability to a positive definite limit W(60). In this variant, the econometrician chooses WT(60) to give the GMM estimator desirable asymptotic properties. Alternatively, we can simply define the estimator dT as the solution to the equation system: AT^ri^T) = ^MOo^iir) = 0 (2.5) t=\ where^7-(0o) is a sequence of p x ^0^(1) matrices converging to a limit A(6) with row rank p. In this formulation, AT(60) is chosen to give the resulting estimator desirable asymptotic properties. The estimating equations for the two variants are, of course, identical in form since: At(0o)It(Ot) = GT8TWT(80)gT(§r) = 0 ; _%(0)' 1^%(0O)' (2-6) G T(2) T 2-~i 96 Tj^ 86 For my purpose, equation (2.5) is a more suggestive formulation. The large sample behavior of 6T is straightforward, particularly in this case where g^Q^) a martingale difference sequence.2 An appropriate weak law of large numbers insures that gT(60)—>0, which, coupled with the identification conditions, implies that 6T^>6a. So long as the necessary time series averages converge: 2 The standard reference on estimation and inference in this framework is Hansen (1982).

64 B. N. Lehmarm M0o) = ^i>[£,(0o)0r(0o)lAs(0o) TtJ "^ "~ (2.7) \S(00)\>0 Gr(0o)-G(0o) the standard first order Taylor expansion coupled with Slutsky's theorem yields: and an appropriate central limit theorem for martingales ensures that Vf(6T - 0o) ^N[0, D(eo)S(0o)D(eo)'] . (2.9) Consistent standard error estimates that are robust to conditional hetero- skedasticity can be calculated from this expression by replacing 0$ with 6T? What choice of AT(Oo) or, equivalently, of WriOg) is optimal? All fixed weight estimators - that is, those that apply the same matrix At{Qsj) to each g (60) for fixed T - are consistent under the weak regularity conditions sketched above. Accordingly, it is natural to compare the asymptotic variances of estimators, a criterion that can, of course, be justified more formally by confining attention to the class of regular estimators that rules out superefficient estimators. The asymptotically optimal A^(60) is obtained by equating WT(0O) with SV(0o)~\ yielding an asymptotic covariance matrix of [G(0o)S(0o)_ G(0O)']_1. Once again, St (So) can be estimated consistently by replacing 0 with 6T.4 The optimal unconditional GMM estimator has a clear connection with the maximum likelihood estimator (MLE), even though we do not know the probability law generating the data. Let ifr(0o,jj) denote the logarithm of the population conditional distribution of the data underlying g (0q) where r\ is a possibly infinite dimensional set of nuisance parameters. Similarly, let i?J(0o,»?) denote the true score function, the vector of derivatives of i?f(0o>^)> with respect to 6. Consider the unconditional population projection of Z£\{Q$,r\) on the moment conditions g (0$): 3 Autocorrelation is not present under the hypothesis that (l(Q) has conditional mean zero and is sampled only once per period (that is, the data are not overlapping). If the data are overlapping, the moment conditions will have a moving average error structure. See Hansen and Hodrick (1980) for a discussion of covariance matrix estimation in this case and Hansen and Singleton (1982) and Newey and West (1987) for methods appropriate for more general autocorrelation. 4 The possible singularity of ST(6) is discussed indirectly in Section 4.3 as part of the justification for factor structure assumptions. While my focus is not on hypothesis testing, the quadratic form in the fitted value of the moment conditions and the optimal weighting matrix yields the test statistic T 3j.(67-)'St(02-)~137.(02-)—>Z2(? ~ p) since p degrees of freedom are used in estimating Q. This test of overidentifying conditions is known as Hansen's J test.

Semiparametric methods for asset pricing models 65 JS?;(0o)2) = Cov[if;(0Ol2),^(0o)']Var[0r(0o)]-10r(0o) + v^ut ; = -4>xP-1gf(e0) + vj?ut ; <? Mo)'' (2.10) do v = E\Sx(e0)gt(e0)'] since E[i?J(0Q, jj)'0 (0o)'] = ~^ 1S zero given sufficient regularity to allow differentiation of the moment condition E[g (0$)] = 0 under the integral sign. In this notation, the asymptotic variance of the unconditional GMM estimator is Hence, the optimal fixed linear combination of moment conditions A^,(6o) has the largest unconditional correlation with the true, but unknown, conditional score in finite samples. This fact does not lead to finite sample efficiency statements for at least two reasons. First, the MLE itself has no obvious efficiency properties in finite samples outside the case where the score takes the linear form I{Qa){S- — So) where /(0O) is the Fisher information matrix. Second, the feasible optimal estimator replaces 8$ with 0T in ^(0O), yielding a consistent estimator with no obvious finite sample efficiency properties. Nevertheless, the optimal fixed weight GMM estimator retains this optimality property in large samples. Now consider the conditional version of the moment condition; that is, E[g (0o)|/r_i] = 0. The prior information available to the econometrician is that g (0O) is a martingale difference sequence. Hence, the econometrician knows only that linear combinations of the g (6$) with weights based on information available at time t—\ have zero means - nonlinear functions of g (0$) have unknown moments given only the martingale difference assumption. Since the econometrician is free to use time varying weights, consider estimators of the form:5 1 T -Y,At-xgt{0T) = Q- 4_ie/,_i (2.11) 1 r=i where At-\ is a sequence of p x q op{\) matrices chosen by the econometrician. In order to identify the model, assume gf((0o) has a nonsingular population conditional covariance matrix E[gr (0o)^ (0o)/[7r_1] and that E[dg (Qo)'/d6\It-i] has full row rank. The basic principles of asymptotically optimal estimation and inference in the conditional and unconditional cases are surprisingly similar ignoring the difficulties associated with the calculation of conditional expectations E[«|/r].6 Once again, under suitable conditional versions of the regularity conditions sketched above: 5 The estimators could, in principle, involve nonlinear functions of these time series averages but their asymptotic linearity means that their effect is absorbed in At-\. 6 Hansen (1985), Tauchen (1986), Chamberlain (2987), Hansen, Heaton, and Ogaki (1988), Newey (1990), Robinson (1991), Chamberlain (1992), and Newey (1993) discuss efficient GMM estimation in related circumstances.

66 B. N. Lehmann 1 T -i *<-i = E as dgt(80)' 1 T t=i >Dc(e)0 ; 80 \h (2.12) *Sc(So) the sample moment condition (2.11) is asymptotically linear so that: V^(gr_&)4-Z>c(&)-^£>_lgr (So) and Vf(6T - Oa) ZN%Dc(eo)Sc{0o)Dc(0o)'] (2.12) (2.13) The econometrician can choose the weighting matrices At_\ to minimize the asymptotic variance of this estimator. The weighting matrices A°t_\ which are optimal in this sense are given by: 1t-i *,_i«7-!i ; «Vi = e^)^)'!/,-!] and the resulting minimal asymptotic variance is: Var[VT(0r-0o)]^ 7fP'-^'-v (2.14) (2.15) 4[E(*,_197-i*<- i-i The evaluation of A\'_{ need not be straightforward and doing so in asset pricing applications is the main preoccupation of Section 4.7 The relations between the optimal conditional GMM estimator and the MLE are similar to the relations arising in the unconditional case. The conditional population projection of .S?J(0o, r\) on the moment conditions g (Qq) reveals that: 7 The implementation of this efficient estimator is straightforward given the ability to calculate the relevant conditional expectations. Under weak regularity conditions, the estimator can be implemented in two steps by first obtaining an initial consistent estimate (perhaps using the unconditional GMM estimator (2.5)), estimating the optimal weighting matrix At_\ using this preliminary estimate, and then solving (2.14) for the efficient conditional GMM estimator. Of course, equations (2.11) and (2.14) can be iterated until convergence, although the iterative and two step estimators are asymptotically equivalent to first order.

Semiparametric methods for asset pricing models 67 ^(So.i?) = Cov[if;(0o,»Z),^(0o)'|/,-i]Var[^(0o)|/,_1]-1 gj^) +v#ct since E[i?',(i90, jj)g (Qo)'+ dg {0^)'IdQ\It-\] is zero given sufficient regularity to interchange the order of differentiation and integration of the conditional moment condition E[g (0o)|7,_i] = 0. Hence, the optimal linear combination of moment conditions A°t_x has the largest conditional correlation with the true, but unknown, conditional score in finite samples. While this observation does not translate into clear finite sample efficiency statements, the GMM estimator based on A°t_x is that which is most highly correlated with the MLE asymptotically. It is easy to characterize the relative efficiency of the optimal conditional and unconditional GMM estimators. As is usual, the variance of the difference between the optimal unconditional and conditional GMM estimators is the difference in their variances since the latter is efficient relative to the former. The difference in the optimal weights given to the martingale increments g (<90) is: 4-i -4(0<>) = [**-i - gt(6)}v;_\ + gt{6)[v;\ - srm1] 4[*/_1-*]«^i1 + *[«pr_11-y-1]. Note that the law of iterated expectations applies to both <P,_i and *f,_i separately but not to the composite A°t_x so that E[A°t_x —A^Oq)] does not generally converge to zero. In any event, the relative efficiency of the conditional estimator is higher when there is considerable time variation in both $,_i and f^i. Finally, the conventional application of the GMM procedure lies somewhere between the conditional and unconditional cases. It involves the observation that zero conditional mean random variables are also uncorrelated with the elements of the information set. Let Z,_i e I,^\ denote an r x q(r > p) matrix of predetermined variables and consider the revised moment conditions E[Zt.lg4{60)\It-l] = EiZ^Oo)] = 0 V 3_! e /,_! . (2.18) In the unconditional GMM procedure discussed above, Zt_\ is lq, the q x q identity matrix. In many applications, the same predetermined variables z,_! multiply each element of g (6$) so that Zt_\ takes the form lq®zt-\- Finally, different subsets of the information available to the econometrician zit_{ g lt_\ can be applied to each element of g (0q) so that Zt_\ is given by Zt- (Zxt-i Q ■•• 0 \ V Q o ••• z^_J (2.19) While optimal conditional GMM can be applied in this case, the main point of this procedure is to modify unconditional GMM. As before, the unconditional population projection of i^(#o) on the moment conditions Z,_\g (6$) yields

68 B. N. Lehmarm i?;(0o,2) = Cov[if;(0o,?I),0r(0o)'z;_i]Var[Zr_1gr(0o)]-1Zr_1£r(0o) + v^uZt (2.20) VguZt $7. VZ = E{Zr_l£r(0o)^(0o)Xi} since E{i?'r(0o, r^g^)' Z^} = -4>z given sufficient regularity to allow differentiation under the integral sign. The weights ^y^t-i can also be viewed as a linear approximation to the optimal conditional weights A^{ = (P^if^. Put differently, ^4°_1 would generally be a nonlinear function of Zr_i if Zr_i were the relevant conditioning information from the perspective of the econ- ometrician. 3. Asset pricing relations and their econometric implications Modern asset pricing theory follows from the restrictions on security prices that arise when markets do not permit arbitrage opportunities. That the absence of arbitrage implies substantive restrictions is somewhat surprising. Outside of international economics, it is not commonplace for the notion that two eggs should sell for the same price in the absence of transactions costs to yield meaningful economic restrictions on egg prices - after all, two eggs of equal grade and freshness are obviously perfect substitutes.8 By contrast, the no-arbitrage assumption yields economically meaningful restrictions on asset prices because of the nature of close substitutes in financial markets. Different assets or, more generally, portfolios of assets may be perfect substitutes in terms of their random payoffs but this might not be obvious by inspection since the assets may represent claims on seemingly very different cash flows. The asset pricing implications of the absence of arbitrage have been elucidated in a number of papers including Rubinstein (1976), Ross (1978b), Harrison and Kreps (1979), and Chamberlain and Rothschild (1983), Hansen and Richard (1987). Consider trade in a securities market on two dates: date t — 1 (i.e., today) and date t (i.e., tomorrow). There are N risky assets, indexed by i = 1, • • •, N, which need not exhaust the asset menu available to investors. The nominal price of asset i today is Pu~\. Its value tomorrow - that is, its price tomorrow plus any cash flow distribution between today and tomorrow - is uncertain from the perspective of today and takes on the random value Pit + Dit tomorrow. Hence, its gross return (that is, one plus its percentage return) is given by Rit = (Pu + Dit)/Pit_i. Finally, the one period riskless asset, if one exists, has the sure gross return Rft = \)Pft-\ and i always denotes a suitably conformable vector of ones. This observation was translated into a lively diatribe by Summers (1985, 1986).

Semiparametric methods for asset pricing models 69 The market has two crucial elements: one environmental and one behavioral. First, the market is frictionless: trade takes place with no taxes, transactions costs, or other restrictions such as short sales constraints.9 Second, investors vigorously exploit any arbitrage opportunities, behavior that is facilitated by the no frictions assumption, that is, investors are delighted to make something for nothing and they can costlessly attempt to do so. In order to illustrate the asset pricing implications of the absence of arbitrage, suppose that a finite number of possible states of nature s = 1, ...,5 can occur tomorrow and that the possible security values in these states are Pht + Dut.10 Clearly, there can be at most min [N,S] portfolios with linearly independent payoffs. Hence, the prices of pure contingent claims - securities that pay one unit of account if state s occurs and zero otherwise - are uniquely determined if N > S and if there are at least S assets with linearly independent payoffs. If N < S, the prices of such claims are not uniquely determined by arbitrage considerations alone, although they are restricted to lie in an N-dimensional subspace if the asset payoffs are linearly independent. Let ^ist_\ denote the price of a pure contingent claim that pays one unit of account if state s occurs tomorrow and zero otherwise. These state prices are all positive so long as each state occurs with positive probability according to the beliefs of all investors. The price of any asset is the sum of the values of its payoffs state by state.11 In particular: s s or, equivalently: s s ^st-xRis, = i ; R/t-i $>*-! = i • (3.2) s=\ s=\ Since they are non-negative, scaling state prices so that they sum to one gives them all of the attributes of probabilities. Hence, these risk neutral probabilities: 9 Some frictions can be easily accommodated in the no-arbitrage framework but general frictions present nontrivial complications. For recent work that accommodates proportional transactions costs and short sales constraints, see Hansen, Heaton, and Luttmer (1993), He and Modest (1993), and Luttmer (1993). 10 The restriction to two dates involves little loss of generality as the abstract states of nature could just as easily index both different dates and states of nature. In addition, most of the results for finite S carry over to the infinite dimensional case, although some technical issues arise in the limit of continuous trading. See Harrison and Kreps (1979) for a discussion. 11 The frictionless market assumption is implicit in this statement. In markets with frictions, the return of a portfolio of contingent claims would not be the weighted average of the returns on the component securities across states but would also depend on the trading costs or taxes incurred in this portfolio.

70 B. N. Lehmarm "st-l ■■Rfrt. st-l Pft-i (3.3) comprise the risk neutral martingale measure, so called because the price of any asset under these probability beliefs is given by: P„-i = Pft-i ^2 Kt-i (P*t + Dist) (3.4) s=l that is, its expected present value. Risk neutral probabilities are one summary of the implications of the absence of arbitrage; they exist if and only if there is no arbitrage. This formulation of the state pricing problem is extremely convenient for pricing derivative claims. Under the risk neutral martingale measure, the riskless rate is the expected return of any asset or portfolio that does not change the span of the market and for which there is a deterministic mapping between its cash flows and states of nature. However, it is not a convenient formulation for empirical purposes. Actual return data is provided according to the true (objective) probability measure. That is, actual returns are generated under rational expectations. Accordingly, let nst-\ be the objective probability that state s occurs at time t given some arbitrary set of information available at time t—\ denoted by lt-\. The reformulation of the pricing relations (3.1) and (3.2) in terms of state prices per unit probability qst-\ = \l/st_l/itst-i reveals: *Vi=E V»-i ^ qst-\ {Pist + Djst) \It-i s=l S = E[Qt(Pit+Dit)\It f-i X^-'l7'- .5=1 (3.5) E[&|/«- or, equivalently, in their expected return form: s UstUt-l y~]qst-\RiSt\I, s=l S / ,qst-\Rft-\\it s=l E[&W-i] = 1 = RftE[Qt\It_l] = l (3.6) At this level of generality, these conditional moment restrictions are the only implications of the hypothesis that markets are frictionless and that market prices are marked by the absence of arbitrage. Asset pricing theory endows these conditional moment conditions with expirical content through models for the pricing kernel Qt couched in terms of

Semiparametric methods for asset pricing models 71 potential observables.12 Such models equate the state price per unit probability qst-\, the cost per unit probability of receiving one unit of account in state s, with some corresponding measure of the marginal benefit of receiving one unit of account in state s.13 Most equilibrium models equate Qt, adjusted for inflation, with the intertemporal marginal rate of substitution of a hypothetical, representative optimizing investor.14 The most common formulation is additively separable, constant relative risk aversion preferences for which Qt = p(c,/c,_i)~a where p is the rate of time preference, c,/c,_i is the rate of consumption growth, and a is the coefficient of relative risk aversion, all for the representative agent.15 Accordingly, let xt denote the relevant observables that characterize these marginal benefits in some asset pricing model. Hence, pricing kernel models take the general form: Qt = Q(xl,iQ) ; a>0 ; *,€/, (3.7) where Oq is a vector of unknown parameters. To be sure, the parametric component can be further weakened in settings where it is possible to estimate the function g(«) nonparametrically given only observations on R^ and xt. However, the bulk of the literature involves models in the form (3.7).16 Equations (3.5) through (3.7) are what make asset pricing theory inherently semiparametric.17 The parametric component of these asset pricing relations is a 12 It is also possible to identify the pricing kernel nonparametrically with the returns of particular portfolios. For example, the return of growth optimal portfolio which solves max E{lnvi'(_15(|//_i;H'g/_i 6/(_i} is equal to Q~x. Of course, it is hard to solve this maximum problem without parametric distributional assumptions. See Bansal and Lehmann (1955) for an application to the term structure of interest rates. The addition of observables can serve to identify payoff relevant states, giving nonparametric estimation a somewhat semiparametric flavor. Put differently, the econo- metrician typically observes a sequence of returns without information on which states have been realized; the vector x, provides is an indicator of the payoff relevant state of nature realized at time t that helps identify similar outcomes (i.e., states with similar state prices per unit probability). Bansal and Viswanathan (1993) estimate a model along these lines. 13 The marginal benefit side of this equation rationalizes the peculiar dating convention for Q, when it is equal to the time t-\ state price per unit probability. 14 Embedding inflation in Q, eliminates the need for separate notation for real and nominal pricing kernels. That is, Q, is equal to Q^PalPa~\ where Pa is an appropriate index for translating real cash flows and the real pricing kernel 2fai into nominal cash flows and kernels. 15 More general models allow for multiple goods and nonseparability of preferences in consumption over time and states as would arise from durability in consumption goods and from preferences marked by habit formation and non-expected utility maximization. Constantinides and Ferson (1991) summarize much of the durability and habit formation literatures, both theoretically and empirically. See Epstein and Zin (1991a) and Epstein and Zin (1991b) for similar models for Q, which do not impose state separability. Cochrane (1991) exploits the corresponding marginal conditions for producers. 16 Exceptions include Bansal and Viswanathan (1993) and the linear model Q, = a>J_1x, with («,_, unobserved, a model discussed in the next section. 17 To be sure, the econometrician could specify a complete parametric probability model for asset returns and such models figure prominently in asset pricing theory. Examples include the Capital Asset Pricing Model (CAPM) when it is based on normally distributed returns and the family of continuous time intertemporal asset pricing models when prices are assumed to follow ltd processes.

72 B. N. Lehmann model for the pricing kernel Q(x„0q). The conditional moment conditions (3.6) can then be used to identify any unknown parameters in the model for Q, and to test its overidentifying restrictions without additional distributional assumptions. Note also that the structure of asset pricing theory confers an obvious econometric simplification. The constructed variables QtRit — 1 constitute a martingale difference sequence and, hence, are serially uncorrelated. This fact greatly simplifies the calculation of the second moments of sample analogues of (3.6), which in turn simplifies estimation and inference.18 Moreover, the economics of these relations constrains how these conditional moment restrictions can be used for estimation and interference. Ross (1978b) observed that portfolios are the only derivative assets that can be priced solely as a function of observables, time, and primary asset values given only the absence of arbitrage opportunities in frictionless markets. The same is true for econometricians - for a given asset menu, the econometrician knows only the prices and payoffs of portfolios with weights w/_1 Glt~i. Hence, only linear combinations of the conditional moment conditions based on information available at time t—\ can be used to estimate the model. Accordingly, in the absence of distributional restrictions, the econometrician must base estimation and inference on estimators of the form: 1 T -Y,A,-x%Q{Xi,lQ) -i] = 0 ; A^ e /,_! (3.8) 1 (=i where At-\ is a sequence of p x N op{\) matrices chosen by the econometrician and p is the number of elements in Oq. The matrices At^\ can be interpreted as the weights of p portfolios with random payoffs At-iE* that cost At-\i units of account. How would a financial econometrician choose A°t_{> An econometrician who favors likelihood methods for their desirable asymptotic properties might prefer the p portfolios with maximal conditional correlation with the true, but unknown, conditional score. In this application, the conditional projection of ^",(Sjo,U) on %Q(x,,0Q) -i] is given by: Coy[^(90,r,),S4Q(xt,eQ)'\It^]ySiT[RtQ(xt,eQ)\It^r1 x &efe,ee)-i] + t^ce(, -*,_i ¥7-1 [&tQ(xt,eQ) - j] + v#cQt ; (3.9) aE[8fe,ge)g,|/(.i]/ do E{fee(xr, ee) - mtQ(xT,eQ) - i]'|/(~i} 18 This observation fails if returns and Q, are sampled more than once per period. For example, consider the two period total return (i.e., with full reinvestment of intermediate cash flows) Riv+\ =RitRit+\ which satisfies the two period moment condition E[Q,Ql+\Riltt+\ |/<_i] = 1. In this case, the constructed random variable g,g!+iiJ„,(+i-l follows a first order moving average process. See Hansen and Hodrick (1980) and Hansen, Heaton, and Ogaki (1988) for more complete discussions.

Semiparametric methods for asset pricing models 73 since E{i^(0o,rf)[R,Q(xt,6Q) - i\'\It-i} = -$t-\ given sufficient regularity to permit differentiation under the integral sign. The p portfolios with payoffs ^t-i^-i&t that cost 4>t-\f^il units of account have no obvious optimality properties from the perspective of prospective investors. However, they are definitely optimal from the perspective of financial econometricians - they are the optimal hedge portfolios for the conditional score of the true, but unknown, log likelihood function. Put differently, the economics and the econometrics coincide here. The econometrician can only observe conditional linear combinations of the conditional moment conditions and seeks portfolios whose payoffs provide information about the parameters of the pricing kernel Q(Xj,@q)- The optimal portfolio weights are $/_iy,".1, and the payoffs $(_i f^1^ maximize the information content of each observation, resulting in an incremental contribution of ^-ilPjl1! <£,_!< to the information about 6Q. In other words, the Fisher information matrix of the true score is Qt-xW'}^',^ — C and the positive semidefinite matrix C is the smallest such matrix produced by linear combinations of the conditional moment conditions. This development conceals a host of implementation problems associated with the evaluation of conditional expectations.19 To be sure, $r_i and fr-i can be estimated with nonparametric methods when they are time invariant functions #(Zr-i) and ^{zf-i) for zt_{ &It-\. The extension of the methods of Robinson (1987), Newey (1990), Robinson (1991), and Newey (1993) to the present setting, in which RTQ(JC,, 0q) — i is serially uncorrelated but not independently distributed over time or homoskedastic, appears to be straightforward. However, the circumstances in which A^ is a time invariant function of zt_x would appear to be the exception rather than the rule. Accordingly, the econometrician generally must place further restrictions on the no-arbitrage pricing model in order to proceed with efficient estimation based on conditional moment restrictions, a subject that occupies the next section. Alternatively, the econometrician can work with weaker moment conditions like the unconditional moment restrictions. The analysis of this case parallels that of optimal conditional GMM. Once again, the fixed weight matrices At{@o) from (2.10) are the weights of p portfolios with random payoffs AT(0o)Rt that cost At(§jo)i units of account. As noted in the previous section, the price of these random payoffs is 4>W~li which generally differs from E(A^_l)i. These portfolios produce the fixed weight moment condition that has maximum unconditional correlation with the derivatives of the true, but unknown, log likelihood function. 19 The nature of the information set itself is less of an issue. While investors might possess more information than econometricians, this is not a problem because the law of iterated expectations implies that E[iJ„g,|/^,j = lV/j^c/l-i. Of course, the conditional probabilities 7tj£_, implicit in this moment condition generally differ from those implicit in E[Rt,Q,\It^\] = 1 as will the associated values of the pricing kernel Qf (i.e., qft_x = "Psr-i/itf!,). The dependence of Qf on nft_x is broken in models for Q, that equate the state price per unit probability qsl_\ with the marginal benefit of receiving one unit of account in state s.

74 B. N. Lehmarm Of course, conventional GMM implementations use conditioning information within the optimal unconditional GMM procedure as discussed in the previous section. Let Z,_i e It_\ denote anrxJV matrix of predetermined variables and consider the revised moment conditions: Eft_i(&e&,0e)--)|Ji-i] = E[Z,_{ (RtQfa, Oq)-,)] =0 V Z,_! e /,_!. In the preceding paragraph, Z,_i is /#, the N x N identity matrix; otherwise, it could reflect identical or different elements of the information set available to investors (i.e., z,_! in IN <£>?,_! and za_1 in (2.19), respectively) being applied to each element of RtQ(xt,0o)—i as given in the previous section. The introduction of z!t_l and zt_{ into the unconditional moment condition (3.10) is often described as invoking trading strategies in estimation and inference following Hansen and Jagannathan (1991) and Hansen and Jagannathan (1994). This characterization arises because security returns are given different weights temporally and, when zrY_j ^ zt_l; cross-sectionally after the fashion of an active investor. In unconditional GMM, the returns weighted in this fashion are then aggregated into p portfolios with weights that are refined as information is added to (3.10) in the form of additional components of Zt-\. Once again, there is an optimal fixed weight portfolio strategy for the revised moment conditions based on Z,_i (RjQfa, 0q)— l)- From (2.20), the active portfolio strategy with portfolio weights ^z^^A-i has random payoffs ^z^'Z,-]^ and costs ^z^z^t-il units of account. The resulting moment conditions have the largest unconditional correlation with the true, but unknown, unconditional score in finite samples within the class of time varying portfolios with weights that are fixed linear combinations of predetermined variables Zt-\. Of course, optimal conditional weights can be obtained from the appropriate reformulation of (3.9) above but the whole point of this approach is that the implementation of this linear approximation to the optimal procedure is straightforward. 4. Efficiency gains within alternative beta pricing formulations The moment condition E[Q(x„ 6Q)Rit\It-\] = 1 is often translated into the form of a beta pricing model, so named for its resemblance to the expected return relation that arises in the Capital Asset Pricing Model (CAPM). Beta pricing models serve another purpose in the present setting; they highlight specific dimensions in which fruitful constraints on the pricing kernel model can be added to facilitate more efficient estimation and inference. Put differently, beta pricing models point to assumptions that permit consistent estimation of the components of A°t_v Accordingly, consider the population projection of the vector of risky asset returns R^ on <3(x,,0g):

Semiparametric methods for asset pricing models 75 £ = «, + £fi&, 0g) + & ; E|c,|/,_i] = 0 ^Covfe,efe,ge)|/(-i] (4.1) & Var[(gfe)0G)|//_i] and Var[«] and Cov[«] denote the variance and covariance of their arguments, respectively. Asset pricing theory restricts the intercept vector a, in this projection which are determined by substituting (4.1) into the moment condition (3.6): I = E&e&.flg)!/,-!] = atE[Q(xt, flg) |/,_!] +PtE\Q(xt,eQf\It-l] (4.2) which, after rearranging terms and insertion into (4.1), yields: Rt = a0t + j^lQix,,§q) -^]+e,; Efel/^i] = 0 ; ht = Efegfe, 0g)|/*-i]-1 ; AG/ = Ao,E02(xr»fiG)2|A-i] - The riskless asset, if one exists, earns Ao<; otherwise, Ao* is the expected return of all assets with returns uncorrelated with Qt. As noted earlier, the lack of serial correlation in the residual vector et is econometrically convenient. The bilinear form of (4.3) is a distinguishing characteristic of these beta pricing models. Put differently, the moment conditions (3.6) constrain expected returns to be linear in the covariances of returns with the pricing kernel. This linear structure is a central feature of all models based on the absence of arbitrage in frictionless markets; that is, the portfolio with returns that are maximally correlated with Qt is conditionally mean-variance efficient.20 Hence, these asset pricing relations differ from semiparametric multivariate regression models in their restrictions on risk premiums like Xq( and lo<-21 The multivariate representation of these no-arbitrage models produces a somewhat different, though arithmetically equivalent, description of efficient GMM estimation. The estimator is based on the moment conditions: 1 T - Y,Ai»-& = 0; k = Rt- dot - Pt[Q(xt, flg) - *&] (4.4) t=i and, after solving in terms of the expressions for Xot and Xq( (in particular, that E[2(x„ Qq) - XQt\It-\\ = -XotVa.r[Q(xt, 6Q)\It-i]) and given sufficient regularity to allow differentiation under the integral sign, the optimal choice of A\t_l is: 20 A portfolio is (conditionally) mean-variance efficient if it minimizes (conditional) variance for given level of (conditional) mean return. A portfolio is (conditionally) mean-variance efficient for a given set of assets if only if the (conditional) expected returns of all assets in the set are linear in their (conditional) convariances with the portfolio. See Merton (1972), Roll (1977), and Hansen and Richard (1987). 21 They differ in at least one other respect - most regression specifications with serially uncorrelated errors have E[£,|g,] = 0, which need not satisfied by (4.3).

76 B. N. Lehmarm *V-i -^/Var|fi(S()0G)|/,_1] = E[£,e/l^-i] ^(var[g(„ge)|^]| + 9Var^^)M,;) (4.5) -^(l-Variefe.fle)!/,-!]^)' °' 00 gg-d-cov[euc,,0e),&l^-i]) • The last line in the expression for ##_! illustrates the relations with (3.9) in the previous section. Note that the observation of the riskless rate eliminates the term involving dAot/dOg.22 There is no generic advantage to casting no-arbitrage models in this beta pricing form unless the econometrician is willing to make additional assumptions about the stochastic processes followed by returns.23 As is readily apparent, there are only three places where useful restrictions can be placed on beta pricing models: (1) constraints on the behavior of the conditional betas, (2) additional restrictions on the model QQc^Qq), and (3) on the regression residuals. We discuss each of these in turn in the Sections 4.1-4.3 and these ingredients are combined in Section 4.4. 4.1. Conditional beta models The benefits of a model for conditional betas are obvious. Conditional beta models facilitate the estimation of the pricing kernel model Q(xt,6o) by sharpening the general moment restrictions (3.6) with a model for the covariances embedded in them (i.e.,E|g(x„0G)*tf|/,_i] = CowlQ^^Ru^+^BiRu^-i]). They also mitigate some of the problems associated with efficient of asset pricing relations. Put differently, the econometrician is explicitly modeling some of the components of $pt-\ in this case. 21 In the case of risk neutral pricing, $^_i collapses to -(dXo,/d9)i since Var[g(x„ 6g)|/,„jj is zero and to zero if, in addition, the econometrician measures the riskless rate. 23 The law of iterated expectations does not apply to the second moments in these multivariate regression models so that this representation alone does nothing to sharpen unconditional GMM estimation. Additional covariances are introduced in the passage from conditional to unconditional moments because of the bilinear form of beta pricing models. The unconditional moment condition for security i is E[EaZjf-i|i<-i] = is[£j,z,(_[] = 0 Vz,(_j E /r_i and the sum of the two offending covariances Cov{ft((E[e(x,,e)-Ae,)|/,_i], z,,_i} + Cov{ft„ (E[Q(x„6) - Aq^EIz,,^) cannot be separated without further restrictions. *Vi Qpt-i =

Semiparametric methods for asset pricing models 11 Accordingly, suppose the econometrician observes a set of variables zt_x e It-\, perhaps also contained in x, (i.e., zt_l e x,), and specifies a model of the form: £ = /»&_, ,0p) ; &_, e/,_i (4.6) where 0» is the vector of unknown parameters in the model for /?. In these circumstances, the beta pricing model becomes: R, = iht + £(&_!, ip) [Q(xt, Oq) - XQt] + spt . (4.7) In the most common form of this model, the conditional betas are constant, the Zf_! is simply the scalar 1, and 6p is the corresponding vector of constant conditional betas p. All serial correlation in returns is mediated through the risk premiums given constant conditional betas.24 Models for conditional betas make efficient GMM estimation more feasible by refining the optimal weighting matrices since: *^_, = e(^|/,_,} = ^(Var[g(x(,fle)|/t-i]^(g^'^) ~d~w y-~ Varts(&.%)i^-i]£fe-i.^)J (4-8) where, as before, an observed riskless rate eliminates the last line of (4.8). Since the parameter vector 6 is (OqO/)', $pzt-\ and $pt-\ in (4.5) differ in two respects: =h,{ de m—£&-i.^)j 4^ ^ } = ^Varlefe,^)!/,-.] d-^M . (4.9) 24 Linear models of the form pit = 0;/S,-0z,-i are also common where Stp is a selection matrix that picks the elements of z,_j relevant for fiu. Linear models for conditional betas naturally arise when the APT holds both conditionally and unconditionally (cf., Lehmann (1992)). Some commercial risk management models allow Oy to vary both across securities and over time; see Rosenberg (1974) and Rosenberg and Marathe (1979) for early examples. Error terms can be added to these conditional beta models when their residuals are orthogonal to the instruments z,_, e I»_j. Nonlinear models can be thought of as specifications of the relevant components of $p,-\ by the econometrician.

78 B. N. Lehmann A tedious calculation using partitioned matrix inversion verifies that the variance of the efficient GMM estimator of 6Q falls after the imposition of the conditional beta model, both because of the reduction in dimensionality in the transition from the derivatives of Cov[g(x,,0g),i?,|/,_i] to the derivatives of Va.rlQfa,^)]!^] in the first line of (4.9) and because of the additional moment conditions arising from the conditional beta model in the second line of (4.9). Hence, the problem of constructing estimates of the covariances between returns and the derivatives of the pricing kernel in (3.9) is replaced by the somewhat simpler problem of estimating the conditional variance of the pricing kernel along with its derivatives in these models. Both formulations require estimation of the conditional mean of Q(xj, 9q) and its derivatives through k0t, a requirement eliminated by observation of a riskless asset. While stochastic process assumptions are required to compute E[£>(x„ 0g)|/,_i], Var[£>(x„ 0g)|/,_i], and their derivatives, a conditional beta model and, when possible, measurement of the riskless rate simplifies efficient GMM estimation considerably.25 Note also that the optimal conditional weighting matrix ^^\W^_X has a portfolio interpretation similar to that in the last section. The portfolio interpretation in this case has a long standing tradition in financial econometrics. Ignoring scale factors, the portfolio weights associated with the estimation of the premium kgt are proportional to j6(S-ii^)- Similarly, the portfolio weights associated with the estimation of the Xq( are proportional to i — fi{zt_x,d_^) after scaling Var[g(xn0g)|/?_i] to equal one, as is appropriate when the econometrician observes the return of portfolio perfectly correlated with Qt but not a model for Qt itself (a case discussed briefly below). Such procedures have been used assuming returns are independently and identically distributed with constant betas beginning with Douglas (1968) and Lintner (1965) and maturing into a widespread tool in Black, Jensen, and Scholes (1972), Miller and Scholes (1972), and Fama and MacBeth (1973). Shanken (1992) provides a comprehensive and rigorous description of the current state of the art for the independently and identically distributed case. Models for the determinants of conditional betas have another use-they make it possible to identify aspects of the no-arbitrage model without an explicit model for the pricing kernel Qt. Given only /^(z^j,^), expected returns are given by: U]RMt-i] = Iht + Nzt-uOpWp, - h,} ■ (4.10) The potentially estimable conditional risk premiums lot and Xpt are the expected returns of conditionally mean-variance efficient portfolios since the expected returns on the assets in this menu are linear in their conditional betas.26 However, 25 The presence of Var[g(x,,gg)|/,_i] and its derivatives in (4.8) arises because (4.6) is a model for conditional betas, not for conditional covariances. In most applications, conditional beta models are more appropriate. 26 The CAPM is the best known model which takes this form, in which portfolio p is the market portfolio of all risky assets. The market portfolio return is maximally correlated with aggregate wealth (which is proportional to Qt in this model) in the CAPM in general; it is perfectly correlated if markets are complete.

Semiparametric methods for asset pricing models 79 these parameters are also the expected returns of any assets of portfolios that cost one unit of account and have conditional betas of one and zero, respectively. Portfolios constructed to have given betas are often called mimicking or basis portfolios in the literature.27 Mimicking portfolios arise in the portfolio interpretation of efficient conditional GMM estimation in this case and delimit what can be learned from conditional beta models alone. Given only the beta model (4.6): R, =iXo, + /?(£_!, dp)[Xp! - Xq,] + eppt ; V/lpt-l = V[tpptgppt'\It-x] (4.11) $Ppt-i = (Xp! - ht) t^—^ + -00-[L-Hz-uOfi)] + - dX pt M)t dB 08 Note that if we treat the risk premiums as unknown parameters in each period, the limiting parameter space is infinite dimensional. Ignoring this obvious problem, the optimal conditional moment restrictions are given by: E (Xpt ~ Xq,) d8R Tppt-\ x Et- iht - jKzt-i, dp) {Xpt - X0t) and the solution for each Xq, and Xp, — X0t is: Xot Xpt — Xq, X (igfe-l,^))'^-!*, (4.12) (4.13) 27 See Grinblatt and Titman (1987), Huberman, Kandel, and Stambaugh (1987), Lehmann (1987), Lehmann and Modest (1988), Lehmann (1990), and Shanken (1992) for related discussions. In econometric terms, the portfolio weights that implicitly arise in cross-sectional regression models with arbitrary matrices r solve the programming problems: minw'rpt-l^rpt-l subject to w!rpt_{i = \ and w^,/?^!,^) = 1 min yJm_xrwm_, subject to }^r0t_il = 1 and w^or-l^fe-l^fl) = 0 Ordinary least squares corresponds to r = I,T = Diag{Var[jR,|/,-i]} to weighted least squares, and r = Var^l/,-!] to generalized least squares.

80 B. N. Lehmann which are, in fact, the actual, not the expected, returns of portfolios that cost one and zero units of account and that have conditional betas of zero and one, respectively. Hence, there are three related limitations on what can be measured from risky asset returns given only a conditional beta model. First, the conditional beta model is identified only up to scale: P{zt_l,9p)(kpt — Ao») is observationally equivalent to <p/?(z(_j, 0^) (Ap, - h)t)/(p for any <p ^ 0. Second, the portfolio returns Xo< and hpt — hot have expected returns /lo< and kpt — Ao(, respectively, but the expected returns can only be recovered with an explicit time series model for E[Rt\It^\].2& Third, the pricing kernel Qt cannot be recovered from this model - only Rpt, the return of the portfolio of these N risk assets that is maximally correlated with Qt, can be identified from Xpt in the limit (i.e., as j8(z(_1( dp)—>j6(z(_1;^)). 4.2. Multifactor models Another parametric assumption that facilitates estimation and inference is a linear model for Qt. The typical linear models found in the literature simultaneously strengthen and weaken the assumptions concerning the pricing kernel. Clearly, linearity is more restrictive than possible nonlinear functional forms. However, linear models generally involve weakening the assumption that Qt is known up to an unknown parameter vector since the weights are usually treated as unobservable variables. Some equilibrium models restrict Qt to be a linear combination (that is, a portfolio) of the returns of portfolios. In intertemporal asset pricing theory, these portfolios let investors hedge against fluctuations in investment opportunities (cf., Merton (1973) and Breeden (1979)). Related results are available from portfolio separation theory, in which such portfolios are optimal for particular preferences (cf., Cass (1970)) or for particular distributions of returns (cf., Ross (1978a)). Similarly, the Arbitrage Pricing Theory (APT) of Ross (1976) and Ross (1977) combines the no-arbitrage assumption with distributional assumptions describing diversification prospects to produce an approximate linear model for Qt.29 In these circumstances, the pricing kernel Qt (typically without any adjustment for inflation) follows the linear model: Qt = dxt-\h + e4(_i£m( ; Qt > 0 ; mxt_x, mmt^ e 7(_i (4.14) where xt is a vector of variables that are not asset returns while R^, is a vector of portfolio returns. These models typically place no restrictions on the (unobserved) weights coxt_l and comt_l save for the requirement that they are based on information available at time t— 1 and that they result in strictly positive values of 28 Moments of X0t and Xpt — A0( can be estimated. For example, the projection of Xq, and Xp, — Xq, on z,_[ £ It-\ recovers the unconditional projection of Xq, and Xpt — X& on zJ_l £ 7,_i in large samples. 29 The APT as developed by Ross (1976) and Ross (1977) places insufficient restrictions on asset prices to identify Q,. In order to obtain the formulation (4.14), sufficient restrictions must be placed on preferences and investment opportunities so that diversifiable risk commands no risk premium.

Semiparametric methods for asset pricing models 81 £>,.30 Put differently, a model takes the more general form Q{xt,&) when (oxt—\ and (Bint-i are parameterized as (ox(z4_l, 0) and co^fe-i, 0). Accordingly, consider the linear conditional multifactor model: R, = at+Bx (zt_,, 6Bx)xt + Bm (z,_,, 0Bm )£„, + &, ■ (4-15) The imposition of the moment conditions (2.6) yields the associated restriction on the intercept vector: a, = [l -#m(?(_[, dBm)i]Xot -Bx(zt_{,Os^ka r '1 (4-16) 4( = /W [Efjyc, |/(-i]«)*(_i + E[x(£m(J/,_i]fl)m(_1J so that, in principle, coxt_{ and fl^^! can be inverted from the expression for Xxt. Finally, insertion of this expected return relation into the multifactor model yields: R, = d0( +Bx(zt^,iBx)[xt - lxt]+Bm{zt_u6Bm)[Rmt ~ ik>t\ + sBl;E[eBt\It-i}=0 . (4.17) Once again, the residual vector has conditional mean zero because expected returns are spanned by the factor loading matrix B(zi_l,6B) and a vector of ones.31 As is readily apparent, this model requires estimates of the conditional mean vector and covariance matrix of (x/2?^,)'. Note that, no restrictions are placed on E[/?m(|4_i] in (4.17). If the econometrician observes the returns ^ and the variables x, with no additional information on Qt, the absence of a model linking R^ with Qt eliminates the restrictions on Efif^l/,-!] that arise from the moment condition E^^d^i-i] = I- The same observation would hold if the returns of portfolio p were observed in (4.10)—(4.13). Put differently, a linear combination of the returns R,„t or of the return Rpt provides a scale-free proxy for Qt. In the absence of data on or of a model for Qt, asset pricing relations explain relative asset prices and expected returns, not the levels of asset prices and risk premiums. As with the imposition of conditional beta models, linear factor models simplify estimation and inference by weakening the information requirements. Linearity of the pricing kernel confers three modest advantages compared with the conditional beta models of the previous section: (1) the derivatives of the conditional mean and variance of Q{xt,9_g) are no longer required; (2) the conditional covariance matrices involving xt and R,„t contains no unknown model parameters (in contrast to Var[£>(x(,0g)|/(_i]); and (3) the linear model permits flirt^! and gymt_x to remain unobservable. The third point comes at a cost - the 30 Imposing the positivity constraint in linear models is sometimes quite difficult. 31 Since the multifactor models described above are cast in terms of Q,, [i — 5m(z,_1,6Sm)ji] will not be identically zero. In multifactor models with no explicit link between Q, and the underlying common factors, this remains a possibility. See Huberman, Kandel, and Stambaugh (1987), Huberman and Kandel (1987), and Lehmann and Modest (1988) for a discussion of this issue.

82 B. N. Lehmann model places no restrictions on the levels of asset prices and risk premiums. Once again, additional simplifications arise if there is an observed riskless rate. Multifactor models also take the form of prespecified beta models. The analysis of these models parallels that of the single beta case in (4.10)—(4.13). A conditional factor loading model B(zt_l, 6B) can only be identified up to scale and, at best, the econometrician can estimate the returns of the minimum variance basis portfolios, each with a loading of one on one factor and loadings of zero on the others. In terms of the single beta representation, a portfolio of these optimal basis portfolios with time-varying weights has returns that are maximally correlated with Qt or, equivalently, a linear combination of Bf&^jdg) is proportional to the conditional betas B in this multifactor prespecified beta model. 4.3. Diversifiable residual models and estimation in large cross-sections One other simplifying assumption is often made in these models: that the residual vectors are only weakly correlated cross-sectionally. This restriction is the principal assumption of the APT and it implies that residual risk can be eliminated in large, well-diversified portfolios. It is convenient econometrically for the same reason; the impact of residuals on estimation can be eliminated through diversification in large cross-sections. In terms of efficient estimation of beta pricing models, this assumption facilitates estimation of "P^_i, the remaining component of the efficient GMM weighting matrix. To be sure, efficient estimation could proceed by postulating a model for Wpt-i in (4.7) of the form ^z^). However, it is unlikely that an econometrician, particularly one using semiparametric methods, would possess reliable prior information of this form save for the factor models of Section 4.2. Accordingly, consider the addition of a linear factor model to the conditional beta models. Once again, consider the projection:32 Et = «, +^i,0/i)fife,0e) +Bx(zt-i,SBx)xt + Bm(zt_ueBm)Rml + £liBl (4.19) and the application of the pricing relation to the intercept vector: «u = \l-Bm(zt_u6Bm)L]ht - /%,_i,9p)XQt -Bxiz,^,0^)4, (4.20) which, after rearranging terms and insertion into (4.19), yields: Rt = do, + £&_i, 0p) [fife, fig) - ht\ + **&-i»&*) fe - 4J + Bm(z,_i, 6_Bm) [R^i - iXot] + £pBt l& = h,E\Q{xt,6Q)2\It-i]; (4.21) VpBt-i = E[e^£^/|/f_i] X^ = k>tB[xiQ{xt,BjQ)\It-.\\ . 32 Of course, one element of (x/R^/) must be dropped if (x/Rm,') and g(x,,0o) are linearly dependent.

Semiparametric methods for asset pricing models 83 When all of these components are present in the model, assume that a vector of ones does not lie in the column span of either Bx(z4_l,6Bx) or 5m(zr_1, 0gm). This formulation nests all of the models in the preceding subsections. When Bx(zt-i,@.Bx) and Bm(zt_u6Bm) are identically zero, equations (4.21) yield the conditional beta model (4.7) or, in the absence of the pricing kernel model 2(*f)0g)> tne prespecified beta model (4.11). Similarly, when ^(zt_l,6p) is identically zero, equations (4.21) yield the observable linear factor model (4.17) or, without observations on xt and B^,, the multifactor analogue of the prespecified beta model. When all components are included simultaneously, the conditional factor model places structure on the conditional covariance matrix of the residuals f^_tin the conditional beta model (4.7). This factor model represents more than mere elegant variation - it makes it plausible to place a a priori restrictions on the conditional variance matrix "Ppst-i ■ In terms of the conditional beta model (4.7), the residual covariance matrix "Ppt-i has an observable factor structure in this model given by:33 V/fc-i = (5,fe-i AJ^mfe-i Aj)var (!' )l4-i x /**&-!. &*)' ' \Bm(zt_i,6Bm)' = BpBt-\ VpBt-\BpBt-\ + *FpBt-l and its inverse is given by: V„ (4.22) f; -1 - r m-\ ~ vpBi-iBm-i{vPBt-i +BpBt-\xppBt-\Bt-\) H/-1 TfiBl- xBR (4.23) Hence, the factor model provides the final input necessary for the efficient estimation of beta pricing models. Chamberlain and Rothschild (1983) provide a convenient characterization of diversifiability restrictions for residuals like £pBl. They assume that the largest eigenvalue of the conditional residual covariance matrix *PpBt-i remains bounded as the number of assets grows without bound. This condition is sufficient for a weak law of large numbers to apply because the residual variance of a portfolio with weights of order 1/iV (i.e., one for which vt/_,w,_| —> 0 as N —> 00 V wt_ ^maxC?'fiBt-i) argument. i^-i^ -1.2^-1 e It-\) converges to zero since <t^(. -> 0 as N —> 00 where £max(») is the largest eigenvalue of its -iwt_i < w, t-i^t-i 33 Unobservable factor models can be imposed as well as long as the associated conditional betas are constant. The methods developed for the iid case in Chamberlain and Rothschild (1983), Connor and Korajczyk (1988) and Lehmann and Modest (1988) apply since the residuals in this application are serially uncorrelated. Lehmann (1992) discusses the serially correlated case.

84 B. N. Lehmann Unfortunately, there is no obvious way to estimate YpBt_i subject to this boundedness condition.34 Hence, the imposition of diversification constraints in practice generally involves the stronger assumption of a strict factor structure: that is, that YpBt~i is diagonal. Of course, there is no guarantee that a diagonal specification leads to an estimator of higher efficiency than an identity matrix (that is, ordinary least squares) when generalized least squares is appropriate, as would be the case if fpst-i is unrestricted save for the diversifiability condition lim ^maxCPfiBt-i) < oo. While weighted least squares may in fact be superior in most applications, conservative inference can be conducted assuming that this specification is false. In any event, the econometrician can allow for a generous amount of dependence in the idiosyncratic variances in the diagonal specification. What is the large cross-section behavior of GMM estimators assuming that a weak law applies to the residuals? To facilitate large N analysis, append the subscript N to the residuals e^BNt and to the associated parameter vectors and matrices PN{zt_l,dpN),BxN(zt_ueBxN),BmN(zt_ue£mN), and *PpBN,-\ and take all limits as N grows without bound by adding elements to vectors and rows to matrices as securities are added to the asset menu. An arbitrary conditional GMM estimator can be calculated from: 1 T -5«iv(a-i,lft»iv)[&tf-I^] • (4-24) where Apsm-i is a sequence of p x Nop{l) matrices chosen by the econometrician having full row rank for which {minG^M-i^BM-i) -> oo as N -> oo where ^min(») is the smallest eigenvalue of its argument. This latter condition ensures that the weights are diversified across securities and not concentrated on only a few assets. Examination of the estimating equations (4.24) reveals the benefits of large cross-sections when residuals are diversifiable. The sample and population residuals are related by: IfiBNt =ZpBNt+l{^Ot ~ kit) + {P{Zt-\,ip)[Q{x„6e) ~ ^Qt] + [BxN(Zt-\, Q.Bxn) - 5xivfe-l > fiflxivfe + [BxN(Zt-\) Q.BxN)kct ~ BxN{Zt-\ i Q-Bxn)^] + [BmN (?(-1, iBmN) ~ BmN(z,_u 0BmN)]Rmt + {BmN{^\iS_BmN)ikjt -BmN(zt^u9BmN)i^it} (4-25) the first component of which is the population residual vector e^BNt and the remaining components of which represent the difference between the population and 34 Recently, Ledoit (1994) has proposed estimating covariance matrices using shrinkage estimators of the eigenvalues, an approach that might work here.

Semiparametric methods for asset pricing models 85 fitted part of the model. Clearly, e^BN1 can be eliminated through diversification and, hence, the application of ApBm-\ to e^BNt will do so since it places implicit weights of order l/Non each asset as the number of assets grows without bound. However, the benefits of diversification have a limit because of the difference between the population and fitted part of the model. For example, the sampling errors in Q(x„ 6Q), XQt, Xxl,BxN(zt_u 6BxN) and B^k,^,IW) generally cannot be diversified away in a single cross-section. To be sure, some components of EpBNl are amenable to diversification in some models. For example, if P{zt_l, Op) is identically zero (i.e., if the pricing kernel Qt is given by coxt_{xt + comt_lRmt) and, if the models for both BxN{zt_x,&BxN) and iJ^z,..!, 0gmJV)^ are linear, the sampling errorsB^^^d^) ~ BxN(zt_uiBxN) and^fe.!,^,^) —JffmAr(^_i,0smAf) can, in principle, be eliminated through diversification. In this case, the only risk premium that can be consistently estimated from a single cross-section is X0t since the difference Xxt — Xxt can only be eliminated in large time series samples.35 4.4. Feasible (nearly efficient) conditional GMM estimation of beta pricing models With these preliminaries in mind, we now consider efficient conditional GMM estimation of the composite conditional beta model (4.21). In this model, the optimal choice of A\Bt_x is $~jBt_\ where these matrices are given by: */i»-i = -~{i--^r[Q(xt,eQ)\it-l]^l,eli)~-Bx(zi_ueBx) xCovfe, £>(*,, 0e)|/(_!] -fl,fe-i,fl»,)i}'+mml - .w-ii'^k^' + Xot{Var[Q(xJAQ)\It-i}mitdleip) dN3j[Q{xt,dQ)\It-X\ 0Covk,e(xtI0fi)|/,_1]' + m K^uk) + m **fe-i, 0*)' + Covfe, gk,fle)!/,-i]'d*'(^,gfa)'} (4.26) 35 This point has resulted in much confusion in the beta pricing literature. The literature abounds with inferences drawn from cross-sectional regressions of returns on the betas of individual assets computed with respect to particular portfolios. If the betas in these prespecified beta models are computed with respect to an efficient portfolio, the best one can do in a single cross-section (with a priori knowledge of the population betas and return covariance matrix) is to recover the returns of the efficient portfolio. Information on the risk premium of portfolios like p in Section 4.1 can only be recovered over time while the return of portfolio 0 converges to the riskless rate in a single cross- section if the residuals of the prespecified beta model are diversifiable given the population value of <bppt-\. Shanken (1992) shows that this is the case using the sample analogue of <f^(_i in a model with constant conditional betas and independently and identically distributed idiosyncratic disturbances given appropriate corrections for biases induced by sampling error. See also Lehmann (1988) and Lehmann (1990).

86 B. N. Lehmann xBpBt-i'yjjBt-i ■ In the original formulation (3.6)-(3.9), efficient estimation required $t-\, the derivatives of the conditional expectation of Q(xt, BqJR,, and "P,.;, the conditional covariance matrix of RtQ(xt,(L^) — i. Equations (4.21) reflect the kinds of assumptions that the econometrician can make to facilitate efficient estimation. The conditional beta model eases the evaluation of the beta pricing version of $t^\ and the factor model assumption places structure on the associated analogue of Vt-i. Consistent estimation of ^am_\ requires the evaluation of a number of conditional moments - Aot,E[Rmt\lt-\],Va.r[Q(xt,9o)\It-\], and Co v[x,, £>(*,, 0g)| It-\] and their derivatives, when necessary, along with BpBt-i, Vpst-i, and Vpst-i- The most common strategy by far is simply to assume that the relevant conditional moments are time invariant functions of available informations. This strategy was taken throughout this section in the models for conditional betas and conditional factor loadings. For the evaluation of j$uBt_\, this approach requires the econometrician to posit relations of the form: ^o(lt-i,i) Amfe-1,£) VfiB&-uS) which permit the consistent estimation of A°.Bt_{ using initial consistent estimates of 0. It is far from obvious that a financial econometrician can be expected to have reliable to prior information in this form. In most asset pricing applications, the possession of such information about the conditional second moments Ogfe-i; S) and ffgxfe-ijfi) is somewhat more plausible than the existence of the corresponding conditional first moment specifications ^o(z(_i,0) and Am(^t-uS) in its conditional mean from. However, observation of the riskless rate eliminates the need to model Ao(z,_i,£) and models for Cov^,, £>(*,, 0g)|z,_i] seem no more demanding than those for other conditional second moments. The conditional covariance matrix Vpsfe-i, 9) is somewhat less problematic as well, although the specification of multivariate conditional covariance models is in its infancy. The discussion in Section 4.3 left some ambiguity concerning the availability of plausible models of this sort for WpBt-\ due to the inability to impose the general bounded eigenvalue condition. As noted there, the specification of idiosyncratic = E[Q(xt,eQ)\zi_l]~l = E&^-i] = -4>fe-i>£)(l- Cov&a.efe,^.,]) = Var[efe,0fi)|2/_1] = Cov|xr)e(x,)0G) !*,_,] (4.27) Var Emt lif-1 Var[£„B(|5 ^-u

Semiparametric methods for asset pricing models 87 variances is comparatively straightforward if VpBt-i is diagonal. Finally, conservative inference is always available through the use of the asymptotic covariance matrix in (2.13). Equations (4.27) can either represent parametric models for these conditional moments or functions that are estimable by semiparametric or nonparametric methods. Robinson (1987), Newwy (1990), Robinson (1991), and Newey (1993) discuss bootstrap, nearest neighbor, and series estimation of functions such as those appearing in (4.27). All of these methods suffer from the curse of dimensionality so their invocation must be justified on a case by case basis. Neural network approximations promising somewhat less impairment from this source might be employed as well.36 5. Concluding remarks This paper shows that efficient semiparametric estimation of asset pricing relations is straightforward in principle if not in practice. Efficiency follows from the maximum correlation property of the optimal GMM estimators described in the second section, a property that has analogues in the optimal hedge portfolios that arise in asset pricing theory. The semiparametric nature of asset pricing relations naturally leads to a search for efficiency gains in the context of beta pricing models. The structure of these models suggests that efficient estimation is made feasible by the imposition of conditional beta models and/or multifactor models with residuals that satisfy a law of large numbers in the cross-section, models that exist in various incarnations in the beta pricing literature. Hence, strategies that have proved useful in the iid environment have natural, albeit nonlinear and perhaps nonparametric, analogues in this more general setting, the details of which are worked out in the paper. While it has offered no evidence on the magnitude of possible efficiency gains, the paper has surely pointed to more straightforward interpretation and implementation than has been heretofore attainable. What remains is to extend there results in two dimensions. The analysis sidestepped the development of the most general approximations of the conditional moments that comprise the optimal conditional weighting matrices, the subtleties of which arise from the martingale difference nature of the residuals in no-arbitrage asset pricing models as opposed to the independence assumption frequently made in other applications. The second dimension involves examination of less parametric semiparametric estimators. In the asset pricing arena, this amounts to semiparametric estimation of pricing kernels and state price densities, a more ambitious and perhaps more interesting task. 36 Barron (1993) and Hornik et al. (1993) discuss the superior approximation properties of neural networks in the multidimensional case.

88 B. N. Lehmann References Bansal, R. and B. N. Lehmann (1995). Bond returns and the prices of state contingent claims. Graduate School of International Relations and Pacific Studies, University of California at San Diego. Bansal, R. and S. Viswanathan (1993). No arbitrage and arbitrage pricing: A new approach. J. Finance 48, pp. 1231-1262. Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory 39, pp. 930-945. Black, F., M. C. Jensen and M. Scholes (1972). The capital assest pricing model: Some empirical tests. In: M. C. Jensen, ed., Studies in the Theory of Capital Markets, New York: Praeger. Breeden, D. T. (1979). An intertemporal asset pricing model with stochastic consumption and investment opportunities. J. Financ. Econom. 7, pp. 265-299. Cass, D. and J. E. Stiglitz (1970). The structure of investor preferences and asset returns and separability in portfolio allocation: A contribution to the pure theory of mutual funds. J. Econom. Theory 2, pp. 122-160. Chamberlain, G. (1987). Asymptotic efficiency in estimation with conditional moment conditions. J. Econometrics 34, pp. 305-334. Chamberlain, G. (1992). Efficiency bounds for semiparametric regression. Econometrica 60, pp. 567- 596. Chamberlain, G. and M. Rothschild (1983). Arbitrage and mean-variance analysis on large asset markets. Econometrica 51, pp. 1281-1304. Cochrane, J. (1991). Production-based asset pricing and the link between stock returns and economic fluctuations. J. Finance 146, pp. 207-234. Connor, G. and R. A. Korajczyk (1988). Risk and return in an equilibrium APT: Application of a new test methodology. J. Financ. Econom. 21, pp. 255-289. Constantinides, G. and W. Ferson (1991). Habit persistence and durability in aggregate consumption: Empirical tests. J. Financ. Econom. 29, pp. 199-240. Douglas, G. W. (1968). Risk in the Equity Markets: An Empirical Appraisal of Market Efficiency. Ann Arbor, Michigan: University Microfilms, Inc. Epstein, L. G. and S. E. Zin (1991a). Substitution, risk aversion, and the temporal behavior of consumption and asset returns: A theoretical framework. Econometrica 57, pp. 937-969. Epstein, L. G. and S. E. Zin (1991b). Substitution, risk averison, and the temporal behavior of consumption and asset returns: An empirical analysis. J. Politic. Econom. 96, pp. 263-286. Fama, E. F. and J. D. MacBeth (1973). Risk, return, and equilibrium: Empirical tests. J. Politic. Econom. 81, pp. 607-636. Grinblatt, M. and S. Titman (1987). The relation between mean-variance efficiency and arbitrage pricing. J. Business 60, pp. 97-112. Hall, A. (1993). Some aspects of generalized method of moments estimation. In: G. S. Maddala, C. R. Rao and H. D. Vinod, ed., Handbook of Statistics: Econometrics. Amsterdam, The Netherlands: Elsevier Science Publishers, pp. 393-418. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50, pp. 1029-1054. Hansen, L. P. (1985). A method for calculating bounds on the asymptotic covariance matrices of generalized method of moments estimators. J. Econometrics 30, pp. 203-238. Hansen, L. P., J. Heaton and E. Luttmer (1995). Econometric evaluation of assest pricing models. Rev. Financ. Stud. 8 pp. 237-274. Hansen, L. P., J. Heaton and M. Ogaki (1988). Efficiency bounds implied by multi-period conditional moment conditions. J. Amer. Stat. Assoc. 83, pp. 863-871. Hansen, L. P. and R. J. Hodrick (1980). Forward exchange rates as optimal predictors of future spot rates: An econometric analysis. J. Politic. Econom. 88, pp. 829-853. Hansen, L. P. and R. Jagannathan (1991). Implications of security market data for models of dynamic Economies. J. Politic. Econom. 99, pp. 225-262.

Semiparametric methods for asset pricing models 89 Hansen, L. P. and R. Jagannathan (1994). Assessing specification errors in stochastic discount factor models. Research Department, Federal Reserve Bank of Minneapolis, Staff" Report 167. Hansen, L. P. and S. F. Richard (1987). The role of conditioning information in deducing testable restrictions implied by dynamic asset pricing models. Econometrica 55, pp. 587-613. Hansen, L. P. and K. J. Singleton (1982). Generalized instrumental variables estimation of nonlinear rational expectations models. Econometrica 50, pp. 1269-1286. Harrison, M. J. and D. Kreps (1979). Martingales and arbitrage in multiperiod securities markets. J. Econom. Theory 20, pp. 381-^08. He, H. and D. Modest (1995). Market frictions and consumption-based asset pricing. J. Politic. Econom. 103, pp. 94-117. Hornik, K., M. Stinchcombe, H. White and P. Auer (1993). Degree of approximation results for feedforward networks approximating unknown mappings and their derivatives. Neural Computation 6, pp. 1262-1275. Huberman, G. and S. Kandel (1987). Mean-variance spanning. J. Finance 42, pp. 873-888. Huberman, G., S. Kandel and R. F. Stambaugh (1987). Mimicking portfolios and exact asset pricing. J. Finance 42, pp. 1-9. Ledoit, O. (1994). Portfolio selection: Improved covariance matrix estimation. Sloan School of Management, Massachusetts Institute of Technology, Lehmann, B. N. (1987). Orthogonal portfolios and alternative mean-variance efficiency tests. J. Finance 42, pp. 601-619. Lehmann, B. N. (1988). Mean-variance efficiency tests in large cross-sections. Graduate School of International Relations and Pacific Studies, University of California at San Diego. Lehmann, B. N. (1990). Residual risk revisited. J. Econometrics 45, pp. 71-97. Lehmann, B. N. (1992) Notes of dynamic factor pricing models. Rev. Quant. Finance Account. 2, pp. 69-87. Lehmann, B. N. and David M. Modest (1988), The empirical foundations of the arbitrage pricing theory. J. Financ. Econom. 21, pp. 213-254. Lintner, J. (1965). Security prices and risk: The theory and a comparative analysis of A.T &T. and leading industrials. Graduate School of Business, Harvard University. Luttmer, E. (1993). Asset pricing in economies with frictions. Department of Finance, Northwestern University. Merton, R. C. (1972). An analytical derivation of the efficient portfolio frontier. J. Financ. Quant. Anal. 7, pp. 1851-1872. Merton, R. C. (1973). An intertemporal capital asset pricing model. Econometrica 41, pp. 867-887. Miller, M. H. and M. Scholes (1972). Rates of return in relation to risk: A reexamination of some recent findings. In: M.C. Jensen, ed., Studies in the Theory of Capital Markets, New York: Praeger, pp. 79-121. Newey, W. K. (1990). Efficient instrumental variables estimation of nonlinear models. Econometrica 58, pp. 809-837. Newey, W. K. (1993). Efficient estimation of models with conditional moment restrictions. In: G. S. Maddala, C. R. Rao and H. D. Vinod, eds., Handbook of Statistics: Econometrics. Amsterdam, The Netherlands: Elsevier Science Publishers. Newey, W. K. and K. D. West (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, pp. 703-708. Ogaki, M. (1993). Generalized method of moments: Econometric applications. In: G. S. Maddala, C. R. Rao and H. D. Vinod, eds., Handbook of Statistics: Econometrics, Amsterdam, The Netherlands: Elsevier Science Publishers, pp. 455-488. Robinson, P. M. (1987). Asymptotically efficient estimation in the presence of heteroskedasticity of unknown form. Econometrica 59, pp. 875-891. Robinson, P. M. (1991). Best nonlinear three-stage least squares estimation of certain econometric models. Econometrica 59, pp. 755-786. Roll, R. W. (1977). A critique of the asset Pricing Theory's Tests - Part I: On past and potential testability of the theory. J. Financ. Econom. 4, pp. 129-176.

90 B. N. Lehmann Rosenberg, B. (1974). Extra-market components of covariance in security returns. /. Financ. Quant. Anal. 9, pp. 262-274. Rosenberg, B. and V. Marathe (1979). Tests of capital asset pricing hypotheses. Research in Finance: A Research Annual 1, pp. 115-223. Ross, S. A. (1976). The arbitrage theory of capital assest pricing. /. Economic Theory 13, pp. 341-360. Ross, S. A. (1977). Risk, return, and arbitrage. In: I. Friend and J.L. Bicksler, eds., Risk and Return in Finance. Cambridge, Mass.: Ballinger. Ross, S. A. (1978a). Mutual fund separation and financial theory - the separating distributions. /. Econom. Theory 17, pp. 254-286. Ross, S. A. (1978b). A simple approach to the valuation of risky streams. /. Business 51, pp. 1^0 Rubinstein, M. (1976). The valuation of uncertain income streams and the pricing of options. Bell J. Econom. Mgmt. Sci. 7, pp. 407-^125. Shanken, J. (1992). On the estimation of beta pricing models. Rev. Financ. Stud. 5, pp. 1-33. Summers, L. H. (1985). On economics and finance. /. Finance 40, pp. 633-636. Summers, L. H. (1986). Does the stock market rationally reflect fundamental values? J. Finance 41, pp. 591-600. Tauchen, G. (1986). Statistical properties of generalized method of moments estimators of structural parameters obtained from financial market data. /. Business Econom. Statist. 4, pp. 397-^4-25.

G.S. Maddala and C.R. Rao, eds., Handbook of Statistics, Vol. 14 © 1996 Elsevier Science B.V. All rights reserved. 4 Modeling the term structure* A. R. Pagan, A. D. Hall and V. Martin 1. Introduction Models of the term structure of interest rates have assumed increasing importance in recent years in line with the need to value interest rate derivative assets. Economists and econometricians have long held an interest in the subject, as an understanding of the determinants of the the term structure has always been viewed as crucial to an understanding of the impact of monetary policy and its transmission mechanism. Most of the approaches taken to the question in the finance literature have revolved around the search for common factors that are thought to underlie the term structure and little has been borrowed from the economic or econometrics literature on the subject. The converse can also be said about the small amount of attention paid in econometric research to the finance literature models. The aim of the present chapter is to look at the connections between the two literatures with the aim of showing that a synthesis of the two may well provide some useful information for both camps. The paper begins with a description of a standard set of data on the term structure. This results in a set of stylized facts pertaining to the nature of the stochastic processes generating yields as well as their spreads. Such a set of facts is useful in forming an opinion of the likelihood of various approaches to term structure modelling being capable of replicating the data. Section 3 outlines the various models used in both the economics and finance literature, and assesses how well these models perform in matching the stylized facts. Section 4 presents a conclusion. * We are grateful for comments on previous versions of this paper by John Robertson, Peter Phillips and Ken Singleton. All computations were performed with a beta version of MICROFIT 4 and GAUSS 3.2 91

92 A. R. Pagan, A. D. Hall and V. Martin 2. Characteristics of term structure data 2.1. Univariate properties The data set examined involves monthly observations on 1, 3, 6 and 9 month and 10 year zero coupon bond yields over the period December 1946 to February 1991, constructed by McCulloch and Kwon (1993); this is an updated version of McCulloch (1989). Table 1 records the autocorrelation characteristics of the series, with pj being the /h autocorrelation coefficient, DF the Dickey-Fuller test, ADF(12) the Augmented Dickey-Fuller test with 12 lags, rt(x) the yield on zero-coupon bonds with maturity oft months and sp,(t) is the spread t,(t) — r,(l). It shows that there is strong evidence of a unit root in all interest rate series, Because this would imply the possibility of negative interest rates, finance modellers have generally maintained that either there is no unit root and the series feature mean reversion or, in continuous time, that an appropriate model is given by the stochastic differential equation dr, = adt + artdr\t , where, throughout the paper, dv\t is a Wiener process. Because of the "levels effect" of r, upon the volatility of interest rate changes, we can think of this as an equation in dlogr, with constant volatility, and the logarithmic transformation ensures that r, remains positive.1 In any case, the important point to be made here is that interest rates seem to behave as integrated processes, certainly over the samples of data we possess. It may be that the autoregressive root is close to unity, rather than identical to it, but such "near integrated" processes are best handled with the integrated process technology rather than that for stationary processes. Table 1 Autocorrelation features, yields, full sample r(l) r(3) r(6) r(9) #•(120) *>(3) sp{S) sp{9) sp{\20) DF -2.41 -2.15 -2.12 -2.12 -1.41 -15.32 -11.67 -10.38 -5.60 ADF(12) -2.02 -1.89 -1.91 -1.89 -1.53 -3.37 -^.21 -^.38 -^.15 h .98 .98 .99 .99 .99 .38 .59 .66 .89 h .33 .51 .55 .80 h .21 .26 .27 .55 Pl2 .38 .30 .26 .32 Pi(Ar) .02 .11 .15 .15 .07 The 5% critical value for the DF and ADF tests is -2.87. 1 It is known that, if r, is replaced by r], the restriction y > .5 ensures a positive interest rate while, if y = .5, a < 2a is needed.

Modeling the term structure 93 Instead of the yields one might examine the time series characteristics of the forward rates. The forward rate F*(x) contracted at time t for a x period bond to be bought at t + k is F*(x) = [\] [(x + k)r,(x + k) - krt{k)\. For a forward contract one period ahead this becomes F}(x) = Ft(x) = ±[(t + l)n(x + 1) -rt{\)}. For reasons that become apparent later it is also of interest to examine the properties of the forward "spreads" Fp,(x,x- 1) = Ft(x - 1) -Ft_i(x). These results are to be found in Table 2. Generally, the conclusions would be the same as for yields, except that the persistence in forward rate spreads is not as marked, particularly as the maturity gets longer. As Table 1 also shows, there is a lot of persistence in spreads between short- dated maturities; after fitting an AR(2) to spt(3) the LM test for serial correlation over 12 lags is 80.71. This persistence shows up in other transformations of the yield series, e.g. the realized excess holding yield ht+](x) = xrt(x) — (x — l)r,+i (t — 1) — r,(l), when x = 3, has serial correlation coefficients of .188 (lag 1), .144 (lag 8), and .111 (lag 10). Such processes are persistent, but not integrated, as the ADF(12) for ht+i(3) clearly shows with its value of -5.27. Papers have appeared concluding that the excess holding yield is a non-stationary process-Evans and Lewis (1994) and Hejazi (1994). That conclusion was reached by the authors performing a Phillips-Hansen (1990) regression of ht+\(x) on Ft-\(x). Applying the same test to our data, with McCulloch's forward rate series, produces an estimated coefficient on Ft-\(x) of .11 with a t ratio of 10, quite consistent with both Evans and Lewis' and Hejazi's results. However, it does not seem reasonable to interpret this as evidence of non-stationarity. Certainly the series is persistent, and an 1(1) series like the forward rate exhibits extreme persistence, so that regressing one upon the other can be expected to lead to some "correlation", but to conclude, therefore, that the excess holding yield is non-stationary is quite incorrect. A fractionally integrated process that is stationary would also show such a relationship with an 1(1) process. Indeed, the autocorrelation functions of the spreads and excess yields are reminiscent of those for the squares of yield changes, which have been modelled by fractionally integrated processes - see Table 2 Autocorrelation features, forward rates, full sample F(l) F(3) F{6) F{9) Fp(0,1) Fp(2,3) Fp(5,6) Fp(i,9) DF -2.28 -2.18 -2.39 -2.14 -17.08 -19.52 -20.61 -19.73 ADF(12) -1.92 -1.97 -1.91 -1.77 -4.07 -5.17 -5.77 -^.69 Pi .98 .98 .98 .98 .29 .16 .11 .15 h .18 .06 -.12 -.00 P6 .11 .01 -.05 .08 Pl2 .18 -.02 -.03 .02 Pi(Ar) .07 .04 .09 .07 The 5% critical value for the DF and ADF tests is -2.87.

94 A. R. Pagan, A. D. Hall and V. Martin Baillie et al. (1993). Nevertheless, the strong persistence in spreads is a characteristic which is a substantial challenge to term structure models.2 As is well known, there was a switch in monetary policy in the US in October 1979 away from targeting interest rates, and this fact generally means that any analyses have to be re-done to ensure that the results do not simply reflect outcomes from 1979 to 1982. Table 3 therefore presents the same statistics as in Table 1 but using only pre October 1979 data. It is apparent that the conclusions drawn above are quite robust. It is also well known that there is a substantial dependence of the conditional volatility of Ar((t)upon the past, but the exact nature of this dependence has been subject to much less analysis. As will become clear, the most important issue is whether the conditional variance, a\v exhibits a levels effect and, if so, exactly what relationship is likely to hold. Here we examine the evidence for a "levels effect" in volatility, i.e. a\t depends on rt(i), concentrating upon the five yields mentioned earlier. Evidence of the effect can be marshalled in a number of ways. By far the simplest approach is to plot (Ar((t) - (i) against r(_i(T), (and this is done in Fig. 1 for r((l)).3 The evidence of a levels effect looks very strong. A more structured approach is to estimate the parameters of a diffusion process for yields of the form dr, = (o<i - $xrt)dt + rytldrjt ;i) and to examine the estimate of yl. To estimate this requires some approximation scheme. Chan et al. (1992) consider a discretization based on the Euler scheme with h = 1 (ht being the discretized steps) producing Table 3 Autocorrelation features, pre October 1979 r(l) r(3) r(6) r(9) r(120) *>(3) 4P(6) *>&) sp(120) DF -.76 -.64 -.52 -.55 -.14 -11.99 -8.80 -8.20 -4.05 ADF(12) -.79 -1.03 -1.00 -.89 .33 -2.64 -2.89 -3.10 -4.30 Pi .97 .97 .98 .98 .99 .46 .70 .71 .90 Pi .34 .56 .60 .84 k .22 .39 .38 .59 Pl2 .34 .38 .36 .23 P.(Ar) -.14 -.07 .08 .11 -.04 2 Throughout the paper we will take the term structure data as corresponding to actual observations. In practice this may not be so, as a complete term structure is interpolated from observations on parts of the curve. This may introduce some biases of unknown magnitude into relationships between yields. McCulloch and Kwon (1993) interpolate with spline functions. Others e.g. Gourieroux and Scaillet (1994), actually utilize some of the factor models discussed later to suggest forms for the yield curve that may be used for interpolation. 3 Marsh and Rosenfeld (1983) also did this and commented on the relation.

Modeling the term structure 95 24.1132 15.3448 - 6.5763 -2.1921 .24900 5.5693 10.8897 Fig. 1. Plot of squared changes in one month yield against lagged yield 16.2100 Art = oci - fan-i + ar]]_ (2) where here and in the remainder of the paper e, is n.i.d.{Q, 1). Equation (2) can be estimated by OLS simply by defining the dependent variable as krtr~]_\, while the regressors become xt= [r~]_\ rtZ{1], as the error term is then <ret, which is n.i.d(0, a2). Because the conditional mean for rt depends only on oci, jSj, while the conditional variance of ut = rt — Er_i(rr) is <r2rtl\, which does not involve these parameters, we could estimate the parameters in the following way. 1. Regress Art on 1 and rr_ 2. Since to get ai and fix Er_, [u]] = ctr]l\ then 2 2 2i>, + vr , (3) (4) where Er_i(vr) = E[u2 — Et-i(u2)] = 0. Hence we can estimate yx by using a nonlinear regression program. 3. We can re-estimate ai,/?! by then doing a weighted regression of Artr~]l against r~2\ and rt~Jl. The above steps would produce a maximum likelihood estimator if et was taken to be Jf{0,1) and the estimation of yx was done by a weighted non-linear

96 A. R. Pagan, A. D. Hall and V. Martin regression on (3) using the conditional standard deviation of vt as weights.4 Chan et al. (1992) use a GMM estimator, which jointly estimates ai, f$x, yl and a from the set of moments E(£r) = 0, E(r,_i%) = 0, E(vt) = 0, E(r,_,v,) = 0 . Their estimator would coincide with the one described above if the last moment condition was replaced by ~E(r]_xvt) = 0. A potential problem with all the estimators is that, if jSj is likely to be close to zero, the regressors in (2) and (4) will be close to 1(1), and so non-standard distribution theory almost certainly applies to the GMM estimator. Table 4 presents estimates of the parameters <x\, p1 and yl found by using three estimation methods. The first one is based on estimating the diffusion with an Euler approximation, Arth = ai/!-/?1/!r(r_1)A + ff/!1/2r[;_1)A£r , (5) with h = 1. It is the estimator described above as GMM. The others stem from the modern approach of indirect estimation proposed by Gourieroux et al. (1993) Table 4 Estimates of diffusion process parameters r,(l) r,(3) r,(6) r,(9) r,(120) GMM «i ft Ti MLE «i ft Tl EGARCH «i ft 7i .106 (2.19) .020 (1.52) 1.351 (6.73) .071 (2.17) .012 (.74) .583 (2.39) .107 (1.63) -.004 (2.15) .838 (2.67) .090 (1.82) .015 (1.24) 1.424 (5.61) .047 (2.48) .007 (7.89) .648 (1.92) .044 (1.89) -.010 (1.57) .974 (5.73) Asymptotic t-ratios in parentheses 4 Frydman (1994) argues that the distribution of the MLE of fly is non-standard when yx =1/2 and there is no drift. .089 (1.80) .015 (1.25) 1.532 (4.99) .041 (3.00) .005 (2.08) .694 (2.31) .045 (4.34) -.008 (2.24) .947 (7.21) .091 (1.87) .015 (1.31) 1.516 (5.12) .037 (3.82) .004 (1.08) .753 (3.34) .043 (1-67) -.008 (2.04) .941 (3.09) .046 (1.77) .006 (.98) 1.178 (9.80) .015 (3.74) -.001 (2.35) 1.136 (19.30) .009 (3.30) -.009 (2.36) 1.104 (4.88)

Modeling the term structure 97 and Gallant and Tauchen (1992). In these methods one simulates K multiple sets of observations J*h (k = 1, ...,K) from (5), with given values of h (we use 1/100) and 6' = («i f$x yx a2), and then finds the estimates of 6 that set Y%=i {K~l !C*=i d(/>(r%;4>)} to zero> where (j> is an estimator of the parameters of some auxiliary model found by solving ^J=l d^,(rt; $) = 0.5 The logic of the estimator is that, if the model (5) is true, then </> —> </>*, where E[d,p(rt; 4>*)] = 0, and the term in curly brackets estimates this expectation by simulation. Consistency and asymptotic normality of the indirect estimator follows from the properties of </> under mis-specification. It is important to note that the auxiliary model need not be correct, but it should be a good representation of the data, otherwise the indirect estimator will be very inefficient. We use two auxiliary models and, in each instance, d$ are the scores for <j> from those models. The first is (5) with h = 1 and e, being assumed n.i.d.(0, \){MLE), while the second has r, being an AR(l) with EGARCH(1,1) errors. The visual evidence of Figure 1 is strongly supported by the estimated parametric models, although there is considerable diversity in the estimates obtained. Perhaps the most interesting aspect of the table is the fact that y! tends to increase with maturity. Based on the evidence from the indirect estimators, yl = 1/2 seems a reasonable choice for the shortest maturity, which would correspond to the diffusion process used by Cox et.al. (1985). A problem in simply fitting a model with a "levels" effect is that the observed conditional heteroskedasticity in the data might be better accounted for by a GARCH process, and so the appropriate questions should either be whether there is evidence of a levels effect after removing a GARCH process, or, whether a levels representation fits the data better than a GARCH model does. To shed some light on these questions, our strategy was to fit augmented EGARCH(1,1) models to At,(t) = [i + oxtEtt, £%t ~./T(0,1), of the form log ffj, = a0t + flit log o2t_x + alxEtt-\ + a3i I |ert-i | - \J- ) + Srt-i (t) . (6) This specification is used to generate a diagnostic test for the presence of a levels effect, and is not intended to be a good representation of the actual volatility. Hence the /-statistic for testing if S is zero can be regarded as a valid test for more general specifications, e.g. 8g{rt^\{z)), where g(-) is some function, provided ?7-i(t) is correlated with g(r,^\(x)). Table 5 gives the estimates of S and the Table 5 S and t Ratios for Levels Effect -5 t *(1) .050 3.73 0(3) .025 3.51 0(6) .023 3.42 0(9) .021 3.04 o(120) .019 2.42 5 A Mihlstein (1974) rather than Euler approximation of (5) was also tried, but there were very minor differences in the results.

98 A. R. Pagan, A. D. Hall and V. Martin associated t ratios. Every yield displays a levels effect, although with the 10 year maturity it seems weaker.6 The same conclusion applies to the spreads between forward rates, Fpt(x,x— 1). Fitting EGARCH(1,1) models to these series for t = 1,3,6 and 9 months maturity, and allowing the levels effect to be a function of F(_i(t), the f-ratios that this coefficient was zero were 3.85, 3.72, 17.25 and 12.07 respectively. A number of studies have appeared that look at this phenomenon. Apart from our own work, Chan et al. (1992), Broze et al. (1993), Koedijk et al. (1994), and Brenner et al. (1994) have all considered the question, while Vetzal (1992) and Kearns (1993) have tried to allow for stochastic volatility, i.e. of is not only a function of the past history of yields. To date no formal comparison of the different models is available, unlike the situation for stock returns e.g. Gallant et al. (1994). All studies find strong evidence for a levels effect on volatility. Brenner et al. provide ML estimates of the parameters of a discretized joint GARCH/levels model in which the volatility function, of, is the product of a GARCH(1,1) process and a levels effect i.e. of — (ao -\- a\<j^_^_^ +Q20f_i)'"J_i- The estimated value of y falls to around .5, but remains highly significant. Koedijk et al. (1993) have a similar formulation except that of is driven by £t_x rather than of_i^_i- Again y is reduced but remains highly significant. One might question the use of conventional significance levels for the "raw" t ratios, owing to the fact that one of the regressors is a near-integrated process. To examine the effects of this we simulated data from an estimated model, equation (6) for r((l), treating the estimates obtained by MLE estimation as the true parameter values, and then found the distribution of the t ratio for the hypothesis that d = 0 using the MLE, constructed by taking one step from the true values of the coefficients (this would be a simulation of the asymptotic distribution). The results indicate that the distribution of the ?-ratio has fatter tails than the normal with critical values for two tailed tests of 2.90 (5%) and 2.41 (10%), but use of these would not change the decisions. 2.2. Multivariate properties 2.2.1. The level of the yield curve As was mentioned in the introduction a great deal of work on the term structure views yields as being driven by a set of M factors M 6 It is interesting to observe that the distribution of the Dickey-Fuller test is very sensitive to whether there is a levels effect or not. To see this we simulated a model in which Ar; = .001 + ti\r]_xet, where e, ~ nid(0,1) and y either took the value of zero or unity. A small drift was added, although its influence upon the distribution is likely to be small. The simulated critical values for 1 %, 2.5% and 5% significance levels when 7 = 0,1 are (-3.14, -6.41), (-2.71, -4.97) and (-2.39, -4.03) respectively. Clearly, the presence of a levels effect in volatility means that the critical values are much larger (in absolute terms), strengthening the claim that Table 1 suggests a unit root in yields.

Modeling the term structure 99 and it is important to investigate whether this is a reasonable characterization of the data. It is useful here to recognise that the modern econometrics literature on multivariate relations admits just such a parameterization. Suppose the yields are collected into an (n x 1) vector yt and that it is assumed that yt can be represented as a VAR. Then, if yt are 1(1) and, in the n yields there are k co-integrating vectors, Stock and Watson (1988) showed this to mean that the yields can be described in the format y_t=Jl + ut (g) it = £t-i + vt , where 1t are the n — k common trends to the system, and ~Et-\vt = 0. The format (8) is commonly referred to as the Beveridge-Nelson-Stock-Watson (BNSW) representation. If there are (n — 1) co-integrating vectors, there will be a single common factor, £y, that determines the level of the yields. How the yields relate to one another is governed by yt — J£,\t = ut i.e. the yield curve is a function of ut. Johansen's (1988) tests for the number of co-integrating vectors may be applied to the data described earlier. Table 6 provides the two most commonly used - the maximum eigenvalue (Max) and trace (Tr) tests - for the five yields under investigation, and assuming a VAR of order one.7 From this table there appears to be four co-integrating vectors, i.e. a single common trend. Johnson (1994), Engsted and Tanggaard (1994) and Hall et al. (1992) reach the same conclusion. Zhang (1993) argues that there are three common trends but Johnson shows that this is due to Zhang's use of a mixture of yields from zero and non-zero coupon bonds. What is the common trend? There is no unique answer to this. One solution is to find a yield that is determined outside of the system, as that will be the driving force. For a small country, that rate is likely to be the "world interest rate", which in practice either means a Euro-Dollar rate or some combination of the US, German and Japanese interest rates. Another candidate for the common trend is Table 6 Tests for cointegration amongst yields 5 vs 4 trends 4 vs 3 trends 3 vs 2 trends 2 vs 1 trends 1 vs 0 trends Max 273.4 184.7 95.6 30.9 2.4 Crit. 33.5 27.1 21.0 14.1 3.8 Value (.05) Tr. 586.9 313.5 128.8 33.3 2.4 Crit. Value (.05) 68.5 47.2 29.6 15.4 3.8 1 Changing this order to four does not affect any conclusions, but restricting it to unity fits in better with the theoretical discussion.

100 A. R. Pagan, A. D. Hall and V. Martin the simple average of the rates.8 In any case we will take this to be the first factor Zu in (7). 2.2.2. The shape of the yield curve The existence of k co-integrating vectors a (a is an (n x k) matrix), such that Ct = v!yt is 7(0), means that any VAR in yt has the ECM format Aj, = yCt-i + £>(i)Aj,_! + et , (9) where E,_i (et) = 0 and D(L) is a polynomial in the lag operator. It is also possible to show that ut in (8) can be written as a function of the k EC (error correction) terms (t and this suggests that we might take these to be the remaining factors £Jt (J = 2,..., K) in (7). To make the following discussion more concrete assume that the expectations theory of the term structure holds i.e. a t period yield is the weighted average of the expected one period yields into the future. In the case of discount bonds the weights are equal to \ so that the theory states l.-1 r,(T)=-£E,(r„*(l)) . Tt=o Of course this is an hypothesis, albeit one that seems quite sensible. It implies that r,(t) - r,(l) = J ^E^+t(l) - E,r,(l) I = {;EEE«^(1)}- Now, if the yields are 7(1) processes, the yield spread rt{x) — rt{\) should be 7(0), i.e. rt{x) and r,(l) should be co-integrated with co-integrating vector [1 — 1], and these spreads are the EC terms. Therefore, to test the expectations hypothesis for the five yields we need to test if the matrix of co-integrating vectors has the form (10) Johansen's (1988) test for this gives a #2(4) of 36.8, leading to a very strong rejection of the hypothesis. Such an outcome has also been observed by Hall et al. (1992), Johnson (1994) and Engsted and Tanggaard (1994). A number of possible explanations for the rejection were canvassed in those papers, involving the size of the test statistic being incorrect etc. One's inclination is to examine the estimated matrix of co-integrating vectors given by Johansen's a' = "-1 1 -1 0 -1 0 -1 0 0 1 0 0 0 0 0 1 See Gonzalo and Granger (1991) for other alternatives.

Modeling the term structure 101 procedure, a, and to see how closely these correspond to the hypothesized values but, unfortunately, the vectors are not unique and the estimated quantities will always be linear combinations of the true values. Some structural information is needed to recover the latter, and to this end we write a' = Aa, where " -03 10 0" -ft 0 1 0 a -09 0 0 0 ' .-^120 0 0 1 and then proceed to solve the equations a = Aa, where A is some non-singular matrix. This produces 03 = 1.038, 06 = 1.063, 09 = 1.075 and 012O = 1.076, which indicates that the point estimates are quite close to those predicted by the expectations theory. It is also possible to estimate the 0T by "limited information" rather than "full-information" methods. To that end the Phillips-Hansen (1990) estimator was adopted with a Parzen kernel and eight lags being used to form the long-run covariance matrices, producing 03 = 1.021, 06 = 1.034, 09 = 1.034 and 0120 = -91 • With the exception of the 10 year rate, neither set of estimates seems to be greatly divergent from that predicted. Some insight into why the rejection occurs may be had from (9). Given that co- integration has been established, and working with a VAR(l), i.e. D(L) = 0 in (9), the change in each yield should be governed by 5 t*tw = J2y^r<-iw - 0/vi(i))+««. (ii) 7=2 where j = 2,..., 5 maps one to one into the elements x = 3,6,9,120. If the expectations theory is valid 0y = 1 and the system becomes 5 tot{v) = X^OviO') ~ 'ViO)) + *« , j=a and the hypothesis Ho : fy = 1 can be tested by computing the likelihood ratio statistic. It is well known that such a test will be distributed as a x2(4) under the null hypothesis. If the yields were taken to be 1(0), the simplest way to test if 0, = 1 would be to re-write (11) as tot{T)=J2yjArt-xU) -n-i(i)) + (E^(1 -A/)Vi(i) +e«> (i2) 7=2 \j=2 ) and to test if the coefficient of r?_i(l) in each of the equations for Art(x) was zero. For a number of reasons this does not reproduce the x2(4) test cited above - there are five coefficients being tested and rt-i(l) will be 1(1), making the distribution non-standard. Nevertheless, the separate single equation tests might still be informative. In this case the ^-values that rt-i(l) has a zero coefficient in each

102 A. R. Pagan, A. D. Hall and V. Martin equation were -4.05, -1.77, -.72, -.24 and .55 respectively, suggesting that the rejection of (10) lies in the behaviour of the one month rate i.e. the spreads are not capable of fully accounting for its movement. Engsted and Tanggaard (1994) also reach this conclusion. It may be that r(_i(l) is proxying for some omitted variable, and the literature has in fact canvassed the possibility of non-linear effects upon the short-term rate. Anderson (1994) makes the influence of spreads upon Ar((l) non-linear, while Pfann et al. (1994) take the process driving rt(\) to be a non-linear autoregression - in particular, the latter allow for a number of regimes according to the magnitude of rt{\), with some of these regimes featuring 1(1) behaviour of the rate while others do not. Another possibility, used in Conley et.al. (1994) is that the "drift term" in a continuous time model has the form Y0j=-m a-iA and this would induce a non-linearity into the relation between Art and r(_i. Instead of a mis-specification in the mean, rejection of (10) may be due to levels effects in ezt. As noted earlier, the Dickey-Fuller test critical values are very sensitive to this effect, and the test that ru^\ has a zero coefficient in the Ar((l) equation in (12) is actually an ADF test, if the augmenting variables are taken to be the spreads. This led us to produce a small Monte Carlo simulation of Jo- hansen's test for (10) under different assumptions about levels effects in the errors of the VAR. The example is a simplified version of the system above featuring only two variables y\t and yu with co-integrating vector [1 — 1], and being generated from the vector ECM, Ayn = -.8(j>i,_i - y2t-\) + ^y\t^u kyit = -•lO'if-i - yit-\) + -\y\t-\Zit ■ The 95% critical value for Johansen's test that the co-integrating vector is the true one varies according to the value of y : 3.95(y = 0), 4.86(y = .5), 5.87(y = .6), 11.20(7 = .8), and 23.63(y = 1). Clearly, there is a major impact of the levels effect upon the sampling distribution of Johansen's test, and the phenomenon needs much closer investigation, but it is conceivable that rejection of (10) may just be due to the use of critical values that are too small. Even if one rejects the co-integrating vectors predicted by the expectations theory, the evidence is still that there are k = n — 1 error correction terms. It is natural to equate the remaining M — 1 factors in (7) (after elimination of the common trend) with these EC terms, but this is not very helpful, as it would mean thatM = n, i.e. the number of factors would equal the number of yields. Hall et al. (1992) provide an example of forecasting the term structure using the ECM relation (9), imposing the expectations theory co-integrating vectors to form £„ and then regressing Ayt on Ct~\ and anY lags needed in Ayt. Hence their model is equivalent to using a single factor, the common trend, to forecast the level, and (n — 1) factors to forecast the slope (the EC or spread terms). In practice however they impose the feature that some of the coefficients in y were zero, i.e. the number of factors determining the yield varies with the maturity being examined. It is interesting to note that their representation for Ar((4) has no EC terms i.e. it is

Modeling the term structure 103 effectively determined outside the system and plays the role of the "world interest rate" mentioned earlier. In an attempt to reduce the number of non-trend factors below » — 1, it is tempting to assume that (say) only m — M - 1 of the (n - 1) terms in (, appear as determinants of Art(x) and that these constitute the requisite common factors, but such a restriction would necessitate m of the columns of y being zero, thereby violating the rank condition, p(y) = n—\. Consequently the factors will need to be combinations of the EC terms. Now, pre-multiplying (7) by a' gives M 7=1 where M = [/? -j... fSjn] is a 1 x « vector. If we designate the first factor as the common trend then it must be that a!b\ = 0 as the LHS is 1(0) by construction, meaning that t;t = a'f2bjZjt = x'BSt , (14) 7=2 where H, is the (K — 1) x 1 vector containing £2( • • ■ £&, and B is an n x (K — 1) matrix with p(B) =K—\, where p(-) designates rank. Equation (14) enables us to draw a number of interesting conclusions. Firstly, p[co\(gt)] = min[p(a), p(B)], provided cov(E,) has rank K—\. Since K <n implies K — 1 < « — 1, it must be that p(B) < p(a.), and therefore p[cov((,)] =K—\ i.e. the number of factors in the term structure (other than the common trend) may be found by examining the rank of the covariance matrix of the co-integrating errors. Secondly, since C = a'B has p{C) =K-l, Ft = {C'C)~XC%„ and hence the factors will be linear combinations of the EC terms. Applying principal components to the data set composed of spreads spt(3), spt(6), spt(9) and 5p((120), the eigenvalues of the covariance matrix are 4.1, .37, .02 and .002, pointing to the fact that these four spreads can be summarized very well by three components (at most).9 The three components are: 9 The principal components approach, or variants of it, has been used in a number of papers - Litterman and Scheinkman (1991), Dybvig (1989) and Egginton and Hall (1993). This technique finds linear combinations of the yields such that the variance of each combination is as small as possible. Thus the i'th principal component of y, will be b^y,, where bt is a set of weights. Because one could always multiply through by a scale factor the bt are normalized, i.e. 6Ji, = 1. With this restriction b becomes the eigenvectors of var(y,). Since bt is an eigenvector it is clear that b'v&r{y,)b = A, where A is a diagonal matrix with the eigenvalues (h ■■■ K) on it, and that tr[b'vaT(y,)b] = J^JLi h- It is conventional to order the components according to the magnitude of A,-; the first principal component having the largest Xt. There is a connection between principal components and common trends. Both seek linear combinations of y, and, in many cases, one of the components can be interpreted as the common trend, e.g. in Egginton and Hall (1993) the first component is effectively the average of the interest rates, which we have mentioned as a possible common trend earlier.

104 A. R. Pagan, A. D. Hall and V. Martin 4>u = 32sPt(3) - .86.sp,(6) - 37sPt(9) + .17spt{l20) 4>2t = ~.78sPt(3) + .00sPt(6) - .55spt(9) + .29spt{l20) 4>* = .54spt{3) + .52spt(6) - .Sispt(9) + 37spt(120). 3. Models of the term structure In this section we describe some popular ways of modelling the term structure. In order to assess whether these models are capable of replicating observed term structures, it is necessary to decide on some way to compare them to the data. There is a small literature wherein formal statistical tests have been performed on how well the models replicate the data in some designated dimension. Generally, however, the reasons for any rejection of the models remain unclear, as many characteristics are being tested at the one time. In contrast, this chapter uses the method of "stylized facts", i.e. it seeks to match up the predictions of the model with the nature of the data as summarized in Section 2. Thus, we look at whether the models predict that yields are near-integrated, have levels effects in volatility, exhibit specific co-integrating vectors, produce persistence in spreads, and would be compatible with two or (at most) three factors in the term structure.10 3.1. Solutions from the consumer's Euler equations Consider a consumer maximising expected utility over a period subject to a budget constraint, i.e. maxE, X>(C,)/F where P is a discount factor, and Cs is consumption at time s. It is well known that a first order condition for this is U'(Ct)vt = Et{ps-'U'(Cs)vs} , where vt is the value of an asset (or portfolio) in terms of consumption goods. This can be re-arranged to give E, Vs ]r'U'(Cs)/U'(Q) = 1 ■ (15) Assuming that the asset is a discount bond, and the general price level is fixed, consider setting s = t + z giving vt = fti?)- The solution of this equation will then 10 There are many other characteristics of these yields that we ignore in this paper but which are challenging to explain e.g. the extreme leptokurtosis in the density of the change in yields and in the spreads.

Modeling the term structure 105 provide a complete set of discount bond prices for any maturity. It is useful to re- express (15) as /,(t) = E([/r[/'(Q+T)/C/'(G)] , (16) imposing the restriction that f,(t + t) = 1, so as to find the price of a zero coupon bond paying SI at maturity. Hence the term structure would then be determined. If the price level is not fixed (16) needs to be modified to /,(t) = Et[FP,U'(Ct+z)/(U'(Ct)Ps+<)} , (17) where Pt is the price level at time t. There have been a few attempts to price bonds from (16) or (17). Canova and Marrinan (1993) and Boudoukh (1993) do this by assuming that ct = log (Q+i/Q) - 1 and pt = log (Pt+\/Pt) — 1, follow a VAR process with some volatility in the errors, and that the utility function has the CRAA form, U(Q) = C]~y/(l - y), where y is the coefficient of risk aversion.11 10g/,(T) • It is necessary to evaluate (17) for the yield rt(x) r,(r) = - - log E,\p*(Ct+z/Ct)-y(Pt/Pt+z)} T = --log Vt\p\\ + cH)-\\ + Pn)-x where cn = Ct+Z/Ct - 1 ~ log Cr+T - log C, Pn = Pt+z/Pt ~ 1 ^ log A+t " log Pt . Expanding around Et(cn) and E,(/>„), and ignoring all cross terms and terms of higher order than a quadratic,12 where ~ _ logjS - 1 log { [(1 + E,(crt)P(l + MPn))' + aiT,var((c,T)+a2TfVar,(^rt)} , aut = 1/2(1 + 7)7(1 + Et(crT))-''-2(l + E,(At))" ^^(l+E,^))"^^^^))^ (18) 11 Canova and Marrinan actually use the Cambridge equation for the price level, Pt = M,/Yt, and so their VAR involves the growth in money, output and consumption. 12 The conditional covariance terms between c,t and ptx are ignored as one is a real and the other a nominal quantity and most general equilibrium models would make this zero. Boudoukh (1993) however argues that the conditional covariance is important for explaining the term structure.

106 A. R. Pagan, A. D. Hall and V. Martin ~ - log j3 +1 log (1 + Et(cn)) + - log (1 + Et(PtT)) \ T (19) - - log {Z>irtvar,(cft) + b2rt\ah{Ptr)} , z where but = \(l+ 7)7(1 +E((ctt)r2 b2Tt = (l+Et(Ptt))~2 . Equation (19) points to a four factor model of the term structure with the level being driven by the first two conditional moments of the inflation rate and consumption growth. However, the relation is not easily interpreted as a linear one, since the weights attached to volatilities are functions of the conditional means. The problem remains to evaluate the conditional moments. To complete the model it is necessary to assume something about the evolution of zlt = ct\ and zit = Pt\ -These are generally taken to be AR processes of the form zJt = $oj + ®\jZjt-\ + ejt . Canova and Marrinan (1993) take o^+1 = \aTt(eJt+i) to be GARCH processes of the form 4+i = a0j + cijtft + aiie)t - whereby the formulae in Baillie and Bollerslev (1992) can be used to evaluate Et{zjH) and var,(z,tt), while Boudoukh (1993) has ajt as a stochastic volatility process. For GARCH models var,(z,tt) is a linear function of o)t+\- How well does this model perform in replicating the stylized facts of the term structure? To produce a near unit root in yields it is necessary that log(l +Et(pH)) ~ Et{pn) be near integrated i.e. inflation must be a near integrated process, as it is the only one of the two series that has such persistence in either mean or variance - see Boudoukh (1993) for a description of the time series properties of the two series. Then the inflation rate becomes the common trend in the term structure, and the spreads will depend upon consumption growth and the two volatilities. As there is rather weak evidence for much dependence in either inflation or consumption volatility - see the test statistics in Boudoukh- it is difficult to see the persistence in spreads being explained by these models.13 Whether a levels effect in Art(z) can be produced is unclear; the GARCH structures used by Canova and Marrinan will not produce it, but Boudoukh's stochastic volatility formulation does allow for a levels effect in var, (p,). Moreover, even if volatilities were constant, the conditional means enter the weights attached 13 Although Boudoukh finds much more in his estimated stochastic volatility specification than GARCH specifications.

Modeling the term structure 107 to them, and this dependence might be used to induce a levels effect into Arf(r). Whilst Et(cn) is likely to be close to a constant due to the weak autocorrelation in consumption growth, there is strong serial correlation in inflation rates, and, with inflation as the common trend, it is conceivable that the requisite effect could be found in that way, although the question was not addressed by the authors.14 Another attempt at working within this framework is Constantinides (1992) who writes (17) as /,(t) = Et[Kt+x/Kt] , where Kt = P'U'(Ct)/Pt is referred to as a "pricing kernel". He then makes assumptions about the evolution of Kt, in particular that Kt = exp< - [g +-y-y + zQt+ ^2{zit - at)2 \ . He works in continuous time and makes z0t a Weiner process while the other zit are Ornstein-Uhlenbeck diffusion processes with parameters Xt and variances of. Each of the zit are taken to be independent. Under these assumptions it turns out that f,(x) = {n^-Wr^cxpH-ff + E^T + fx'w^ - a<e"iT)2 - x> - a<-)2)» i=\ i=l J where Ht(x) = of/A, + (1 — of /'Xi)e2i-'t'. Consequently, rt(x) has the format N N r,(x) = <5o, + E di"(z'' ~ ^"f + E T_1 (z« " a<)2 ■ i=\ i=l Terms such as (zit - a,)2 reflect the fact that the "variance" of the change in zit of an Orstein-Uhlenbeck process depends upon the level of the variable zit. Constantinides' model will have trouble producing the right outcomes. After converting to yields his model has no factor that would be 1(1). The difficulty arises from his specification of the "pricing kernel". The pricing kernel used to evaluate (17) has an 1(2) variable Pt as it is the inflation rate which is 1(1). Consequently it is the assumption implictly made by Constantinides that the kernel is only 1(1) through the presence of the term z§t which is the root of the problem with his model. 14 Essentially, these are "calibrated" models that emphasise the use of a highly specified theory to explain an observed phenomenon. Hence, one should really distinguish between the model prediction of yields, t*(x), and the observed outcomes, rt{%). The gap between the two variables is due to factors not captured within the model, or perhaps to specification errors. Examination of the characteristics of the gap may be very informative.

108 A. R. Pagan, A. D. Hall and V. Martin 3.2. One factor models from finance Finance theory has developed by working with factor models to determine the term structure. Common to the material just discussed is the use of models of an economy in which there is inter-temporal optimization, but a notable difference is the introduction of a production sector and a concern with ensuring that the pricing formulae prohibit the possibility of arbitrage i.e the solution tends to be closer to a general rather than partial equilibrium solution. The basic work horse of the literature is the model due to Cox, Ingersoll and Ross (1985) (CIR). Essentially they propose an economy driven by a number of processes that affect the rate of return to assets e.g. technological change and (possibly) an inflation factor. Dealing with the simplest case where there is just a single state vector, /j,t, perhaps total factor productivity (TFP), it is assumed that this variable follows a diffusion process of the form dfit = (b — Kfit)dt + cpfj./ dr\t . General equilibrium in asset markets for such an economy results in an expression for the instantaneous rate of interest of the form drt = (a — firt)dt + art' dr\t . (20) Once one has the expression for the instantaneous rate the whole term structure ft(x) is priced according to a partial differential equation 1/2 a2r /„ + {*- M fr + ft- MT ~ rf = 0 , (21) where f„ = cff/drdr, fr = df/dr, ft = df/dt and the term Xrfr , which depends upon the covariance of the change in the price of the factor with the percentage change in the optimal portfolio, is the "market price of risk" associated with that factor. This partial differential equation comes from the fact that a zero coupon riskless bond maturing at t + x must be valued at /rW = E, '(-/ exp - / r(\j/)d\j/ (22) Since the expected rate of change of the price of the bond is given by r + krfr/f, it also can be interpreted as a liquidity premium. It is clear that we could group together the terms (a - fir)fr and -Xrfr and treat the problem as one of pricing an asset using a "hypothetical " instantaneous rate that is generated by dr, = (a - fir, - krt)dt + ar),2dr\t, 1/2 *■ ' = (a - yrt)dt + art' dy\t . The distinction is between the true probability measure in (20) and the "equivalent martingale measure " in (23).

Modeling the term structure 109 The analytic solution for the term structure in the CIR model is then (see Cox et al. (p. 393)) f,{z) = Al(x)exp(-Bl(x)rt) , where r 2dcW((d + y)z/2) l2^ |_(<5 + 7)(exp(<5T)-l) + 2<5j ' = 2(exp(<k) - 1) l{ ' [(<5+y)(exp(<5T)-l) + 2<5 Converting to a yield r,(T) = {-\og{Ax{z))+Bx{z)rt}/x . (24) This is a single factor model with the instantaneous rate or, more fundamentally, the "returns" factor, driving the whole term structure, i.e. the level of the term structure depends on the value of rt at any point in time. The slope of the yield curve depends upon the parameters of the diffusion equation as well as the market price of risk. Perhaps the biggest problem with this methodology is that it will never exactly reproduce an observed yield curve. This bothers practitioners a lot. One response has been to allow a to change according to t and t. What this does is to add on "fudge factors" to the model based yield curve so that the modified curve equals the observed yield structure. Then, after forecasting rt+\ and finding the predicted term structure, the "fudge factors" from the previous period are added on. The need for "fudge factors" suggests that there is substantial mis-specification in the CIR model as a description of the term structure, just as "intercept corrections" in macro econometric models were given such an interpretation. Brown and Dybvig (1986) estimated the parameters of the CIR model by maximum likelihood and then computed the residuals denned by the gap between the observed bond prices (/,) and the predictions of the model (/*). Examination of the residuals pointed to specification errors in the model.15 Looking at the CIR model in the light of stylized facts, the data should posess the characteristic that interest rates are near-integrated processes and possibly co-integrated with co- integrating vectors between any pair of rates of [1 -1] i.e. the spreads should be 1(0). The question that arises is whether the CIR model would deliver such a prediction. One problem to be overcome is quantifying the market price of risk, X, in the CIR bond formulae. As CIR point out, X — 0 if the factor had no effect on the real economy e.g. if it was some nominal quantity such as the inflation rate. Accordingly, we will adopt this interpretation, allowing us to set X — 0. To induce a unit root we set ft = 0, and we also put the drift term a = 0. This makes 15 Since there are n yields but only one factor they needed to add on a vector of errors to the model to produce a non-singular covariance matrix for /*, in order to be able to form a likelihood. It may be that the mis-specification reflects the assumptions made in this step. ,and<5 = (y + 2ff2)1/2.

110 A. R. Pagan, A. D. Hall and V. Martin ' w ' w <5[exp(<5T)-l)]+2<5 Now the spread spt(z) will be rt(z)-rt(l) = [z-lBl(z)-Bl(l)}rt , so that we will not get spreads to be 1(0) unless the term in square brackets is zero. Generally it will not be. Realistic values for a, /? and a might be the GMM estimates for rt(\) of .049, .02 and .106. These produce values of z~xB\(z) = .990, .967, .930, .890 and .181 for the five maturities. In the limit (z -► oo)Bi(z) = 2/3, and so the spreads between adjoining yields tend to zero as the maturity lengthens. The source of the failure of the spreads to be 1(0) is the fact that 3 ^ 0. If 3 = 0 then, using L'Hopital's rule, B\ (z) = z, and so the spreads should be identically zero. By making a very small we can always produce results in which the spreads will be very close to being 1(0) i.e. even if a is not exactly zero it can be regarded as sufficiently close to zero that the spreads are nearly non-integrated, although the longer the maturity which the spread is based on the less likely we are to see such an outcome. Another way of understanding the problem is to look at a discrete form of the fundamental pricing equation (22), ft(z) = E,[exp(— J2%]~1 r])\- Suppose that rt is 1(1) with martingale difference innovations that are normally distributed. Then ft(z)=exp(-zrt){H<z\[l/2(z-j)\art(Yl%]-{Art+j)]}. If the conditional variance is a constant the spreads will therefore be 1(0). However, if it depends upon the level of the instantaneous rate, the spreads at any maturity would be equal to a non-linear function of rt. For example, substituting the "square-root" formulation of CIR gives var((Ar(+y) = a2rt, and spt(z) = cnst — (1 — 1) log rt. Thus, it is important to determine the nature of the conditional variances in the data. Most econometric models of the term structure make these conditional variances GARCH processes, which effectively means that they are functions of Art_j. But, as seen in the section examining the term structure data, there is prima facie evidence of a levels effect after allowing for a GARCH specification of the conditional variance. Given the conflicting evidence in Section 2, one might look at other co-integrating vectors when performing the comparison with CIR. In general, the CIR model points towards co-integrating vectors that are of the form rt(z) = d(z)rt(\) , where d(z) < 1 and decreasing with z. As seen in Section 2, with one exception both the Johansen and Phillips-Hansen estimates of d(z) have d(z) > 1 and

Modeling the term structure 111 increasing in z. The predictions from CIR type models are therefore diametrically opposed to the data.16 3.3. Two factor models from finance Another response to the discrepancy between the model based prediction of a yield curve and the observed one, is to seek to make the model more complex. It is not uncommon in this literature to see people "bypassing" the step between the instantaneous rate and the fundamental driving forces and simply postulating a process for the instantaneous rate, after which this is used to price all the bonds. An example of this is the paper by Chen and Scott (1992) who assume that the instantaneous rate is the sum of two factors r, = ft, + & , (25) where dlu = (ai - h l\t)dt + a\£\lt2dr\u d&t = (<*2 - ht,it)dt + (r2£l2t2dti2t , where dr\jt are independent, thereby making each factor independent. Then the solution for the bond price is ft(z) =Al(z)A2(z)Sxp{-Bl(z)^t-B2(z)i2t} , where A2 and B2 are defined analogously to A\ and B\. Obviously this framework could be extended to encompass any number of factors, provided they are assumed to be independent. Another method is that of Longstaff and Schwartz (1992) who also have two factors but these are related to the underlying rate of return process fj.t rather than directly to the instantaneous rate. In particular they wish to have the two factors being linear combinations of the instantaneous rate and its conditional variance. The model is interesting because the second factor they use, £2?, affects only the conditional variance of the fj.t process, whereas both factors affect the conditional mean. This is unlike Chen and Scott's model which has £lt and £2t affecting both the mean and variance. Empirically, the two factors are regarded as the short term rate and its conditional volatility, where the latter is estimated by a GARCH 16 Brown and Schaefer (1994) find that the CIR model closely fits the term structure of real yields, where these are computed from British government index-linked bonds. Note in constructing the Johansen and Phillips-Hansen estimators that an intercept was allowed into the relations in order to correspond to A(z).

112 A. R. Pagan, A. D. Hall and V. Martin process when assessing the quality of the model.17 Tests of the model are limited to how well it replicates the unconditional standard deviations of yield changes. There are a number of other two factor models. Brennan and Schwartz (1979) and Edmister and Madan (1993) begin with the long and short rates following a joint diffusion process. After imposing the "no arbitrage condition" and assuming that the long rate is a traded instrument, Brennan and Schwatz find that the price of the instantaneous risk associated with the long rate can be eliminated, and the two factors then effectively become the instantaneous rate and the yield spread between that rate and the long rate. Eliminating the price of risk for the long rate makes the model non-linear and they need to linearize to find a solution. Even then there is no analytical solution for the yield curve as with CIR. Another possibility for a two factor model might be to allow for stochastic volatility as a factor. Edmister and Madan find closed form solutions for the term structure in their formulation. Suppose that the first factor in Chen and Scott's model is a "near 7(1)" process whereas the second factor is 7(0) .Then the instantaneous rate has the common trend format (compare (25) and (8) recognising that J can be regarded as the unit column vector). Using the same parameter values for the first factor as the polar case discussed in the preceding sub-section i.e. fix = 0, k\ = 0, g\ = 0, the first factor disappears from the spreads, which now equal rt{z) - r,(l) = log(^2(l)/^2(T)) + [t-1^) -52(l)]fc, ■ Hence, they are now stochastic and inherit the properties of the second factor. For them to be persistent, it is necessary that the second factor have that characteristic. Notice also that rt(z) — rt(z - 1) will tend to zero as z —> oo, and this may make it implausible to use this model with a large range of maturities. Consequently, this two factor model can be made to reproduce the standard results of the co-integration approach in the sense that the EC terms are decomposed into a smaller number of factors. Of course the model would predict that the coefficients on the factors would be negative as z~lB2(z) < 52(1). The conclusion of negative weights extends to any number of factors, provided they are independent, so it is interesting to look at the evidence upon the signs of the coefficients of the factors in our data set, where the non-trend factors are equated with the principal components. Although one cannot uniquely move from the principal components/spreads relation to a spreads/principal components relation, a simple way to get some information on the relationship between spreads and factors is to regress each of the spreads against the principal components. Doing so the R2 are .999, .999, .98 and .99 respectively, showing that the spreads are well explained by the three components. The results from the regressions are 17 Volatility affects the term structure here by its impact upon r, in (25). Shen and Starr (1992) raise the interesting question of why volatility should be priced; if one thinks of bonds as part of a larger portfolio only their covariances with the market portfolio would be relevant. To justify the observed importance of volatility they note that the bid/ask spread will be a function of volatility and that has an immediate effect upon yields.

Modeling the term structure 113 spt(3) = .36iAu - .83^ + .48^, spt(6) = -J6il/u-. 09il/2t + .42^, spt(9) = -1.28^,, + .33^ + .44^ 5^(120) = -1.44^ + 1.84^ + 2.12^3, , where \j/jt are the first three principal components. It is clear that independent factor models would not generate the requisite signs. Formal testing of two factor pricing models is in its infancy. Pearson and Sun (1994) and Chen and Scott (1993) estimate the parameters of the model by maximum likelihood and provide some evidence that at least two factors are needed to capture the term structure adequately. The two factor model is also useful for examining some of the literature on the validity of the expectations hypothesis. Campbell and Shiller (1991) pointed out that the hypothesis implies that r»+i(T - 1) - rt(z) = a0 +—-\rt{z) - rt{\)\ , (26) if the liquidity premium was a constant. They found that this restriction was strongly rejected by the data. With McCulloch and Kwon's data and z = 3, the regression of rt+\{2) — r,(3) against rt(3) — rt{\) yields an estimated coefficient of -.09, well away from the predicted value of .5. Of course, the assumption of a constant premium is incorrect. Bond prices are determined by (22) which, when discretized, would be, tt-T-l j=t "■+*-! w (27) exp (- iz rj) L \ j=t / Vt where ff (t) is the bond price predicted by the expectations theory. Thus rt(z) differs from that of the expectations theory by the term - z~l log v,, and this in turn will be a function of the conditional moments of Art. In the case where Art is conditionally normal it depends upon the conditional variance, and the equation corresponding to (26) will now feature a time varying ao that depends on this moment. If the conditional variance relates to the spreads with a negative coefficient, then that could cause there to be a negative bias in the coefficient of rt(z) — rt{\) in the Campbell and Shiller regressions. One scenario in which this happens is if the conditional variance depended upon Art, as happens with an EGARCH model. Then, due to cointegration amongst yields, Art could also be replaced by the lagged spreads, and these will have negative coefficients. More generally, since we observed in Section 2 that the factors influencing the term structure, such as volatility, could be written as linear combinations of the

114 A. R. Pagan, A. D. Hall and V. Martin spreads, there is a possibility that term structure anomalies might be explained in this way. 3.4. Multiple non-independent factor models in finance Duffle and Kan (1993) present a multi-factor model of the term structure where the factors may not be independent. As for the two factor models it is assumed that the instantaneous rate is a linear function of M factors, collected in an M x 1 vector £„ which evolves according to the diffusion process where dr\t is a vector of standard Brownian motions and n(£t),a{£t) are vectors and matrices corresponding to drift and volatility functions. They then ask what type of functions /z(-) and a{-) are capable of producing a solution for the n bond prices ft{z), z = 1, •. •, n, of the exponential affine form ft(z)=exp[(A(z)+B(z)i;t)} = exp u i=i It turns out that n(£t) and a{£t) should be linear (affine) functions of £,. Thereupon the solution for B[z) can be found by solving an ordinary differential equation of the form B{z) = B(B(z)), 5(0) =0. In most cases only numerical solutions for B(z) are available. Duffle and Kan consider some special cases, differing according to the evolution of £,. When the £it are joint diffusions driven by Brownian motion with covariance matrix Q that is not diagonal, there is the possibility that the weights attached to the factors can have different signs, and so the principal defect with the two factor models of the preceding sub-section might be overcome. To date little empirical work seems to be available on these models, with the exception of El Karoui and Lacoste (1992) who make £, Gaussian with constant volatility. 3.5. Forward rate models In recent years it has become popular to model the forward rate structure directly rather than the yields, e.g. in Ho and Lee (1986) and Heath, Jarrow and Morton (1992) (HJM). Since the forward rates are linear combinations of the yields, specifications based on the nature of the forward rate structure imply some restriction upon the nature of the yield curve, and conversely. In the light of what is known about the behavior of yields, this sub-section considers the likelihood that popular models of forward rates can replicate the term structure. In what follows, one step ahead forward rates are used along with the HJM framework. In the

Modeling the term structure 115 interest of space only a simple Euler discretization of the HJM stochastic differential equations describing the evolution of the forward rate curve is used. Many variants of these equations have emerged, but they have the common format, F,{z - 1) -F,-\(x) = c/]t_i + oviT_i£,,.T_i , where e,jT_i is n.i.d.(0,1). Differences among the models reflect differences in the assumptions made about volatilities. Examples would be a constant volatility model in which c/]T_i = a0 + a2z and ff/]t_i = a, or a proportional volatility model that has ct<t-\ = -aF,(z)X + oFt(z)(^2"k=xF,(k)) and ov]T_i = oF,(z). The nature of c/jT_i reflects the no-arbitrage assumption. After some manipulation it can be shown that Ft{z - \) -F,.x{z) = 1 Z + 1 sPt(z) (r<(T + 0 ~ ^(T)) fI±iAr/(T + l)--Ar/(l) , Z Z so that the equation used by HJM for the evolution of the forward rate incorporates spreads and changes in yields. In turn, using co-integration ideas, Art(z + 1) depends upon spreads, and this shows quite clearly that the characteristics of F,(z — 1) — F/_i(t) will be those of the spreads - see Table 2. Consequently, at least for small z, constant volatility models with martingale difference errors could not adequately describe the data. It is possible that proportional volatility models might do so due to the dependence of their c/>T_i upon Ft(z), as the latter is near integrated. To check this out we regressed F,(2) — F,_i(3) against c/j2 and spt-\(3) for n = 9 and a variety of values for the market price of risk X. For X = 0 the t ratio of the coefficient of spt-\ (3) was -4.37, while for very large X it was -4.70. Adopting other values for X resulted in t ratios between these extremes. Hence, the conditional mean for the forward rates is far more complex than that found in HJM models. Moreover, the rank of the covariance matrix of the errors e,T_i must reflect the number of factors in the term structure, which appears to be two or three, so that the common assumption of a single error to drive all forward spreads seems inaccurate. A number of formal investigations have been made into the compatibility of the HJM model with the data - Abken(1993) and Thurston(1994) fitted HJM models to forward rate data by GMM whilst Amin and Morton(1994) used options prices to recover implied volatilities whose evolution was compared to those of the most popular variants of the HJM model. Abken and Thurston reach conflicting conclusions - the latter favours a constant volatility formulation and the former a proportional one, although his general conclusion was that all models were rejected by the data. Consequently, it seems interesting to look at the stylized facts regarding volatility and to compare them with model specifications. Equation (28) is useful for this task. As it has been shown that there is a levels effect in Art(k), in order to have constant volatility it would be necessary that

116 A. R. Pagan, A. D. Hall and V. Martin there be some "co-levels" effect, analogous to the co-persistence phenomenon of the GARCH literature - Bollerslev and Engle (1993) - i.e. even though Ar,(k) displays a levels effect the linear combination ^Art(x + 1) — \Art(l) does not. This contention is easily rejected - a plot of that variable squared against r(_i(3) looks almost identical to Figure 1, and such an observation points to the proportional volatility model as being the appropriate one. 4. Conclusion This chapter has described methods of modeling the term structure that are to be found in the econometrics and finance literatures. By utilizing a factor representation we have been able to show that there are many similarities in the two approaches. However, there were also some differences. Within the econometrics literature it is common to assume that yields are integrated processes and that spreads constitute the co-integrating relations. Although the finance literature takes the stance that yields are near integrated but stationary, it emerges that the models used in that literature would not predict that the spreads are co-integrating errors if we actually replaced the stationarity assumption by one of a unit root. The reason for this outcome is found to lie in the assumption that the conditional volatility of yields is a function of the level of the yields. Empirical work tends to support such an hypothesis and we suggest that the consequences of such a relationship can be profound for testing propositions about the term structure. We also document a number of stylized facts about a set of data on yields that prove useful in assessing the likely adequacy of many of the models that are used in finance for capturing the term structure References Abken, P. A. (1993). Generalized method of moments tests of forward rate processes. Working Paper, 93-7. Federal Reserve Bank of Atlanta. Amin, K. I. and A. J. Morton (1994). Implied volatility functions in arbitrage-free term structure models. /. Financ. Econom. 35, 141-180. Anderson, H. M. (1994). Transaction costs and nonlinear adjustment towards equilibrium in the US treasury bill market. Mimeo, University of Texas at Austin. Baillie, R. T. and T. Bollerslev (1992). Prediction in dynamic models with time-dependent conditional variances, /. Econometrics 52, 91-113. Baillie, R. T., T. Bollerslev and H. O. Mikkelson (1993). Fractionally integrated autoregressive conditional heteroskedasticity. Mimeo, Michigan State University. Bollerslev T. and R. F. Engle (1993). Common persistence in conditional variances: Definition and representation. Econometrica 61, 167-186. Boudoukh, J. (1993). An equilibrium model of nominal bond prices with inflation-output correlation and stochastic volatility. /. Money, Credit and Banking 25, 636-665. Brennan M. J. and E. S. Schwartz (1979). A continuous time approach to the pricing of bonds. J. Banking Finance 3, 133-155. Brenner R. J., R. H. Harjes and K. F. Kroner (1994). Another look at alternative models of the short- term interest rate. Mimeo, University of Arizona.

Modeling the term structure 117 Brown, S. J. and P. H. Dybvig (1986). The empirical implications of the Cox-Ingersoll-Ross theory of the term structure of intestest rates. J. Finance XLI, 617-632. Brown, R. H. and S. M. Schaefer (1994). The term structure of real interest rates and the Cox, Ingersoll and Ross model. J. Financ. Econom. 35, 3-42. Broze, L. O. Scaillet and J. M. Zakoian (1993). Testing for continuous-time models of the short-term interest rates. CORE Discussion Paper 9331. Campbell, J. Y. and R. J. Shiller (1991). Yield spreads and interest rate movements: A bird's eye view. Rev. Econom. Stud. 58, 495-514. Canova F. and J. Marrinan (1993). Reconciling the term structure of interest rates with the consumption based ICAP model. Mimeo, Brown University. Chan K. C, G. A. Karolyi, F. A. Longstaff and A. B. Sanders (1992). An empirical comparison of alternative models of the short-term interest rate. J. Finance XLVII. 1209-1227. Chen R. R. and L. Scott (1992). Pricing interest rate options in a two factor Cox-Ingersoll-Ross model of the term structure. Rev. Financ. Stud. 5, 613-636. Chen R. R. and L. Scott (1993). Maximum likelihood estimation for a multifactor equilibrium model of the term structure of interest rates. J. Fixed Income 3, 14-31. Conley T., L. P. Hansen, E. Luttmer and J. Scheinkman (1994). Estimating subordinated diffusions from discrete time data. Mimeo, University of Chicago. Constantinides, G. (1992). A theory of the nominal structure of interest rates. Rev. Financ. Stud. 5, 531-552. Cox, J. C, J. E. Ingersoll and S. A. Ross. (1985). A theory of the term structure of interest rates. Econometrica 53, 385-408. DufBe, D. and R. Kan (1993). A yield-factor model of interest rates. Mimeo, Graduate School of Business, Stanford University. Dybvig, P. H. (1989). Bonds and bond option pricing based on the current term structure. Working Paper, Washington University in St. Louis. Edmister, R. O. and D. B. Madan (1993). Informational content in interest rate term structures. Rev. Econom. Statist. 75, 695-699. Egginton, D. M. and S. G. Hall (1993). An investigation of the effect of funding on the slope of the yield curve. Working Paper No. 6, Bank of England. El Karoui, N. and V. Lacoste, (1992). Multifactor models of the term structure of interest rates. Working Paper. University of Paris VI. Engsted, T. and C. Tanggaard (1994). Cointegration and the US term structure. J. Banking Finance 18, 167-181. Evans, M. D. D. and K. L. Lewis (1994). Do stationary risk premia explain it all? Evidence from the term structure. J. Monetary Econom. 33, 285-318. Frydman, H. (1994). Asymptotic inference for the parameters of a discrete-time square-root process. Math. Finance 4, 169-181. Gallant, A. R. and G. Tauchen (1992). Which moments to match? Mimeo, Duke University. Gallant, A. R., D. Hsieh and G. Tauchen (1994). Estimation of stochastic volatility models with diagnostics. Mimeo, Duke University. Gonzalo, J. and C. W. J. Granger, (1991). Estimation of common long-memory components in cointegrated systems. UCSD, Discussion Paper 91-33. Gourieroux, C, A. Monfort and E. Renault (1993). Indirect inference. J. Appl. Econometrics 8, S85- Sl 18. Gourieroux, C. and O. Scaillet (1994). Estimation of the term structure from bond data. Working Paper No. 9415 CEPREMAP. Hall, A. D., H. M. Anderson and C. W. J. Granger. (1992). A cointegration analysis of treasury bill yields. Rev. Econom. Statist. 74, 116-126. Heath, D., R. Jarrow and A. Morton (1992). Bond pricing and the term structure of interest rates: A new methodology for contingent claims valuation. Econometrica 60, 77-105. Hejazi, W. 1994. Are term premia stationary? Mimeo, University of Toronto.

118 A. R. Pagan, A. D. Hall and V. Martin Ho, T. S. and S-B Lee (1986). Term structure movements and pricing interest rate contingent claims. J. Finance 41, 1011-1029. Johansen, S. (1988). Statistical analysis of cointegrating vectors. J. Econom. Dynamic Control 12, 231— 254. Johnson, P. A. (1994). On the number of common unit roots in the term structure of interest rates. Appl. Econom. 26, 815-820. Kearns, P. (1993). Volatility and the pricing of interest rate derivative claims. Unpublished doctoral dissertation, University of Rochester. Koedijk, K. G., F. G. J. A. Nissen, P. C. Schotman and C. C. P. Wolff (1993). The dynamics of short- term interest rate volatility reconsidered. Mimeo, Limburg Institute of Financial Economics. Litterman, R and J. Scheinkman (1991). Common factors affecting bond returns. J. Fixed Income 1, 54-61. Longstaff, F. and E. S. Schwartz (1992). Interest rate volatility and the term structure: A two factor general equilibrium model. J. Finance XLVII 1259-1282. Marsh, T. A. and E. R. Rosenfeld (1983). Stochastic processes for interest rates and equilibrium bond prices. J. Finance XXXVIII, 635-650. Mihlstein, G. N. (1974). Approximate integration of stochastic differential equations. Theory Probab. Appl. 19, 557-562. McCuUoch, J. H. (1989). US term structure data. 1946-1987, Handbook of Monetary Economics 1, 672-715. McCuUoch, J. H. and H. C. Kwon (1993). US term structure data. 1947-1991. Ohio State University Working Paper 93-6. Pearson, N. D. and T-S Sun (1994). Exploiting the conditional density in estimating the term structure: An application to the Cox, Ingersoll and Ross model. J. Fixed Income XLIX, 1279-1304. Pfann, G. A., P. C. Schotman and R. Tschernig (1994). Nonlinear interest rate dynamics and implications for the term structure. Mimeo, University of Limburg. Phillips, P. C. B. and B. E. Hansen (1990). Statistical inference in instrumental variables regression with 1(1) processes. Rev. Econom. Stud. 57, 99-125. Shen, P. and R. M. Starr (1992). Liquidity of the treasury bill market and the term structure of interest rates. Discussion paper 92-32. University of California at San Diego. Stock, J. H. and M. W. Watson (1988). Testing for common trends. J. Amer. Statist. Assoc. 83, 1097- 1107. Thurston, D. C. (1994). A generalized method of moments comparison of discrete Heath-Jarrow- Morton interest rate models. Asia Pac. J. Mgmt. 11, 1-19. Vetzal, K. R. (1992). The impact of stochastic volatility on bond option prices. Working Paper 92-08. University of Waterloo. Institute of Insurance and Pension Research, Waterloo, Ontario. Zhang, Z. (1993). Treasury yield curves and cointegration. Appl. Econom. 25, 361-367.

G. S. Maddala, and C. R. Rao, eds., Handbook of Statistics, Vol. 14 © 1996 Elsevier Science B.V. All rights reserved. 5 Stochastic Volatility* Eric Ghysels, Andrew C. Harvey and Eric Renault 1. Introduction The class of stochastic volatility (SV) models has its roots both in mathematical finance and financial econometrics. In fact, several variations of SV models originated from research looking at very different issues. Clark (1973), for instance, suggested to model asset returns as a function of a random process of information arrival. This so-called time deformation approach yielded a time-varying volatility model of asset returns. Later Tauchen and Pitts (1983) refined this work proposing a mixture of distributions model of asset returns with temporal dependence in information arrivals. Hull and White (1987) were not directly concerned with linking asset returns to information arrival but rather were interested in pricing European options assuming continuous time SV models for the underlying asset. They suggested a diffusion for asset prices with volatility following a positive diffusion process. Yet another approach emerged from the work of Taylor (1986) who formulated a discrete time SV model as an alternative to Autoregressive Conditional Heteroskedasticity (ARCH) models. Until recently estimating Taylor's model, or any other SV model, remained almost infeasible. Recent advances in econometric theory have made estimation of SV models much easier. As a result, they have become an attractive class of models and an alternative to other classes such as ARCH. Contributions to the literature on SV models can be found both in mathematical finance and econometrics. Hence, we face quite a diverse set of topics. We say very little about ARCH models because several excellent surveys on the subject have appeared recently, including those by Bera and Higgins (1995), Bollerslev, Chou and Kroner (1992), Bollerslev, Engle and Nelson (1994) and * We benefited from helpful comments from Torben Andersen, David Bates, Frank Diebold, Rene Garcia, Eric Jacquier and Neil Shephard on preliminary drafts of the paper. The first author would like to acknowledge the financial support of FCAR (Quebec), SSHRC (Canada) as well as the hospitality and support of CORE (Louvain-la-Neuve, Belgium). The second author wishes to thank the ESRC for financial support. The third author would like to thank the Institut Universitaire de France, the Federation Francaise des Societes d'Assurance as well as CIRANO and C.R.D.E. for financial support. 119

120 E. Ghysels, A. C. Harvey and E. Renault Diebold and Lopez (1995). Furthermore, since this chapter is written for the Handbook of Statistics, we keep the coverage of the mathematical finance literature to a minimum. Nevertheless, the subject of option pricing figures prominently out of necessity. Indeed, Section 2, which deals with definitions of volatility has extensive coverage of Black-Scholes implied volatilities. It also summarizes empirical stylized facts and concludes with statistical modeling of volatility. The reader with a greater interest in statistical concepts may want to skip the first three subsections of Section 2 which are more finance oriented and start with Section 2.4. Section 3 discusses discrete time models, while Section 4 reviews continuous time models. Statistical inference of SV models is the subject of Section 5. Section 6 concludes. 2. Volatility in financial markets Volatility plays a central role in the pricing of derivative securities. The Black- Scholes model for the pricing of an European option is by far the most widely used formula even when the underlying assumptions are known to be violated. Section 2.1 will therefore take the Black-Scholes model as a reference point from which to discuss several notions of volatility. A discussion of stylized facts regarding volatility and option prices will appear next in Section 2.2. Both sections set the scene for a formal framework defining stochastic volatility which is treated in Section 2.3. Finally, Section 2.4 introduces the statistical models of stochastic volatility. 2.1. The Black-Scholes model and implied volatilities More than half a century after the seminal work of Louis Bachelier (1900), continuous time stochastic processes have become a standard tool to describe the behavior of asset prices. The work of Black and Scholes (1973) and Merton (1990) has been extremely influential in that regard. In Section 2.1.1 we review some of the assumptions that are made when modeling asset prices by diffusions, in particular to present the concept of instantaneous volatility. In Section 2.1.2 we turn to option pricing models and the various concepts of implied volatility. 2.1.1. An instantaneous volatility concept We consider a financial asset, say a stock, with today's (time t) market price denoted by St.2 Let the information available at time t be described by I, and consider the conditional distribution of the return St+h/St of holding the asset over the period [t, t + h] given lt? A maintained assumption throughout this chapter will be that asset returns have finite conditional expectation given I, or: 2 Here and in the remainder of the paper we will focus on options written on stocks or exchange rates. The large literature on the term structure of interest rates and related derivative securities will not be covered. 3 Section 2.3 will provide a more rigorous discussion of information sets. It should also be noted that we will indifferently be using conditional distributions of asset prices St+h and of returns St+h/St since S, belongs to /,.

Stochastic volatility 121 Et(St+h/St) = S^EtSt+h < +00 (2.1.1) and likewise finite conditional variance given It, namely vt{st+h/st) = s;2vtst+h < +00 . (2.1.2) The continuously compounded expected rate of return will be characterized by h~x \ogEt(St+f,/St). Then a first assumption can be stated as follows: Assumption 2.1.1.A. The continuously compounded expected rate of return converges almost surely towards a finite value fis(It) when h > 0 goes to zero. From this assumption one has EtSt+h - S, ~ hfis{It)St or in terms of its differential representation: sE<<« fis(It)St almost surely (2.1.3) z=t where the derivatives are taken from the right. Equation (2.1.3) is sometimes loosely defined as: E,(dSt) = fis(I,)Stdt. The next assumption pertains to the conditional variance and can be stated as: Assumption 2.1.l.B. The conditional variance of the return h~xVt(St+h/St) converges almost surely towards a finite value o2s(It) when h > 0 goes to zero. Again, in terms of its differential representation this amounts to: —Var,(ST) = a](I,)Sf almost surely (2.1.4) ®x t.=t and one loosely associates with the expression Vt(dSt) = a^(It)S2dt. Both assumptions 2.1.1.A and B lead to a representation of the asset price dynamics by an equation of the following form: dSt = ns(I,)Stdt + as(I,)StdW, (2.1.5) where Wt is a standard Brownian Motion. Hence, every time a diffusion equation is written for an asset price process we have automatically defined the so-called instantaneous volatility process as (It) which from the above representation can also be written as: °si!t) \imh-lVt(St+h/St) h[o 1/2 (2.1.6) Before turning to the next section we would like to provide a brief discussion of some of the foundations for the Assumptions 2.1.1.A and B. It was noted that Bachelier (1900) proposed Brownian Motion process as a model of stock price movements. In modern terminology this amounts to the random walk theory of asset pricing which claims that asset returns ought not to be predictable because of the informational efficiency of financial markets. Hence, it assumes that returns

122 E. Ghysels, A. C. Harvey and E. Renault on consecutive regularly sampled periods [t + k, t + k + 1], k = 0,2,..., h - 1 are independently (identically) distributed. With such a benchmark in mind, it is natural to view the expectation and the variance of the continuously compounded rate of return log (St+h/St) as proportional to the maturity h of the investment. Obviously we no longer use Brownian Motions as a process for asset prices but it is nevertheless worth noting that Assumptions 2.1.1. A and B also imply that the expected rate of return and the associated squared risk (in terms of variance of the rate of return) of an investment over an infinitely-short interval [t, t + h] is proportional to h. Sims (1984) provided some rationale for both assumptions through the concept of "local unpredictability". To conclude, let us briefly discuss a particular special case of (2.1.5) predominantly used in theoretical developments and also highlight an implicit restriction we made. When ps(It) = ns and os(It) = as are constants for all t the asset price is a Geometric Brownian Motion. This process was used by Black and Scholes (1973) to derive their well-known pricing formula for European options. Obviously, since os(It) is a constant we no longer have an instantaneous volatility process but rather a single parameter as - a situation which undoubtedly greatly simplifies many things including the pricing of options. A second point which needs to be stressed is that Assumptions 2.1.1.A and B allow for the possibility of discrete jumps in the asset price process. Such jumps are typically represented by a Poisson process and have been prominent in the option pricing literature since the work of Merton (1976). Yet, while the assumptions allow in principle for jumps, they do not appear in (2.1.5). Indeed, throughout this chapter we will maintain the assumption of sample path continuity and exclude the possibility of jumps as we focus exclusively on SV models. 2.1.2. Option prices and implied volatilities It was noted in the introduction that SV models originated in part from the literature on the pricing of options. We have witnessed over the past two decades a spectacular growth in options and other derivative security markets. Such markets are sometimes characterized as places where "volatilities are traded". In this section we will provide the rationale for such statements and study the relationship between so-called options implied volatilities and the concepts of instantaneous and averaged volatilities of the underlying asset return process. The Black-Scholes option pricing model is based on a Log-Normal or Geometric Brownian Motion model for the underlying asset price: dSt = UsStdt + asStdWt (2.1.7) where ns and as are fixed parameters. A European call option with strike price K and maturity t + h has a payoff:

Stochastic volatility 123 Since the seminal Black and Scholes (1973) paper, there is now a well established literature proposing various ways to derive the pricing formula of such a contract. Obviously, it is beyond the scope of this paper to cover this literature in detail.4 Instead, the bare minimum will be presented here allowing us to discuss the concepts of interest regarding volatility. With continuous costless trading assumed to be feasible, it is possible to form in the Black-Scholes economy a portfolio using one call and a short-sale strategy for the underlying stock to eliminate all risk. This is why the option price can be characterized without ambiguity, using only arbitrage arguments, by equating the market rate of return of the riskless portfolio containing the call option with the risk-free rate. Moreover, such arbitrage-based option pricing does not depend on individual preferences.5 This is the reason why the easiest way to derive the Black-Scholes option pricing formula is via a "risk-neutral world", where asset price processes are specified through a modified probability measure, referred to as the risk neutral probability measure denoted Q (as discussed more explicitly in Section 4.2). This fictitious world where probabilities in general do not coincide with the Data Generating Process (DGP), is only used to derive the option price which remains valid in the objective probability setup. In the risk neutral world we have: dSt/St = rtdt + asdWt (2.1.9) Ct = C(St,K,h,t)=B(t,t + h)E?(St+h-K)+ (2.1.10) where Ep is the expectation under Q, B(t, t + h) is the price at time t of a pure discount bond with payoff one unit at time t + h and rt = -limTLogB(t,t + h) (2.1.11) is the riskless instantaneous interest rate.6 We have implicitly assumed that in this market interest rates are nonstochastic (Wt is the only source of risk) so that: ft+h B(t, t + h)— exp /t+h rTd% (2.1.12) By definition, there are no risk premia in a risk neutral context. Therefore rt coincides with the instantaneous expected rate of return of the stock and hence 4 See however Jarrow and Rudd (1983), Cox and Rubinstein (1985), Duffle (1989), Duffle (1992), Hull (1993) or Hull (1995) among others for more elaborate coverage of options and other derivative securities. 5 This is sometimes refered to as preference free option pricing. This terminology may somewhat be misleading since individual preferences are implicitly taken into account in the market price of the stock and of the riskless bond. However, the option price only depends on individual preferences through the stock and bond market prices. 6 For notational convenience we denote by the same symbol W, a Brownian Motion under P (in 2.1.7) and under Q (in 2.1.9). Indeed, Girsanov's theorem establishes the link between these two processes (see e.g. Duffle (1992) and section 4.2.1).

124 E. Ghysels, A. C. Harvey and E. Renault the call option price Ct is the discounted value of its terminal payoff (St+h - K)+ as stated in (2.1.10). The log-normality of St+h given St allows one to compute the expectation in (2.1.10) yielding the call price formula at time t: C, = St<t>{dt) - KB{t, t + h)4>(dt - osVh) (2.1.13) where <f> is the cumulative standard normal distribution function while dt will be defined shortly. Formula (2.1.13) is the so-called Black-Scholes option pricing formula. Thus, the option price Ct depends on the stock price St, the strike price K and the discount factor B(t, t + h). Let us now define: x, = Log St/KB(t,t + h) . (2.1.14) Then we have: Q/St = <t>{dt) - e-x><t>{dt - as\fh) (2.1.15) with dt = (xt/os\fJi) + 0SVh/2. It is easy to see the critical role played by the quantity xt, called the moneyness of the option. - If xt — 0, the current stock price St coincides with the present value of the strike price K. In other words, the contract may appear to be fair to somebody who would not take into account the stochastic changes of the stock price between t and t + h. We shall say that we have in this case an at the money option. - If xt > 0 (respectively xt < 0) we shall say that the option is in the money (respectively out the money).7 It was noted before that the Black-Scholes formula is widely used among practitioners, even when its assumptions are known to be violated. In particular the assumption of a constant volatility as is unrealistic (see Section 2.2 for empirical evidence). This motivated Hull and White (1987) to introduce an option pricing model with stochastic volatility assuming that the volatility itself is a state variable independent of Wt:& dS,/St = rtdt + aStdWt (2 116} {°st)t<$>,T\i{wt)t<z$,T\ independent Markovian . \- ■ ) It should be noted that (2.1.16) is still written in a risk neutral context since rt coincides with the instantaneous expected return of the stock. On the other hand the exogenous volatility risk is not directly traded, which prevents us from de- 7 We use here a slightly modified terminology with respect to the usual one. Indeed, it is more common to call at the money /in the money/ out of the money options, when St = KjSt > K/St < K respectively. From an economic point of view, it is more appealing to compare St with the present value of the strike price K. 8 Other stochastic volatility models similar to Hull and White (1987) appear in Johnson and Shanno (1987), Scott (1987), Wiggins (1987), Chesney and Scott (1989), Stein and Stein (1991) and Heston (1993) among others.

Stochastic volatility 125 fining unambiguously a risk neutral probability measure, as discussed in more detail in Section 4.2. Nevertheless, the option pricing formula (2.1.10) remains valid provided the expectation is computed with respect to the joint probability distribution of the Markovian process (S, as), given (St, ast)-9 We can then rewrite (2.1.10) as follows: Ct = B(t,t + h)Et(St+h-K)+ = B(t,t + h)Et{E[(St+h -K) + \KU<,+J} where the expectation inside the brackets is taken with respect to the conditional probability distribution of St+h given It and a volatility path oSt, t <% <t + h. However, since the volatility process osz is independent of Wt, we obtain using (2.1.15) B(t,t + h)Et[{St+h -K)+\{aST)t^t+h] i = StEt[4>(dlt) - e-*>4>(d2t)} Here d\t and d2t are defined as follows: d\t = (xt/y(t, t + h)Vh) + y(t, t + h)Vh/2 dn = di -y(t,t + h)Vh where y(t,t + h) > 0 and: 1 ft+h y2(t,t + h)=-Jt a2Sxdx . (2.1.19) This yields the so-called Hull and White option pricing formula: Ct = StEt[4>(dlt)-e-x>4>(d2t)} , (2.1.20) where the expectation is taken with respect to the conditional probability distribution (for the risk neutral probability measure) of y(t, t + h) given oSt-w In the remainder of this section we will assume that observed option prices obey Hull and White's formula (2.1.20). Then option prices would yield two types of implied volatility concepts: (1) an instantaneous implied volatility, and (2) an averaged implied volatility. To make this more precise, let us assume that the risk neutral probability distribution belongs to a parametric family, Pg, 6 e ©. Then, the Hull and White option pricing formula yields an expression for the option price as a function: Ct = StF[aSt,xt,eo] (2.1.21) 9 We implicitly assume here that the available information /, contains the past values (S,z,<rz)t<r This assumption will be discussed in Section 4.2. 10 The conditioning is with respect to a, since it summarizes the relevant information taken from /, (the process <r is assumed to be Markovian and independent of W).

126 E. Ghysels, A. C. Harvey and E. Renault where 0o is the true unknown value of the parameters. Formula (2.1.21) reveals why it is often claimed that "option markets can be thought of as markets trading volatility" (see e.g. Stein (1989)). As a matter of fact, if for any given (xh6), F(-,xt, 6) is one-to-one, then equation (2.1.21) can be inverted to yield an implied instantaneous volatility:11 oi™v(6) = G[St,Ct,xt,6} . (2.1.22) Bajeux and Rochet (1992), by showing that this one-to-one relationship between option prices and instantaneous volatility holds, in fact formalize the use of option markets as an appropriate instrument to hedge volatility risk. Obviously implied instantaneous volatilities (2.1.22) could only be useful in practice for pricing or hedging derivative instruments when we know the true unknown value 0o or, at least, are able to compute a sufficiently accurate estimate of it. However, the difficulties involved in estimating SV models has for long prevented their widespread use in empirical applications. This is the reason why practitioners often prefer another concept of implied volatility, namely the so- called Black-Scholes implied volatility introduced by Latane and Rendleman (1976). It is a process w[mp(t,t + h) defined by: ' Ct = St[4>(dlt) - e-x'4>(d2t)} < dit=(xt/(Qimv{t,t + h)y/h)+a)imv(t,t + h)^/2 (2.1.23) Jit = dit- aF*(t,t + h)Vh where Ct is the observed option price.12 The Hull and White option pricing model can indeed be seen as a theoretical foundation for this practice; the comparison between (2.1.23) and (2.1.20) allows us to interpret the Black-Scholes implied volatility a>imp(f, t + h) as an implied averaged volatility since a>imp(f, t + h) is something like a conditional expectation of y(t, t + h) (assuming observed option prices coincide with the Hull and White pricing formula). To be more precise, let us consider the simplest case of at the money options (the general case will be studied in Section 4.2). Since xt — 0 it follows that du = —d\t and therefore: <j){d\t) — e~x'(f>(d2t) = 2(j>(d\t) — 1. Hence, a>™p(f, t + h) (the index o is added to make explict that we consider at the money options) is defined by: 0{&l+3^\ = ^fe'+»)^ . (2,.24) Since the cumulative standard normal distribution function is roughly linear in the neighborhood of zero, if follows that (for small maturities h): 11 The fact that F(-,x,,d) is one-to-one is shown to be the case for any diffusion model on trst under certain regularity conditions, see Bajeux and Rochet (1992). 12 We do not explicitly study here the dependence between colmp(f, (+ h) and the various related processes: C,, St, xt. This is the reason why, for sake of simplicity, this dependence is not apparent in the notation <uimp(f, t + h).

Stochastic volatility 127 (J™V(t,t + h)*Ety(t,t + h) . This yields an interpretation of the Black-Scholes implied volatility co™p(f, t + h) as an implied average volatility: rt+h _ -] 1/2 rdx 1 /"+ a^r(t,t + h)^Et- a2Szdx . (2.1.25) 2.2. Some stylized facts The search for model specification and selection is always guided by empirical stylized facts. A model's ability to reproduce such stylized facts is a desirable feature and failure to do so is most often a criterion to dismiss a specification, although one typically does not try to fit or explain all possible empirical regularities at once with a single model. Stylized facts about volatility have been well documented in the ARCH literature, see for instance Bollerslev, Engle and Nelson (1994). Empirical regularities regarding derivative securities and implied volatilities are also well covered, for instance, by Bates (1995a). In this section we will summarize empirical stylized facts, complementing and updating some of the material covered in the aforementioned references. (a) Thick tails Since the early sixties it was observed, notably by Mandelbrot (1963), Fama (1963, 1965), among others that asset returns have leptokurtic distributions. As a result, numerous papers have proposed to model asset returns as i.i.d. draws from fat-tailed distributions such as Paretian or Levy. (b) Volatility clustering Any casual observations of financial time series reveal bunching of high and low volatility episodes. In fact, volatility clustering and thick tails of asset returns are intimately related. Indeed, the latter is a static explanation whereas a key insight provided by ARCH models is a formal link between dynamic (conditional) volatility behavior and (unconditional) heavy tails. ARCH models, introduced by Engle (1982) and the numerous extensions thereafter, as well as SV models are essentially built to mimic volatility clustering. It is also widely documented that ARCH effects disappear with temporal aggregation, see e.g. Diebold (1988) and Drost and Nijman (1993). (c) Leverage effects A phenomenon coined by Black (1976) as the leverage effect suggests that stock price movements are negatively correlated with volatility. Because falling stock prices imply an increased leverage of firms it is believed that this entails more uncertainty and hence volatility. Empirical evidence reported by Black (1976), Christie (1982) and Schwert (1989) suggests, however, that leverage alone is too

128 E. Ghysels, A. C. Harvey and E. Renault small to explain the empirical asymmetries one observes in stock prices. Others reporting empirical evidence regarding leverage effects include Nelson (1991), Gallant, Rossi and Tauchen (1992, 1993), Campbell and Kyle (1993) and Engle and Ng (1993). (d) Information arrivals Asset returns are typically measured and modeled with observations sampled at fixed frequencies such as daily, weekly or monthly observations. Several authors, including Mandelbrot and Taylor (1967) and Clark (1973) suggested linking asset returns explicitly to the flow of information arrival. In fact it was already noted that Clark proposed one of the early examples of SV models. Information arrival is non-uniform through time and quite often not directly observable. Conceptually, one can think of asset price movements as the realization of a process Yt = Yz, where Zt is a so-called directing process. This positive nondecreasing stochastic process Zt can be thought of as being related to the arrival of information. This idea of time deformation or subordinated stochastic processes was used by Mandelbrot and Taylor (1967) to explain fat tailed returns, by Clark (1973) to explain volatility and was recently refined and further explored by Ghysels, Gourieroux and Jasiak (1995a). Moreover, Easley and O'Hara (1992) provide a microstructure model involving time deformation. In practice, it suggests a direct link between market volatility and (1) trading volume, (2) quote arrivals, (3) forecastable events such as dividend announcements or macro- economic data releases, (4) market closures, among many other phenomena linked to information arrival. Regarding trading volume and volatility there are several papers documenting stylized facts notably linking high trading volume with market volatility, see for example Karpoff (1987) or Gallant, Rossi and Tauchen (1992).13 The intraday patterns of volatility and market activity measured for instance by quote arrivals are also well-known and documented. Wood, Mclnish and Ord (1985) and Harris (1986) studied this phenomenon for securities markets and found a U-shaped pattern with volatility typically high at the open and close of the market. The around the clock trading in foreign exchange markets also yields a distinct volatility pattern which is tied with the intensity of market activity and produces strong seasonal patterns. The intraday patterns for FX markets are analyzed for instance by Muller et al. (1990), Baillie and Bollerslev (1991), Harvey and Huang (1991), Dacorogna et al. (1993), Bollerslev and Ghysels (1994), Andersen and Bollerslev (1995), Ghysels, Gourieroux and Jasiak (1995b) among others. Another related empirical stylized fact is that of overnight and weekend market closures and their effect on volatility. Fama (1965) and French and Roll (1986) have found that information accumulates more slowly when the NYSE and AMEX are closed resulting in higher volatility on those markets after weekends 13 There are numerous models, theoretical and empirical, linking trading volume and asset returns which we cannot discuss in detail. A partial list includes Foster and Viswanathan (1993a,b), Ghysels and Jasiak (1994a,b), Hausman and Lo (1991), Huffman (1987), Lamoureux and Lastrapes (1990, 1993), Wang (1993) and Andersen (1995).

Stochastic volatility 129 and holidays. Similar evidence for FX markets has been reported by Baillie and Bollerslev (1989). Finally, numerous papers documented increased volatility of financial markets around dividend announcements (Cornell (1978), Patell and Wolfson (1979,1981)) and macroeconomic data releases (Harvey and Huang (1991, 1992), Ederington and Lee (1993)). (e) Long memory and persistence Generally speaking volatility is highly persistent. Particularly for high frequency data one finds evidence of near unit root behavior of the conditional variance process. In the ARCH literature numerous estimates of GARCH models for stock market, commodities, foreign exchange and other asset price series are consistent with an IGARCH specification. Likewise, estimation of stochastic volatility models show similar patterns of persistence (see for instance Jacquier, Poison and Rossi (1994)). These findings have led to a debate regarding modeling persistence in the conditional variance process either via a unit root or a long memory process. The latter approach has been suggested both for ARCH and SV models, see Baillie, Bollerslev and Mikkelsen (1993), Breidt et al. (1993), Harvey (1993) and Comte and Renault (1995). Ding, Granger and Engle (1993) studied the serial correlations of \r(t, t + l)|c for positive values of c where r(t, t + 1) is a one-period return on a speculative asset. They found \r(t,t+ \)\c to have quite high autocorrelations for long lags while the strongest temporal dependence was for c close to one. This result initially found for daily S&P500 return series was also shown to hold for other stock market indices, commodity markets and foreign exchange series (see Granger and Ding (1994)). (/) Volatility comovements There is an extensive literature on international comovements of speculative markets. Concerns on whether globalization of equity markets increases price volatility and correlations of stock returns has been the subject of many recent studies including, von Fustenberg and Jean (1989), Hamao, Masulis and Ng (1990), King, Sentana and Wadhwani (1994), Harvey, Ruiz and Sentana (1992), and Lin, Engle and Ito (1994). Typically one uses factor models to model the commonality of international volatility, as in Diebold and Nerlove (1989), Harvey, Ruiz and Sentana (1992), Harvey, Ruiz and Shephard (1994) or explores so- called common features, see e.g. Engle and Kozicki (1993) and common trends as studied by Bollerslev and Engle (1993). (g) Implied volatility correlations Stylized facts are typically reported as model-free empirical observations.14 Implied volatilities are obviously model-based as they are calculated from a pricing 14 This is in some part fictitious even for macroeconomic data for instance when they are de- trended or seasonally adjusted. Both detrending and seasonal adjustment are model-based. For the potentially severe impact of detrending on stylized facts see Canova (1992) and Harvey and Jaeger (1993) and for the effect of seasonal adjustment on empirical regularities see Ghysels et al. (1993).

130 E. Ghysels, A. C. Harvey and E. Renault equation of a specific model, namely the Black and Scholes model as noted in Section 2.1.3. Since they are computed on a daily basis there is obviously an internal inconsistency since the model presumes constant volatility. Yet, since many option prices are in fact quoted through their implied volatilities it is natural to study the time series behavior of the latter. Often one computes a composite measure since synchronous option prices with different strike prices and maturities for the same underlying asset yield different implied volatilities. The composite measure is usually obtained from a weighting scheme putting more weight on the near-the-money options which are the most heavily traded in organized markets.15 The time series properties of implied volatilities obtained from stock, stock index and currency options are quite similar. They appear stationary and are well described by a first order autoregressive model (see Merville and Pieptea (1989) and Sheikh (1993) for stock options, Poterba and Summers (1986), Stein (1989), Harvey and Whaley (1992) and Diz and Finucane (1993) for the S&P100 contract and Taylor and Xu (1994), Campa and Chang (1995) and Jorion (1995) for currency options). It was noted from equation (2.1.25) that implied (average) volatilities are expected to contain information regarding future volatility and therefore should predict the latter. One typically tests such hypotheses by regressing realized volatilities on past implied ones. The empirical evidence regarding the predictable content of implied volatilities is mixed. The time series study of Lamoureux and Lastrapes (1993) considered options on non-dividend paying stocks and compared the forecasting performance of GARCH, implied volatility and historical volatility estimates and found that implied volatility forecasts, although biased as one would expect from (2.1.25), outperform the others. In sharp contrast, Canina and Figlewski (1993) studied S&P100 index call options for which there is an extremely active market. They found that implied volatilities were virtually useless in forecasting future realized volatilities of the S&P100 index. In a different setting using weekly sampling intervals for S&P100 option contracts and a different sample Day and Lewis (1992) not only found that implied volatilities had a predictive content but also were unbiased. Studies examining options on foreign currencies, such as Jorion (1995), also found that implied volatilities were predicting future realizations and that GARCH as well as historical volatilities were not outperforming the implied measures of volatility. (h) The term structure of implied volatilities The Black-Scholes model predicts a flat term structure of volatilities. In reality, the term structure of at-the-money implied volatilities is typically upward sloping when short term volatilities are low and the reverse when they are high (see Stein(1989)). Taylor and Xu (1994) found that the term structure of implied 15 Different weighting schemes have been suggested, see for instance Latane and Rendleman (1976), Chiras and Manaster (1978), Beckers (1981), Whaley (1982), Day and Lewis (1988), Engle and Mustafa (1992) and Bates (1995b).

Stochastic volatility 131 volatilities from foreign currency options reverses slope every few months. Stein (1989) also found that the actual sensitivity of medium to short term implied volatilities was greater than the estimated sensitivity from the forecast term structure and concluded that medium term implied volatilities overreacted to information. Diz and Finucane (1993) used different estimation techniques and rejected the overreaction hypothesis, and instead reported evidence suggesting underreaction. (i) Smiles If option prices in the market were conformable with the Black-Scholes formula, all the Black-Scholes implied volatilities corresponding to various options written on the same asset would coincide with the volatility parameter a of the underlying asset. In reality this is not the case, and the Black-Scholes implied volatility w™p(f, t + h) denned by (2.1.23) heavily depends on the calendar time t, the time to maturity h and the moneyness xt = Log St/KB(t, t + h) of the option. This may produce various biases in option pricing or hedging when BS implied volatilities are used to evaluate new options with different strike prices K and maturities h. These price distortions, well-known to practitioners, are usually documented in the empirical literature under the terminology of the smile effect, where the so- called "smile" refers to the U-shaped pattern of implied volatilities across different strike prices. More precisely, the following stylized facts are extensively documented (see for instance Rubinstein (1985), Clewlow and Xu (1993), Taylor and Xu (1993)): - The U-shaped pattern of wimp (t,t + h) as a function of K (or \ogK) has its minimum centered at near-the-money options (discounted K close to St, i.e. xt close to zero). - The volatility smile is often but not always symmetric as a function of log K (or of x,). When the smile is asymmetric, the skewness effect can often be described as the addition of a monotonic curve to the standard symmetric smile: if a decreasing curve is added, implied volatilities tend to rise more for decreasing than for increasing strike prices and the implied volatility curve has its minimum out of the money. In the reverse case (addition of an increasing curve), implied volatilities tend to rise more with increasing strike prices and their minimum is in the money. - The amplitude of the smile increases quickly when time to maturity decreases. Indeed, for short maturities the smile effect is very pronounced (BS implied volatilities for synchronous option prices may vary between 15% and 25%) while it almost completely disappears for longer maturities. It is widely believed that volatility smiles have to be explained by a model of stochastic volatility. This is natural for several reasons: First, it is tempting to propose a model of stochastically time varying volatility to account for stochastically time varying BS implied volatilities. Moreover, the decreasing amplitude of the smile being a function of time to maturity is conformable with a formula like (2.1.25). Indeed, it shows that, when time to maturity is increased,

132 E. Ghysels, A. C. Harvey and E. Renault temporal aggregation of volatilities erases conditional heteroskedasticity, which decreases the smile phenomenon. Finally, the skewness itself may also be attributed to the stochastic feature of the volatility process and overall to the correlation of this process with the price process (the so-called leverage effect). Indeed, this effect, while sensible for stock prices data, is small for interest rate and exchange rate series which is why the skewness of the smile is more often observed for options written on stocks. Nevertheless, it is important to be cautious about tempting associations: stochastic implied volatility and stochastic volatility; asymmetry in stocks and skewness in the smile. As will be discussed in Section 4, such analogies are not always rigorously proven. Moreover, other arguments to explain the smile and its skewness (jumps, transaction costs, bid-ask spreads, non-synchronous trading, liquidity problems, ...) have also to be taken into account both for theoretical reasons and empirical ones. For instance, there exists empirical evidence suggesting that the most expensive options (the upper parts of the smile curve) are also the least liquid; skewness may therefore be attributed to specific configurations of liquidity in option markets. 2.3. Information sets So far we left the specification of information sets vague. This was done on purpose to focus on one issue at the time. In this section we need to be more formal regarding the definition of information since it will allow us to clarify several missing links between the various SV models introduced in the literature and also between SV and ARCH models. We know that SV models emerged from research looking at a very diverse set of issues. In this section we will try to define a common thread and a general unifying framework. We will accomplish this through a careful analysis of information sets and associate with it notions of non-causality in the Granger sense. These causality conditions will allow us to characterize in Section 2.4 the distinct features of ARCH and SV models.16 2.3.1. State variables and information sets The Hull and White (1987) model is a simple example of a derivative asset pricing model where the stock price dynamics are governed by some unobservable state variables, such as random volatility. More generally, it is convenient to assume that a multivariate diffusion process Ut summarizes the relevant state variables in the sense that: ' dSt/St = fitdt + atdWt . dUt = ytdt + 5,dW}J (2.3.1) Cov(dWt,dWY) =ptdt 16 The analysis in this section has some features in common with Andersen (1992) regarding the use of information sets to clarify the difference between SV and ARCH type models.

Stochastic volatility 133 where the stochastic processes fit,at,yt,St and pt are if = [UT,z<t] adapted (Assumption 2.3.1). This means that the process U summarizes the whole dynamics of the stock price process S (which justifies the terminology "state" variable) since, for a given sample path (Ut)0<T<T of state variables, consecutive returns Stk+1/Stk,0 < t\ < t2 < ■■■ < h < T are stochastically independent and log- normal (as in the benchmark BS model). The arguments of Section 2.1.2 can be extended to the state variables framework (see Garcia and Renault (1995)) discussed here. Indeed, such an extension provides a theoretical justification for the common use of the Black and Scholes model as a standard method of quoting option prices via their implied volatilities.17 In fact, it is a way of introducing neglected heterogeneity in the BS option pricing model (see Renault (1995) who draws attention to the similarities with introducing heterogeneity in microeconometric models of labor markets, etc.). In continuous time models, available information at time t for traders (whose information determines option prices) is characterized by continuous time observations of both the state variable sample path and stock price process sample path; namely: I, = <t[Ut,Sx; z<t] . (2.3.2) 2.3.2. Discrete sampling and Granger noncausality In the next section we will treat explicitly discrete time models. It will necessitate formulating discrete time analogues of equation (2.3.1). The discrete sampling and Granger noncausality conditions discussed here will bring us a step closer to building a formal framework for statistical modeling using discrete time data. Clearly, a discrete time analogue of equation (2.3.1) is: log St+l/St = fi(Ut) + a{Ut)et+l (2.3.3) provided we impose some restrictions on the process et. The restrictions we want to impose must be flexible enough to accommodate phenomena such as leverage effects for instance. A setup that does this is the following: Assumption 2.3.2.A. The process et in (2.3.3) is i.i.d. and not Granger-caused by the state variable process Ut. Assumption 2.3.2.B. The process et in (2.3.3) does not Granger-cause Ut. Assumption 2.3.2.B is useful for the practical use of BS implied volatilities as it is the discrete time analogue of Assumption 2.3.1 where it is stated that the coefficients of the process U are 1^ adapted (for further details see Garcia and 17 Garcia and Renault (1995) argued that Assumption 2.3.1 is essential to ensure the homogeneity of option prices with respect to the pair (stock price, strike price) which in turn ensures that BS implied volatilities do not depend on the stock price level but only on the moneyness S/K. This homogeneity property was first emphasized by Merton (1973).

134 E. Ghysels, A. C. Harvey and E. Renault Renault (1995)). Assumption 2.3.2.A is important for the statistical interpretation of the functions n(Ut) and o(Ut) respectively as trend and volatility coefficients, namely, E[log St+l/St\{Sx/Sx^t<t)] = E[E[log St+l/St\(UT,eT;x < t)]\{Sx/Si-i;x < t)] (2.3.4) = E\M(Ut)\(St/St-i;x<t)] since E[£,+i | (£/t,£t;t < t)] = E[£,+i | et;z < t] = 0 due to the Granger non- causality from Ut to et of Assumption 2.3.2.A. Likewise, one can easily show that Var[log St+l/St - n(Ut)\(S,/^l;x < t)] ^ = E[a?(Ul)\(S,/S,-l;x<t)] • Implicitly we have introduced a new information set in (2.3.4) and (2.3.5) which, besides It denned in (2.3.2), will be useful as well for further analysis. Indeed, one often confines (statistical) analysis to information conveyed by a discrete time sampling of stock return series which will be denoted by the information set If = afo/Si-i : t = 0,1,..., / - 1, /] (2.3.6) where the superscript R stands for returns. By extending Andersen (1994), we shall adopt as the most general framework for univariate volatility modelling, the setup given by the Assumptions 2.3.2.A, 2.3.2.B and: Assumption 2.3.2.C. n{Ut) is if- measurable. Therefore in (2.3.4) and (2.3.5) we have essentially shown that: E[log St+l/St\I*] = n(Ut) (2.3.7) Var[(log St+l/St)\l*] = E[o\Ut)\lf] . (2.3.8) 2.4. Statistical modelling of stochastic volatility Financial time series are observed at discrete time intervals while a majority of theoretical models are formulated in continuous time. Generally speaking there are two statistical methodologies to resolve this tension. Either one considers for the purpose of estimation statistical discrete time models of the continuous time processes, or alternatively, the statistical model may be specified in continuous time and inference done via a discrete time approximation. In this section we will discuss in detail the former approach while the latter will be introduced in Section 4. The class of discrete time statistical models discussed here is general. In Section 2.4.1 we introduce some notation and terminology. The next section discusses the so-called stochastic autoregressive volatility model introduced by

Stochastic volatility 135 Andersen (1994) as a rather general and flexible semi-parametric framework to encompass various representations of stochastic volatility already available in the literature. Identification of parameters and the restrictions required for it are discussed in Section 2.4.3. 2.4.1. Notation and terminology In Section 2.3, we left unspecified the functional forms which the trend /*(■) and volatility <x(-) take. Indeed, in some sense we built a nonparametric framework recently proposed by Lezan, Renault and de Vitry (1995) which they introduced to discuss a notion of stochastic volatility of unknown form.18 This nonparametric framework encompasses standard parametric models (see Section 2.4.2 for more formal discussion). For the purpose of illustration let us consider two extreme cases, assuming for simplicity that fi(Ut) = 0 : (i) the discrete time analogue of the Hull and White model (2.1.16) is obtained when a{Ut) = at is a stochastic process independent from the stock return standardized innovation process e and (ii) at may be a deterministic function h(et, x < t) of past innovations. The latter is the complete opposite of (z) and leads to a large variety of choices of parameterized functions for h yielding X-ARCH models (GARCH, EGARCH, QTARCH, Periodic GARCH, etc.). Besides these two polar cases where Assumption 2.3.2.A is fulfilled in a trivial degenerate way, one can also accommodate leverage effects.19 In particular the contemporaneous correlation structure between innovations in U and the return process can be nonzero, since the Granger non-causality assumptions deal with temporal causal links rather than contemporaneous ones. For instance, we may have a(Ut) = at with: log St+i/S, = atet+i (2.4.1) Cov(<x,+i,e(+i|lf)^0 . (2.4.2) A negative covariance in (2.4.2) is a standard case of leverage effect, without violating the non-causality Assumptions 2.3.2.A and B. A few concluding observations are worth making to deal with the burgeoning variety of terminology in the literature. First, we have not considered the distinction due to Taylor (1994) between "lagged autoregressive random variance models" given by (2.4.1) and "contemporaneous autoregressive random variance models" defined by: log St+i/S, = a,+iet+i . (2.4.3) 18 Lezan, Renault and de Vitry (1995) discuss in detail how to recover phenomena such as volatility clustering in this framework. As a nonparametric framework it also has certain advantages regarding (robust) estimation. They develop for instance methods that can be useful as a first estimation step for efficient algorithms assuming a specific parametric model (see Section 5). 19 Assumption 2.3.2.B is fulfilled in case (i) but may fail in the GARCH case (ii). When it fails to hold in the latter case it makes the GARCH framework not very well-suited for option pricing.

136 E. Ghysels, A. C. Harvey and E. Renault Indeed, since the volatility process at is unobservable, the settings (2.4.1) and (2.4.3) are observationally equivalent as long as they are not completed by precise (non)-causality assumptions. For instance: (i) (2.4.1) and assumption 2.3.2.A together appear to be a correct and very general definition of a SV model possibly completed by Assumption 2.3.2.B for option pricing and (2.4.2) to introduce leverage effects, (ii) (2.4.3) associated with (2.4.2) would not be a correct definition of a SV model since in this case in general: E[log St+\/St | lf\ ^ 0, and the model would introduce via the process a a forecast which is related not only to volatility but also to the expected return. For notational simplicity, the framework (2.4.3) will be used in Section 3 with the leverage effect captured by Cov(<t(+i , et) ^ 0 instead of Cov(<r(+i, et+\) ^ 0. Another terminology was introduced by Amin and Ng (1993) for option pricing. Their distinction between "predictable" and "unpredictable" volatility is very close to the leverage effect concept and can also be analyzed through causality concepts as discussed in Garcia and Renault (1995). Finally, it will not be necessary to make a distinction between weak, semi-strong and strong definitions of SV models in analogy with their ARCH counterparts (see Drost and Nijman (1993)). Indeed, the class of SV models as defined here can accommodate para- meterizations which are closed under temporal aggregation (see also Section 4.1 on the subject of temporal aggregation). 2.4.2. Stochastic autoregressive volatility For simplicity, let us consider the following univariate volatility process: yt+\ = rt + °t£t+\ (2-4.4) where \it is a measurable function of observables yt elf, x < t. While our discussion will revolve around (2.4.4), we will discuss several issues which are general and not confined to that specific model; extensions will be covered more explicitly in Section 3.5. Following the result in (2.3.8) we know that: Var[J(+1|/f]=E[^|/f] (2.4.5) suggesting (1) that volatility clustering can be captured via autoregressive dynamics in the conditional expectation (2.4.5) and (2) that thick tails can be obtained in either one of three ways, namely (a) via heavy tails of the white noise et distribution, (b) via the stochastic features of E \o2t \lf] and (c) via specific randomness of the volatility process at which makes it latent i.e. o0f?® The volatility dynamics that follow from (1) and (2) are usually an AR(1) model for some nonlinear function of ot. Hence, the volatility process is assumed to be stationary and Markovian of order one but not necessarily linear AR(1) in at itself. This is 20 Kim and Shephard (1994), using data on weekly returns on the S&P500 Index , found that a t- GARCH model has an almost identical likelihood as the normal based SV model. This example shows that a specific randomness in at may produce the same level of marginal kurtosis as a heavy tailed student distribution of the white noise e.

Stochastic volatility 137 precisely what motivated Andersen (1994) to introduce the Stochastic Auto- regressive Variance or SARV class of models where at (or of) is a polynomial function g(Kt) of a Markov process Kt with the following dynamic specification: K, = w + pK,-i + [y + aK,.i]u, (2.4.6) where ut = ut — 1 is zero-mean white noise with unit variance. Andersen (1994) discusses sufficient regularity conditions which ensure stationarity and ergodicity for Kt. Without entering into the details, let us note that the fundamental non- causality Assumption 2.3.2A implies that the ut process in (2.4.6) does not Granger-cause et in (2.4.4). In fact, the non-causality condition suggests a slight modification of Andersen's (1994) definition. Namely, it suggests assuming et+\ independent of ut-j, j > 0 for the conditional probability distribution, given et-j, j > 0 rather than for the unconditional distribution. This modification does not invalidate Andersen's SARV class of models as the most general parametric statistical model studied so far in the volatility literature. The GARCH(1,1) model is straightforwardly obtained from (2.4.6) by letting Kt = aj,y = 0 and ut = ej. Note that the deterministic relationship ut — ej between the stochastic components of (2.4.4) and (2.4.6) emphasizes that, in GARCH models, there is no randomness specific to the volatility process. The Autoregressive Random Variance model popularized by Taylor (1986) also belongs to the SARV class. Here: log ov+i = £, + (j) log a, + r\t+x (2-4.7) where r\t+l is a white noise disturbance such that Cov(^+1,£(+i) ^ 0 to accommodate leverage effects. This is a SARV model with Kt = log at, a = 0 and tlt+i =yut+i.21 2.4.3. Identification of parameters Introducing a general class of processes for volatility, like the SARV class discussed in the previous section prompts questions regarding identification. Suppose again that yt+i = <Jt£t+\ o? = g(Kt), g£{\,2} (2.4.8) Kt = w + 0K,-i + [y + <jKt-i}u, . Andersen (1994), noted that the model is better interpreted by considering the zero-mean white noise process ut = ut — 1: K, = (w + y) + (« + 0)K,-i + {y + aKt_x)ut . (2.4.9) It is clear from the latter that it may be difficult to distinguish empirically the constant w from the "stochastic" constant yut. Similarly, the identification of the a and fi parameters separately is also problematic as (a + /?) governs the persis- 21 Andersen (1994) also shows that the SARV framework encompasses another type of random variance model that we have considered as ill-specified since it combines (2.4.2) and (2.4.3).

138 E. Ghysels, A. C. Harvey and E. Renault tence of shocks to volatility. These identification problems are usually resolved by imposing (arbitrary) restrictions on the pairs of parameters (w,y) and (a, /?). The GARCH(1,1) and Autoregressive Random Variance specifications assume that 7 = 0 and a = 0 respectively. Identification of all parameters without such restrictions generally requires additional constraints, for instance via some distributional assumptions on £t+\ and ut, which restrict the semi-parametric framework of (2.4.6) into a parametric statistical model. To address more rigorously the issue of identification, it is useful to consider, according to Andersen (1994), the following reparameterization (assuming for notational convenience that a ^ 0): (2.4.10) Hence equation (2.4.9) can be rewritten as: Kt=K + p{Kt.x -K) + (d + Kt-i)Ut where Ut — aut. It is clear from (2.4.10) that only three functions of the original parameters a, /?, y, w may be identified and that the three parameters K, p, 8 are identified from the first three unconditional moments of the process Kt for instance. To give to these identification results an empirical content, it is essential to know: (1) how to go from the moments of the observable process Yt to the moments of the volatility process at, and (2) how to go from the moments of the volatility process a, to the moments of the latent process Kt. The first point is easily solved by specifying the corresponding moments of the standardized innovation process e. If we assume for instance a Gaussian probability distribution, we obtain: = 2/n E(otot-j) (2.4.11) = a/^ E(of<T(_y) . The solution of the second point requires in general the specification of the mapping g and of the probability distribution of ut in (2.4.6). For the so-called Log-normal SARV model, it is assumed that a = 0 and Kt — log at (Taylor's autoregressive random variance model) and that ut is normally distributed (Log- normality of the volatility process). In this case, it is easy to show that: Ecr? = exp[«E^ + n2Var^/2] E(afa"t_j) = Eo?Eo?_Jexp[mnCov(Kt,Kt-J)] (2.4.12) Co\(Kt,K,-j) = pJVarKt . Without the normality assumption (i.e. QML, mixture of normal, Student distribution ...) this model will be studied in much more detail in sections 3 and 5

Stochastic volatility 139 from both probabilistic and statistical points of view. Moreover, this is a template for studying other specifications of the SARV class of models. In addition, various specifications will be considered in Section 4 as proxies of continuous time models. 3. Discrete time models The purpose of this section will be to discuss the statistical handling of discrete time SV models, using simple univariate cases. We start by defining the most basic SV model corresponding to the autoregressive random variance model discussed earlier in (2.4.7). We study its statistical properties in Section 3.2 and provide a comparison with ARCH models in Section 3.3. Section 3.4 is devoted to filtering, prediction and smoothing. Various extensions, including multivariate models, are covered in the last section. Estimation of the parameters governing the volatility process is discussed later in section 5. 3.1. The discrete time SV model The discrete time SV model may be written as yt = atet , t=l,...,T , (3.1.1) where yt denotes the demeaned return process yt = log (St/St-i) — n and log of follows an AR(1) process. It will be assumed that e, is a series of independent, identically distributed random disturbances. Usually et is specified to have a standard distribution so its variance of is known. Thus for a normal distribution of is unity while for a ^-distribution with v degrees of freedom it will be v/(v - 2). Following a convention often adopted in the literature we write ht = log of: yt = aete0-5h- (3.1.2) where o is a scale parameter, which removes the need for a constant term in the stationary first-order autoregressive process ht+l = j,ht + r,nr,t~IID(0,tf) , \4>\<l. (3.1.3) It was noted before that if et and r\t are allowed to be correlated with each other, the model can pick up the kind of asymmetric behavior which is often found in stock prices. Indeed a negative correlation between et and r\t induces a leverage effect. As in Section 2.4.1, the timing of the disturbance in (3.1.3) ensures that the observations are still a martingale difference, the equation being written in this way so as to tie in with the state space literature. It should be stressed that the above model is only an approximation to the continuous time models of Section 2 observed at discrete intervals. The accuracy of the approximation is examined in Dassios (1995) using Edgeworth expansions (see also Sections 4.1 and 4.3 for further discussion).

140 E. Ghysels, A. C. Harvey and E. Renault 3.2. Statistical properties The following properties of the SV model hold even if e, and r\t are contemporaneously correlated. Firstly, as noted, y, is a martingale difference. Secondly, stationarity of ht implies stationarity of yt. Thirdly, if rjt is normally distributed, it follows from the properties of the lognormal distribution that E[exp(aA/)] = exp(a2cr2/2), where a is a constant and a\ is the variance of ht. Hence, if e, has a finite variance, the variance of y, is given by Var(Jr) = cr2^exp(cr2/2) . (3.2.1) Similarly if the fourth moment of e, exists, the kurtosis of y, is Kexp(cr2), where k is the kurtosis of et, so y, exhibits more kurtosis than et. Finally all the odd moments are zero. For many purposes we need to consider the moments of powers of absolute values. Again, r\t is assumed to be normally distributed. Then for e, having a standard normal distribution, the following expressions are derived in Harvey (1993): E|ylr = ^y/2r(c^/2)exp(y«i) , c >-I , c^O (3.2.2) and r(i/2) Varl^r^^^exp^cr2' T(c/2 + 1/2)12> r(i/2) c > -0.5, c ^ 0 Note that T(l/2) =y/n and T(l) = 1. Corresponding expressions may be computed for other distributions of e, including Student's t and the General Error Distribution (see Nelson (1991)). Finally, the square of the coefficient of variation of of is often used as a measure of the relative strength of the SV process. This is Var(<72)/[E(c-2)]2 = exp(cr^) - 1. Jacquier, Poison and Rossi (1994) argue that this is more easily interpretable than cr2. In the empirical studies they quote it is rarely less than 0.1 or greater than 2. 3.2.1. Autocorrelation functions If we assume that the disturbances e, and r\t are mutually independent, and r\t is normal, the ACF of the absolute values of the observations raised to the power c is given by (e) _E(|yinyl-tn-{E(|yir)}2„, «p($«fa, j ~ 1 E(\yt)-{m<\C)}2 Kcexp(£cr2)-1 ' (3-2.3) t> 1 , c> -0.5 , c^O

Stochastic volatility 141 where kc is Kc = v{\yt)l{H\yt\c)}2 , (3-2.4) and ph , t = 0,1,2,... denotes the ACF of ht. Taylor (1986) gives this expression for c equal to one and two and et normally distributed. When c = 2, kc is the kurtosis and this is three for a normal distribution. More generally, Kc = r(c + i/2)r(i/2)/{r(c/2 +1/2)}2 , c ± 0 . For Student's /-distribution with v degrees of freedom: ^ r{c + i/2)r(-c + v/2)r(i/2)r(v/2) Kc ~~ {r(c/2 + l/2)r(-C/2 + v/2)}2 ' (3.2.5) \c\ < v/2 , c^O Note that v must be at least five if c is two. The ACF, p\, has the following features. First, if a\ is small and/or pA is close to one, (»ccexp(^-o^)-l) compare Taylor (1986, p. 74-5). Thus the shape of the ACF of ht is approximately carried over to p\c' except that it is multiplied by a factor of proportionality, which must be less than one for c positive as kc is greater than one. Secondly, for the /-distribution, kc declines as v goes to infinity. Thus p^ is a maximum for a normal distribution. On the other hand, a distribution with less kurtosis than the normal will give rise to higher values of pi°'. Although (3.2.6) gives an explicit relationship between pic' and c, it does not appear possible to make any general statements regarding p\c' being maximized for certain values of c. Indeed different values of a\ lead to different values of c maximizing pf . If a\ is chosen so as to give values of p[ of a similar size to those reported in Ding, Granger and Engle (1993) then the maximum appears to be attained for c slightly less than one. The shape of the curve relating p\c' to c is similar to the empirical relationships reported in Ding, Granger and Engle, as noted by Harvey (1993). 3.2.2. Logarithmic transformation Squaring the observations in (3.1.2) and taking logarithms gives log y] = log a2 + ht + log e2 . (3.2.7) Alternatively log yj = m + h, + Zt , (3.2.8)

142 E. Ghysels, A. C. Harvey and E. Renault where a> = log a2 + Elog e^,so that the disturbance £t has zero mean by construction. The mean and variance of log e1 are known to be -1.27 and 7t2/2 = 4.93 when et has a standard normal distribution; see Abramovitz and Stegun (1970). However, the distribution of log e1 is far from being normal, being heavily skewed with a long tail. More generally, if et has a ^-distribution with v degrees of freedom, it can be expressed as: where £t is a standard normal variate and Kt is independently distributed such that vk{ is chi-square with v degrees of freedom. Thus log e2 = log £ - log k, and again using results in Abramovitz and Stegun (1970), it follows that the mean and variance of log E2t are -1.27 -ij/(v/2) - log (v/2) and 4.93 + t^'(v/2) respectively, where »/^(-) is the digamma function. Note that the moments of £r exist even if the model is formulated in such a way that the distribution of et is Cauchy, that is v — 1. In fact in this case t,t is symmetric with excess kurtosis two, compared with excess kurtosis four when et is Gaussian. Since log e^ is serially independent, it is straightforward to work out the ACF of log yj for ht following any stationary process: P<0)=Pv/{l + ^} , t>1 . (3.2.9) The notation pi' reflects the fact that the ACF of a power of an absolute value of the observation is the same as that of the Box-Cox transform, that is {|jr|c-l}/c, and hence the logarithmic transform of an absolute value, raised to any (non-zero) power, corresponds to c = 0. (But note that one cannot simply set c = 0 in (3.2.3)). Note that even if r\t and et are not mutually independent, the r\t and t,t disturbances are uncorrected if the joint distribution of et and rjt is symmetric, that is f{£t,qt) — fi~£ti —fit)'' see Harvey, Ruiz and Shephard (1994). Hence the expression for the ACF in (3.2.9) remains valid. 3.3. Comparison with ARCH models The GARCH(1,1) model has been applied extensively to financial time series. The variance in (3.1.1) is assumed to depend on the variance and squared observation in the previous time period. Thus (x? = y-l-ay*.!+/»*?_, , t=l,...,T. (3.3.1) The GARCH model was proposed by Bollerslev (1986) and Taylor (1986), and is a generalization of the ARCH model formulated by Engle (1982). The

Stochastic volatility 143 ARCH(l) model is a special case of GARCH(1,1) with /? = 0. The motivation comes from forecasting; in an AR(1) model with independent disturbances, the optimal prediction of the next observation is a fraction of the current observation, and in ARCH(l) it is a fraction of the current squared observation (plus a constant). The reason is that the optimal forecast is constructed conditional on the current information and in an ARCH model the variance in the next period is assumed to be known. This construction leads directly to a likelihood function for the model once a distribution is assumed for et. Thus estimation of the parameters upon which of depends is straightforward in principle. The GARCH formulation introduces terms analogous to moving average terms in an ARMA model, thereby making forecasts a function of a distributed lag of past squared observations. It is straightforward to show that yt is a martingale difference with (unconditional) variance y/(l - a - /?). Thus a + /? < 1 is the condition for covariance stationarity. As shown in Bollerslev (1986), the condition under which the fourth moment exists in a Gaussian model is 2a2 + (a + fi)2 < 1. The model then exhibits excess kurtosis. However, the fourth moment condition may not always be satisfied in practice. Somewhat paradoxically, the conditions for strict stationarity are much weaker and, as shown by Nelson (1990), even include the case a + /J=l. The specification of GARCH(1,1) means that we can write y2 = y + ay2_x + jScr2,! + vt = y + (a + $)y)_x + vt - /to,_! where vt = y2 - of is a martingale difference. Thus y2 has the form of an ARMA(1,1) process and so its ACF can be evaluated in the same way. The ACF of the corresponding ARMA model seems to be indicative of the type of patterns likely to be observed in practice in correlograms of yj. The GARCH model extends by adding more lags of of and y2. However, GARCH(1,1) seems to be the most widely used. It displays similar properties to the SV model, particularly if cf> is close to one. This should be clear from (3.2.6) which has the pattern of an ARM A( 1,1) process. Clearly (f> plays a role similar to that of a + p. The main difference in the ACFs seems to show up most at lag one. Jacquier et al. (1994, p. 373) present a graph of the correlogram of the squared weekly returns of a portfolio on the New York Stock Exchange together with the ACFs implied by fitting SV and GARCH(1,1) models. In this case the ACF implied by the SV model is closer to the sample values. The SV model displays excess kurtosis even if <j> is zero since yt is a mixture of distributions. The a2 parameter governs the degree of mixing independently of the degree of smoothness of the variance evolution. This is not the case with a GARCH model where the degree of kurtosis is tied to the roots of the variance equation, a and /? in the case of GARCH(1,1). Hence, it is very often necessary to use a non-Gaussian GARCH model to capture the high kurtosis typically found in a financial time series. The basic GARCH model does not allow for the kind of asymmetry captured by a SV model with contemporaneously correlated disturbances, although it can

144 E. Ghysels, A. C. Harvey and E. Renault be modified as suggested in Engle and Ng (1993). The EGARCH model, proposed by Nelson (1991), handles asymmetry by taking log of to be a function of past squares and absolute values of the observations. 3.4. Filtering, smoothing and prediction For the purposes of pricing options, we need to be able to estimate and predict the variance, of, which of course, is proportional to the exponent of ht. An estimate based on all the observations up to, and possibly including, the one at time t is called a filtered estimate. On the other hand an estimate based on all the observations in the sample, including those which came after time t is called a smoothed estimate. Predictions are estimates of future values. As a matter of historical interest we may wish to examine the evolution of the variance over time by looking at the smoothed estimates. These might be compared with the volatilities implied by the corresponding options prices as discussed in Section 2.1.2. For pricing "at the money" options we may be able to simply use the filtered estimate at the end of the sample and the predictions of future values of the variance, as in the method suggested for ARCH models by Noh, Engle and Kane (1994). More generally, it may be necessary to base prices on the full distribution of future values of the variance, perhaps obtained by simulation techniques; for further discussion see Section 4.2. One can think of constructing filtered and smoothed estimates in a very simple, but arbitrary way, by taking functions (involving estimated parameters) of moving averages of transformed observations. Thus: £2 = A E wtjf(yt-j) 1 , t= l,.., t , (3.4.1) where r = 0 or 1 for a filtered estimate and r = t - T for a smoothed estimate. Since we have formulated a stochastic volatility model, the natural course of action is to use this as the basis for filtering, smoothing and prediction. For a linear and Gaussian time series model, the state space form can be used as the basis for optimal filtering and smoothing algorithms. Unfortunately, the SV model is nonlinear. This leaves us with three possibilities: a. compute inefficient estimates based on a linear state space model; b. use computer intensive techniques to estimate the optimal filter to a desired level of accuracy; c. use an (unspecified) ARCH model to approximate the optimal filter. We now turn to examine each of these in some detail. 3.4.1. Linear state space form The transformed observations, the log yfs, can be used to construct a linear state space model as suggested by Nelson (1988) and Harvey, Ruiz and Shephard (1994). The measurement equation is (3.2.8) while (3.1.3) is the transition equa-

Stochastic volatility 145 tion. The initial conditions for the state, ht, are given by its unconditional mean and variance, that is zero and <72/(l - 4>2) respectively. While it may be reasonable to assume that r\t is normal, £r would only be normal if the absolute value of et were lognormal. This is unlikely. Thus application of the Kalman filter and the associated smoothers yields estimators of the state, h,, which are only optimal within the class of estimators based on linear combinations of the log yfs. Furthermore, it is not the h'ts which are required, but rather their exponents. Suppose h,\T denotes the smoothed estimator obtained from the linear state space form. Then exp^y) is of the form (3.4.1), multiplied by an estimate of the scaling constant, a2. It can be written as a weighted geometric mean. This makes the estimates vulnerable to very small observations and is an indication of the limitations of this approach. Working with the logarithmic transformation raises an important practical issue, namely how to handle observations which are zero. This is a reflection of the point raised in the previous paragraph, since obviously any weighted geometric mean involving a zero observation will be zero. More generally we wish to avoid very small observations. One possible solution is to remove the sample mean. A somewhat more satisfactory alternative, suggested by Fuller, and studied by Breidt and Carriquiry (1995), is to make the following transformation based on a Taylor series expansion: log y] ~ log (y2 + cs2y) - cs2y/\y2 + cs2y) , t = 1, ■ ■ ■, T , (3.4.2) where s2 is the sample variance of the /ts and c is a small number, the suggested value being 0.02. The effect of this transformation is to reduce the kurtosis in the transformed observations by cutting down the long tail made up of the negative values obtained by taking the logarithms of the "inliers". In other words it is a form of trimming. It might be more satisfactory, to carry out this procedure after correcting the observations for heteroskedasticity by dividing by preliminary estimates, a2's. The log afs are then added to the transformed observations. The dt^s could be constructed from a first round or by using a totally different procedure, perhaps a nonparametric one. The linear state space form can be modified so as to deal with asymmetric models. It was noted earlier that even if r\t and e, are not mutually independent, the disturbances in the state space form are uncorrelated if the joint distribution of e, and Tj, is symmetric. Thus the above filtering and smoothing operations are still valid, but there is a loss of information stemming from the squaring of the observations. Harvey and Shephard (1993) show that this information may be recovered by conditioning on the signs of the observations denoted by st, a variable which takes the value + 1 (-1) when y, is positive (negative). These signs are, of course, the same as the signs of the er's. Let E+(E_) denote the expectation conditional on et being positive (negative), and assign a similar interpretation to variance and covariance operators. The distribution of £, is not affected by conditioning on the signs of the e/s, but, remembering that E(f/,|e,) is an odd function of e,,

146 E. Ghysels, A. C. Harvey and E. Renault V* = E+fe) = E+[Et]t\£t} = -E_(jj() , and f =Cov+(i/„&) = E+(i,,&) - E+(i/,)E(&) = E+(i/r&) = -Cov_(jj(,^) , because the expectation of £, is zero and E+(i/,&) = E+tE(i/,|eO log e,] - /i*E(log e,) = -E_(i/r&) . Finally Var+J?, - E+(i£) - (E+(i/r)]2 = ^ - ^*2 . The linear state space form is now log y* = w + ht + £t h,+\ = 4>h, + s,/i* + n* , ($MG)-(;uv))- The Kalman filter may still be initialized by taking ho to have mean zero and variance o^/(l — <t>2)- The parameterization in (3.4.3) does not directly involve a parameter representing the correlation between zt and r\t. The relationship between ff and y* and the original parameters in the model can only be obtained by making a distributional assumption about et as well as r\t. When e, and r\t are bivariate normal with Corr(e(, jj() = p, E(jj(|er) = ponet, and so /i* = E+fe) = pff,E+(er) - PSv^ = 0.7979^, . (3.4.4) Furthermore, y* = p«T,E(|e(| log e?) - 0.7979pcr^E(log e?) = 1.1061p<r, . (3.4.5) When e, has a ^-distribution, it can be written as Ct^7°'5, and £( and r\t can be regarded as having a bivariate normal distribution with correlation p, while Kt is independent of both. To evaluate /i* and y* one proceeds as before, except that the initial conditioning is on £, rather than on et, and the required expressions are found to be exactly as in the Gaussian case. The filtered estimate of the log volatility ht, written as ht+\\t, takes the form: ht+\\t = 4>ht\t-\ + -ir-*—TT (lo8 >7 ~ « ~ ht\t-\) + W > Pt\t-\ + iy st-\- a^ where pt\t~\ is the corresponding mean square error of the ht\t-\. If p < 0, then y* < 0, and the filtered estimator will behave in a similar way to the EGARCH

Stochastic volatility 147 model estimated by Nelson (1991), with negative observations causing bigger increases in the estimated log volatility than corresponding positive values. 3.4.2. Nonlinear filters In principle, an exact filter may be written down for the original (3.1.2) and (3.1.3), with the former taken as the measurement equation. Evaluating such a filter requires approximating a series of integrals by numerical methods. Kita- gawa (1987) has proposed a general method for implementing such a filter and Watanabe (1993) has applied it to the SV model. Unfortunately, it appears to be so time consuming as to render it impractical with current computer technology. As part of their Bayesian treatment of the model as a whole, Jacquier, Poison and Rossi (1994) show how it is possible to obtain smoothed estimates of the volatilities by simulation. What is required is the mean vector of the joint distribution of the volatilities conditional on the observations. However, because simulating this joint distribution is not a practical proposition, they decompose it into a set of univariate distributions in which each volatility is conditional on all the others. These distributions may be denoted p(et\o-t, y), where &-t denotes all the volatilities apart from at. What one would like to do is to sample from each of these distributions in turn, with the elements of <r_f set equal to their latest estimates, and repeat several thousand times. As such this is a Gibbs sampler. Unfortunately, there are difficulties. The Markov structure of the SV model may be exploited to write p(ot\<r-t,y) = p{°t\°t-\,°t+\,yt) « p{yt\ht)p{h,\ht-\)p{h,+i\h,) but although the right hand side of the above expression can be written down explicitly, the density is not of a standard form and there is no analytic expression for the normalizing constant. The solution adopted by Jacquier, Poison and Rossi is to employ a series of Metropolis accept/reject independence chains. Kim and Shephard (1994) argue that the single mover algorithm employed by Jacquier, Poison and Rossi will be slow if (f> is close to one and/or a1 is small. This is because at changes slowly; in fact when it is constant, the algorithm will not converge at all. Another approach based on the linear state space form, is to capture the non-normal disturbance term in the measurement equation, t,t, by a mixture of normals. Watanabe (1993) suggested an approximate method based on a mixture of two moments. Kim and Shephard (1994) propose a multimove sampler based on the linear state space form. Blocks of the h'ts are sampled, rather than taking them one at a time. The technique they use is based on mixing an appropriate number of normal distributions to get the required level of accuracy in approximating the disturbance in (3.2.7). Mahieu and Schotman (1994a) extend this approach by introducing more degrees of freedom in the mixture of normals where the parameters are estimated rather than fixed a priori. Note that the distribution of the o'ts can be obtained from the simulated distribution of the h'ts. Jacquier, Poison and Rossi (1994, p.416) argue that no matter how many mixture components are used in the Kim and Shephard method, the tail behavior of log z] can never be satisfactorily approximated. Indeed, they note that given

148 E. Ghysels, A. C. Harvey and E. Renault the discreteness of the Kim and Shephard state space, not all states can be visited in the small number of draws mentioned, i.e. the so called inlier problem (see also Section 3.4.1 and Nelson (1994)) is still present. As a final point it should be noted that when the hyperparameters are unknown, the simulated distribution of the state produced by the Bayesian approach allows for their sampling variability. 3.4.3. ARCH models as approximate filters The purpose here is to draw attention to a subject that will be discussed in greater detail in Section 4.3. In an ARCH model the conditional variance is assumed to be an exact function of past observations. As pointed out by Nelson and Foster (1994, p.32) this assumption is ad hoc on both economic and statistical grounds. However, because ARCH models are relatively easy to estimate, Nelson (1992) and Nelson and Foster (1994) have argued that a useful strategy is to regard them as niters which produce estimates of the conditional variance. Thus even if we believe we have a continuous time or discrete time SV model, we may decide to estimate a GARCH(1,1) model and treat the afs as an approximate filter, as in (3.4.1). Thus the estimate is a weighted average of past squared observations. It delivers an estimate of the mean of the distribution of <rj, conditional on the observations at time t—\. As an alternative, the model suggested by Taylor (1986) and Schwert (1989), in which the conditional standard deviation is set up as a linear combination of the previous conditional standard deviation and the previous absolute value, could be used. This may be more robust to outliers as it is a linear combination of past absolute values. Nelson and Foster derive an ARCH model which will give the closest approximation to the continuous time SV formulation (see Section 4.3 for more details). This does not correspond to one of the standard models, although it is fairly close to EGARCH. For discrete time SV models the filtering theory is not as extensively developed. Indeed, Nelson and Foster point out that a change from stochastic differential equations to difference equations makes a considerable difference in the limit theorems and optimality theory. They study the case of near diffusions as an example to illustrate these differences. 3.5. Extensions of the model 3.5.1. Persistence and seasonality The simplest nonstationary SV model has ht following a random walk. The dynamic properties of this model are easily obtained if we work in terms of the logarithmically transformed observations, log yj. All we have to do is first difference to give a stationary process. The untransformed observations are non- stationary but the dynamic structure of the model will appear in the ACF of \y,/y,-i\c, provided that c < 0.5. The model is an alternative to IGARCH, that is (3.3.1) with a. + ft = 1. The IGARCH model is such that the squared observations have some of the features of an integrated ARM A process and it is said to exhibit persistence; see Bollerslev

Stochastic volatility 149 and Engle (1993). However, its properties are not straightforward. For example it must contain a constant, y, otherwise, as Nelson (1990) has shown, of converges almost surely to zero and the model has the peculiar feature of being strictly stationary but not weakly stationary. The nonstationary SV model, on the other hand, can be analyzed on the basis that h, is a standard integrated process of order one . Filtering and smoothing can be carried out within the linear state space framework, since log y2 is just a random walk plus noise. The initial conditions are handled in the same way as is normally done with nonstationary structural time series models, with a proper prior for the state being effectively formed from the first observation; see Harvey (1989). The optimal filtered estimate of h, within the class of estimates which are linear in past log yj's, that is ht\t_\, is a constant plus an equally weighted moving average (EWMA) of past log yf's. In IGARCH a2 is given exactly by a constant plus an EWMA of past squared observations. The random walk volatility can be replaced by other nonstationary specifications. One possibility is the doubly integrated random walk in which A2h, is white noise. When formulated in continuous time, this model is equivalent to a cubic spline and is known to give a relatively smooth trend when applied in levels models. It is attractive in the SV context if the aim is to find a weighting function which fits a smoothly evolving variance. However, it may be less stable for prediction. Other nonstationary components can easily be brought into ht. For example, a seasonal or intra-daily component can be included; the specification is exactly as in the corresponding levels models discussed in Harvey (1989) and Harvey and Koopman (1993). Again the dynamic properties are given straightforwardly by the usual transformation applied to log y2, and it is not difficult to transform the absolute values suitably. Thus if the volatility consists of a random walk plus a slowly changing, nonstationary seasonal as in Harvey (1989, p. 40-3), the appropriate transformations are A, log y2 and | yt/yt-s \c where s is the number of seasons. The state space formulation follows along the lines of the corresponding structural time series models for levels. Handling such effects is not so easy within the GARCH framework. Different approaches to seasonality can also be incorporated in SV models using ideas of time deformation as discussed in a later sub-section. Such approaches may be particularly relevant when dealing with the kind of abrupt changes in seasonality which seem to occur in high frequency, like five minute or tick-by-tick, foreign exchange data. 3.5.2. Interventions and other deterministic effects Intervention variables are easily incorporated into SV models. For example, a sudden structural change in the volatility process can be captured by assuming that log a2 = log cr2 + h, + Aw,

150 E. Ghysels, A. C. Harvey and E. Renault where w, is zero before the break and one after, and A is an unknown parameter. The logarithmic transformation gives (3.2.8) but with kw, added to the right hand side. Care needs to be taken when incorporating such effects into ARCH models. For example, in the GARCH(1,1) a sudden break has to be modelled as (P-t = y + Aw, - (a + j?)M-i + aj'Li + P^-i with k constrained so that of is always positive. More generally observable explanatory variables, as opposed to intervention dummies, may enter into the model for the variance. 3.5.3. Multivariate models The multivariate model corresponding to (3.1.2) assumes that each series is generated by a model of the form yu = aieife0-5h' , t=l,...,T, (3.5.1) with the covariance (correlation) matrix of the vector et = (e\t,..., eNt)' being denoted by 2e. The vector of volatilities, ht, follows a VAR(l) process, that is ht+i — ®h, + t]t , where r\t ~ /ZD(0,E,,). This specification allows the movements in volatility to be correlated across different series via 2,. Interactions can be picked up by the off- diagonal elements of <P. The logarithmic transformation of squared observations leads to a multivariate linear state space model from which estimates of the volatilities can be computed as in Section 3.4.1. A simple nonstationary model is obtained by assuming that the volatilities follow a multivariate random walk, that is <P = /. If E, is singular, of rank K < N, there are only K components in volatility, that is each hit in (3.5.1) is a linear combination of K < N common trends, that is h, = 0h} + h (3.5.2) where h\ is the ^xl vector of common random walk volatilities, h is a vector of constants and 0 is an N x K matrix of factor loadings. Certain restrictions are needed on 0 and It to ensure identifiability; see Harvey, Ruiz and Shephard (1994). The logarithms of the squared observations are "co-integrated" in the sense of Engle and Granger (1987) since there are N — K linear combinations of them which are white noise and hence stationary. This implies, for example, that if two series of returns exhibit stochastic volatility, but this volatility is the same with ©' = (1,1), then the ratio of the series will have no stochastic volatility. The application of the related concept of "co-persistence" can be found in Bollerslev and Engle (1993). However, as in the univariate case there is some ambiguity about what actually constitutes persistence.

Stochastic volatility 151 There is no reason why the idea of common components in volatility should not extend to stationary models. The formulation of (3.5.2) would apply, without the need for \ and with h] modelled, for example, by a VAR(l). Bollerslev, Engle and Wooldridge (1988) show that a multivariate GARCH model can, in principle, be estimated by maximum likelihood, but because of the large number of parameters involved computational problems are often encountered unless restrictions are made. The multivariate SV model is much simpler than the general formulation of a multivariate GARCH. However, it is limited in that it does not model changing covariances. In this sense it is analogous to the restricted multivariate GARCH model of Bollerslev (1986) in which the conditional correlations are assumed to be constant. Harvey, Ruiz and Shephard (1994) apply the nonstationary model to four exchange rates and find just two common factors driving volatility. Another application is in Mahieu and Schotman (1994b). A completely different way of modelling exchange rate volatility is to be found in the latent factor ARCH model of Diebold and Nerlove (1989). 3.5.4. Observation intervals, aggregation and time deformation Suppose that a SV model is observed every 8 time periods. In this case, hz, where x denotes the new observation (sampling) interval, is still AR(1) but with parameter (f>d. The variance of the disturbance, qt, increases, but a\ remains the same. This property of the SV model makes it easy to make comparisons across different sampling intervals; for example it makes it clear why if (f> is around 0.98 for daily observations, a value of around 0.9 can be expected if an observation is made every week (assuming a week has 5 days). If averages of observations are observed over the longer period, the comparison is more complicated, as h% will now follow an ARMA(1,1) process. However, the AR parameter is still 4>d. Note that it is difficult to change the observation interval of ARCH processes unless the structure is weakened as in Drost and Nijman (1993); see also Section 4.4.1. Since, as noted in Section 2.4, one typically uses a discrete time approximation to the continuous time model, it is quite straightforward to handle irregularly spaced observations by using the linear state space form as described, for example, in Harvey (1989). Indeed the approach originally proposed by Clark (1973) based on subordinated processes to describe asset prices and their volatility fits quite well into this framework. The techniques for handling irregularly spaced observations can be used as the basis for dealing with time deformed observations, as noted by Stock (1988). Ghysels and Jasiak (1994a,b) suggest a SV model in which the operational time for the continuous time volatility equation is determined by the flow of information. Such time deformed processes may be particularly suited to dealing with high frequency data. If x = g{t) is the mapping between calendar time x and operational time t, then dS, = fJ,S,dt + o{g{t))StdWXt and

152 E. Ghysels, A. C. Harvey and E. Renault d\og <t(t) = a((b — log a(x))dx + cdW2x where W\t and W2x are standard, independent Wiener processes. The discrete time approximation generalizing (3.1.3), but including a term which in (3.1.2) is incorporated in the constant scale factor a, is then ht+1 = [1 - e-"^}b + e-"^ht + r,t where \g(t) is the change in operational time between two consecutive calendar time observations and nt is normally distributed with mean zero and variance c2(l - e-2aAgW)/2a. Clearly if Ag(t) = 1, 0 = e~a in (3.1.3). Since the flow of information, and hence Ag(t), is not directly observable, a mapping to calendar time must be specified to make the model operational. Ghysels and Jasiak (1994a) discuss several specifications revolving around a scaled exponential function relating g(t) to observables such as past volume of trade and past price changes with asymmetric leverage effects. This approach was also used by Ghysels and Jasiak (1994b) to model return-volume co-movements and by Ghysels, Gourieroux and Jasiak (1995b) for modeling intra-daily high frequency data which exhibit strong seasonal patterns (cf. Section 3.5.1). 3.5.5. Long memory Baillie, Bollerslev and Mikkelsen (1993) propose a way of extending the GARCH class to account for long memory. They call their models Fractionally Integrated GARCH (FIGARCH), and the key feature is the inclusion of the fractional difference operator, (1 - L) , where L is the lag operator, in the lag structure of past squared observations in the conditional variance equation. However, this model can only be stationary when d = 0 and it reduces to GARCH. In a later paper, Bollerslev and Mikkelsen (1995) consider a generalization of the EGARCH model of Nelson (1991) in which log of is modelled as a distributed lag of past e, 's involving the fractional difference operator. This FIEGARCH model is stationary and invertible if | d |< 0.5. Breidt, Crato and de Lima (1993) and Harvey (1993) propose a SV model with ht generated by fractional noise h, = ti,/(l-L)d , nt~NID(0,<fy , 0<d<l. (3.5.1) Like the AR(1) model in (3.1.3), this process reduces to white noise and a random walk at the boundary of the parameter space, that is d = 0 and 1 respectively. However, it is only stationary if d < 0.5. Thus the transition from stationarity to nonstationarity proceeds in a different way to the AR(1) model. As in the AR(1) case it is reasonable to constrain the autocorrelations in (3.5.1) to be positive. However, a negative value of d is quite legitimate and indeed differencing ht when it is nonstationary gives a stationary "intermediate memory" process in which -0.5 < d < 0. The properties of the long memory SV model can be obtained from the formulae in sub-Section 3.2. A comparison of the ACF for ht following a long

Stochastic volatility 153 memory process with d = 0.45 and a\ = 2 with the corresponding ACF when ht is AR(1) with (j> = 0.99 can be found in Harvey (1993). Recall that a characteristic property of long memory is a hyperbolic rate of decay for the autocorrelations instead of an exponential rate, a feature observed in the data (see Section 2.2e). The slower decline in the long memory model is very clear and, in fact, for t = 1000, the long memory autocorrelation is still 0.14, whereas in the AR case it is only 0.000013. The long memory shape closely matches that in Ding, Granger and Engle (1993, p. 86-8). The model may be extended by letting r\t be an ARMA process and/or by adding more components to the volatility equation. As regards smoothing and filtering, it has already been noted that the state space approach is approximate because of the truncation involved and is relatively cumbersome because of the length of the state vector. Exact smoothing and filtering, which is optimal within the class of estimators linear in the log y\ 's , can be carried out by a direct approach if one is prepared to construct and invert the T x T covariance matrix of the log y\ 's . 4. Continuous time models At the end of Section 2 we presented a framework for statistical modelling of SV in discrete time and devoted the entire Section 3 to specific discrete time SV models. To motivate the continuous time models we study first of all the exact relationship (i.e. without approximation error) between differential equations and SV models in discrete time. We examine this relationship in Section 4.1 via a class of statistical models which are closed under temporal aggregation and proceed (1) from high frequency discrete time to lower frequencies and (2) from continuous time to discrete time. Next, in Section 4.2, we study option pricing and hedging with continuous time models and elaborate on features such as the smile effect. The practical implementation of option pricing formulae with SV often requires discrete time SV and/or ARCH models as filters and forecasters of the continuous time volatility processes. Such filters, covered in Section 4.3, are in general discrete time approximations (and not exact discretizations as in Section 4.1) of continuous time SV models. Section 4.4 concludes with extensions of the basic model. 4.1. From discrete to continuous time The purpose of this section is to provide a rigorous discussion of the relationship between discrete and continuous time SV models. The presentation will proceed first with a discussion of temporal aggregation in the context of the SARV class of models and focus on specific cases including GARCH models. This material is covered in Section 4.1.1. Next we turn our attention to the aggregation of continuous time SV models to yield discrete time representations. This is the subject matter of Section 4.1.2.

154 E. Ghysels, A. C. Harvey and E. Renault 4.1.1. Temporal aggregation of discrete time models Andersen's SARV class of models was presented in Section 2.4 as a general discrete time parametric SV statistical model. Let us consider the zero-mean case, namely: y,+\ = a,£t+i (4.1.1) and of for q = 1 or 2 is a polynomial function g(Kt) of the Markov process Kt with stationary autoregressive representation: Kt = co + pKt^ + vt (4.1.2) where |/?| < 1 and E[e(+i|eT,uTT<f]=0 (4.1.3a) E[e^+l\eT,v,x<t} =1 (4.1.3b) E[u(+i|eT,uTT<f] = 0 . (4.1.3c) The restrictions (4.1.3a-c) imply that v is a martingale difference sequence with respect to the filtration Jf= <r[eT,uT,T < t].22 Moreover, the conditional moment conditions in (4.3.1a-c) also imply that e in (4.1.1) is a wliite noise process in a semi-strong sense, i.e. E[e(+i|eT,T < t] = 0 and E[e2+1|eT,T < t\ = 1, and is not Granger-caused by u.23 From the very beginning of Section 2 we choose the continuously compounded rate of return over a particular time horizon as the starting point for continuous time processes. Therefore, let yt+\ in (4.1.1) be the continuously compounded rate of return for [t, t + 1] of the asset price process Sh consequently: j(+i=log St+l/St . (4.1.4) Since the unit of time of the sampling interval is to a large extent arbitrary, we would surely want the SV model defined by equations (4.1.1) through (4.1.3), (for given q and function g) to be closed under temporal aggregation. As rates of return are flow variables, closure under temporal aggregation means that for any integer m: -l ytn = log StmlStm-m = ^ ytm~k k=0 is again conformable to a model of the type (4.1.1) through (4.1.3) for the same choice of q and g involving suitably adapted parameter values. The analysis in this section follows Meddahi and Renault (1995) who study temporal aggregation of SV models in detail, particularly the case of = Kt, i.e. q = 2 and g is the identity 22 Note that we do not use here the decomposition appearing in (2.4.9) namely, u, = [y + aK,-i]ut. 23 The Granger noncausality considered here for e, is weaker than Assumption 2.3.2.A as it applies only to the first two conditional moments.

Stochastic volatility 155 function. It is related to the so called continuous time GARCH approach of Drost and Werker (1994). Hence, we have (4.1.1) with: o? = © + /&»?_!+o, (4.1.5) With conditional moment restrictions (4.1.3a-c) this model is closed under aggregation. For instance, for m = 2: with: where: v(2) _ v , , v _ J2) .(2) yt+\ - yt+i +yt — <rt-\£t+\ (^)W> + ^>(^2+,£> = 2©(1+/?) = (/?+l)D?0,_2 + V,-i] . Moreover, it also worth noting that whenever a leverage effect is present at the aggregate level, i.e.: Cov ,(2) J2) ^o with ef\ = (j^-i + yt-2)/of\, it necessarily appears at the disaggregate level, i.e. Cov(o,, et) ± 0. For the general case Meddahi and Renault (1995) show that model (4.1.5) together with conditional moment restrictions (4.1.3a-c) is a class of processes closed under aggregation. Given this result, it is of interest to draw a comparison with the work of Drost and Nijman (1993) on temporal aggregation of GARCH. While establishing this link between Meddahi and Renault (1995) and Drost and Nijman (1993) we also uncover issues of leverage properties in GARCH models. Indeed, contrary to what is often believed, we find leverage effect restrictions in GARCH processes. Moreover, we also find from the results of Meddahi and Renault that the class of weak GARCH processes includes certain SV models. To find a class of GARCH processes which is closed under aggregation Drost and Nijman (1993) weakened the definition of GARCH, namely for a positive stationary process a,: a] = w + ay]_x + ba]_x (4.1.6) where a + b < 1, they defined: - strong GARCH if yt+\/at is i.i.d. with mean zero and variance 1

156 E. Ghysels, A. C. Harvey and E. Renault - semi-strong GARCH if E {yt+i\y%,x < t] = 0 and E[y2+1\yz,x < t] = a2 - weak GARCH if EL[yt+l \y%, y2,x < t] = 0; EL [y2+l \y%, y2, x < t] = a2.24 Drost and Nijman show that weak GARCH processes temporally aggregate and provide explicit formulae for their coefficients. In Section 2.4 it was noted that the framework of SARV includes GARCH processes whenever there is no randomness specific to the volatility process. This property will allow us to show that the class of weak GARCH processes - as defined above - in fact includes more general SV processes which are strictly speaking not GARCH. The arguments, following Meddahi and Renault (1995), require a classification of the models defined by (4.1.3) and (4.1.5) according to the value of the correlation between ut and yj, namely: (a) Models with perfect correlation: This first class, henceforth denoted C\, is characterized by a linear correlation between ut and yj conditional on (eT, \>%,x < t) which is either 1 or -1 for the model in (4.1.5). (b) Models without perfect correlation: This second class, henceforth denoted C2, has the above conditional correlation less than one in absolute value. The class C\ contains all semi-strong GARCH processes, indeed whenever Var[y^|£f,\>%,x < t] is proportional to Var[uf|£T, vz,x < t] in C\ we have a semi- strong GARCH. Consequently, a semi-strong GARCH processes is a model (4.1.5) with (1) restrictions (4.1.3), (2) a perfect conditional correlation as in C\, and (3) restrictions on the conditional kurtosis dynamics.25 Let us consider now the following assumption: Assumption 4.1.1. The following two conditional expectations are zero: E[etvt\e%,vz,x<t] = 0 (4.1.7a) E[e3t\e%,v%,x<t] = 0 . (4.1.7ft) This assumption amounts to an absence of leverage effects, where the latter is defined in a conditional covariance sense to capture the notion of instantaneous causality discussed in Section 2.4.1 and applied here in the context of weak white noise.26 It should also be noted that (4.1.7a) and (4.1.7b) are in general not equivalent except for the processes of class C\. The class C2 allows for randomness proper to the volatility process due to the imperfect correlation. Yet, despite this volatility-specific randomness one can 24 For any Hilbert space H of L2, EL[x,|z, z e H] is the best linear predictor of x, in terms of 1 and z e H. It should be noted that a strong GARCH process is a fortiori semi-strong which itself is also a weak GARCH process. 25 In fact, Nelson and Foster (1994) observed that the most commonly used ARCH models effectively assume that the variance of the variance rises linearly in of, which is the main drawback of ARCH models in approximating SV models in continuous time (see also Section 4.3). 26 The conditional expectation (4.1.7b) can be viewed as a conditional covariance between e, and e2. It is this conditional covariance which, if nonzero, produces leverage effects in GARCH.

Stochastic volatility 157 show that under Assumption 4.1.1 processes of C2 satisfy the weak GARCH definition. A fortiori, any SV model conformable to (4.1.3a-c), (4.1.5), (4.7.1a-b) and Assumption 4.1.1 is a weak GARCH process. It is indeed the symmetry assumptions (4.1.7a-b), or restrictions on leverage in GARCH, that make EL[j^+1|}'.c,}^,t < f] = of (together with the conditional moment restrictions (4.1.3a-c)) and yield the internal consistency for temporal aggregation found by Drost and Nijman (1993, example 2, p. 915) for the class of so called symmetric weak GARCH(1,1). Hence, this class of weak GARCH( 1,1) processes can be viewed as a subclass of processes satisfying (4.1.3) and (4.1.5).27 4.1.2. Temporal aggregation of continuous time models To facilitate our discussion we will specialize the general continuous time model (2.3.1) to processes with zero drift, i.e.: d\ogSt = atdWt (4.1.8a) da, = ytdt + btdWat (4.1.8b) Cov (dWh dWt) = Ptdt (4.1.8c) where the stochastic processes at,yt,8t and pt are If = [az;x < t] adapted. To ensure that at is a nonnegative process one typically follows either one of two strategies: (1) considering a diffusion for log of or (2) describing of as a CEV process (or Constant Elasticity of Variance process following Cox (1975) and Cox and Ross (1976)).28 The former is frequently encountered in the option pricing literature (see e.g. Wiggins (1987)) and is also clearly related to Nelson (1991), who introduced EGARCH, and to the log-Normal SV model of Taylor (1986). The second class of CEV processes can be written as da) = k(6 - a))dt + y(a2t)&dW? (4.1.9) where 8 < 1/2 ensures that of is a stationary process with nonnegative values. Equation (4.1.9) can be viewed as the continuous time analogue of the discrete time SARV class of models presented in Section 2.4. This observation establishes links with the discussion of the previous Section 4.1.1 and yields exact discretization results of continuous time SV models. Here, as in the previous section, it will be tempting to draw comparisons with the GARCH class of models, in particular the diffusions proposed by Drost and Werker (1994) in line with the temporal aggregation of weak GARCH processes. 27 As noted before, the class of processes satisfying (4.1.3) and (4.1.5) is closed under temporal aggregation, including processes with leverage effects not satisfying Assumption 4.1.1. 28 Occasionally one encounters specifications which do not ensure nonnegativity of the ot process. For the sake of computational simplicity some authors for instance have considered Ornstein-Uh- lenbeck processes for a, or of (see e.g. Stein and Stein (1991)).

158 E. Ghysels, A. C. Harvey and E. Renault Firstly, one should note that the CEV process in (4.1.9) implies an auto- regressive model in discrete time for a\ , namely: aj+At = 0(1 - e~kA<) + e-^a] + e~kAl J ek^y{<j2rfdW°u . (4.1.10) Meddahi and Renault (1995) show that whenever (4.1.9) and its discretization (4.1.10) govern volatility, the discrete time process log St+(k+i)At/St+kAt,k G Z is a SV process satisfying the model restrictions (4.1.3a-c) and (4.1.5). Hence, from the diffusion (4.1.9) we obtain the class of discrete time SV models which is closed under temporal aggregation, as discussed in the previous section. To be more specific, consider for instance At = 1 , then from (4.1.10) it follows that: yt+\ = log St+i/S, = <7,(1)£(+i (^)2=-+K'<(1))2+* (4'U1) where from (4.1.10): p = e-k,w = e(l-e-k), / x * (4.1.12) It is important to note from (4.1.12) that absence of leverage effect in continuous time, i.e. p, = 0 in (4.1.8c), means no such effect at low frequencies and the two symmetry conditions of Assumption 4.1.1 are fulfilled. This line of reasoning also explains the temporal aggregation result of Drost and Werker (1994), but one more generally can interpret discrete time SV models with leverage effects as exact discretizations of continuous time SV models with leverage. 4.2. Option pricing and hedging Section 4.2.1 is devoted to the basic option pricing model with SV, namely the Hull and White model of Section 2. We are better equipped now to elaborate on its theoretical foundations. The practical implications appear in Section 4.2.2 while 4.2.3 concludes with some extensions of the basic model. 4.2.1. The basic option pricing formula Consider again formula (2.1.10) for a European option contract maturing at time t + h = T. As noted in Section 2.1.2, we assume continuous and frictionless trading. Moreover no arbitrage profits can be made from trading in the underlying asset and riskless bonds ; interest rates are nonstochastic so that B(t, T) defined by (2.1.12) denotes the time t price of a unit discount bond maturing at time T. Consider now the probability space (Q ,J*,P), which is the fundamental space of the underlying asset price process S:

Stochastic volatility 159 dS,jSt = n(t,S„ U,)dt + <jtdWf a2 = f(Ut) (4-2-1) dUt = a(t, U,)dt + b(t, Ut)dWta where Wt = (Wf, Wf) is a standard two dimensional Brownian Motion (Wf and Wf are independent, zero-mean and unit variance) defined on (Q ,#,.P). The function /, called the volatility function, is assumed to be one-to-one. In this framework (under suitable regularity conditions) the no free lunch assumption is equivalent to the existence of a probability distribution Q on (Q,#"), equivalent to P, under which discounted price processes are martingales (see Harrison and Kreps (1979)). Such a probability is called an equivalent martingale measure and is unique if and only if the markets are complete (see Harrison and Pliska (1981)).29 From the integral form of martingale representations (see Karatzas and Shreve (1988), p. 184), the (positive) density process of any probability measure Q equivalent to P can be written as: [^dWsu-\j\xsu)2du [Kdw:-\j\x:fdu ft 1 rt Mt = exp )o LJo (4.2.2) where the processes Xs and X" are adapted to the natural filtration ot = a[Wz,x <t],t> 0, and satisfy the integrability conditions (almost surely): I (Xsufdu < + oo and / (X°)2du < +oo . Jo Jo ~ ~ ~ ' By Girsanov's theorem the process W = (Ws, W") defined by: Wf =Wf + I Xsudu and Wta = Wt" + f Xaudu (4.2.3) Jo Jo is a two dimensional Brownian Motion under Q. The dynamic of the underlying asset price under Q is obtained directly from (4.2.1) and (4.2.3). Moreover, the discounted asset price process StB(0, t), 0 < t < T, is a g-martingale if and only if for rt defined in (2.1.11): xs^{t,St,Ut)-rt Since S is the only traded asset, the process X" is not fixed. The process Xs defined by (4.2.4) is called the asset risk premium. By analogy, any process X" satisfying the required integrability condition can be viewed as a volatility risk 29 Here, the market is seen as incomplete (before taking into account the market pricing of the option) so that we have to characterize a set of equivalent martingale measures.

160 E. Ghysels, A. C. Harvey and E. Renault premium and for any choice of X" , the probability Q(X") defined by the density process M in (4.2.2) is an equivalent martingale measure. Therefore, given the volatility risk premium process Xa: Cf = B(t, 7,)Ep(r) [Max[0, ST - K]] , 0<t<T (4.2.5) is an admissible price process of the European call option.30 The Hull and White option pricing model relies on the following assumption, which restricts the set of equivalent martingale measures: Assumption 4.2.1. The volatility risk premium Xat only depends on the current value of the volatility process: X"t = Xa(t, Ut),Vt e [0, T}. This assumption is consistent with an intertemporal equilibrium model where the agent preferences are described by time separable isoelastic utility functions (see He (1993) and Pham and Touzi (1993)). It ensures that Ws and W" are independent, so that the Q(X") distribution of log ST/St, conditionally on if and the volatility path (ah0 <t<T) is normal with mean Jt rudu — \y2(t,T) and variance y2(t, T) = Jt o\du. Under Assumption 4.2.1 one can compute the expectation in (4.2.5) conditionally on the volatility path, and obtain finally: Cf = StT$(n[<$>{du) - e-X'Md*)} (4.2.6) with the same notation as in (2.1.20). To conclude it is worth noting that many option pricing formulae available in the literature have a feature common with (4.2.6) as they can be expressed as an expectation of the Black-Scholes price over a heterogeneous distribution of the volatility parameter (see Renault (1995) for an elaborate discussion on this subject). 4.2.2. Pricing and hedging with the Hull and White model The Markov feature of the process (S, a) implies that the option price (4.2.6) only depends on the contemporaneous values of the underlying asset prices and its volatility. Moreover, under mild regularity conditions, this function is differ- entiable. Therefore, a natural way to solve the hedging problem in this stochastic volatility context is to hedge a given option of price C\ by A* units of the underlying asset and £)* units of any other option of price Cj where the hedging ratios solve: r dc}/ast - a; - e; dcj/ast = 0 \dCl/dat-J2*tdCf/dat = 0 . Such a procedure, known as the delta-sigma hedging strategy, has been studied by Scott (1991). By showing that any European option completes the market, i.e. dCf/dot + 0, 0 < t < T, Bajeux and Rochet (1992) justify the existence of an 30 Here elsewhere Ep(-) = Ee(-|#",) stands for the conditional expectation operator given J5", when the price dynamics are governed by Q.

Stochastic volatility 161 unique solution to the delta-sigma hedging problem (4.2.7) and the implicit assumption in the previous sections that the available information It contains the past values (St, at), x < t. In practice, option traders often focus on the risk due to the underlying asset price variations and consider the imperfect hedging strategy Y,t = 0 and A, = dC}/dSt. Then, the Hull and White option pricing formula (4.2.6) provides directly the theoretical value of A,: At = dC?/dSt = E?{n4>(du) ■ (4.2.8) This theoretical value is hard to use in practice since: (1) even if we knew the Q(X") conditional probability distribution of d\t given It (summarized by at), the derivation of the expectation (4.2.8) might be computationally demanding and (2) the conditional probability is directly related to the conditional probability distribution of y2(t, T) = J a\du given at, which in turn may involve nontrivially the parameters of the latent process at. Moreover, these parameters are those of the conditional probability distribution of y2(t, T) given at under the risk-neutral probability Q{X") which is generally different from the Data Generating Process P. The statistical inference issues are therefore quite complicated. We will argue in Section 5 that only tools like simulation-based inference methods involving both asset and option prices (via an option pricing model) may provide some satisfactory solutions. Nevertheless, a practical way to avoid these complications is to use the Black- Scholes option pricing model, even though it is known to be misspecified. Indeed, option traders know that they cannot generally obtain sufficiently accurate option prices and hedge ratios by using the BS formula with historical estimates of the volatility parameters based on time series of the underlying asset price. However, the concept of Black-Scholes implied volatility (2.1.23) is known to improve the pricing and hedging properties of the BS model. This raises two issues: (1) what is the internal consistency of the simultaneous use of the BS model (which assumes constant volatility) and of BS implied volatility which is clearly time-varying and stochastic and (2) how to exploit the panel structure of option pricing errors?31 Concerning the first issue, we noted in Section 2 that the Hull and White option pricing model can indeed be seen as a theoretical foundation for this practice of pricing. Hedging issues and the panel structure of option pricing errors are studied in detail in Renault and Touzi (1992) and Renault (1995). 4.2.3. Smile or smirk? As noted in Section 2.2, the smile effect is now a well documented empirical stylized fact. Moreover the smile becomes sometimes a smirk since it appears more or less lopsided (the so called skewness effect). We cautioned in Section 2 that some explanations of the smile/smirk effect are often founded on tempting analogies rather than rigorous proofs. 31 The value of a which equates the BS formula to the observed market price of the option heavily depends on the actual date t, the strike price K, the time to maturity (T - i) and therefore creates a panel data structure.

162 E. Ghysels, A. C. Harvey and E. Renault To the best of our knowledge, the state of the art is the following: (i) the first formal proof that a Hull and White option pricing formula implies a symmetric smile was provided by Renault and Touzi (1992), (ii) the first complete proof that the smile/smirk effects can alternatively be explained by liquidity problems (the upper parts of the smile curve, i.e. the most expensive options are the least liquid) was provided by Platten and Schweizer (1994) using a micro structure model, (iii) there is no formal proof that asymmetries of the probability distribution of the underlying asset price process (leverage effect, non-normality,...) are able to capture the observed skewness of the smile. A different attempt to explain the observed skewness is provided by Renault (1995). He showed that a slight discrepancy between the underlying asset price St used to infer BS implied volatilities and the stock price St considered by option traders may generate an empirically plausible skewness in the smile. Such nonsynchronous St and St may be related to various issues: bid-ask spreads, non-synchronous trading between the two markets, forecasting strategies based on the leverage effect, etc. Finally, to conclude it is also worth noting that a new approach initiated by Gourieroux, Monfort, Tenreiro (1994) and followed also by Ait-Sahalia, Bickel, Stoker (1994) is to explain the BS implied volatility using a nonparametric function of some observed state variables. Gourieroux, Monfort, Tenreiro (1995) obtain for example a good nonparametric fit of the following form: at(St,K) = a(K)+b(K)(log St/St.xf . A classical smile effect is directly observed on the intercept a(K) but an inverse smile effect appears for the path-dependent effect parameter b(K). For American options a different nonparametric approach is pursued by Broadie, Detemple, Ghysels and Torres (1995) where, besides volatility, exercise boundaries for the option contracts are also obtained.32 4.3. Filtering and discrete time approximations In Section 3.4.3 it was noted that the ARCH class of models could be viewed as filters to extract the (continuous time) conditional variance process from discrete time data. Several papers were devoted to the subject, namely Nelson (1990, 1992, 1995a,b) and Nelson and Foster (1994, 1995). It was one of Nelson's seminal contributions to bring together ARCH and continuous time SV. Nelson's first contribution in his 1990 paper was to show that ARCH models, which model volatility as functions of past (squared) returns, converge weakly to a diffusion process, either a diffusion for log aj or a CEV process as described in Section 4.1.2. In particular, it was shown that a GARCH(1,1) model observed at finer and finer time intervals At = h with conditional variance parameters a>h = hco, ah = a(h/2)1'2 and Ph = 1 - a.(h/2)x'2-Qh and conditional mean 32 See also Bossaerts and Hillion (1995) for the use of a nonparametric hedging procedure and the smile effect.

Stochastic volatility 163 Hh = hcaj converges to a diffusion limit quite similar to equations (4.1.8a) combined with (4.1.9) with <5 = 1, namely d logS; = cajdt + OtdWt d a2 = (co - 9a2) dt + a\dW°t . Similarly, it was also shown that a sequence of AR(1)-EGARCH(1,1) models converges weakly to an Ornstein-Uhlenbeck diffusion for In a2: d In a) = u(P- In a2)dt + dW? . Hence, these basic insights showed that the continuous time stochastic difference equations emerging as diffusion limits of ARCH models were no longer ARCH but instead SV models. Moreover, following Nelson (1992), even when misspecified, ARCH models still kept desirable properties regarding extracting the continuous time volatility. The argument was that for a wide variety of misspecified ARCH models the difference between the ARCH filter volatility estimates and the true underlying diffusion volatilities converges to zero in probability as the length of the sampling time interval goes to zero at an appropriate rate. For instance the GARCH(1,1) model with cot,, ctf, and fih described before estimates a2 as follows: oo i=o where yt = log St/St_h- This filter can be viewed as a particular case of equation (3.4.1). The GARCH(1,1) and many other models, effectively achieve consistent estimation of at via a lag polynomial function of past squared returns close to time t. The fact that a wide variety of misspecified ARCH models consistently extract at from high frequency data raises questions regarding efficiency of niters. The answers to such questions are provided in Nelson (1995a,b) and Nelson and Foster (1994, 1995). In Section 3.4 it was noted that the linear state space Kalman filter can also be viewed as a (suboptimal) extraction filter for at. Nelson and Foster (1994) show that the asymptotically optimal linear Kalman filter has asymptotic variance for the normalized estimation error /i_1/4[ln(6f) -ln<r2] equal to lY(l/2)^2 where Y(x) = d[\nr(x)]/dx and X is a scaling factor. A model, closely related to EGARCH of the following form: H°2t+h) = H^) + pKSt+h - st)a;x +i(i - p^'^ni/if^rwif^-s^ - 2-v*] yields the asymptotically optimal ARCH filter with asymptotic variance for the normalized estimation error equal to l[2(l — p2)] where the parameter p measures the leverage effect. These results also show that the differences between

164 E. Ghysels, A. C. Harvey and E. Renault the most efficient suboptimal Kalman filter and the optimal ARCH filter can be quite substantial. Besides filtering one must also deal with smoothing and forecasting. Both of these issues were discussed in Section 3.4 for discrete time SV models. The prediction properties of (misspecified) ARCH models were studied extensively by Nelson and Foster (1995). Nelson (1995) takes ARCH models a step further by studying smoothing filters, i.e. ARCH models involving not only lagged squared returns but also future realizations, i.e. r = t—T in equation (3.4.1). 4.4. Long memory We conclude this section with a brief discussion of long memory in continuous time SV models. The purpose is to build continuous time long memory stochastic volatility models which are relevant for high frequency financial data and for (long term) option pricing. The reasons motivating the use of long memory models were discussed in sections 2.2 and 3.5.5. The advantage of considering continuous time long memory is their relative ability to provide a more structural interpretation of the parameters governing short term and long term dynamics. The first subsection defines fractional Brownian Motion. Next we will turn our attention to the fractional SV model followed by a section on filtering and discrete time approximations. 4.4.1. Stochastic integration with respect to fractional Brownian Motion We recall in this subsection a few definitions and properties of fractional and long memory processes in continuous time, extensively studied for instance in Comte and Renault (1993). Consider the scalar process: x,= f a(t-s)dWs . (4.4.1) Jo Such a process is asymptotically equivalent in quadratic mean to the stationary process: yt= [ a{t-s)dWs (4.4.2) J — oo whenever j^00 a2(x)dx < +oo. Such processes are called fractional processes if a(x) =xaa(x)/r(l + a)for |a| < 1/2, a continuously differentiable on [0,7] and where r(l + a) is a scaling factor useful for normalizing fractional derivative operators on [0,T\. Such processes admit several representations, and in particular they can also be written: xt= [ c{t-s)dWas, Wat= [ rf~*\dWs (4.4.3) Jo Jo / (1 + «) where Wa is the so-called fractional Brownian Motion of order a (see Mandelbrot and Van Ness (1968)).

Stochastic volatility 165 The relation between the functions a and c is one-to-one. One can show that Wa is not a semi-martingale (see e.g. Rogers (1995)) but stochastic integration with respect to Wa can be defined properly. The processes xt are long memory if: lim xa(x) = a^, 0 < a < 1/2 and 0 < ax < +oo , (4.4.4) x—>+oo v ' for instance, dxt = -kxtdt + adWat xt = 0, k > 0 , 0 < a < 1/2 (4.4.5) with its solution given by: xt = [ (t - s)a(r(l + a))"1^ (4.4.6a) Jo x\a) = f e-W-tadW, . (4.4.6b) Jo Note that, xy the derivative of order a of xr, is a solution of the usual SDE: dzt = —kztdt + adWt- 4.4.2. The fractional SV model To facilitate comparison with both the FIEGARCH model and the fractional extensions of the log-Normal SV model discussed in Section 3.5.5 let us consider the following fractional SV model (henceforth FSV): dSt/St = otdWt (4.4.7a) d log at = -klog atdt + ydWat (4.4.7b) where k > 0 and 0 < a. < 1/2. If nonzero, the fractional exponent a will provide some degree of freedom in the order of regularity of the volatility process, namely the greater a the smoother the path of the volatility process. If we denote the autocovariance function of a by ra{-) then: a > 0 =>• (ra(h) - ra(0))/h -> 0 as A -> 0 . This would be incorrectly interpreted as near-integrated behavior, widely found in high frequency data for instance, when: re{h) - ra(0)/h = (ph - \)/h -f log p as h -► 0 , and at is a continuous time AR(1) with correlation p near 1. The long memory continuous time approach allows us to model persistence with the following features:(l) the volatility process itself (and not just its logarithm) has hyperbolic decay of the correlogram ; (2) the persistence of volatility shocks yields leptokurtic features for returns which vanishes with temporal

166 E. Ghysels, A. C. Harvey and E. Renault aggregation at a slow hyperbolic rate of decay.33 Indeed for rate of return on [0,h]: Epog St+h/ S, - E(log St+h/St)}4 (Epog St+h/St - E(log St+h/St)}2) as h —» oo at a rate /j2*-1 if a e [0,1/2] and a rate exp(-M/2) if a = 0. 4.4.J. Filtering and discrete time approximations The volatility process dynamics are described by the solution to the SDE (4.4.5), namely: log <jt = f (t - s)x/r( 1 + a)d log <jW (4.4.6) Jo where log a^ follows the O-U process: d log <j{ta) = -kloga^dt + ydW, . (4.4.7) To compute a discrete time approximation one must evaluate numerically the integral (4.4.6) using only values of the process log a^ on a discrete partition of [o,t] at points j/nj = 0,1..., [nt].34 A natural way to proceed is to use step functions, generating the following proxy process: M logo? = £(* - (/ - l)/»)7r(l + «)Alog<T$ (4.4.8) where Alog^ = log^"),-logff(^_1)/n. Comte and Renault (1995) show that log <Tnt converges to the log at process for n —> oo uniformly on compact sets. Moreover, by rearranging (4.4.8) one obtains: log o% (4.4.9) los fyn = E([(« + iT-n/n«r(i + «))4 L >'=o where Ln is the lag operator corresponding to the sampling scheme j/n, i.e. L„Zj/„ =Zq_i)/„. With this sampling scheme loga^ is a discrete time AR(1) deduced from the continuous time process with the following representation: (l-pnLn)\oga^n = uJ/n (4.4.10) where pn = exp(-£/n) and uj/„ is the associated innovations process. Since the process J < 0): process is stationary we are allowed to write (assuming logOj) = iij/„ = 0 for 33 With usual GARCH or SV models, it vanishes at an exponential rate (see Drost and Nijman (1993) and Drost and Werker (1994) for these issues in the short memory case). 34 [z] is the integer k such that k < z < k + 1.

Stochastic volatility 167 .(») kg*£ (1-/>„!„)«;/„ (4.4.11) which gives a parameterization of the volatility dynamics in two parts: (1) a long memory part which corresponds to the filter Y^oSi>L'„/na with a, = [(i + \)a~ia]/r(\ + a) and (2) a short memory part which is characterized by the AR(1) process: (1 - pnLn)~lUj/n. Indeed, one can show that the long memory filter is "long-term equivalent" to the usual discrete time long memory filters (1 - L)~a in the sense that there is a long term relationship (a cointegration relation) between the two types of processes. However, this long-term equivalence between the long-memory filter and the usual discrete time one (1 - L)~a does not imply that the standard parametrization FARIMA(l,a,0) is well-suited in our framework. Indeed, one can show that the usual discrete time filter (1 — L)"a introduces some mixing between long and short term characteristics whereas the parsimonious continuous time model doesn't.35 This feature clearly puts the continuous time FSV at an advantage with regard to the discrete time SV and GARCH long-memory models. 5. Statistical inference Evaluating the likelihood function of ARCH models is a relatively straightforward task. In sharp contrast for SV models it is impossible to obtain explicit expressions for the likelihood function. This is a generic feature common to almost all nonlinear latent variable models. The lack of estimation procedures for SV models made them for a long time an unattractive class of models in comparison to ARCH. In recent years, however, remarkable progress has been made regarding the estimation of nonlinear latent variable models in general and SV models in particular. A flurry of methods are now available and are up and running on computers with ever increasing CPU performance. The early attempts to estimate SV models used a GMM procedure. A prominent example is Melino and Turnbull (1990). Section 5.1 is devoted to GMM estimation in the context of SV models. Obviously, GMM is not designed to handle continuous time diffusions as it requires discrete time processes satisfying certain regularity conditions. A continuous time GMM approach, developed by Hansen and Scheinkman (1994), involves moment conditions directly drawn from the continuous time representation of the process. This approach is discussed in Section 5.3. In between, namely in Section 5.2, we discuss the QML approach suggested by Harvey, Ruiz and Shephard (1994) and Nelson (1988). It relies on the fact that the nonlinear (Gaussian) SV model can be transformed into a linear non-Gaussian state space model as in Section 3, and from this a Gaussian quasi-likelihood can be computed. None of the methods covered in Sections 5.1 through 5.3 involve simulation. However, increased computer power has made simulation-based es- Namely, (1 -Z„)"logo^, is not an AR(1) process.

168 E. Ghysels, A. C. Harvey and E. Renault timation techniques increasingly popular. The simulated method of moments, or simulation-based GMM approach proposed by Duffie and Singleton (1993), is a first example which is covered in Section 5.4. Next we discuss the indirect inference approach of Gourieroux, Monfort and Renault (1993) and the moment matching methods of Gallant and Tauchen (1994) in Section 5.5. Finally, Section 5.6 covers a very large class of estimators using computer intensive Markov Chain Monte Carlo methods applied in the context of SV models by Jacquier, Poison and Rossi (1994) and Kim and Shephard (1994), and simulation based ML estimation proposed in Danielsson (1994) and Danielsson and Richard (1993). In each section we will only try to limit our focus to the use of estimation procedures in the context of SV models and avoid details regarding econometric theory. Some useful references to complement the material which will be covered are (1) Hansen (1992), Gallant and White (1988), Hall (1993) and Ogaki (1993) for GMM estimation, (2) Gourieroux and Monfort (1993b) and Wooldridge (1994) for QMLE, (3) Gourieroux and Monfort (1995) and Tauchen (1995) for simulation based econometric methods including indirect inference and moment matching, and finally (4) Geweke (1995) and Shephard (1995) for Markov Chain Monte Carlo methods. 5.7. Generalized method of moments Let us consider the simple version of the discrete time SV as presented in equations (3.1.2) and (3.1.3) with the additional assumption of normality for the probability distribution of the innovation process (et,nt). This log-normal SV model has been the subject of at least two extensive Monte Carlo studies on GMM estimation of SV models. They were conducted by Andersen and Serensen (1993) and Jacquier, Poison and Rossi (1994). The main idea is to exploit the stationary and ergodic properties of the SV model which yield the convergence of sample moments to their unconditional expectations. For instance, the second and fourth moments are simple expressions of a2 and a\, namely <72exp(<7|/2) and 3(74exp(2<7|) respectively. If these moments are computed in the sample, o\ can be estimated directly from the sample kurtosis, k, which is the ratio of the fourth moment to the second moment squared. The expression is just a\ = log(ic/3). The parameter a2 can then be estimated from the second moment by substituting in this estimate of o\. We might also compute the first-order autocovariance of y\, or simply the sample mean of y\y\^\ which has expectation <r4exp({l + 4>}<r\) and from which, given the estimate of a2 and o\ , it is straightforward to get an estimate of 4>. The above procedure is an example of the application of the method of moments. In general terms, m moments are computed. For a sample of size T, let gr(P) denote the m x 1 vector of differences between each sample moment and its theoretical expression in terms of the model parameters fi. The generalized method of moments (GMM) estimator is constructed by minimizing the criterion function fiT = Arg min gT(PJWTgT(P) P

Stochastic volatility 169 where WT is an m x m weighting matrix reflecting the importance given to matching each of the moments. When et and t\t are mutually independent, Jac- quier, Poison and Rossi (1994) suggest using 24 moments. The first four are given by (3.2.2) for c = 1,2,3,4, while the analytic expression for the others is: E[|y^T|] = |^2e[r0 + i)] /Aexp(j<Tl[l + <P]) , c= 1,2 , x-- ... In the more general case when et and r\t are correlated, Melino and Turnbull (1990) included estimates of: E[| yt \ yt^],x = 0, ±1, ±2,..., 10. They presented an explicit expression in the case of x = 1 and showed that its sign is entirely determined by p. The GMM method may also be extended to handle a non-normal distribution for et. The required analytic expressions can be obtained as in Section 3.2. On the other hand, the analytic expression of unconditional moments presented in Section 2.4 for the general SARV model may provide the basis of GMM estimation in more general settings (see Andersen (1994)). From the very start we expect the GMM estimator not to be efficient. The question is how much inefficiency should be tolerated in exchange for its relative simplicity. The generic setup of GMM leaves unspecified the number of moment conditions, except for the minimal number required for identification, as well as the explicit choice of moments. Moreover, the computation of the weighting matrix is also an issue since many options exist in practice. The extensive Monte Carlo studies of Andersen and Sorensen (1993) and Jacquier, Poison and Rossi (1994) attempted to answer these outstanding questions. In general they find that GMM is a fairly inefficient procedure primarily stemming from the stylized fact, noted in Section 2.2, that <f> in equation (3.1.3) is quite close to unity in most empirical findings because volatility is highly persistent. For parameter values of 0 close to unity convergence to unconditional moments is extremely slow suggesting that only large samples can rescue the situation. The Monte Carlo study of Andersen and S0rensen (1993) provides some guidance on how to control the extent of the inefficiency, notably by keeping the number of moment conditions small. They also provide specific recommendations for the choice of weighting matrix estimators with data-dependent bandwidth using the Bartlett kernel. 5.2. Quasi maximum likelihood estimation 5.2.1. The basic model Consider the linear state space model described in sub-Section 3.4.1, in which (3.2.8) is the measurement equation and (3.1.3) is the transition equation. The 36 A simple way to derive these moment conditions is via a two-step approach similar in spirit to (2.4.8) and (2.4.9) or (3.2.3).

170 E. Ghysels, A. C. Harvey and E. Renault QML estimators of the parameters <f>, a2 and the variance of £„ a2, are obtained by treating ^ and r\t as though they were normal and maximizing the prediction error decomposition form of the likelihood obtained via the Kalman filter. As noted in Harvey, Ruiz and Shephard (1994), the quasi maximum likelihood (QML) estimators are asymptotically normal with covariance matrix given by applying the theory in Dunsmuir (1979, p. 502). This assumes that t]t and & have finite fourth moments and that the parameters are not on the boundary of the parameter space. The parameter co can be estimated at the same time as the other parameters. Alternatively, it can be estimated as the mean of the log yf's, since this is asymptotically equivalent when 0 is less than one in absolute value. Application of the QML method does not require the assumption of a specific distribution for et. We will refer to this as unrestricted QML. However, if a distribution is assumed, it is no longer necessary to estimate a2, as it is known, and an estimate of the scale factor, a2, can be obtained from the estimate of m. Alternatively, it can be obtained as suggested in sub-Section 3.4.1. If unrestricted QML estimation is carried out, a value of the parameter determining a particular distribution within a class may be inferred from the estimated variance of £,. For example in the case of the Student's t,v may be determined from the knowledge that the theoretical value of the variance of £, is 4.93 + ij/'(v/2) (where *P(-) is the digamma function introduced in Section 3.2.2). 5.2.2. Asymmetric model In an asymmetric model, QML may be based on the modified state space form in (3.4.3). The parameters a2, o2v 0, ju*, and y* can be estimated via the Kalman filter without any distributional assumptions, apart from the existence of fourth moments of r\t and £t and the joint symmetry of ^ and r\t. However, if an estimate of p is wanted it is necessary to make distributional assumptions about the disturbances, leading to formulae like (3.4.4) and (3.4.5). These formulae can be used to set up an optimization with respect to the original parameters a2,a2^> and p. This has the advantage that the constraint \p\ < 1 can be imposed. Note that any ^-distribution gives the same relationship between the parameters, so within this class it is not necessary to specify the degrees of freedom. Using the QML method with both the original disturbances assumed to be Gaussian, Harvey and Shephard (1993) estimate a model for the CRSP daily returns on a value weighted US market index for 3rd July 1962 to 31st December 1987. These data were used in the paper by Nelson (1991) to illustrate his EGARCH model. The empirical results indicate a very high negative correlation. 5.2.3. QML in the frequency domain For a long memory SV model, QML estimation in the time domain becomes relatively less attractive because the state space form (SSF) can only be used by expressing h, as an autoregressive or moving average process and truncating at a suitably high lag. Thus the approach is cumbersome, though the initial state covariance matrix is easily constructed, and the truncation does not affect the

Stochastic volatility 171 asymptotic properties of the estimators. If the autoregressive approximation, and therefore the SSF, is not used, time domain QML requires the repeated construction and inversion of the T x T covariance matrix of the log yj's; see Sowell (1992). On the other hand, QML estimation in the frequency domain is no more difficult than it is in the AR(1) case. Cheung and Diebold (1994) present simulation evidence which suggests that although time domain estimation is more efficient in small samples, the difference is less marked when a mean has to be estimated. The frequency domain (quasi) log-likelihood function is, neglecting constants, logL = -^loggj - nY,I{Xj)lgj (5.2.1) where I(2.j) is the sample spectrum of the log yfs and gj is the spectral generating function (SGF), which for (3.5.1) is gj = (T>[2(l-cos).j)]-d + ff1( . Note that the summation in (5.2.1) is from j = 1 rather than j = 0. This is because go cannot be evaluated for positive d . However, the omission of the zero frequency does remove the mean. The unknown parameters are <tjj, <7^ and d, but <P^ may be concentrated out of the likelihood function by a reparameterisation in which a* is replaced by the signal-noise ratio q = a^/a^. On the other hand if a distribution is assumed for st, then a^ is known. Breidt, Crato and de Lima (1993) show the consistency of the QML estimator. When d lies between 0.5 and one, ht is nonstationary, but differencing the log yj 's yields a zero mean stationary process, the SGF of which is gj = o*[2(l - cos^-)]1^ + 2(1 - cos kj) a\ . One of the attractions of long memory models is that inference is not affected by the kind of unit root issues which arise with autoregressions. Thus a likelihood based test of the hypothesis that d = 1 against the alternative that it is less than one can be constructed using standard theory; see Robinson (1993). 5.2.4. Comparison of GMM and QML Simulation evidence on the finite sample performance of GMM and QML can be found in Andersen and Sorensen (1993), Ruiz (1994), Jacquier, Poison and Rossi (1994), Breidt and Carriquiry (1995), Andersen and Sorensen (1996) and Harvey and Shephard (1996). The general conclusion seems to be that QML gives estimates with a smaller MSE when the volatility is relatively strong as reflected in a high coefficient of variation. This is because the normally distributed volatility component in the measurement equation, (3.2.8), is large relative to the non- normal error term. With a lower coefficient of variation, GMM dominates. However, in this case Jacquier, Poison and Rossi (1994, p. 383) observe that "... the performance of both the QML and GMM estimators deteriorates rapidly." In

172 E. Ghysels, A. C. Harvey and E. Renault other words the case for one of the more computer intensive methods outlined in Section 5.6 becomes stronger. Other things being equal, an AR coefficient, 4>, close to one tends to favor QML because the autocorrelations are slow to die out and are hence captured less well by the moments used in GMM. For the same reason, GMM is likely to be rather poor in estimating a long memory model. The attraction of QML is that it is very easy to implement and it extends easily to more general models, for example nonstationary and multivariate ones. At the same time, it provides filtered and smoothed estimates of the state, and predictions. The one-step ahead prediction errors can also be used to construct diagnostics, such as the Box-Ljung statistic, though in evaluating such tests it must be remembered that the observations are non-normal. Thus even if the hyperpara- meters are eventually estimated by another method, QML may have a valuable role to play in finding a suitable model specification. 5.3. Continuous time GMM Hansen and Scheinkman (1995) propose to estimate continuous time diffusions using a GMM procedure specifically tailored for such processes. In Section 5.1 we discussed estimation of SV models which are either explicitly formulated as discrete time processes or else are discretizations of the continuous time diffusions. In both cases inference is based on minimizing the difference between unconditional moments and their sample equivalent. For continuous time processes Hansen and Scheinkman (1995) draw directly upon the diffusion rather than its discretization to formulate moment conditions. To describe the generic setup of the method they proposed let us consider the following (multivariate) system of n diffusion equations: dyt = n{yt]6)dt + o{yt;6)dWt . (5.3.1) A comparison with the notation in Section 2 immediately draws attention to certain limitations of the setup. First, the functions ne{-) = n(-;9) and ag{-) = a{-\0) are parameterized by yt only which restricts the state variable process Ut in Section 2 to contemporaneous values of yt. The diffusion in (5.3.1) involves a general vector process yt, hence yt could include a volatility process to accommodate SV models. Yet, the yt vector is assumed observable. For the moment we will leave these issues aside, but return to them at the end of the section. Hansen and Scheinkman (1995) consider the infinitesimal operator A defined for a class of square integrable functions q>: W —> U as follows: AeHy) = "pM ATi^^^W) ■ (53"2) Because the operator is defined as a limit, namely: Aecp{y) = \imCl[fc.{q>{yt)\y0 = y) - y] ,

Stochastic volatility 173 it does not necessarily exist for all square integrable functions (p but only for a restricted domain D. A set of moment conditions can now be obtained for this class of functions (p E D. Indeed, as shown for instance by Revuz and Yor (1991), the following equalities hold: EAe<p(yt) = 0 , (5.3.3) E[Ag(p(yt+l)(p(yt) - (p(yt+i)A*g(p{yt)] = 0 , (5.3.4) where A*e is the adjoint infinitesimal operator of Ag for the scalar product associated with the invariant measure of the process y?1 By choosing an appropriate set of functions, Hansen and Scheinkman exploit moment conditions (5.3.3) and (5.3.4) to construct a GMM estimator of 6. The choice of the function cp e D and q> € D* determines what moments of the data are used to estimate the parameters. This obviously raises questions regarding the choice of functions to enhance efficiency of the estimator but first and foremost also the identification of 9 via the conditions (5.3.3) and (5.3.4). It was noted in the beginning of the section that the multivariate process yt, in order to cover SV models, must somehow include the latent conditional variance process. Gourieroux and Monfort (1994, 1995) point out that since the moment conditions based on cp and q> cannot include any latent process it will often (but not always) be impossible to attain identification of all the parameters, particularly those governing the latent volatility process. A possible remedy is to augment the model with observations indirectly related to the latent volatility process, in a sense making it observable. One possible candidate would be to include in yt both the security price and the Black-Scholes implied volatilities obtained through option market quotations for the underlying asset. This approach is in fact suggested by Pastorello, Renault and Touzi (1993) although not in the context of continuous time GMM but instead using indirect inference methods which will be discussed in Section 5.5.38 Another possibility is to rely on the time deformation representation of SV models as discussed in the context of continuous time GMM by Conley et al. (1995). 5.4. Simulated method of moments The estimation procedures discussed so far do not involve any simulation techniques. From now on we cover methods combining simulation and estimation beginning with the simulated method of moments (SMM) estimator, which is covered by Duffie and Singleton (1993) for time series processes.39 In Section 5.1 37 Please note that A% is again associated with a domain D* so that q> 6 D and <j> 6 D* in (5.3.4). 38 It was noted in section 2.1.3 that implied volatilities are biased. The indirect inference procedures used by Pastorello, Renault and Touzi (1993) can cope with such biases, as will be explained in section 5.5. The use of option price data is further discussed in section 5.7. 39 SMM was originally proposed for cross-section applications, see Pakes and Pollard (1989) and McFadden (1989). See also Gourieroux and Monfort (1993a).

174 E. Ghysels, A. C. Harvey and E. Renault we noted that GMM estimation of SV models is based on minimizing the distance between a set of chosen sample moments and unconditional population moments expressed as analytical functions of the model parameters. Suppose now that such analytical expressions are hard to obtain. This is particularly the case when such expressions involve marginalizations with respect to a latent process such a stochastic volatility process. Could we then simulate data from the model for a particular value of the parameters and match moments from the simulated data with sample moments as a substitute? This strategy is precisely what SMM is all about. Indeed, quite often it is fairly straightforward to simulate processes and therefore take advantage of the SMM procedure. Let us consider again as point of reference and illustration the (multivariate) diffusion of the previous section (equation (5.3.1)) and conduct H simulations i = l,...,H using a discretization: A#(0) = ji(#(0); 0) + o(yt(6); B)et and i = 1,... ,H and t = 1,..., T where yt(6) are simulated given a parameter 9 and et is i.i.d. Gaussian.40 Subject to identification and other regularity conditions one then considers 9HT=ATgr^n\\f(yl,...yT)-j-J2f{y\(e),...,yiT(e))\\ with a suitable choice of norm, i.e. weighting matrix for the quadratic form as in GMM, and function / of the data, i.e. moment conditions. The asymptotic distribution theory is quite similar to that of GMM, except that simulation introduces an extra source of random error affecting the efficiency of the SMM estimator in comparison to its GMM counterpart. The efficiency loss can be controlled by the choice of H.41 5.5. Indirect inference and moment matching The key insight of the indirect inference approach of Gourieroux, Monfort and Renault (1993) and the moment matching approach of Gallant and Tauchen (1994) is the introduction of an auxiliary model parameterized by a vector, say /?, in order to estimate the model of interest. In our case the latter is the SV model.42 In the first subsection we will describe the general principle while the second will focus exclusively on estimating diffusions. 5.5.7. The principle We noted at the beginning of Section 5 that ARCH type models are relatively easy to estimate in comparison to SV models. For this reason an ARCH type model 40 We discuss in detail the simulation techniques in the next section. Indeed, to control for the discretization bias, one has to simulate with a finer sampling interval. 41 The asymptotic variance of the SMM estimator depends on H through a factor(l +H~l), see e.g. Gourieroux and Monfort (1995). 42 It is worth noting that the simulation based inference methods we will describe here are applicable to many other types of models for cross-sectional, time series and panel data.

Stochastic volatility 175 may be a possible candidate as an auxiliary model. An alternative strategy would be to try to summarize the features of the data via a SNP density as developed by Gallant and Tauchen (1989). This empirical SNP density, or more specifically its score, could also fulfill the role of auxiliary model. Other possibilities could be considered as well. The idea is then to use the auxiliary model to estimate /?, so that: T PT = Arg max V log/*(* I y,-ij) (5-5.1) where we restrict our attention here to a simple dynamic model with one lag for the purpose of illustration. The objective function f* in (5.5.1) can be a pseudo- likelihood function when the auxiliary model is deliberately misspecified to facilitate estimation. As an alternative /* can be taken from the class of SNP densities.43 Gourieroux, Monfort and Renault then propose to estimate the same parameter vector fi not using the actual sample data but instead using samples {y't(®)},-i smlulated i = 1, ...H times drawn from the model of interest given 6. This yields a new estimator of /?, namely: Pm{e) = Arg max(l//O25>gr$(0) I tf-iW./O • (5-5-2) fi 1=1 «=i The next step is to minimize a quadratic distance using a weighting matrix WT to choose an indirect estimator of 6 based on H simulation replications and a sample of T observations, namely: 0HT = Arg min(j8r - jM0)) Vr(j8r - jM0)) (5.5.3) The approach of Gallant and Tauchen (1994) avoids the step of estimating Pht(@) by computing the score function of f* and minimizing a quadratic distance similar to (5.5.3) but involving the score function evaluated at fiT and replacing the sample data by simulated series generated by the model of interest. Under suitable regularity conditions the estimator GHT is root T consistent and asymptotically normal. As with GMM and SMM there is again an optimal weighting matrix. The resulting asymptotic covariance matrix depends on the number of simulations in the same way the SMM estimator depends on H. Gourieroux, Monfort and Renault (1993) illustrated the use of indirect inference estimator with a simple example that we would like to briefly discuss here. Typically AR models are easy to estimate while MA models require more elaborate procedures. Suppose the model of interest is a moving average model of order one with parameter 9. Instead of estimating the MA parameter directly from the data they propose to estimate an AR(p) model involving the parameter 43 The discussion should not leave the impression that the auxiliary model can only be estimated via ML-type estimators. Any root T consistent asymptotically normal estimation procedure may be used.

176 E. Ghysels, A. C. Harvey and E. Renault vector /?. The next step then consists of simulating data using the MA model and proceeding further as described above.44 They found that the indirect inference estimator for 9HT appeared to have better finite sample properties than the more traditional maximum likelihood estimators for the MA parameter. In fact the indirect inference estimator exhibited features similar to the median unbiased estimator proposed by Andrews (1993). These properties were confirmed and clarified by Gourieroux, Renault and Touzi (1994) who studied the second order asymptotic expansion of indirect inference estimators and their ability to reduce finite sample bias. 5.5.2. Estimating diffusions Let us consider the same diffusion equation as in Section 5.3 which dealt with continuous time GMM, namely: dy, = n(yt; &)dt + o{yt; 6)dWt . (5.5.4) In Section 5.3 we noted that the above equation holds under certain restrictions such as the functions \i and a being restricted to yt as arguments. While these restrictions were binding for the setup of Section 5.3 this will not be the case for the estimation procedures discussed here. Indeed, equation (5.5.4) is only used as an illustrative example. The diffusion is then simulated either via exact discretizations or some type of approximate discretization (e.g. Euler or Mil'shtein, see Pardoux and Talay (1985) or Kloeden and Platten (1992) for further details). More precisely we define the process y) ' such that: j&o, - $ + *{&<>)* + °{yfhey,24U ■ (5-5-5) Under suitable regularity conditions (see for instance Strook and Varadhan (1979)) we know that the diffusion admits a unique solution (in distribution) and the process y) ' converges to yt as 5 goes to zero. Therefore one can expect to simulate yt quite accurately for S sufficiently small. The auxiliary model may be a discretization of (5.5.4) choosing 5 = 1. Hence, one formulates a ML estimator based on the nonlinear AR model appearing in (5.5.5) setting d = 1. To control for the discretization bias one can simulate the underlying diffusion with 8 = 1/10 or 1/20, for instance, and aggregate the simulated data to correspond with the sampling frequency of the DGP. Broze, Scaillet and Zako'ian (1994) discuss the effect of the simulation step size on the asymptotic distribution. The use of simulation-based inference methods becomes particularly appropriate and attractive when diffusions involve latent processes, such as is the case 44 Again one could use a score principle here, following Gallant and Tauchen (1994). In fact in a linear Gaussian setting the SNP approach to fit data generated by a MA (1) model would be to estimate an AR(p) model. Ghysels, Khalaf and Vodounou (1994) provide a more detailed discussion of score-based and indirect inference estimators of MA models as well as their relation with more standard estimators.

Stochastic volatility 177 with SV models. Gourieroux and Monfort (1994, 1995) discuss several examples and study their performance via Monte Carlo simulation. It should be noted that estimating the diffusion at a coarser discretization is not the only possible choice of auxiliary model. Indeed, Pastorello, Renault and Touzi (1993), Engle and Lee (1994) and Gallant and Tauchen (1994) suggest the use of ARCH-type models. There have been several successful applications of these methods to financial time series. They include Broze et al. (1995), Engle and Lee (1994), Gallant, Hsieh and Tauchen (1994), Gallant and Tauchen (1994, 1995), Ghysels, Gourieroux and Jasiak (1995b), Ghysels and Jasiak (1994a,b), Pastorello et al. (1993), among others. 5.6. Likelihood-based and Bayesian methods In a Gaussian linear state space model the likelihood function is constructed from the one step ahead prediction errors. This prediction error decomposition form of the likelihood is used as the criterion function in QML, but of course it is not the exact likelihood in this case. The exact filter proposed by Watanabe (1993) will, in principle, yield the exact likelihood. However, as was noted in Section 3.4.2, because this filter uses numerical integration, it takes a long time to compute and if numerical optimization is to be carried out with respect to the hyperparameters it becomes impractical. Kim and Shephard (1994) work with the linear state space form used in QML but approximate the log(x2) distribution of the measurement error by a mixture of normals. For each of these normals, a prediction error decomposition likelihood function can be computed. A simulated EM algorithm is used to find the best mixture and hence calculate approximate ML estimates of the hyperpar- amaters. The exact likelihood function can also be constructed as a mixture of distributions for the observations conditional on the volatilities, that is L{y-4>,c1vG1) = J p{y\h)p{h)dh where y and h contain the T elements of yt and ht respectively. This expression can be written in terms of the a2 's, rather than their logarithms, the ht 's, but it makes little difference to what follows. Of course the problem is that the above likelihood has no closed form, so it must be calculated by some kind of simulation method. Excellent discussions can be found in Shephard (1995) and in Jacquier, Poison and Rossi (1994), including the comments. Conceptually, the simplest approach is to use Monte Carlo integration by drawing from the unconditional distribution of h for given values of the parameters,(0, a2, a2), and estimating the likelihood as the average of the p(y\h) 's. This is then repeated, searching over 0, a2 until the maximum of the simulated likelihood is found. As it stands this procedure is not very satisfactory, but it may be improved by using ideas of importance sampling. This has been implemented for ML estimation of SV

178 E. Ghysels, A. C. Harvey and E. Renault models by Danielsson and Richard (1993) and Danielsson (1994). However, the method becomes more difficult as the sample size increases. A more promising way of attacking likelihood estimation by simulation techniques is to use Markov Chain Monte Carlo (MCMC) to draw from the distribution of volatilities conditional on the observations. Ways in which this can be done were outlined in sub-Section 3.4.2 on nonlinear filters and smoothers. Kim and Shephard (1994) suggest a method of computing ML estimators by putting their multimove algorithm within a simulated EM algorithm. Jacquier, Poison and Rossi (1994) adopt a Bayesian approach in which the specification of the model has a hierarchical structure in which a prior distribution for the hy- perparameters, q> = (<x,,,<^,<x)', joins the conditional distributions, y\h and h\q>. (Actually the at 's are used rather than the ht 's). The joint posterior of h and q> is proportional to the product of these three distributions, that is p(h, q>\y) oc p(y\h)p{h\(p)p{q>). The introduction of h makes the statistical treatment tractable and is an example of what is called data augmentation; see Tanner and Wong (1987). From the joint posterior, p(h,q>\y), the marginal p{h\y) solves the smoothing problem for the unobserved volatilities, taking account of the sampling variability in the hyperparameters. Conditional on h, the posterior of q>, p(q>\h, y) is simple to compute from standard Bayesian treatment of linear models. If it were also possible to sample directly from p(h\q>, y) at low cost, it would be straightforward to construct a Markov chain by alternating back and forth drawing from p(q>\h,y) and p(h\q>,y). This would produce a cyclic chain, a special case of which is the Gibbs sampler. However, as was noted in sub- Section 3.4.2, Jacquier, Poison and Rossi (1994) show that it is much better to decompose p{h\<p,y) into a set of univariate distributions in which each ht, or rather ct, is conditioned on all the others. The prior distribution for co, the parameters of the volatility process in JPR (1994), is the standard conjugate prior for the linear model, a (truncated) Normal- Gamma. The priors can be made extremely diffuse while remaining proper. JPR conduct an extensive sampling experiment to document the performance of this and more traditional approaches. Simulating stochastic volatility series, they compare the sampling performances of the posterior mean with that of the QML and GMM point estimates. The MCMC posterior mean exhibit root mean squared errors anywhere between half and a quarter of the size of the GMM and QML point estimates. Even more striking are the volatility smoothing performance results. The root mean squared error of the posterior mean of h, produced by the Bayesian filter is 10% smaller than the point estimate produced by an approximate Kalman filter supplied with.the true parameters. Shephard and Kim in their comment of JPR (1994) point out that for very high (j> and small an, the rate of convergence of the JPR algorithm will slow down. More draws will then be required to obtain the same amount of information. They propose to approximate the volatility disturbance with a discrete mixture of normals. The benefit of the method is that a draw of the vector h is then possible, faster than T draws from each h,. However this is at the cost that the draws navigate in a much higher dimensional space due to the discretisation effected.

Stochastic volatility 179 Also, the convergence of chains based upon discrete mixtures is sensitive to the number of components and their assigned probability weights. Mahieu and Schotman (1994) add some generality to the Shephard and Kim idea by letting the data produce estimates of the characteristics of the discretized state space (probabilities, mean and variance). The original implementation of the JPR algorithm was limited to a very basic model of stochastic volatility, AR(1) with uncorrected mean and volatility disturbances. In a univariate setup, correlated disturbances are likely to be important for stock returns, i.e., the so called leverage effect. The evidence in Gallant, Rossi, and Tauchen (1994) also points at non normal conditional errors with both skewness and kurtosis. Jacquier, Poison, and Rossi (1995a) show how the hierarchical framework allows the convenient extension of the MCMC algorithm to more general models. Namely, they estimate univariate stochastic volatility models with correlated disturbances, and skewed and fat-tailed variance disturbance, as well as multivariate models. Alternatively, the MCMC algorithm can be extended to a factor structure. The factors exhibit stochastic volatility and can be observable or non-observable. 5.7. Inference and option price data Some of the continuous time SV models currently found in the literature were developed to answer questions regarding derivative security pricing. Given this rather explicit link between derivates and SV diffusions it is perhaps somewhat surprising that relatively little attention has been paid to the use of option price data to estimate continuous time diffusions. Melino (1994) in his survey in fact notes: "Clearly, information about the stochastic properties of an asset's price is contained both in the history of the asset's price and the price of any options written on it. Current strategies for combining these two sources of information, including implicit estimation, are uncomfortably ad hoc. Statistically speaking, we need to model the source of the prediction errors in option pricing and to relate the distribution of these errors to the stock price process". For example implicit estimation, like computation of BS implied volatilities, is certainly uncomfortably ad hoc from a statistical point of view. In general, each observed option price introduces one source of prediction error when compared to a pricing model. The challenge is to model the joint nondegenerate probability distribution of options and asset prices via a number of unobserved state variables. This approach has been pursued in a number of recent papers, including Christensen (1992), Renault and Touzi (1992), Pastorello et al. (1993), Duan (1994) and Renault (1995). Christensen (1992) considers a pricing model for n assets as a function of a state vector xt which is (/ + n) dimensional and divided into a /-dimensional observed (zt) and «-dimensional unobserved (cot) components. Let pt be the price vector of the n assets, then: Pt = m(z„ co„ 6) . (5.7.1)

180 E. Ghysels, A. C. Harvey and E. Renault Equation (5.7.1) provides a one-to-one relationship between the n latent state variables cot and the n observed prices pt, for given zt and 0. From a financial viewpoint, it implies that the n assets are appropriate instruments to complete the markets if we assume that the observed state variables zt are already mimicked by the price dynamics of other (primitive) assets. Moreover, from a statistical viewpoint it allows full structural maximum likelihood estimation provided the log-likelihood function for observed prices can be deduced easily from a statistical model for xt. For instance, in a Markovian setting where, conditionally on xq , the joint distribution of x\ = [xt)x<t<T is given by the density: T /x(*f|*O,0) =n/(z,,o>,|zr_,,«»,_,, 0) (5.7.2) t=\ the conditional distribution of data D\ = (pt,zt)i<t<T given D0 = (po,zo) is obtained by the usual Jacobian formula: T fD{DTx\D0,e) =Y[f[zt,mgi{zt, pt)\zt-i,mg\zt-u pt-\),0]x t=\ x \Vam(zt,nqx{zt,pt),e)\~X (5.7.3) where nigl(z,.) is the co-inverse of m(z,.,6) denned formally by mg\z,m(z,co,6)) = co while Vram (■) represents the columns corresponding to co of the Jacobian matrix. This MLE using price data of derivatives was proposed independently by Christensen (1992) and Duan (1994). Renault and Touzi (1992) were instead more specifically interested in the Hull and White option pricing formula with: zt = St observed underlying asset price, and cot — at unobserved stochastic volatility process. Then with the joint process xt = (St, at) being Markovian we have a call price of the form: Ct = m{x„e,K) where 0 = (a',y') involves two types of parameters: (1) the vector a of parameters describing the dynamics of the joint process xt — (St, at) which under the equivalent martingale measure allows to compute the expectation with respect to the (risk-neutral) conditional probability distribution of y2(t,t + h) given o>; and (2) the vector y of parameters which characterize the risk premia determining the relation between the risk neutral probability distribution of the x process and the Data Generating Process. Structural MLE is often difficult to implement. This motivated Renault and Touzi (1992) and Pastorello, Renault and Touzi (1993) to consider less efficient but simpler and more robust procedures involving some proxies of the structural likelihood (5.7.3). To illustrate these procedures let us consider the standard log-normal SV model in continuous time:

Stochastic volatility 181 d logff, = k(a - logat)dt + cdWat . (5.7.4) Standard option pricing arguments allow us to ignore misspecifications of the drift of the underlying asset price process. Hence, a first step towards simplicity and robustness is to isolate from the likelihood function the volatility dynamics, namely: IK2 7rc2)-1/2exp -(2c2)-1 (log ex, - e-^logcx,,, - 0(1 - e~m)) (5.7.5) associated with a sample atj,i = 1,..., n and u — f,-_, = At. To approximate this expression one can consider a direct method, as in Renault and Touzi (1992) or an indirect method, as in Pastorello et al. (1993). The former involves calculating implied volatilities from the Hull and White model to create pseudo samples ctl parameterized by k, a and c and computing the maximum of (5.7.5) with respect to those three parameters.45 Pastorello et al. (1993) proposed several indirect inference methods, described in Section 5.5, in the context of (5.7.5). For instance, they propose to use an indirect inference strategy involving GARCH(1,1) volatility estimates obtained from the underlying asset (also independently suggested by Engle and Lee (1994)). This produces asymptotically unbiased but rather inefficient estimates. Pastorello et al. indeed find that an indirect inference simplification of the Renault and Touzi direct procedure involving option prices is far more efficient. It is a clear illustration of the intuition that the use of option price data paired with suitable statistical methods should largely improve the accuracy of estimating volatility diffusion parameters. 5.8. Regression models with stochastic volatility A single equation regression model with stochastic volatility in the disturbance term may be written y, = x'tp + ut , t=l,...,T , (5.8.1) where yt denotes the tth observation, xt is a k x 1 vector of explanatory variables, P is a k x 1 vector of coefficients and ut = aet exp(0.5/z,) as discussed in Section 3. As a special case, the observations may simply have a non-zero mean so that x\p =]i yt. Since ut is stationary, an OLS regression of yt on xt yields a consistent estimator of p. However it is not efficient. 45 The direct maximization of (5.7.5) using BS implied volatilities has also been proposed, see e.g. Heynen, Kemna and Vorst (1994). Obviously the use of BS implied volatility induces a misspecification bias due to the BS model assumptions.

182 E. Ghysels, A. C. Harvey and E. Renault For given values of the SV parameters, $ and a1, a smoothed estimator of ht, ht\T, can be computed using one of the methods outlined in Section 3.4. Multiplying (5.8.1) through by exp(-.5Afir) gives yt 1 + u,, t = l, (5.8.2) where the u/s can be thought of as heteroskedasticity corrected disturbances. Harvey and Shephard (1993) show that these disturbances have zero mean, constant variance and are serially uncorrelated and hence suggest the construction of a feasible GLS estimator (=i t\T XtXt (=1 ~A||7 xtyt (5.8.3) In the classical heteroskedastic regression model ht is deterministic and depends on a fixed number of unknown parameters. Because these parameters can be estimated consistently, the feasible GLS estimator has the same asymptotic distribution as the GLS estimator. Here ht is stochastic and the MSE of its estimator is of 0(1). The situation is therefore somewhat different. Harvey and Shephard (1993) show that, under standard regularity conditions on the sequence of xt, P is asymptotically normal with mean P and a covariance matrix which can be consistently estimated by avar(jS) = -i -i ,-h, ■l\T xtxt J2(yt-x'tp)2e- 2h*Txtx't t=\ ^ e^Tx,x't (=1 (5.8.4) When ht\T is the smoothed estimate given by the linear state space form, the analysis in Harvey and Shephard (1993) suggests that, asymptotically, the feasible GLS estimator is almost as efficient as the GLS estimator and considerably more efficient than the OLS estimator. It would be possible to replace exp(A!|r) by a better estimate computed from one of the methods described in Section 3.4 but this may not have much effect on the efficiency of the resulting feasible GLS estimator of p. When ht is nonstationary, or nearly nonstationary, Hansen (1995) shows that it is possible to construct a feasible adaptive least squares estimator which is asymptotically equivalent to GLS. Conclusions No survey is ever complete. There are two particular areas we expect will flourish in the years to come but which we were not able to cover. The first is the area of market microstructures which is well surveyed in a recent review paper by Goodhart and O'Hara (1995). With the ever increasing availability of high fre-

Stochastic volatility 183 quency data series, we anticipate more work involving game theoretic models. These can now be estimated because of recent advances in econometric methods, similar to those enabling us to estimate diffusions. Another area where we expect interesting research to emerge is that involving nonparametric procedures to estimate SV continuous time and derivative securities models. Recent papers include Ait-Sahalia (1994), Ait-Sahalia et al. (1994), Bossaerts, Hafner and Hardle (1995), Broadie et al. (1995), Conley et al. (1995), Elsheimer et al. (1995), Gourie- roux, Monfort and Tenreiro (1994), Gourieroux and Scaillet (1995), Hutchinson, Lo and Poggio (1994), Lezan et al. (1995), Lo (1995), Pagan and Schwert (1992). Research into the econometrics of Stochastic Volatility models is relatively new. As our survey has shown, there has been a burst of activity in recent years drawing on the latest statistical technology. As regards the relationship with ARCH, our view is that SV and ARCH are not necessarily direct competitors, but rather complement each other in certain respects. Recent advances such as the use of ARCH models as filters, the weakening of GARCH and temporal aggregation and the introduction of nonparametric methods to fit conditional variances, illustrate that a unified strategy for modelling volatility needs to draw on both ARCH and SV. References Abramowitz, M. and N. C. Stegun (1970). Handbook of Mathematical Functions. Dover Publications Inc., New York. Ait-Sahalia, Y. (1994). Nonparametric pricing of interest rate derivative securities. Discussion Paper, Graduate School of Business, University of Chicago. Ait-Sahalia, Y. S. J. Bickel and T. M. Stoker (1994). Goodness-of-Fit tests for regression using kernel methods. Discussion Paper, University of Chicago. Amin, K. L. and V. Ng (1993). Equilibrium option valuation with systematic stochastic volatility. J. Financed, 881-910. Andersen, T. G. (1992). Volatility. Discussion paper, Northwestern University. Andersen, T. G. (1994). Stochastic autoregressive volatility: A framework for volatility modeling. Math. Finance 4, 75-102. Andersen, T. G. (1996). Return volatility and trading volume: An information flow interpretation of stochastic volatility. J. Finance, to appear. Andersen, T. G. and T. Bollerslev (1995). Intraday seasonality and volatility persistence in financial Markets. J. Emp. Finance, to appear. Andersen, T. G. and B. Sarensen (1993). GMM estimation of a stochastic volatility model: A Monte Carlo study. J. Business Econom. Statist, to appear. Andersen, T. G. and B. Sarensen (1996). GMM and QML asymptotic standard deviations in stochastic volatility models: A response to Ruiz (1994). J. Econometrics, to appear. Andrews, D. W. K. (1993). Exactly median-unbiased estimation of first order autoregressive unit root models. Econometrica 61, 139-165. Bachelier, L. (1900). Theorie de la speculation. Ann. Sci. Ecole Norm. Sup. 17, 21-86, [On the Random Character of Stock Market Prices (Paul H. Cootner, ed.) The MIT Press, Cambridge, Mass. 1964]. Baillie, R. T. and T. Bollerslev (1989). The message in daily exchange rates: A conditional variance tale. J. Business Econom. Statist. 7, 297-305. Baillie, R. T. and T. Bollerslev (1991). Intraday and Interday volatility in foreign exchange rates. Rev. Econom. Stud. 58, 565-585.

184 E. Ghysels, A. C. Harvey and E. Renault Baillie, R. T., T. Bollerslev and H. O. Mikkelsen (1993). Fractionally integrated generalized auto- regressive conditional heteroskedasticity. J. Econometrics, to appear. Bajeux, I. and J. C. Rochet (1992). Dynamic spanning: Are options an appropriate instrument? Math. Finance, to appear. Bates, D. S. (1995a). Testing option pricing models. In: G. S. Maddala ed., Handbook of Statistics, Vol. 14, Statistical Methods in Finance. North Holland, Amsterdam, in this volume. Bates, D. S. (1995b). Jumps and stochastic volatility: Exchange rate processes implicit in PHLX Deutschemark options. Rev. Financ. Stud., to appear. Beckers, S. (1981). Standard deviations implied in option prices as predictors of future stock price variability. J. Banking Finance 5, 363-381. Bera, A. K. and M. L. Higgins (1995). On ARCH models: Properties, estimation and testing. In: L. Exley, D. A. R. George, C. J. Roberts and S. Sawyer eds., Surveys in Econometrics. Basil Blackwell: Oxford, Reprinted from J. Econom. Surveys. Black, F. (1976). Studies in stock price volatility changes. Proceedings of the 1976 Business Meeting of the Business and Economic Statistics Section, Amer. Statist. Assoc. 177-181. Black, F. and M. Scholes (1973). The pricing of options and corporate liabilities. J. Politic. Econom. 81, 637-654. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. J. Econometrics 31, 307-327. Bollerslev, T., Y. C. Chou and K. Kroner (1992). ARCH modelling in finance: A selective review of the theory and empirical evidence. J. Econometrics 52, 201-224. Bollerslev, T. and R. Engle (1993). Common persistence in conditional variances. Econometrica 61, 166-187. Bollerslev, T., R. Engle and D. Nelson (1994). ARCH models. In: R. F. Engle and D. McFadden eds., Handbook of Econometrics, Volume IV. North-Holland, Amsterdam. Bollerslev, T., R. Engle and J. Wooldridge (1988). A capital asset pricing model with time varying covariances. J. Politic. Econom. 96, 116-131. Bollerslev, T. and E. Ghysels (1994). On periodic autoregression conditional heteroskedasticity. J. Business Econom. Statist., to appear. Bollerslev, T. and H. O. Mikkelsen (1995). Modeling and pricing long-memory in stock market volatility. J. Econometrics, to appear. Bossaerts, P., C. Hafner and W. Hardle (1995). Foreign exchange rates have surprising volatility. Discussion Paper, CentER, University of Tilburg. Bossaerts, P. and P. Hillion (1995). Local parametric analysis of hedging in discrete time. J. Econometrics, to appear. Breidt, F. J., N. Crato and P. de Lima (1993). Modeling long-memory stochastic volatility. Discussion paper, Iowa State University. Breidt, F. J. and A. L. Carriquiry (1995). Improved quasi-maximum likelihood estimation for stochastic volatility models. Mimeo, Department of Statistics, University of Iowa. Broadie, M., J. Detemple, E. Ghysels and O. Torres (1995). American options with stochastic volatility: A nonparametric approach. Discussion Paper, CIRANO. Broze, L., O. Scaillet and J. M. Zakoian (1994). Quasi indirect inference for diffusion processes. Discussion Paper CORE. Broze, L., O. Scaillet and J. M. Zakoian (1995). Testing for continuous time models of the short term interest rate. J. Emp. Finance, 199-223. Campa, J. M. and P. H. K. Chang (1995). Testing the expectations hypothesis on the term structure of implied volatilities in foreign exchange options. J. Finance 50, to appear. Campbell, J. Y. and A. S. Kyle (1993). Smart money, noise trading and stock price behaviour. Rev. Econom. Stud. 60, 1-34. Canina, L. and S. Figlewski (1993). The informational content of implied volatility. Rev. Financ. Stud. 6, 659-682. Canova, F. (1992). Detrending and Business Cycle Facts. Discussion Paper, European University Institute, Florence.

Stochastic volatility 185 Chesney, M. and L. Scott (1989). Pricing European currency options: A comparison of the modified Black-Scholes model and a random variance model. J. Financ. Quant. Anal. 24, 267-284. Cheung, Y.-W. and F. X. Diebold (1994). On maximum likelihood estimation of the differencing parameter of fractionally-integrated noise with unknown mean. J. Econometrics 62, 301-316. Chiras, D. P. and S. Manaster (1978). The information content of option prices and a test of market efficiency. J. Financ. Econom. 6, 213-234. Christensen, B. J. (1992). Asset prices and the empirical martingale model. Discussion Paper, New York University. Christie, A. A. (1982). The stochastic behavior of common stock variances: Value, leverage, and interest rate effects. J. Financ. Econom. 10, 407-432. Clark, P. K. (1973). A subordinated stochastic process model with finite variance for speculative prices. Econometrica 41, 135-156. Clewlow, L and X. Xu (1993). The dynamics of stochastic volatility. Discussion Paper, University of Warwick. Comte, F. and E. Renault (1993). Long memory continuous time models. J. Econometrics, to appear. Comte, F. and E. Renault (1995). Long memory continuous time stochastic volatility models. Paper presented at the HFDF-I Conference, Zurich. Conley, T., L. P. Hansen, E. Luttmer and J. Scheinkman (1995). Estimating subordinated diffusions from discrete time data. Discussion paper, University of Chicago. Cornell, B. (1978). Using the options pricing model to measure the uncertainty producing effect of major announcements. Financ. Mgmt. 7, 54-59. Cox, J. C. (1975). Notes on option pricing I: Constant elasticity of variance diffusions. Discussion Paper, Stanford University. Cox, J. C. and S. Ross (1976). The valuation of options for alternative stochastic processes. J. Financ. Econom. 3, 145-166. Cox, J. C. and M. Rubinstein (1985). Options Markets. Englewood Cliffs, Prentice-Hall, New Jersey. Dacorogna, M. M, U. A. Miiller, R. J. Nagler, R. B. Olsen and O. V. Pictet (1993). A geographical model for the daily and weekly seasonal volatility in the foreign exchange market. J. Internal. Money Finance 12, 413-438. Danielsson, J. (1994). Stochastic volatility in asset prices: Estimation with simulated maximum likelihood. J. Econometrics 61, 375-400. Danielsson, J. and J. F. Richard (1993). Accelerated Gaussian importance sampler with application to dynamic latent variable models. J. Appl. Econometrics 3, S153-S174. Dassios, A. (1995). Asymptotic expressions for approximations to stochastic variance models. Mimeo, London School of Economics. Day, T. E. and C. M. Lewis (1988). The behavior of the volatility implicit in the prices of stock index options. J. Financ. Econom. 22, 103-122. Day, T. E. and C. M. Lewis (1992). Stock market volatility and the information content of stock index options. J. Econometrics 52, 267-287. Diebold, F. X. (1988). Empirical Modeling of Exchange Rate Dynamics. Springer Verlag, New York. Diebold, F. X. and J. A. Lopez (1995). Modeling Volatility Dynamics. In: K. Hoover ed., Macroeconomics: Developments, Tensions and Prospects. Diebold, F. X. and M. Nerlove (1989). The dynamics of exchange rate volatility: A multivariate latent factor ARCH Model. J. Appl. Econometrics 4, 1-22. Ding, Z., C. W. J. Granger and R. F. Engle (1993). A long memory property of stock market returns and a new model. J. Emp. Finance 1, 83-108. Diz, F. and T. J. Finucane (1993). Do the options markets really overreact? J. Futures Markets 13, 298-312. Drost, F. C. and T. E. Nijman (1993). Temporal aggregation of GARCH processes. Econometrica 61, 909-927. Drost, F. C. and B. J. M. Werker (1994). Closing the GARCH gap: Continuous time GARCH modelling. Discussion Paper CentER, University of Tilburg. Duan, J. C. (1994). Maximum likelihood estimation using price data of the derivative contract. Math. Financed 155-167.

186 E. Ghysels, A. C. Harvey and E. Renault Duan, J. C. (1995). The GARCH option pricing model. Math. Finance 5, 13-32. Duffle, D. (1989). Futures Markets. Prentice-Hall International Editions. Duffle, D. (1992). Dynamic Asset Pricing Theory. Princeton University Press. Duffie, D. and K. J. Singleton (1993). Simulated moments estimation of Markov models of asset prices. Econometrica 61, 929-952. Dunsmuir, W. (1979). A central limit theorem for parameter estimation in stationary vector time series and its applications to models for a signal observed with noise. Ann. Statist. 7, 490-506. Easley, D. and M. O'Hara (1992). Time and the process of security price adjustment. J. Finance, 47, 577-605. Ederington, L. H. and J. H. Lee (1993). How markets process information: News releases and volatility. J. Finance 48, 1161-1192. Elsheimer, B., M. Fisher, D. Nychka and D. Zirvos (1995). Smoothing splines estimates of the discount function based on US bond Prices. Discussion Paper Federal Reserve, Washington, D.C. Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50, 987-1007. Engle, R. F. and C. W. J. Granger (1987). Co-integration and error correction: Representation, estimation and testing. Econometrica 55, 251-576. Engle, R. F. and S. Kozicki (1993). Testing for common features. J. Business Econom. Statist. 11, 369- 379. Engle, R. F. and G. G. J. Lee (1994). Estimating diffusion models of stochastic volatility. Discussion Paper, Univeristy of California at San Diego. Engle, R. F. and C. Mustafa (1992). Implied ARCH models from option prices. J. Econometrics 52, 289-311. Engle, R. F. and V. K. Ng (1993). Measuring and testing the impact of news on volatility. J. Finance 48, 1749-1801. Fama, E. F. (1963). Mandelbrot and the stable Paretian distribution. J. Business 36, 420-429. Fama, E. F. (1965). The behavior of stock market prices. J. Business 38, 34-105. Foster, D. and S. Viswanathan (1993a). The effect of public information and competition on trading volume and price volatility. Rev. Financ. Stud. 6, 23-56. Foster, D. and S. Viswanathan (1993b). Can speculative trading explain the volume volatility relation. Discussion Paper, Fuqua School of Business, Duke University. French, K. and R. Roll (1986). Stock return variances: The arrival of information and the reaction of traders. J. Financ. Econom. 17, 5-26. Gallant, A. R., D. A. Hsieh and G. Tauchen (1994). Estimation of stochastic volatility models with suggestive diagnostics. Discussion Paper, Duke University. Gallant, A. R., P. E. Rossi and G. Tauchen (1992). Stock prices and volume. Rev. Financ. Stud. 5,199- 242. Gallant, A. R., P. E. Rossi and G. Tauchen (1993). Nonlinear dynamic structures. Econometrica 61, 871-907. Gallant, A. R. and G. Tauchen (1989). Semiparametric estimation of conditionally constrained heterogeneous processes: Asset pricing applications. Econometrica 57, 1091-1120. Gallant, A. R. and G. Tauchen (1992). A nonparametric approach to nonlinear time series analysis: Estimation and simulation. In: E. Parzen, D. Brillinger, M. Rosenblatt, M. Taqqu, J. Geweke and P. Caines eds., New Dimensions in Time Series Analysis. Springer-Verlag, New York. Gallant, A. R. and G. Tauchen (1994). Which moments to match. Econometric Theory, to appear. Gallant, A. R. and G. Tauchen (1995). Estimation of continuous time models for stock returns and interest rates. Discussion Paper, Duke University. Gallant, A. R. and H. White (1988). A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models. Basil Blackwell, Oxford. Garcia, R. and E. Renault (1995). Risk aversion, intertemporal substitution and option pricing. Discussion Paper CIRANO. Geweke, J. (1994). Comment on Jacquier, Poison and Rossi. J. Business Econom. Statist. 12, 397-399.

Stochastic volatility 187 Geweke, J. (1995). Monte Carlo simulation and numerical integration. In: H. Amman, D. Kendrick and J. Rust eds., Handbook of Computational Economics. North Holland. Ghysels, E., C. Gourieroux and J. Jasiak (1995a). Market time and asset price movements: Theory and estimation. Discussion paper CIRANO and C.R.D.E., Univeriste de Montreal. Ghysels, E., C. Gourieroux and J. Jasiak (1995b). Trading patterns, time deformation and stochastic volatility in foreign exchange markets. Paper presented at the HFDF Conference, Zurich. Ghysels, E. and J. Jasiak (1994a). Comments on Bayesian analysis of stochastic volatility models. J. Business Econom. Statist. 12, 399-401. Ghysels, E. and J. Jasiak (1994b). Stochastic volatility and time deformation an application of trading volume and leverage effects. Paper presented at the Western Finance Association Meetings, Santa Fe. Ghysels, E., L. Khalaf and C. Vodounou (1994). Simulation based inference in moving average models. Discussion Paper, CIRANO and C.R.D.E. Ghysels, E., H. S. Lee and P. Siklos (1993). On the (mis)specification of seasonality and its consequences: An empirical investigation with U.S. Data. Empirical Econom. 18, 747-760. Goodhart, C. A. E. and M. O'Hara (1995). High frequency data in financial markets: Issues and applications. Paper presented at HFDF Conference, Zurich. Gourieroux, C. and A. Monfort (1993a). Simulation based Inference: A survey with special reference to panel data models. J. Econometrics 59, 5-33. Gourieroux, C. and A. Monfort (1993b). Pseudo-likelihood methods in Maddalaet al. ed., Handbook of Statistics Vol. 11, North Holland, Amsterdam. Gourieroux, C. and A. Monfort (1994). Indirect inference for stochastic differential equations. Discussion Paper CREST, Paris. Gourieroux, C. and A. Monfort (1995). Simulation-Based Econometric Methods. CORE Lecture Series, Louvain-la-Neuve. Gourieroux, C, A. Monfort and E. Renault (1993). Indirect inference. J. Appl. Econometrics 8, S85- S118. Gourieroux, C, A. Monfort and C. Tenreiro (1994). Kernel M-estimators: Nonparametric diagnostics for structural models. Discussion Paper, CEPREMAP. Gourieroux, C, A. Monfort and C. Tenreiro (1995). Kernel M-estimators and functional residual plots. Discussion Paper CREST - ENSAE, Paris. Gourieroux, C, E. Renault and N. Touzi (1994). Calibration by simulation for small sample bias correction. Discussion Paper CREST. Gourieroux, C. and O. Scaillet (1994). Estimation of the term structure from bond data. J. Emp. Finance, to appear. Granger, C. W. J. and Z. Ding (1994). Stylized facts on the temporal and distributional properties of daily data for speculative markets. Discussion Paper, University of California, San Diego. Hall, A. R. (1993). Some aspects of generalized method of moments estimation in Maddala et al. ed., Handbook of Statistics Vol. 11, North Holland, Amsterdam. Hamao, Y., R. W. Masulis and V. K. Ng (1990). Correlations in price changes and volatility across international stock markets. Rev. Financ. Stud. 3, 281-307. Hansen, B. E. (1995). Regression with nonstationary volatility. Econometrica 63, 1113-1132. Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50, 1029-1054. Hansen, L. P. and J. A. Scheinkman (1995). Back to the future: Generating moment implications for continuous-time Markov processes. Econometrica 63, 767-804. Harris, L. (1986). A transaction data study of weekly and intradaily patterns in stock returns. J. Financ. Econom. 16, 99-117. Harrison, M. and D. Kreps (1979). Martingale and arbitrage in multiperiod securities markets. J. Econom. Theory 20, 381-408. Harrison, J. M. and S. Pliska (1981). Martingales and stochastic integrals in the theory of continuous trading. Stochastic Processes and Their Applications 11, 215-260.

188 E. Ghysels, A. C. Harvey and E. Renault Harrison, P. J. and C. F. Stevens (1976). Bayesian forecasting (with discussion). J. Roy. Statis. Soc, Ser. B, 38, 205-247. Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press. Harvey, A. C. and A. Jaeger (1993). Detrending, stylized facts and the business cycle. J. Appl. Econometrics 8, 231-247. Harvey, A. C. (1993). Long memory in stochastic volatility. Discussion Paper, London School of Economics. Harvey, A. C. and S. J. Koopman (1993). Forecasting hourly electricity demand using time-varying splines. J. Amer. Statist. Assoc. 88, 1228-1236. Harvey, A. C, E. Ruiz and E. Sentana (1992). Unobserved component time series models with ARCH Disturbances, J. Econometrics 52, 129-158. Harvey, A. C, E. Ruiz and N. Shephard (1994). Multivariate stochastic variance models. Rev. Econom. Stud. 61, 247-264. Harvey, A. C. and N. Shephard (1993). Estimation and testing of stochastic variance models, STI- CERD Econometrics. Discussion paper, EM93/268, London School of Economics. Harvey, A. C. and N. Shephard (1996). Estimation of an asymmetric stochastic volatility model for asset returns. J. Business Econom. Statist, to appear. Harvey, C. R. and R. D. Huang (1991). Volatility in the foreign currency futures market. Rev. Financ. Stud. 4, 543-569. Harvey, C. R. and R. D. Huang (1992). Information trading and fixed income volatility. Discussion Paper, Duke University. Harvey, C. R. and R. E. Whaley (1992). Market volatility prediction and the efficiency of the S&P 100 index option market. J. Financ. Econom. 31, 43-74. Hausman, J. A. and A. W. Lo (1991). An ordered probit analysis of transaction stock prices. Discussion paper, Wharton School, University of Pennsylvania. He, H. (1993). Option prices with stochastic volatilities: An equilibrium analysis. Discussion Paper, University of California, Berkeley. Heston, S. L. (1993). A closed-form solution for options with stochastic volatility with applications to bond and currency options. Rev. Financ. Stud. 6, 327-343. Heynen, R., A. Kemna and T. Vorst (1994). Analysis of the term structure of implied volatility. J. Financ. Quant. Anal. Hull, J. (1993). Options, futures and other derivative securities. 2nd ed. Prentice-Hall International Editions, New Jersey. Hull, J. (1995). Introduction to Futures and Options Markets. 2nd ed. Prentice-Hall, Englewood Cliffs, New Jersey. Hull, J. and A. White (1987). The pricing of options on assets with stochastic volatilities. J. Finance 42, 281-300. Huffman, G. W. (1987). A dynamic equilibrium model of asset prices and transactions volume. J. Politic. Econom. 95, 138-159. Hutchinson, J. M., A. W. Lo and T. Poggio (1994). A nonparametric approach to pricing and hedging derivative securities via learning networks. J. Finance 49, 851-890. Jacquier, E., N. G. Poison and P. E. Rossi (1994). Bayesian analysis of stochastic volatility models (with discussion). J. Business Econom. Statist. 12, 371-417. Jacquier, E., N. G. Poison and P. E. Rossi (1995a). Multivariate and prior distributions for stochastic volatility models. Discussion paper CIRANO. Jacquier, E., N. G. Poison and P. E. Rossi (1995b). Stochastic volatility: Univariate and multivariate extensions. Rodney White center for financial research. Working Paper 19-95, The Wharton School, University of Pennsylvania. Jacquier, E., N. G. Poison and P. E. Rossi (1995c). Efficient option pricing under stochastic volatility. Manuscript, The Wharton School, University of Pennsylvania. Jarrow, R. and Rudd (1983). Option Pricing. Irwin, Homewood III. Johnson, H. and D. Shanno (1987). Option pricing when the variance is changing. J. Financ. Quant. Anal. 22, 143-152.

Stochastic volatility 189 Jorion, P. (1995). Predicting volatility in the foreign exchange market. J. Finance 50, to appear. Karatzas, I. and S. E. Shreve (1988). Brownian Motion and Stochastic Calculus. Springer-Verlag: New York, NY. Karpoff, J. (1987). The relation between price changes and trading volume: A survey. J. Financ. Quant. Anal. 22, 109-126. Kim, S. and N. Shephard (1994). Stochastic volatility: Optimal likelihood inference and comparison with ARCH Model. Discussion Paper, Nuffield College, Oxford. King, M., E. Sentana and S. Wadhwani (1994). Volatility and links between national stock markets. Econometrica 62, 901-934. Kitagawa, G. (1987). Non-Gaussian state space modeling of nonstationary time series (with discussion). J. Amer. Statist. Assoc. 79, 378-389. Kloeden, P. E. and E. Platten (1992). Numerical Solutions of Stochastic Differential Equations. Springer-Verlag, Heidelberg. Lamoureux, C. and W. Lastrapes (1990). Heteroskedasticity in stock return data: Volume versus GARCH effect. J. Finance 45, 221-229. Lamoureux, C. and W. Lastrapes (1993). Forecasting stock-return variance: Towards an understanding of stochastic implied volatilities. Rev. Financ. Stud. 6, 293-326. Latane, H. and R. Jr. Rendleman (1976). Standard deviations of stock price ratios implied in option prices. J. Finance 31, 369-381. Lezan, G., E. Renault and T. deVitry (1995) Forecasting foreign exchange risk. Paper presented at 7th World Congres of the Econometric Society, Tokyo. Lin, W. L., R. F. Engle and T. Ito (1994). Do bulls and bears move across borders? International transmission of stock returns and volatility as the world turns. Rev. Financ. Stud., to appear. Lo, A. W. (1995). Statistical inference for technical analysis via nonparametric estimation. Discussion Paper, MIT. Mahieu, R. and P. Schotman (1994a). Stochastic volatility and the distribution of exchange rate news. Discussion Paper, University of Limburg. Mahieu, R. and P. Schotman (1994b). Neglected common factors in exchange rate volatility. J. Emp. Finance 1,279-311. Mandelbrot, B. B. (1963). The variation of certain speculative prices. J. Business 36, 394--416. Mandelbrot, B. and H. Taylor (1967). On the distribution of stock prices differences. Oper. Res. 15, 1057-1062. Mandelbrot, B. B. and J.W. Van Ness (1968). Fractal Brownian motions, fractional noises and applications. SI AM Rev. 10, 422-437. McFadden, D. (1989). A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica 57, 1027-1057. Meddahi, N. and E. Renault (1995). Aggregations and marginalisations of GARCH and stochastic volatility models. Discussion Paper, GREMAQ. Melino, A. and M. Turnbull (1990). Pricing foreign currency options with stochastic volatility. J. Econometrics 45, 239-265. Melino, A. (1994). Estimation of continuous time models in finance. In: C.A. Sims ed., Advances in Econometrics (Cambridge University Press). Merton, R. C. (1973). Rational theory of option pricing. Bell J. Econom. Mgmt. Sci. 4, 141-183. Merton, R. C. (1976). Option pricing when underlying stock returns are discontinuous. J. Financ. Econom. 3, 125-144. Merton, R. C. (1990). Continuous Time Finance. Basil Blackwell, Oxford. Merville, L. J. and D. R. Pieptea (1989). Stock-price volatility, mean-reverting diffusion, and noise. J. Financ. Econom. 242, 193-214. Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller and E. Teller (1954). Equation of state calculations by fast computing machines. J. Chem. Physics 21, 1087-1092. Miiller, U. A., M. M. Dacorogna, R. B. Olsen, W. V. Pictet, M. Schwarz and C. Morgenegg (1990). Statistical study of foreign exchange rates. Empirical evidence of a price change scaling law and intraday analysis. J. Banking Finance 14, 1189-1208.

190 E. Ghysels, A. C. Harvey and E. Renault Nelson, D. B. (1988). Time series behavior of stock market volatility and returns. Ph.D. dissertation, MIT. Nelson, D. B. (1990). ARCH models as diffusion approximations. J. Econometrics 45, 7-39. Nelson, D. B. (1991). Conditional heteroskedasticity in asset returns: A new approach. Econometrica 59, 347-370. Nelson, D. B. (1992). Filtering and forecasting with misspecified ARCH Models I: Getting the right variance with the wrong model. J. Econometrics 25, 61-90. Nelson, D. B. (1994). Comment on Jacquier, Poison and Rossi. J. Business Econom. Statist. 12, 403- 406. Nelson, D. B. (1995a). Asymptotic smoothing theory for ARCH Models. Econometrica, to appear. Nelson, D. B. (1995b). Asymptotic filtering theory for multivariate ARCH models. J. Econometrics, to appear. Nelson, D. B. and D. P. Foster (1994). Asymptotic filtering theory for univariate ARCH models. Econometrica 62, 1-41. Nelson, D. B. and D. P. Foster (1995). Filtering and forecasting with misspecified ARCH models II: Making the right forecast with the wrong model. J. Econometrics, to appear. Noh, J., R. F. Engle and A. Kane (1994). Forecasting volatility and option pricing of the S&P 500 index. J. Derivatives, 17-30. Ogaki, M. (1993). Generalized method of moments: Econometric applications. In: Maddalaet al. ed., Handbook of Statistics Vol. 11, North Holland, Amsterdam. Pagan, A. R. and G. W. Schwert (1990). Alternative models for conditional stock volatility. J. Econometrics 45, 267-290. Pakes, A. and D. Pollard (1989). Simulation and the asymptotics of optimization estimators. Econometrica 57, 995-1026. Pardoux, E. and D. Talay (1985). Discretization and simulation of stochastic differential equations. Acta Appl. Math. 3, 23-47. Pastorello, S., E. Renault and N. Touzi (1993). Statistical inference for random variance option pricing. Discussion Paper, CREST. Patell, J. M. and M. A. Wolfson (1981). The ex-ante and ex-post price effects of quarterly earnings announcement reflected in option and stock price. J. Account. Res. 19, 434-458. Patell, J. M. and M. A. Wolfson (1979). Anticipated information releases reflected in call option prices. J. Account. Econom. 1, 117-140. Pham, H. and N. Touzi (1993). Intertemporal equilibrium risk premia in a stochastic volatility model. Math. Finance, to appear. Platten, E. and Schweizer (1995). On smile and skewness. Discussion Paper, Australian National University, Canberra. Poterba, J. and L. Summers (1986). The persistence of volatility and stock market fluctuations. Amer. Econom. Rev. 76, 1142-1151. Renault, E. (1995). Econometric models of option pricing errors. Invited Lecture presented at 7th W.C.E.S., Tokyo, August. Renault, E. and N. Touzi (1992). Option hedging and implicit volatility. Math. Finance, to appear. Revuz, A. and M. Yor (1991). Continuous Martingales andBrownian Motion. Springer-Verlag, Berlin. Robinson, P. (1993). Efficient tests of nonstationary hypotheses. Mimeo, London School of Economics. Rogers, L. C. G. (1995). Arbitrage with fractional Brownian motion. University of Bath, Discussion paper. Rubinstein, M. (1985). Nonparametric tests of alternative option pricing models using all reported trades and quotes on the 30 most active CBOE option classes from August 23, 1976 through August 31, 1978. J. Finance 40, 455-480. Ruiz, E. (1994). Quasi-maximum likelihood estimation of stochastic volatility models. J. Econometrics 63, 289-306. Schwert, G. W. (1989). Business cycles, financial crises, and stock volatility. Camegie-Rochester Conference Series on Public Policy 39, 83-126.

Stochastic volatility 191 Scott, L. O. (1987). Option pricing when the variance changes randomly: Theory, estimation and an application. J. Financ. Quant. Anal. 22, 419-438. Scott, L. (1991). Random variance option pricing. Advances in Futures and Options Research, Vol. 5, 113-135. Sheikh, A. M. (1993). The behavior of volatility expectations and their effects on expected returns. J. Business 66, 93-116. Shephard, N. (1995). Statistical aspect of ARCH and stochastic volatility. Discussion Paper 1994, Nuffield College, Oxford University. Sims, A. (1984). Martingale-like behavior of prices. University of Minnesota. Sowell, F. (1992). Maximum likelihood estimation of stationary univariate fractionally integrated time series models. J. Econometrics 53, 165-188. Stein, J. (1989): Overreactions in the options market. J. Finance 44, 1011-1023. Stein, E. M. and J. Stein (1991). Stock price distributions with stochastic volatility: An analytic approach. Rev. Financ. Stud. 4, 727-752. Stock, J. H. (1988). Estimating continuous time processes subject to time deformation. J. Amer. Statist. Assoc. 83, 77-84. Strook, D. W. and S. R. S. Varadhan (1979). Multi-dimensional Diffusion Processes. Springer-Verlag, Heidelberg. Tanner, T. and W. Wong (1987). The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc. 82, 528-549. Tauchen, G. (1995). New minimum chi-square methods in empirical finance. Invited Paper presented at the 7th World Congress of the Econometric Society, Tokyo. Tauchen, G and M. Pitts (1983). The price variability-volume relationship on speculative markets. Econometrica 51, 485-505. Taylor, S. J. (1986). Modeling Financial Time Series. John Wiley: Chichester. Taylor, S. J. (1994). Modeling stochastic volatility: A review and comparative study. Math. Finance 4, 183-204. Taylor, S. J. and X. Xu (1994). The term structure of volatility implied by foreign exchange options. J. Financ. Quant Anal. 29, 57-74. Taylor, S. J. and X. Xu (1993). The magnitude of implied volatility smiles: Theory and empirical evidence for exchange rates. Discussion Paper, University of Warwick. Von Furstenberg, G. M. and B. Nam Jeon (1989). International stock price movements: Links and messages. Brookings Papers on Economic Activity 1,125-180. Wang, J. (1993). A model of competitive stock trading volume. Discussion Paper, MIT. Watanabe, T. (1993). The time series properties of returns, volatility and trading volume in financial markets. Ph.D. Thesis, Department of Economics, Yale University. West, M. and J. Harrison (1990). Bayesian Forecasting and Dynamic Models. Springer-Verlag, Berlin. Whaley, R. E. (1982). Valuation of American call options on dividend-paying stocks. J. Financ. Econom. 10, 29-58. Wiggins, J. B. (1987). Option values under stochastic volatility: Theory and empirical estimates. J. Financ. Econom. 19, 351-372. Wood, R. T. Mclnish and J. K. Ord (1985). An investigation of transaction data for NYSE Stocks. J. Finance 40, 723-739. Wooldridge, J. M. (1994). Estimation and inference for dependent processes. In: R.F. Engle and D. McFadden eds., Handbook of Econometrics Vol. 4. North Holland, Amsterdam.

G. S. Maddala and C. R. Rao, eds., Handbook of Statistics, Vol. 14 © 1996 Elsevier Science B.V. All rights reserved 6 Stock Price Volatility Stephen F. LeRoy 1. Introduction In the early days of the efficient capital markets literature, discourse between finance academics and practitioners was characterized by mutual incomprehension. Academics held that security prices were governed exclusively by their prospective payoffs - in fact, the former equaled the discounted expected value of the latter. Practitioners, on the other hand, made no secret of their opinion that only naive academics could take the present value relation seriously as a theory of asset pricing: everyone knows that traders routinely ignore cash flows, and that large price changes often occur in the complete absence of news about future cash flows. Academics, at least since Samuelson's (1965) paper, responded that rejection of the present value relation implies the existence of profitable trading rules. Given that no one appeared to be identifying a trading rule that significantly outperforms buy-and-hold, academics saw no grounds for rejecting the present- value relation. Prior to the 1980's, empirical tests of market efficiency were conducted on the home court of the academics: one searched for evidence of return predictability; failing to find it, one concluded in favor of market efficiency. The variance- bounds tests introduced by Shiller (1981) and LeRoy and Porter (1981), however, can be interpreted as shifting the locus of the debate from the home court of the academics to that of the practitioners - instead of looking for patterns in returns that are ruled out by market efficiency, one looked for the price patterns that are implied by market efficiency. Specifically, one asked whether security price changes are of about the magnitude one would expect if they were generated exclusively by fundamentals. The implications of this shift from returns tests to price-level tests were at first difficult to sort out since finding a predictable pattern has opposite interpretations in the two cases: finding that fundamentals predict future security returns argues against market efficiency, whereas finding that fundamentals predict current prices supports market efficiency. In both cases the early evidence suggested that the correlation being sought was not in the data; hence the returns tests accepted market efficiency, whereas the variance-bounds tests rejected efficiency. 193

194 S. F. LeRoy To understand the relation between returns and variance-bounds tests of market efficiency, note that the simplest specification of the efficient markets model (applied to stock prices) says that E,(r,+i) = p, (1.1) where rt is the (gross) rate of return on stock, p is a constant greater than one, and E, denotes mathematical expectation conditional on some information set /,. Equation 1.1 says that no matter what agents' information is, the conditional expected rate of return on stock is p; past information, such as past realized stock returns, should not be correlated with future returns. Conventional efficiency tests directly investigated this implication. Variance-bounds tests, on the other hand, used the definition of the rate of return, _ dt+\ +pt+\ . .. r,+i= , (1.2) Pt to derive from 1.1 the relation p, = PE,(d,+i + p,+i) , (1.3) where /? = 1/(1 + p). After successive substitution and application of the law of iterated expectations, (1.3) may be written as pt = E,(M+i + fdt+2 + ... + Pn+ldl+n+l + Pn+lpt+n+i) ■ (1.4) Assuming the convergence condition lim/f+1E,(^+„+1)=0 (1.5) n—»oo is satisfied, sending n to infinity in (1.4) results in Pt = W) , (1-6) where p* is the ex-post rational stock price; i.e., the value stock would have if future dividends were perfectly forecastable: oo p*(=Y,Pndt+n ■ (1-7) n=\ Because the conditional expectation of any random variable is less volatile than that random variable itself, (1.6) implies the variance bounds inequality V(pt) < V(p*) . (1.8) Both Shiller and LeRoy-Porter reported reversal of the empirical counterpart to inequality (1.8): prices appear to be more volatile than the upper bound implied by the volatility of dividends under market efficiency.

Stock price volatility 195 2. Statistical issues Several statistical issues must be considered in interpreting the fact that the empirical counterpart of inequality 1.8 is apparently reversed. These are (1) bias in parameter estimation, (2) nuisance parameter problems, and (3) sample variation of parameter estimates. Of these, discussion in the variance-bounds literature has concentrated almost exclusively on bias. However, bias is not a serious problem in the absence of nuisance parameter or sample variability problems since the rejection region can always be modified to allow for bias. In contrast, nuisance parameter problems - which occur whenever the sample distribution of the test statistic is materially affected by a parameter which is unrestricted under the null hypothesis - make it difficult or impossible to set rejection regions so that rejection will occur with prespecified probability if the null hypothesis is true. Therefore they are much more serious. High sample variability in the test statistic is also a serious problem since it diminishes the ability of the test to distinguish between the null and the alternative, therefore reducing the power of the test for given size. In testing (1.8) one immediately encounters the fact that/7* cannot be directly constructed from any finite sample since dividends after the end of the sample are unobservable. The problems of bias, nuisance parameters and sample variability in testing (1.8) take different forms depending on how this problem is addressed. Two methods for estimating V(p*) are available, the model-free estimator used by Shiller and the model-based estimator used by LeRoy-Porter. The model-free estimator simply replaces the unobservable p* with the expected value of p* conditional on the sample, which is observable. This is given by setting the terminal value p*j,T of the observable proxy series p*,T equal to actual pT: P*t\t=Pt (2.1) and computing earlier values p*, T from the backward recursion P:\T = P(P*,+x\T + dt+i) , (2-2) which has the required property: E(p*\T\pi,du...,pT,dT)=p* (2.3) (under the assumption that the population value of ft is used in the discounting). The estimated series is model-free in the sense that its construction requires no assumptions about how dividends are generated, an attractive property. Using the model-free /jjjr series to construct V(p*) has several less attractive consequences. Most important, if the model-builder is unwilling to commit to a model for dividends, there is no prospect of evaluating the sample variability of V(p*t), rendering construction of confidence intervals impossible. Thus it was no accident that Shiller reported point estimates of V{pt) and V{p*t), but no ?-sta- tistics.

196 S. F. LeRoy One can, however, investigate the statistical properties of V(p*t) under particular models of dividends, and this has been done. As Flavin (1983) and Kleidon (1986) showed, because of the very high serial correlation of p*,T, V(p*t) is severely biased downward as an estimator of V(p*t); see Gilles and LeRoy (1991) for intuitive interpretation. As noted above, this by itself is not a problem since the rejection region can always be modified to offset the effect of bias. However, such modification cannot be implemented without committing to a dividends model, so if one takes this route the advantage of the model-free estimator is foregone. Also, it is known that the model-free estimator V(p*t) has higher sample variability than its model-based counterpart, discussed below. A model-based estimator of V(p*t) can be constructed if one is willing to specify a statistical model assumed to generate dividends. For example, suppose dividends are generated by a first-order autoregression: dt+i = kdt + er+i . (2.4) Then an expression for the population value of V(p*) is readily computed as a function of 1, c\ and /?, and a model-based estimator V(p*) can be constructed by substituting parameter estimates for their population counterparts. Assuming the dividends model is correctly specified, the model-based estimator has little bias (at least in some settings) and, more important, very low sample variability (LeRoy- Parke (1992)). In the setting of LeRoy-Parke the model-based point estimate of V(p*) is about three times greater than the estimate of V(pt), suggesting acceptance of (1.8). However, due to the nuisance-parameter problem to be discussed now, this result is not of much importance. Besides the ambiguities resulting from the various methods of constructing V(p*t), an even more basic problem arises from the fact that 1.8 is an inequality rather than an equality. Assuming that the null hypothesis is true, the population value of V(pt) depends on the magnitude of the error in investors' estimates of future dividends. Therefore the same is true of the volatility parameter V(p*t) - V{pt), the sample counterpart of which constitutes the test statistic. This error variance is not restricted by the assumption of market efficiency, leading to its characterization as a nuisance parameter. In LeRoy-Parke it is argued that this problem is very serious quantitatively: there is no way to set a rejection region for the volatility statistic V(p*t) — V(pt)- It is argued there that because of this nuisance parameter problem, directly testing Eq. (1.8) is essentially impossible. Since (1.8) is the best-known of the variance-bounds relations, this is not a minor conclusion. There exist other variance-bounds tests that are better-behaved econometri- cally than inequality (1-8). To develop these, define e,+i as the innovation in stock payoffs: e,+i =dt+i +p,+i -Et(dt+i +p,+\) , (2.5) so that the present-value relation (1.3) can be written as pt = j?Er(dr+i +Pt+\) = P{d,+i +pt+\ - £r+i) • (2-6)

Stock price volatility 197 Substituting recursively, using the definition 1.7 of p* and assuming convergence, (2.6) becomes oo tf =/>« + £ A+/ , (2-7) (=1 so that the difference between p* and pt is expressible as a weighted sum of payoff innovations. Equation (2.7) implies V(p*t) = V(j>t)+J^j1V{Et) . (2.8) Put this result aside for the moment. The upper bound for price volatility is derived by considering the volatility of a hypothetical price series that would obtain if investors had perfect information about future dividends. LeRoy-Porter also showed that a lower bound on price volatility could be derived if one was willing to specify that investors have at least some minimal information about future dividends. Suppose that one assumes that investors know at least current and past dividends; they may or may not have access to other variables that predict future dividends. Let p, denotes the stock price that would prevail under this minimal information specification: pt = E(p*t\dt,dt-Udt-2,...) . (2.9) Then because It is a refinement of the information partition induced by dt,dt-\,dt-i,..., we have pt=E({E(p*t\It)]\dt,dt-Udt-2,...) , (2.10) by the law of iterated expectations, or pt = E(pt\d„dt-Udt-2,---) , (2.11) using (1.6). Therefore, by exactly the same reasoning used to derive (1.8), we obtain V(p,) < V(p,) , (2.12) so the variance of pt is a lower bound for the variance of pt. This lower bound is without direct empirical interest since no one has seriously suggested that stock prices are less volatile than is implied by the present-value model under the assumption that investors know current and past dividends. However, the lower bound may be put to a more interesting use. By defining lt+\ as the payoff innovation under the information set generated by du dt~\, dt-i,..., e(+i =dt+i +pt+i -E(dt+i +p,+l\dt,dt-udt-2,---) , (2.13) we derive oo pl=~Pt + J2F~£<+i ■ (2-14)

198 S. F. LeRoy by following exactly the derivation of (2.7). Equation (2.14) implies V(p*t) = V(pt)+j£-pV(lt) . (2.15) Equations (2.8) and (2.15) plus the lower bound inequality (2.12) imply V(l,) > V(et) . (2.16) Thus the present-value relation implies not just that prices are less volatile than they would be if investors had perfect information, but also that net one-period payoffs are less volatile than they would be if investors had less information than they (by assumption) do. To test (2.16), one simply fits a univariate time-series model to dividends and uses it to compute V(st), while V(et) is just the estimated residual variance in the regression d,+p, = prlPt-i+et ■ (2.17) This adaptation of LeRoy-Porter's lower bound on price volatility to the formally equivalent - but much more interesting econometrically - upper bound on payoff volatility is due to West (1988). The West test, like Shiller and LeRoy-Porter's upper bound tests on price volatility, resulted in rejection. West reported statistically significant rejection (as noted, Shiller did not compute confidence intervals, while LeRoy-Porter's rejections were only of borderline statistical significance). Generally, the West test is free of the most serious econometric problems that beset the price bounds tests. Most important, under the null hypothesis payoff innovations are serially uncorrelated, so sample means yield good estimates of population means (recall that model-free tests of price volatility are subject to the problem that pt and p* are highly serially correlated). Further, the associated ^-statistics can be used to compute rejection regions. Finally, there is no need to specify investors' information since a model-free estimate of V(et) is used, implying that the nuisance parameter problem that occurs under model-based price bounds tests does not appear here. 3. Dividend-smoothing and nonstationarity One objection sometimes raised against the variance-bounds tests is that corporate managers smooth dividends. That being the case, and because the ex-post rational stock price is in turn a highly smoothed average of dividends, it is argued that we should not be surprised that actual stock prices are choppier than ex-post rational prices. This point was raised most forcefully by Marsh and Merton (1983), (1986).l Marsh-Merton asserted that the variance-bounds theorems re- 1 This discussion is drawn from the 1988 version of Gilles-LeRoy (1991), available from the author. Discussion of Marsh-Merton was deleted from the published version of that paper in response to an editor's request.

Stock price volatility 199 quire for their derivation the assumption that dividends are exogenous, and also that the resulting series is stationary. If these assumptions are not satisfied the variance-bounds theorems are reversed. To prove this, Marsh-Merton (1986) assumed that managers set dividends as a distributed lag on past stock prices: iV dt = Y^hPt-t . (3.1) /=i Further, from (1.7) the ex-post rational stock price can be written as p^^dM + f-%. (3.2) i=i Finally, Marsh-Merton took the terminal ex-post rational stock price to be given by the sample average stock price: p*T = l2=±2L . (3.3) Substituting (3.1) and (3.3) into (3.2), it is seen that p* is expressible as a weighted average of the in-sample p/s. Using this result, Marsh-Merton proved that in every sample p* has lower variance than pt, just the opposite of the variance- bounds theorem. Questions emerge about Marsh-Merton's assertion that the variance-bounds inequality is reversed if managers smooth dividends. The most important question arises from the fact that none of the rigorous derivations of the variance- bounds theorems available in the literature make use, explicitly or implicitly, of any assumption of exogeneity or stationarity: instead, the theorems depend only on the fact that the conditional expectation of a random variable is less volatile than the random variable itself. How, then, does dividend smoothing reverse the variance-bounds theorem? It turns out that Marsh-Merton are not in fact asserting that the variance-bounds theorems are incorrect, but only that in the setting they specify the sample counterparts of the variance of;?* and pt reverse the population inequality; Marsh-Merton's failure to use notation that distinguishes population from sample moments renders careful reading of their paper needlessly difficult. Marsh-Merton's dividend specification implies that dividends and prices are necessarily nonstationary (this is proved explicitly in Shiller's (1986) comment on Marsh-Merton). Sample moments cannot be expected to satisfy the same inequalities as population moments if the latter are infinite (or time-varying, depending on the interpretation). In nonstationary populations, in fact, there is essentially no relation between population moments and the corresponding sample moments2 - indeed, the very idea that there is a correspondence between 2 Gilles-LeRoy (1991) set out an example, adapted from Kleidon (1986), in which the martingale convergence theorem implies that the sample counterpart of the variance-bounds inequality is reversed with arbitrarily high probability in arbitrarily long samples despite being troe at each date in the population. As with Marsh-Merton, nonstationarity is the culprit.

200 S. F. LeRoy sample and population moments in time-series analysis derives its meaning from the analysis of stationary series. Thus there is no inconsistency whatever between the assertion that the population variance-bounds inequality is satisfied at every date, as it is in Marsh-Merton's model, and Marsh-Merton's demonstration that under their specification its sample counterpart is reversed for every possible sample. What Marsh-Merton's example demonstrates is that if one uses analytical methods appropriate under stationarity when the data under investigation are nonstationary, one can be misled. Thus formulated, Marsh-Merton's conclusion is surely correct. The logical implication is that one wishes to make progress with the analysis of stock price volatility, one should go on to formulate statistical procedures that are appropriate in the nonstationary setting they assume. Marsh- Merton did not do so, and no easy extension of their model would have allowed them to take this next step. The reason is that Marsh-Merton's model does not contain any specification of what exogenous variables drive their model; the only behavior they model is managers' response to stock prices, treated as exogenous, in setting dividends. Marsh-Merton made two criticisms of the variance-bounds tests: (1) that they depend on the assumption that dividends are stationary, and (2) that they depend on the assumption that dividends are exogenous, as opposed to being smoothed by managers (this second criticism is especially prominent in Marsh-Merton's unpublished paper (1983) dealing with LeRoy-Porter (1981)). Marsh-Merton treated the two points as interchangeable, so that exogeneity was taken to imply stationarity, and dividend-smoothing nonstationarity. In fact dividend exogeneity neither implies nor is implied by stationarity, and the variance-bounds theorems require neither one, as we saw above. It is true that the specific empirical implementation adopted by Shiller has attractive econometric properties only when dividends are stationary in levels.3 However, whether or not the analyst chooses to model the dividend-payout decision, as Marsh-Merton did, or directly assigns dividends a probabilistic model, as LeRoy-Porter did, is immaterial: if the assumed dividends model under the latter coincides with the behavior implied for dividends in the former case, the two are equivalent. It follows that any implementation of the variance-bounds tests that accurately characterizes dividend behavior is acceptable, regardless of whether corporate managers are smoothing dividends and regardless of whether such behavior, if occurring, is modeled. Whether or not Shiller's assumption of trend-stationarity is acceptable has been controversial: many analysts believe that major macroeconomic time series, such as GNP, have a unit root. The debate about trend-stationarity vs. unit roots in macroeconomic time-series is not reviewed here, except to note that (1) of all 3 LeRoy-Porter used a trend correction based on reversing the effect of earnings retention that should have resulted in stationary data, but in fact produced series with a downward trend (which explained why their rejections of the variance-bounds theorems were of only marginal statistical significance). The reasons for the failure of LeRoy-Porter's trend correction are unclear.

Stock price volatility 201 the major macroeconomic time series, aggregate dividends appears closest to trend-stationarity, and (2) many econometricians believe that it is difficult to distinguish empirically between the trend-stationary and unit-root cases. Kleidon (1986) showed that if dividends have a unit root, so that dividend shocks have a permanent component, then stock prices should be more volatile than they would be if dividends were stationary. Kleidon expressed the opinion that the evidence of excess volatility reflects nothing more than the nonstationarity of dividends. However, this opinion cannot be sustained. First, the West test is valid if dividends are generated by a linear time-series process with a unit root, so that, if the expected present-value model is correct, dividends and stock prices are cointegrated. West, it is recalled, found significant excess volatility. Other tests, of which Campbell and Shiller (1988) was the first to be published, dealt with dividend nonstationarity by working with the price-dividend ratio instead of price levels. Again the conclusion was that stock prices are excessively volatile. LeRoy-Parke (1992) showed that the variance equality that LeRoy-Porter had used, VW) = V(P,)+J^ , (3-4) could be adapted to apply to the intensive price-dividend variables, yielding V(p*t/dt)=V(p,/dt) + dV{rt) , (3.5) where 5 is a function of various parameters, under the assumption that all variances of the intensive variables pt/d,, p*/d, and r, remain constant over time (this is the counterpart of the assumption, required to derive (3.4), that variances of extensive variables like p,, p* and e, remain constant over time). LeRoy-Parke also found excess volatility (see also LeRoy and Steigerwald, 1993). Thus the debate about whether dividends are trend-stationary or have a unit root is, from the point of view of the variance-bounds tests, irrelevant: either way, volatility exceeds that predicted by the present-value model. 4. Bubbles These results show that excess volatility occurs under at least some forms of dividend nonstationarity. However, they do not necessarily completely dispose of Marsh-Merton's criticisms; any model-based variance-bounds test requires some specification of the probability law, stationary or nonstationary, assumed to generate dividends, and critics can always question this specification. For example, LeRoy-Parke assumed that dividends follow a geometric random walk, a characterization that appears not to do great violence to the data. However, it may be that the dividend-smoothing behavior of managers results in a less parsimonious model for dividends, in which case LeRoy-Parke's results may reflect nothing more than misspecification of the dividends model.

202 S. F. LeRoy Two sets of circumstances might invalidate variance-bounds tests based on particular dividend specification such as the geometric random walk. First, it may be that even data sets as long as a century (the length of Shiller's 1981 data set, which was also used in several of the subsequent variance-bounds papers) are too short to allow accurate estimation of dividend volatility. Regime shift models, for example, require very long data sets for accurate estimation. Alternatively, the stock market may be subject to a "peso problem" - investors might attach time- varying probabilities to an event which did not occur in the finite sample. The second circumstance that might invalidate variance-bounds tests is rational speculative bubbles. Thus consider an extreme case of Marsh-Merton's dividend-smoothing behavior: suppose that firms pay some positive (but low) level of dividends that is deterministic.4 Thus all fluctuations in earnings show up as additions to (or subtractions from) capital. In this setting the market value of the firm will reflect the value of its capital, which by assumption does not depend on past dividends. Price volatility will obviously exceed the volatility implied by dividends, since the latter is zero, so the variance-bounds theorem is violated. Theoretically, what is happening in this case is that the limiting condition (1.5) is not satisfied, so that stock prices do not equal the limit of the present value of dividends. Models in which (1.5) fails are defined as rational speculative bubbles: prices are higher than the present value of future dividends but, because they are expected to rise still higher, (1.3) is satisfied. Thus insofar as they are suggesting that dividend smoothing invalidates empirical tests of the variance-bounds relations even in infinite samples, Marsh-Merton are asserting the existence of rational speculative bubbles. Bubbles have received much study in the recent economics literature, partly because of their potential role in resolving the excess volatility puzzle (for theoretical studies of rational bubbles, see Gilles and LeRoy (1992) and the sources cited there; for a summary of the empirical results as they apply to variance- bounds, see Flood and Hodrick (1990)). This is not the place for a complete discussion of bubbles; we remark only that the widely-held impression that bubbles cannot occur in models incorporating rationality is incorrect. This impression is fostered by the practice of referring incorrectly to (1.5) as a trans- versality condition (a transversality condition is associated with an optimization problem; no such problem has been specified here), suggesting that its satisfaction is somehow virtually automatic. In fact, (1) there exist well-posed optimization problems that do not have necessary transversality conditions, and (2) transversality conditions, even when necessary for optimization, do not always imply (1.5.) Examples are found in Gilles-LeRoy (1992). These examples, it is true, appear recondite. However, recall that the goal here is to explain behavior - 4 This specification conflicts with limited liability, which in conjunction with random earnings implies that firm managers may not be able to commit to paying positive dividends with certainty into the infinite future. This objection, while valid, is extraneous to the present concern, and hence is set aside.

Stock price volatility 203 excess volatility - that is itself counterintuitive; given this, we should not readily dismiss out of hand counterintuitive specifications of preferences. If (1.3) is satisfied but (1.5) fails, then the price of stock differs from the expected present value of dividends by a bubble term that satisfies bt+i =^+p)bt + n,+x , (4-1) so that a bubble is a martingale with drift p. Since the bubble increases in value at average rate p, which exceeds the growth rate of dividends (otherwise stock prices would be infinite), stock prices rise more rapidly than dividends. Therefore the dividend-price ratio will decrease over time. Informal examination of a plot of the dividend-price ratio shows no clear downward trend, and the majority of the empirical studies surveyed by Flood-Hodrick (1990) do not find evidence of bubbles. This literature is under rapid development, however, from both the theoretical and empirical sides, and this conclusion may shortly be reversed. For now, however, it is difficult to find support for the contention that firms are smoothing dividends in such a way as to invalidate the stationarity presumed in the variance-bounds tests. 5. Time-varying discount rates One possible explanation for the apparent excess volatility of securities prices is that conditionally expected rates of return depend on the values taken on by the conditioning variables, contradicting (1.1). There is no reason, other than a desire for simplification, to adopt the restriction that the conditional expected return on stock is constant over time, as implied by (1.1). If agents are risk averse, one would expect the conditions of equilibrium in asset markets to reflect a risk-return tradeoff, so that (1.1) would be replaced by a term involving the higher moments of return distributions as well as the conditional mean (consider CAPM, for example). Thus equilibrium conditions like (1.1) are best interpreted as obtaining in efficient markets under the additional assumption of risk-neutrality (LeRoy (1973), Lucas (1978)). Further, in simple models in which agents are risk averse, price volatility is likely to exceed that predicted by risk-neutrality. The intuition is simple: under risk aversion agents try to transfer consumption from dates when income and consumption are high to dates when they are low. Decreasing returns in production mean that this transfer is increasingly costly, so security prices must behave in such a way as to penalize agents who make this transfer. If stock prices are high (low) when income is high (low), then agents are motivated to adapt their saving or dissaving to the production technology, as they must in equilibrium. Thus the more risk averse agents are, the more choppy equilibrium stock prices will be (LaCivita and LeRoy (1981), Grossman and Shiller (1981)). This raises the possibility that the apparent volatility is nothing more than an artifact of the misspecification of risk neutrality implicit in (1.1).

204 S. F. LeRoy A very simple modification of the efficient markets model is seen to be, in principle, sufficient to explain existing price volatility. Providing other explanations subsequently became a minor cottage industry, perhaps because it is so easy to modify the characterization of market efficiency so as to alter its volatility prediction (1.8) (see Eden and Jovanovic 1994, Romer 1993 or Allen and Gale 1994, for example, for recent contributions). For example, consider an overlapping generations model in which the aggregate endowment is deterministic, but some stochastic factor like a random wealth transfer or monetary shock affects individual agents. In general this random shock will affect equilibrium stock prices. This juxtaposition of deterministic aggregate dividends and stochastic prices contradicts the simplest formulation of market efficiency, since deterministic dividends means that the right-hand side of (1.8) is zero, while the left-hand side is strictly positive. Evidently, however, such models are efficient in any reasonable sense of the word: transactions costs are excluded and agents are assumed to be rational and to have rational expectations. Models with asymmetric information can be shown to predict price volatility that exceeds that associated with the conventional market efficiency definition. These efforts have been instructive, but should not be viewed as disposing of the volatility puzzle. The variance-bounds literature was never properly interpreted as pointing to a puzzle for which potential theoretical explanations were in short supply. Rather, it consisted in showing that a simple model which had served well in some contexts did not appear to serve so well in another context. Resolving the puzzle would consist not in pointing out that other more general models do not generate the volatility implication that the data contradict - this was never in doubt - but in showing that these models actually explain the observed variations in security prices. Such explanations have not been forthcoming. For example, attempts to incorporate the effects of risk aversion in security pricing have not succeeded (Hansen and Singleton (1983), Mehra and Prescott (1985)), nor have any of the other proposed explanations of excess volatility been successfully implemented empirically. The enduring interest of the variance-bounds controversy lies in the fact that it was here that it was first pointed out that we do not have good explanations, even ex post, for why security prices behave as they do. It is hard to imagine a more important conclusion, and nothing in the recent development of empirical finance has altered it. 6. Interpretation Variance-bounds tests as currently formulated appear to be essentially free of major econometric problems - for example, LeRoy-Parke (1992) relied on Monte Carlo simulations to assess the behavior of test statistics, thus ensuring that any econometric biases in the real-world statistics appears equally in the simulated statistics. Therefore econometric problems are automatically accommodated in

Stock price volatility 205 setting the rejection region. These reformulated variance-bounds tests have continued to find excess price volatility. The debate about statistical problems with the variance-bounds tests has died out in recent years: it is no longer seriously argued that there does not exist excess price volatility relative to that implied by the simplest expected present-value relation. As important as the above-mentioned refinements of the variance- bounds tests were in leading to this outcome, another development was still more important: conventional market efficiency tests were themselves evolving at the same time as the variance-bounds tests were being developed. The most important modification of the conventional return market efficiency tests was that they investigated return autocorrelations over much longer time horizons than had the earlier tests. Fama and French (1988) found significant predictability in returns. These return autocorrelations are most significant when returns are averaged over five to ten years; earlier studies, such as those reported in Fama (1970), had investigated return autocorrelations over weeks or months rather than years. There are several general methodological lessons to be learned from comparison of conventional market efficiency tests and variance bounds tests about econometric testing of economic theories. Since the same null hypothesis is tested, one would presume that there exist no grounds for a different interpretation of rejection in one case relative to the other. Yet it is extraordinarily difficult to keep this in mind: the existence of excess volatility suggests the conclusion that "we cannot explain security prices", whereas the return autocorrelation results suggest the more workaday conclusion that "average security returns are subject to gradual shifts over time". To bring home the point that this difference in interpretation is unjustified, assume that security prices equal those predicted by the present-value model plus a random term independent of dividends which has low innovation variance, but is highly autocorrelated. One can interpret that random term either as representing an irrational fad or as capturing smooth shifts in security returns due to changes in investment opportunities, shifts in social conditions, or whatever. This modification will generate excess volatility, and will also generate return autocorrelations of the type observed. With the same alternative hypothesis generating both the excess volatility and the return autocorrelations by assumption, there can be no justification for attaching different verbal interpretations to the two rejections. The lesson to be learned is that rejection of a model is just that: rejection of a model. One must be careful about basing interpretations of the rejection on the particular test leading to the rejection, rather than on the model being rejected. Despite being generally aware of the possibility that excess price volatility is the same thing statistically as long-horizon return autocorrelation, many financial economists nonetheless dismiss the possibility that excess price volatility has anything to do with capital market efficiency. Fama (1991) is a good example. Fama began his 1991 update of his survey (1970) by reemphasizing the point (made also in his 1970 survey) that any test of market efficiency is necessarily a joint test with a particular returns model. He then surveyed the evidence (to which

206 S. F. LeRoy he has been a major contributor) that there exists high negative autocorrelation in returns at long horizons, remarking that this is statistically equivalent to "long swings away from fundamental value" (p. 1581). However, in discussing the variance-bounds tests, Fama expressed the opinion that, despite the fact that they are "another useful way to show that expected returns vary through time", variance-bounds tests "are not informative about market efficiency". Contrary to this, it would seem that the joint-hypothesis problem applies no less or more to variance-bounds tests than to return autocorrelation tests: if one type of evidence is relevant to market efficiency, so is the other. Another lesson is that one must be careful about applying implicit psychological metrics that seem appropriate, but in fact are not. For example, it is easy to regard the apparently spectacular rejections of the variance bounds tests as justifying a strong verbal characterization, whereas the extraneous random term that accounts for return autocorrelations appears too small to justify a similar interpretation. This too is incorrect: a random term that adds and subtracts two or three percentage points, on average, to real stock returns (which average some six or eight per cent) will, if it is highly autocorrelated, routinely translate into a large increase in price variance. The small change in real stock returns is the same thing arithmetically as the large increase in price volatility, so the two should be accorded a similar verbal characterization. 7. Conclusion In the introduction it was noted that the early interchanges between academics and finance practitioners about capital market efficiency generated more heat than light. Models derived from market efficiency, such as CAPM-based portfolio management models, made some inroads among practitioners, but for the most part the debate between proponents and opponents of rationality in financial markets died down. Parties on both sides agreed to disagree. The evidence of excess price volatility reopened the debate, since it seemed at first to give unambiguous testimony to the existence of irrational elements in security price determination. Now it is clear that there exist other more conservative ways to interpret the evidence of excess volatility: for example, that we simply do not know what causes changes in the rates at which future expected dividends are discounted. The variance-bounds controversy, together with parallel developments in financial economics, permit a considerable narrowing of the gap separating proponents and opponents of market efficiency. The existence of excess volatility implies that there are profitable trading rules, but it is known that these generate only small utility gains to those employing them. In fact, this juxtaposition between large departures from present-value pricing and small gains to those who try to exploit these departures provides the key to finding some middle ground in the efficiency debate. Proponents of market efficiency are vindicated because no one has identified trading rules that are more than marginally profitable. De-

Stock price volatility 207 tractors of market efficiency are vindicated because a large proportion of the variation in security prices remains unexplained by market fundamentals. Both are correct; both are discussing the same sets of stylized facts. Some proponents of market efficiency go to great lengths to argue that it is unscientific to interpret excess volatility as evidence in favor of the importance of psychological elements in security price determination; see, for example, Co- chrane's otherwise excellent review (1991) of Shiller's (1989) book. On this view, evidence is scientific only when it is incontrovertible and, presumably, not susceptible to interpretations other than that proposed. At best this is an unconventional use of the term "scientific". Indeed, if the term "unscientific" is to be applied at all, should it not be to those who feel no embarrassment about the continuing presence in their models of an uninterpreted residual that accounts for most of the variation in the data? Given the continuing failure of financial models based exclusively on received neoclassical economics to provide ex-post explanations of security price behavior, why does being scientific rule out broadening the field of inquiry to include psychological considerations? References Allen, F. and D. Gale (1994). Limited market participation and volatility of asset prices. Amer. Econom. Rev. 84, 933-955. Campbell, J. Y. and R. J. Shiller (1988). The dividend-price ratio and expectations of future dividends and discount factors. Rev. Financ. Stud. 1, 195-228. Cochrane, J. (1991). Volatility tests and efficient markets: A review essay. /. Monetary Econom. 27, 463-485. Eden, B. and B. Jovanovic (1994). Asymmetric information and the excess volatility of stock prices. Economic Inquiry 32, 228-235. Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. /. Finance 25, 283-417. Fama, E. F. (1991). Efficient capital markets: II. /. Finance 46, 1575-1617. Fama, E. F. and K. R. French (1988). Permanent and transitory components of stock prices. /. Politic. Econom. 96, 246-273. Flavin, M. (1983). Excess volatility in the financial markets: A reassessment of the empirical evidence. /. Politic. Econom. 91, 929-956. Flood, R. P. and R. J. Hodrick (1990). On testing for speculative bubbles. /. Econom. Perspectives 4, 85-101. Gilles, C. and S. F. LeRoy (1992). Bubbles and charges. Internat. Econom. Rev. 33, 323-339. Gilles, C. and S. F. LeRoy (1991). Economic aspects of the variance-bounds tests: A survey. Rev. Financ. Stud. 4, 753-791. Grossman, S. J. and R. J. Shiller (1981). The determinants of the variability of stock prices. Amer. Econom. Rev. Papers Proc. 71, 222-227. Hansen, L. and K. J. Singleton (1983). Stochastic consumption, risk aversion, and the temporal behavior of asset returns. Econometrica 91, 249-265. Kleidon, A. W. (1986). Variance bounds tests and stock price valuation models. /. Politic. Econom. 94, 953-1001. LaCivita, C. J. and S. F. LeRoy (1981). Risk aversion and the dispersion of asset prices. /. Business 54, 535-547.

208 S. F. LeRoy LeRoy, S. F. (1973). Risk aversion and the martingale model of stock prices. Internal. Econom. Rev. 14, 436-446. LeRoy, S. F. and W. R. Parke (1992). Stock price volatility: Tests based on the geometric random walk. Amer. Econom. Rev. 82, 981-992. LeRoy, S. F. and A. D. Porter (1981). Stock price volatility: Tests based on implied variance bounds. Econometrica 49, 555-574. LeRoy, S. F. and D. G. Steigerwald (1993). Volatility. University of Minnesota. Lucas, R. E. (1978). Asset prices in an exchange economy. Econometrica 46, 1429-1445. Marsh, T. A. and R. C. Merton (1986). Dividend variability and variance bounds tests for the rationality of stock market prices. Amer. Econom. Rev. 76, 483^498. Marsh, T. A. and R. E. Merton (1983). Earnings variability and variance bounds tests for stockmarket prices: A comment. Reproduced, MIT Mehra, R. and E. C. Prescott (1985). The equity premium: A puzzle. /. Monetary Econom. 15, 145— 161. Romer, D. (1993). Rational asset price movements without news. Amer. Econom. Rev. 83, 1112-1130. Samuelson, P. A. (1965). Proof that properly anticipated prices flutuate randomly. Indust. Mgmt. Rev. 6, 41^19. Shiller, R. J. (1981). Do stock prices move too much to be justified by subsequent changes in dividends? Amer. Econom. Rev. 71, 421^436. Shiller, R. J. (1989). Market Volatility. MIT Press, Cambridge, MA. Shiller, R. J. (1986). The Marsh-Merton model of managers' smoothing of dividends. Amer. Econom. Rev. 76, 499-503. West, K. (1988), Bubbles, fads and stock price volatility: A partial evaluation. /. Finance 43, 636-656.

G. S. Maddala and C. R. Rao, eds., Handbook of Statistics, Vol. 14 © 1996 Elsevier Science B. V. All rights reserved. 7 GARCH Models of Volatility* F. C. Palm 1. Introduction Until some fifteen years ago, the focus of statistical analysis of time series centered on the conditional first moment. The increased role played by risk and uncertainty in models of economic decision making and the finding that common measures of risk and volatility exhibit strong variation over time lead to the development of new time series techniques for modeling time-variation in second moments. In line with Box-Jenkins type models for conditional first moments, Engle (1982) put forward the Autoregressive Conditional Heteroskedastic (ARCH) class of models for conditional variances which proved to be extremely useful for analyzing economic time series. Since then an extensive literature has been developed for modeling higher order conditional moments. Many applications can be found in the field of financial time series. This vast literature on the theory and empirical evidence from ARCH modeling has been surveyed in Bollerslev et al. (1992), Nijman and Palm (1993), Bollerslev et al. (1994), Diebold and Lopez (1994), Pagan (1995) and Bera and Higgings (1995). A detailed treatment of ARCH models at a textbook level is also given by Gourieroux (1992). The purpose of this chapter is to provide a selective account of certain aspects of conditional volatility modeling in finance using ARCH and GARCH (generalized ARCH) models and to compare the ARCH approach to alternatives lines of research. The emphasis will be on recent developments for instance in multivariate modeling using factor-ARCH models. Finally, an evaluation of the state of the art will be given. In Section 2, we introduce the univariate and multivariate GARCH models (including ARCH models), discuss their properties and the choice of the functional form and compare them with alternative volatility models. Section 3 will be devoted to problems of inference in these models. In Section 4, the statistical properties of GARCH models, their relationships with continuous time diffusion * The author acknowledges many helpful comments by G. S. Maddala on an earlier version of the paper. 209

210 F. C. Palm models and the forecasting volatility will be discussed. Finally in Section 5 we conclude and comment on potentially fruitful directions of future research. 2. GARCH models 2.1. Motivation GARCH models have been developed to account for empirical regularities in financial data. As emphasized by Pagan (1995) and Bollerslev et al. (1994), many financial time series have a number of characteristics in common. First, asset prices are generally nonstationary, often have a unit root whereas returns are usually stationary. There is increasing evidence that some financial series are fractionally integrated. Second, return series usually show no or little autocorrelation. Serial independence between the squared values of the series however is often rejected pointing towards the existence of nonlinear relationships between subsequent observations. Volatility of the return series appears to be clustered. Heavy fluctuations occur for longer periods. Small values for returns tend to be followed by small values. These phenomena point towards time-varying conditional variances. Third, normality has to be rejected frequently in favor of some thick-tailed distribution. The presence of unconditional excess kurtosis in the series could be related to the time-variation in the conditional variance. Fourth, some series exhibit so-called leverage effects [see Black (1976)], that is changes in stock prices tend to be negatively correlated with changes in volatility. Some series have skewed unconditional empirical distributions pointing towards the inappropriateness of the normal distribution. Fifth, volatilities of different securities very often move together, indicating that there are linkages between markets and that some common factors may explain the temporal variation in conditional second moments. In the next subsection, we shall present several models which account for temporal dependence in conditional variances, for skewness and excess kurtosis. 2.2. Univariate GARCH models Consider stochastic models of the form yt = Bth]/2 , (2.1) p i /*r = a0 + ]T PA-i + J2 arf-i (2-2) e=i i=i with Eer = 0, Var(et) = 1, a0 > 0, ft > 0, «, > 0, and £f=1 ft + £?=i «* < L This is the (p,q)th order GARCH model introduced by Bollerslev (1986). When ft = 0, i = 1,2, ...p, it specializes to the ARCH(^) model put forward in a seminal paper by Engle (1982). The nonnegativity conditions imply a nonnegative variance, while the condition on the sum of the a,'s and ft's is required for wide sense

GARCH models of volatility 211 stationarity. These sufficient conditions for a nonnegative conditional variance can be substantially weakened as shown by Nelson and Cao (1992). The conditional variance of yt can become larger than the unconditional variance given by a2 = ao/(l— J2jLi $i ~ 121=1 ad if past realizations of y2 have been larger than a2. As shown by Anderson (1992), the GARCH model belongs to the class of deterministic conditional heteroskedasticity models in which the conditional variance is a function of variables that are in the information set available at time t. Adding the assumption of normality, the model can be written as j,,|*,_, ~JV(0,A,) , (2.3) with ht being given by (2.2) and <P,^\ being the set of information available a