\chapter{Univariate time series models} \label{chap:timeser} \section{Introduction} \label{sec:tsintro} Time series models are discussed in this chapter and the next. In this chapter we concentrate on ARIMA models, unit root tests, and GARCH. The following chapter deals with cointegration and error correction. \section{ARIMA models} \label{arma-estimation} \subsection{Representation and syntax} \label{arma-repr} The \cmd{arma} command performs estimation of AutoRegressive, Integrated, Moving Average (ARIMA) models. These are models that can be written in the form \begin{equation} \label{eq:plain-0-arma} \phi(L) y_t = \theta(L) \epsilon_t \end{equation} where $\phi(L)$, and $\theta(L)$ are polynomials in the lag operator, $L$, defined such that $L^n x_t = x_{t-n}$, and $\epsilon_t$ is a white noise process. The exact content of $y_t$, of the AR polynomial $\phi()$, and of the MA polynomial $\theta()$, will be explained in the following. \subsection{Mean terms} \label{sec:arma-nonzeromean} The process $y_t$ as written in equation (\ref{eq:plain-0-arma}) has, without further qualifications, mean zero. If the model is to be applied to real data, it is necessary to include some term to handle the possibility that $y_t$ has non-zero mean. There are two possible ways to represent processes with nonzero mean: one is to define $\mu_t$ as the \emph{unconditional} mean of $y_t$, namely the central value of its marginal distribution. Therefore, the series $\tilde{y}_t = y_t - \mu_t$ has mean 0, and the model (\ref{eq:plain-0-arma}) applies to $\tilde{y}_t$. In practice, assuming that $\mu_t$ is a linear function of some observable variables $x_t$, the model becomes \begin{equation} \label{eq:arma-with-x} \phi(L) (y_t - x_t \beta) = \theta(L) \epsilon_t \end{equation} This is sometimes known as a ``regression model with ARMA errors''; its structure may be more apparent if we represent it using two equations: \begin{eqnarray*} y_t & = & x_t \beta + u_t \\ \phi(L) u_t & = & \theta(L) \epsilon_t \end{eqnarray*} The model just presented is also sometimes known as ``ARMAX'' (ARMA + eXogenous variables). It seems to us, however, that this label is more appropriately applied to a different model: another way to include a mean term in (\ref{eq:plain-0-arma}) is to base the representation on the \emph{conditional} mean of $y_t$, that is the central value of the distribution of $y_t$ \emph{given its own past}. Assuming, again, that this can be represented as a linear combination of some observable variables $z_t$, the model would expand to \begin{equation} \label{eq:arma-with-z} \phi(L) y_t = z_t \gamma + \theta(L) \epsilon_t \end{equation} The formulation (\ref{eq:arma-with-z}) has the advantage that $\gamma$ can be immediately interpreted as the vector of marginal effects of the $z_t$ variables on the conditional mean of $y_t$. And by adding lags of $z_t$ to this specification one can estimate \emph{Transfer Function models} (which generalize ARMA by adding the effects of exogenous variable distributed across time). \app{Gretl} provides a way to estimate both forms. Models written as in (\ref{eq:arma-with-x}) are estimated by maximum likelihood; models written as in (\ref{eq:arma-with-z}) are estimated by conditional maximum likelihood. (For more on these options see the section on ``Estimation'' below.) In the special case when $x_t = z_t = 1$ (that is, the models include a constant but no exogenous variables) the two specifications discussed above reduce to \begin{equation} \phi(L) (y_t - \mu) = \theta(L) \epsilon_t \label{eq:arma-with-xconst} \end{equation} and \begin{equation} \phi(L) y_t = \alpha + \theta(L) \epsilon_t \label{eq:arma-with-zconst} \end{equation} respectively. These formulations are essentially equivalent, but if they represent one and the same process $\mu$ and $\alpha$ are, fairly obviously, not numerically identical; rather \[ \alpha = \left(1 - \phi_1 - \ldots - \phi_p\right) \mu \] The \app{gretl} syntax for estimating (\ref{eq:arma-with-xconst}) is simply \begin{code} arma p q ; y \end{code} The AR and MA lag orders, \texttt{p} and \texttt{q}, can be given either as numbers or as pre-defined scalars. The parameter $\mu$ can be dropped if necessary by appending the option \cmd{--nc} (``no constant'') to the command. If estimation of (\ref{eq:arma-with-zconst}) is needed, the switch \option{conditional} must be appended to the command, as in \begin{code} arma p q ; y --conditional \end{code} Generalizing this principle to the estimation of (\ref{eq:arma-with-x}) or (\ref{eq:arma-with-z}), you get that \begin{code} arma p q ; y const x1 x2 \end{code} would estimate the following model: \[ y_t - x_t \beta = \phi_1 \left(y_{t-1} - x_{t-1} \beta \right) + \ldots + \phi_p \left( y_{t-p} - x_{t-p} \beta \right) + \epsilon_t + \theta_1 \epsilon_{t-1} + \ldots + \theta_q \epsilon_{t-q} \] where in this instance $x_t \beta = \beta_0 + x_{t,1} \beta_1 + x_{t,2} \beta_2$. Appending the \option{conditional} switch, as in \begin{code} arma p q ; y const x1 x2 --conditional \end{code} would estimate the following model: \[ y_t = x_t \gamma + \phi_1 y_{t-1} + \ldots + \phi_p y_{t-p} + \epsilon_t + \theta_1 \epsilon_{t-1} + \ldots + \theta_q \epsilon_{t-q} \] Ideally, the issue broached above could be made moot by writing a more general specification that nests the alternatives; that is \begin{equation} \label{armax-general} \phi(L) \left(y_t - x_t \beta\right) = z_t \gamma + \theta(L) \epsilon_t ; \end{equation} we would like to generalize the \cmd{arma} command so that the user could specify, for any estimation method, whether certain exogenous variables should be treated as $x_t$s or $z_t$s, but we're not yet at that point (and neither are most other software packages). \subsection{Seasonal models} A more flexible lag structure is desirable when analyzing time series that display strong seasonal patterns. Model (\ref{eq:plain-0-arma}) can be expanded to \begin{equation} \label{eq:seasonal-arma} \phi(L) \Phi(L^s) y_t = \theta(L) \Theta(L^s) \epsilon_t . \end{equation} For such cases, a fuller form of the syntax is available, namely, \begin{code} arma p q ; P Q ; y \end{code} where \texttt{p} and \texttt{q} represent the non-seasonal AR and MA orders, and \texttt{P} and \texttt{Q} the seasonal orders. For example, \begin{code} arma 1 1 ; 1 1 ; y \end{code} would be used to estimate the following model: \[ (1 -\phi L)(1 -\Phi L^s) (y_t - \mu) = (1 + \theta L)(1 + \Theta L^s) \epsilon_t \] If $y_t$ is a quarterly series (and therefore $s=4$), the above equation can be written more explicitly as \[ y_t - \mu = \phi (y_{t-1} - \mu) + \Phi (y_{t-4} - \mu) - (\phi \cdot \Phi) (y_{t-5} - \mu) + \epsilon_t + \theta \epsilon_{t-1} + \Theta \epsilon_{t-4} + (\theta \cdot \Theta) \epsilon_{t-5} \] Such a model is known as a ``multiplicative seasonal ARMA model''. \subsection{Gaps in the lag structure} The standard way to specify an ARMA model in \app{gretl} is via the AR and MA orders, $p$ and $q$ respectively. In this case all lags from 1 to the given order are included. In some cases one may wish to include only certain specific AR and/or MA lags. This can be done in either of two ways. % \begin{itemize} \item One can construct a matrix containing the desired lags (positive integer values) and supply the name of this matrix in place of $p$ or $q$. \item One can give a space-separated list of lags, enclosed in braces, in place of $p$ or $q$. \end{itemize} % The following code illustrates these options: % \begin{code} matrix pvec = {1, 4} arma pvec 1 ; y arma {1 4} 1 ; y \end{code} % Both forms above specify an ARMA model in which AR lags 1 and 4 are used (but not 2 and 3). This facility is available only for the non-seasonal component of the ARMA specification. \subsection{Differencing and ARIMA} The above discussion presupposes that the time series $y_t$ has already been subjected to all the transformations deemed necessary for ensuring stationarity (see also section \ref{sec:uroot}). Differencing is the most common of these transformations, and \app{gretl} provides a mechanism to include this step into the \cmd{arma} command: the syntax \begin{code} arma p d q ; y \end{code} would estimate an ARMA$(p,q)$ model on $\Delta^d y_t$. It is functionally equivalent to \begin{code} series tmp = y loop for i=1..d tmp = diff(tmp) endloop arma p q ; tmp \end{code} except with regard to forecasting after estimation (see below). When the series $y_t$ is differenced before performing the analysis the model is known as ARIMA (``I'' for Integrated); for this reason, \app{gretl} provides the \cmd{arima} command as an alias for \cmd{arma}. Seasonal differencing is handled similarly, with the syntax \begin{code} arma p d q ; P D Q ; y \end{code} where \texttt{D} is the order for seasonal differencing. Thus, the command \begin{code} arma 1 0 0 ; 1 1 1 ; y \end{code} would produce the same parameter estimates as \begin{code} genr dsy = sdiff(y) arma 1 0 ; 1 1 ; dsy \end{code} where we use the \texttt{sdiff} function to create a seasonal difference (e.g.\ for quarterly data, $y_t - y_{t-4}$). In specifying an ARIMA model with exogenous regressors we face a choice which relates back to the discussion of the variant models (\ref{eq:arma-with-x}) and (\ref{eq:arma-with-z}) above. If we choose model (\ref{eq:arma-with-x}), the ``regression model with ARMA errors'', how should this be extended to the case of ARIMA? The issue is whether or not the differencing that is applied to the dependent variable should also be applied to the regressors. Consider the simplest case, ARIMA with non-seasonal differencing of order 1. We may estimate either \begin{equation} \label{eq:arima-with-x} \phi(L) (1-L) (y_t - X_t\beta) = \theta(L) \epsilon_t \end{equation} or \begin{equation} \label{eq:arima-with-z} \phi(L) \left((1-L) y_t - X_t\beta\right) = \theta(L) \epsilon_t \end{equation} % The first of these formulations can be described as a regression model with ARIMA errors, while the second preserves the levels of the $X$ variables. As of \app{gretl} version 1.8.6, the default model is (\ref{eq:arima-with-x}), in which differencing is applied to both $y_t$ and $X_t$. However, when using the default estimation method (native exact ML, see below), the option \option{y-diff-only} may be given, in which case \app{gretl} estimates (\ref{eq:arima-with-z}).\footnote{Prior to \app{gretl} 1.8.6, the default model was (\ref{eq:arima-with-z}). We changed this for the sake of consistency with other software.} \subsection{Estimation} \label{arma-est} The default estimation method for ARMA models is exact maximum likelihood estimation (under the assumption that the error term is normally distributed), using the Kalman filter in conjunction with the BFGS maximization algorithm. The gradient of the log-likelihood with respect to the parameter estimates is approximated numerically. This method produces results that are directly comparable with many other software packages. The constant, and any exogenous variables, are treated as in equation (\ref{eq:arma-with-x}). The covariance matrix for the parameters is computed using a numerical approximation to the Hessian at convergence. The alternative method, invoked with the \option{conditional} switch, is conditional maximum likelihood (CML), also known as ``conditional sum of squares'' \citep[see][p.\ 132]{hamilton94}. This method was exemplified in the script~\ref{jack-arma}, and only a brief description will be given here. Given a sample of size $T$, the CML method minimizes the sum of squared one-step-ahead prediction errors generated by the model for the observations $t_0, \ldots, T$. The starting point $t_0$ depends on the orders of the AR polynomials in the model. The numerical maximization method used is BHHH, and the covariance matrix is computed using a Gauss--Newton regression. The CML method is nearly equivalent to maximum likelihood under the hypothesis of normality; the difference is that the first $(t_0 - 1)$ observations are considered fixed and only enter the likelihood function as conditioning variables. As a consequence, the two methods are asymptotically equivalent under standard conditions --- except for the fact, discussed above, that our CML implementation treats the constant and exogenous variables as per equation (\ref{eq:arma-with-z}). The two methods can be compared as in the following example \begin{code} open data10-1 arma 1 1 ; r arma 1 1 ; r --conditional \end{code} which produces the estimates shown in Table~\ref{tab:ml-cml}. As you can see, the estimates of $\phi$ and $\theta$ are quite similar. The reported constants differ widely, as expected --- see the discussion following equations (\ref{eq:arma-with-xconst}) and (\ref{eq:arma-with-zconst}). However, dividing the CML constant by $1-\phi$ we get 7.38, which is not far from the ML estimate of 6.93. \begin{table}[htbp] \caption{ML and CML estimates} \label{tab:ml-cml} \begin{center} \begin{tabular}{crrrr} \hline Parameter & \multicolumn{2}{c}{ML} & \multicolumn{2}{c}{CML} \\ \hline $\mu$ & 6.93042 & (0.923882) & 1.07322 & (0.488661) \\ $\phi$ & 0.855360 & (0.0511842) & 0.852772 & (0.0450252) \\ $\theta$ & 0.588056 & (0.0986096) & 0.591838 & (0.0456662) \\ \hline \end{tabular} \end{center} \end{table} \subsection{Convergence and initialization} The numerical methods used to maximize the likelihood for ARMA models are not guaranteed to converge. Whether or not convergence is achieved, and whether or not the true maximum of the likelihood function is attained, may depend on the starting values for the parameters. \app{Gretl} employs one of the following two initialization mechanisms, depending on the specification of the model and the estimation method chosen. \begin{enumerate} \item Estimate a pure AR model by Least Squares (nonlinear least squares if the model requires it, otherwise OLS). Set the AR parameter values based on this regression and set the MA parameters to a small positive value (0.0001). \item The Hannan--Rissanen method: First estimate an autoregressive model by OLS and save the residuals. Then in a second OLS pass add appropriate lags of the first-round residuals to the model, to obtain estimates of the MA parameters. \end{enumerate} To see the details of the ARMA estimation procedure, add the \option{verbose} option to the command. This prints a notice of the initialization method used, as well as the parameter values and log-likelihood at each iteration. Besides the built-in initialization mechanisms, the user has the option of specifying a set of starting values manually. This is done via the \cmd{set} command: the first argument should be the keyword \texttt{initvals} and the second should be the name of a pre-specified matrix containing starting values. For example \begin{code} matrix start = { 0, 0.85, 0.34 } set initvals start arma 1 1 ; y \end{code} The specified matrix should have just as many parameters as the model: in the example above there are three parameters, since the model implicitly includes a constant. The constant, if present, is always given first; otherwise the order in which the parameters are expected is the same as the order of specification in the \cmd{arma} or \cmd{arima} command. In the example the constant is set to zero, $\phi_1$ to 0.85, and $\theta_1$ to 0.34. You can get \app{gretl} to revert to automatic initialization via the command \cmd{set initvals auto}. Two variants of the BFGS algorithm are available in \app{gretl}. In general we recommend the default variant, which is based on an implementation by \cite{nash90}, but for some problems the alternative, limited-memory version (L-BFGS-B, see \citealp{byrd-etal95}) may increase the chances of convergence on the ML solution. This can be selected via the \option{lbfgs} option to the \texttt{arma} command. \subsection{Estimation via X-12-ARIMA} As an alternative to estimating ARMA models using ``native'' code, \app{gretl} offers the option of using the external program \app{X-12-ARIMA}. This is the seasonal adjustment software produced and maintained by the U.S. Census Bureau; it is used for all official seasonal adjustments at the Bureau. \app{Gretl} includes a module which interfaces with \app{X-12-ARIMA}: it translates \cmd{arma} commands using the syntax outlined above into a form recognized by \app{X-12-ARIMA}, executes the program, and retrieves the results for viewing and further analysis within \app{gretl}. To use this facility you have to install \app{X-12-ARIMA} separately. Packages for both MS Windows and GNU/Linux are available from the \app{gretl} website, \url{http://gretl.sourceforge.net/}. To invoke \app{X-12-ARIMA} as the estimation engine, append the flag \verb|--x-12-arima|, as in \begin{code} arma p q ; y --x-12-arima \end{code} As with native estimation, the default is to use exact ML but there is the option of using conditional ML with the \option{conditional} flag. However, please note that when \app{X-12-ARIMA} is used in conditional ML mode, the comments above regarding the variant treatments of the mean of the process $y_t$ \textit{do not apply}. That is, when you use \app{X-12-ARIMA} the model that is estimated is (\ref{eq:arma-with-x}), regardless of whether estimation is by exact ML or conditional ML. In addition, the treatment of exogenous regressors in the context of ARIMA differencing is always that shown in equation (\ref{eq:arima-with-x}). \subsection{Forecasting} \label{arma-fcast} ARMA models are often used for forecasting purposes. The autoregressive component, in particular, offers the possibility of forecasting a process ``out of sample'' over a substantial time horizon. \app{Gretl} supports forecasting on the basis of ARMA models using the method set out by \cite{box-jenkins76}.\footnote{See in particular their ``Program 4'' on p.\ 505ff.} The Box and Jenkins algorithm produces a set of integrated AR coefficients which take into account any differencing of the dependent variable (seasonal and/or non-seasonal) in the ARIMA context, thus making it possible to generate a forecast for the level of the original variable. By contrast, if you first difference a series manually and then apply ARMA to the differenced series, forecasts will be for the differenced series, not the level. This point is illustrated in Example~\ref{arima-fcast-script}. The parameter estimates are identical for the two models. The forecasts differ but are mutually consistent: the variable \texttt{fcdiff} emulates the ARMA forecast (static, one step ahead within the sample range, and dynamic out of sample). \begin{script}[htbp] \caption{ARIMA forecasting} \label{arima-fcast-script} \begin{scode} open greene18_2.gdt # log of quarterly U.S. nominal GNP, 1950:1 to 1983:4 genr y = log(Y) # and its first difference genr dy = diff(y) # reserve 2 years for out-of-sample forecast smpl ; 1981:4 # Estimate using ARIMA arima 1 1 1 ; y # forecast over full period smpl --full fcast fc1 # Return to sub-sample and run ARMA on the first difference of y smpl ; 1981:4 arma 1 1 ; dy smpl --full fcast fc2 genr fcdiff = (t<=1982:1)? (fc1 - y(-1)) : (fc1 - fc1(-1)) # compare the forecasts over the later period smpl 1981:1 1983:4 print y fc1 fc2 fcdiff --byobs \end{scode} The output from the last command is: % \begin{code} y fc1 fc2 fcdiff 1981:1 7.964086 7.940930 0.02668 0.02668 1981:2 7.978654 7.997576 0.03349 0.03349 1981:3 8.009463 7.997503 0.01885 0.01885 1981:4 8.015625 8.033695 0.02423 0.02423 1982:1 8.014997 8.029698 0.01407 0.01407 1982:2 8.026562 8.046037 0.01634 0.01634 1982:3 8.032717 8.063636 0.01760 0.01760 1982:4 8.042249 8.081935 0.01830 0.01830 1983:1 8.062685 8.100623 0.01869 0.01869 1983:2 8.091627 8.119528 0.01891 0.01891 1983:3 8.115700 8.138554 0.01903 0.01903 1983:4 8.140811 8.157646 0.01909 0.01909 \end{code} \end{script} \section{Unit root tests} \label{sec:uroot} \subsection{The ADF test} \label{sec:ADFtest} The Augmented Dickey--Fuller (ADF) test is, as implemented in \app{gretl}, the $t$-statistic on $\varphi$ in the following regression: \begin{equation} \label{eq:ADFtest} \Delta y_t = \mu_t + \varphi y_{t-1} + \sum_{i=1}^p \gamma_i \Delta y_{t-i} + \epsilon_t . \end{equation} This test statistic is probably the best-known and most widely used unit root test. It is a one-sided test whose null hypothesis is $\varphi = 0$ versus the alternative $\varphi < 0$ (and hence large negative values of the test statistic lead to the rejection of the null). Under the null, $y_t$ must be differenced at least once to achieve stationarity; under the alternative, $y_t$ is already stationary and no differencing is required. One peculiar aspect of this test is that its limit distribution is non-standard under the null hypothesis: moreover, the shape of the distribution, and consequently the critical values for the test, depends on the form of the $\mu_t$ term. A full analysis of the various cases is inappropriate here: \cite{hamilton94} contains an excellent discussion, but any recent time series textbook covers this topic. Suffice it to say that \app{gretl} allows the user to choose the specification for $\mu_t$ among four different alternatives: \begin{center} \begin{tabular}{cc} \hline $\mu_t$ & command option \\ \hline 0 & \option{nc} \\ $\mu_0$ & \option{c} \\ $\mu_0 + \mu_1 t$ & \option{ct} \\ $\mu_0 + \mu_1 t + \mu_1 t^2$ & \option{ctt} \\ \hline \end{tabular} \end{center} These option flags are not mutually exclusive; when they are used together the statistic will be reported separately for each selected case. By default, \app{gretl} uses the combination \verb|--c --ct|. For each case, approximate p-values are calculated by means of the algorithm developed in \cite{mackinnon96}. The \app{gretl} command used to perform the test is \cmd{adf}; for example \begin{code} adf 4 x1 \end{code} would compute the test statistic as the $t$-statistic for $\varphi$ in equation \ref{eq:ADFtest} with $p=4$ in the two cases $\mu_t = \mu_0$ and $\mu_t = \mu_0 + \mu_1 t$. The number of lags ($p$ in equation \ref{eq:ADFtest}) should be chosen as to ensure that (\ref{eq:ADFtest}) is a parametrization flexible enough to represent adequately the short-run persistence of $\Delta y_t$. Setting $p$ too low results in size distortions in the test, whereas setting $p$ too high leads to low power. As a convenience to the user, the parameter $p$ can be automatically determined. Setting $p$ to a negative number triggers a sequential procedure that starts with $p$ lags and decrements $p$ until the $t$-statistic for the parameter $\gamma_p$ exceeds 1.645 in absolute value. \subsection{The ADF-GLS test} \label{sec:ADF-GLS} \citet*{ERS96} proposed a variant of the ADF test which involves an alternative method of handling the parameters pertaining to the deterministic term $\mu_t$: these are estimated first via Generalized Least Squares, and in a second stage an ADF regression is performed using the GLS residuals. This variant offers greater power than the regular ADF test for the cases $\mu_t = \mu_0$ and $\mu_t = \mu_0 + \mu_1 t$. The ADF-GLS test is available in \app{gretl} via the \verb|--gls| option to the \texttt{adf} command. When this option is selected the \verb|--nc| and \verb|--ctt| options become unavailable, and only one case can be selected at a time; by default the constant-only model is used but a trend can be added using the \verb|--ct| flag. When a trend is present in this test MacKinnon-type p-values are not available; instead we show critical values from Table 1 in \cite{ERS96}. \subsection{The KPSS test} \label{sec:KPSStest} The KPSS test \citep*{KPSS92} is a unit root test in which the null hypothesis is opposite to that in the ADF test: under the null, the series in question is stationary; the alternative is that the series is $I(1)$. The basic intuition behind this test statistic is very simple: if $y_t$ can be written as $y_t = \mu + u_t$, where $u_t$ is some zero-mean stationary process, then not only does the sample average of the $y_t$s provide a consistent estimator of $\mu$, but the long-run variance of $u_t$ is a well-defined, finite number. Neither of these properties hold under the alternative. The test itself is based on the following statistic: \begin{equation} \label{eq:KPSStest} \eta = \frac{\sum_{i=1}^T S_t^2 }{ T^2 \bar{\sigma}^2 } \end{equation} where $S_t = \sum_{s=1}^t e_s$ and $\bar{\sigma}^2$ is an estimate of the long-run variance of $e_t = (y_t - \bar{y})$. Under the null, this statistic has a well-defined (nonstandard) asymptotic distribution, which is free of nuisance parameters and has been tabulated by simulation. Under the alternative, the statistic diverges. As a consequence, it is possible to construct a one-sided test based on $\eta$, where $H_0$ is rejected if $\eta$ is bigger than the appropriate critical value; \app{gretl} provides the 90, 95 and 99 percent quantiles. The critical values are computed via the method presented by \cite{sephton95}, which offers greater accuracy than the values tabulated in \cite{KPSS92}. Usage example: \begin{code} kpss m y \end{code} where \verb|m| is an integer representing the bandwidth or window size used in the formula for estimating the long run variance: \[ \bar{\sigma}^2 = \sum_{i=-m}^m \left( 1 - \frac{|i|}{m+1} \right) \hat{\gamma}_i \] The $\hat{\gamma}_i$ terms denote the empirical autocovariances of $e_t$ from order $-m$ through $m$. For this estimator to be consistent, $m$ must be large enough to accommodate the short-run persistence of $e_t$, but not too large compared to the sample size $T$. If the supplied \verb|m| is non-positive a default value is computed, namely the integer part of $4 \left( \frac{T}{100} \right)^{1/4}$. The above concept can be generalized to the case where $y_t$ is thought to be stationary around a deterministic trend. In this case, formula (\ref{eq:KPSStest}) remains unchanged, but the series $e_t$ is defined as the residuals from an OLS regression of $y_t$ on a constant and a linear trend. This second form of the test is obtained by appending the \option{trend} option to the \cmd{kpss} command: \begin{code} kpss n y --trend \end{code} Note that in this case the asymptotic distribution of the test is different and the critical values reported by \app{gretl} differ accordingly. \subsection{Panel unit root tests} \label{sec:panel-uroot} The most commonly used unit root tests for panel data involve a generalization of the ADF procedure, in which the joint null hypothesis is that a given times series is non-stationary for all individuals in the panel. In this context the ADF regression (\ref{eq:ADFtest}) can be rewritten as \begin{equation} \label{eq:panelADF} \Delta y_{it} = \mu_{it} + \varphi_i y_{i,t-1} + \sum_{j=1}^{p_i} \gamma_{ij} \Delta y_{i,t-j} + \epsilon_{it} \end{equation} The model (\ref{eq:panelADF}) allows for maximal heterogeneity across the individuals in the panel: the parameters of the deterministic term, the autoregressive coefficient $\varphi$, and the lag order $p$ are all specific to the individual, indexed by $i$. One possible modification of this model is to impose the assumption that $\varphi_i = \varphi$ for all $i$; that is, the individual time series share a common autoregressive root (although they may differ in respect of other statistical properties). The choice of whether or not to impose this assumption has an important bearing on the hypotheses under test. Under model (\ref{eq:panelADF}) the joint null is $\varphi_i = 0$ for all $i$, meaning that all the individual time series are non-stationary, and the alternative (simply the negation of the null) is that \emph{at least one} individual time series is stationary. When a common $\varphi$ is assumed, the null is that $\varphi = 0$ and the alternative is that $\varphi < 0$. The null still says that all the individual series are non-stationary, but the alternative now says that they are \emph{all} stationary. The choice of model should take this point into account, as well as the gain in power from forming a pooled estimate of $\varphi$ and, of course, the plausibility of assuming a common AR(1) coefficient.\footnote{If the assumption of a common $\varphi$ seems excessively restrictive, bear in mind that we routinely assume common slope coefficients when estimating panel models, even if this is unlikely to be literally true.} In \app{gretl}, the formulation (\ref{eq:panelADF}) is used automatically when the \texttt{adf} command is used on panel data. The joint test statistic is formed using the method of \citet*{IPS03}. In this context the behavior of \texttt{adf} differs from regular time-series data: only one case of the deterministic term is handled per invocation of the command; the default is that $\mu_{it}$ includes just a constant but the \verb|--nc| and \verb|--ct| flags can be used to suppress the constant or to include a trend, respectively; and the quadratic trend option \verb|--ctt| is not available. The alternative that imposes a common value of $\varphi$ is implemented via the \texttt{levinlin} command. The test statistic is computed as per \citet*{LLC2002}. As with the \texttt{adf} command, the first argument is the lag order and the second is the name of the series to test; and the default case for the deterministic component is a constant only. The options \verb|--nc| and \verb|--ct| have the same effect as with \texttt{adf}. One refinement is that the lag order may be given in either of two forms: if a scalar is given, this is taken to represent a common value of $p$ for all individuals, but you may instead provide a vector holding a set of $p_i$ values, hence allowing the order of autocorrelation of the series to differ by individual. So, for example, given \begin{code} levinlin 2 y levinlin {2,2,3,3,4,4} y \end{code} the first command runs a joint ADF test with a common lag order of 2, while the second (which assumes a panel with six individuals) allows for differing short-run dynamics. The first argument to \texttt{levinlin} can be given as a set of comma-separated integers enclosed in braces, as shown above, or as the name of an appropriately dimensioned pre-defined matrix (see chapter~\ref{chap:matrices}). Besides variants of the ADF test, the KPSS test also can be used with panel data via the \texttt{kpss} command. In this case the test (of the null hypothesis that the given time series is \emph{stationary} for all individuals) is implemented using the method of \cite{choi01}. This is an application of \emph{meta-analysis}, the statistical technique whereby an overall or composite p-value for the test of a given null hypothesis can be computed from the p-values of a set of separate tests. Unfortunately, in the case of the KPSS test we are limited by the unavailability of precise p-values, although if an individual test statistic falls between the 10 percent and 1 percent critical values we are able to interpolate with a fair degree of confidence. This gives rise to four cases. \begin{enumerate} \item All the individual KPSS test statistics fall between the 10 percent and 1 percent critical values: the Choi method gives us a plausible composite p-value. \item Some of the KPSS test statistics exceed the 1 percent value and none fall short of the 10 percent value: we can give an upper bound for the composite p-value by setting the unknown p-values to 0.01. \item Some of the KPSS test statistics fall short of the 10 percent critical value but none exceed the 1 percent value: we can give a lower bound to the composite p-value by setting the unknown p-values to 0.10. \item None of the above conditions are satisfied: the Choi method fails to produce any result for the composite KPSS test. \end{enumerate} \section{Cointegration tests} \label{sec:coint-test} The generally recommended test for cointegration is the Johansen test, which is discussed in detail in chapter~\ref{chap:vecm}. In this context we offer a few remarks on the cointegration test of \cite{engle-granger87}, which builds on the ADF test discussed above (section~\ref{sec:ADFtest}). For the Engle--Granger test, the procedure is: \begin{enumerate} \item Test each series for a unit root using an ADF test. \item Run a ``cointegrating regression'' via OLS. For this we select one of the potentially cointegrated variables as dependent, and include the other potentially cointegrated variables as regressors. \item Perform an ADF test on the residuals from the cointegrating regression. \end{enumerate} The idea is that cointegration is supported if (a) the null of non-stationarity is \emph{not} rejected for each of the series individually, in step 1, while (b) the null \emph{is} rejected for the residuals at step 3. That is, each of the individual series is $I(1)$ but some linear combination of the series is $I(0)$. This test is implemented in \app{gretl} by the \texttt{coint} command, which requires an integer lag order (for the ADF tests) followed by a list of variables to be tested, the first of which will be taken as dependent in the cointegrating regression. Please see the online help for \texttt{coint}, or the \textit{Gretl Command Reference}, for further details. \section{ARCH and GARCH} \label{sec:arch-garch} Heteroskedasticity means a non-constant variance of the error term in a regression model. Autoregressive Conditional Heteroskedasticity (ARCH) is a phenomenon specific to time series models, whereby the variance of the error displays autoregressive behavior; for instance, the time series exhibits successive periods where the error variance is relatively large, and successive periods where it is relatively small. This sort of behavior is reckoned to be quite common in asset markets: an unsettling piece of news can lead to a period of increased volatility in the market. An ARCH error process of order $q$ can be represented as \[ u_t = \sigma_t \varepsilon_t; \qquad \sigma^2_t \equiv {\rm E}(u^2_t|\Omega_{t-1}) = \alpha_0 + \sum_{i=1}^q \alpha_i u^2_{t-i} \] where the $\varepsilon_t$s are independently and identically distributed (iid) with mean zero and variance 1, and where $\sigma_t$ is taken to be the positive square root of $\sigma^2_t$. $\Omega_{t-1}$ denotes the information set as of time $t-1$ and $\sigma^2_t$ is the conditional variance: that is, the variance conditional on information dated $t-1$ and earlier. It is important to notice the difference between ARCH and an ordinary autoregressive error process. The simplest (first-order) case of the latter can be written as \[ u_t = \rho u_{t-1} + \varepsilon_t; \qquad -1 < \rho < 1 \] where the $\varepsilon_t$s are independently and identically distributed with mean zero and variance $\sigma^2$. With an AR(1) error, if $\rho$ is positive then a positive value of $u_t$ will tend to be followed, with probability greater than 0.5, by a positive $u_{t+1}$. With an ARCH error process, a disturbance $u_t$ of large absolute value will tend to be followed by further large absolute values, but with no presumption that the successive values will be of the same sign. ARCH in asset prices is a ``stylized fact'' and is consistent with market efficiency; on the other hand autoregressive behavior of asset prices would violate market efficiency. One can test for ARCH of order $q$ in the following way: \begin{enumerate} \item Estimate the model of interest via OLS and save the squared residuals, $\hat{u}^2_t$. \item Perform an auxiliary regression in which the current squared residual is regressed on a constant and $q$ lags of itself. \item Find the $TR^2$ value (sample size times unadjusted $R^2$) for the auxiliary regression. \item Refer the $TR^2$ value to the $\chi^2$ distribution with $q$ degrees of freedom, and if the p-value is ``small enough'' reject the null hypothesis of homoskedasticity in favor of the alternative of ARCH($q$). \end{enumerate} This test is implemented in \app{gretl} via the \cmd{arch} command. This command may be issued following the estimation of a time-series model by OLS, or by selection from the ``Tests'' menu in the model window (again, following OLS estimation). The result of the test is reported and if the $TR^2$ from the auxiliary regression has a p-value less than 0.10, ARCH estimates are also reported. These estimates take the form of Generalized Least Squares (GLS), specifically weighted least squares, using weights that are inversely proportional to the predicted variances of the disturbances, $\hat{\sigma}_t$, derived from the auxiliary regression. In addition, the ARCH test is available after estimating a vector autoregression (VAR). In this case, however, there is no provision to re-estimate the model via GLS. \subsection{GARCH} \label{subsec:garch} The simple ARCH($q$) process is useful for introducing the general concept of conditional heteroskedasticity in time series, but it has been found to be insufficient in empirical work. The dynamics of the error variance permitted by ARCH($q$) are not rich enough to represent the patterns found in financial data. The generalized ARCH or GARCH model is now more widely used. The representation of the variance of a process in the GARCH model is somewhat (but not exactly) analogous to the ARMA representation of the level of a time series. The variance at time $t$ is allowed to depend on both past values of the variance and past values of the realized squared disturbance, as shown in the following system of equations: \begin{eqnarray} \label{eq:garch-meaneq} y_t & = & X_t \beta + u_t \\ \label{eq:garch-epseq} u_t & = & \sigma_t \varepsilon_t \\ \label{eq:garch-vareq} \sigma^2_t & = & \alpha_0 + \sum_{i=1}^q \alpha_i u^2_{t-i} + \sum_{j=1}^p \delta_j \sigma^2_{t-j} \end{eqnarray} As above, $\varepsilon_t$ is an iid sequence with unit variance. $X_t$ is a matrix of regressors (or in the simplest case, just a vector of 1s allowing for a non-zero mean of $y_t$). Note that if $p=0$, GARCH collapses to ARCH($q$): the generalization is embodied in the $\delta_j$ terms that multiply previous values of the error variance. In principle the underlying innovation, $\varepsilon_t$, could follow any suitable probability distribution, and besides the obvious candidate of the normal or Gaussian distribution the Student's $t$ distribution has been used in this context. Currently \app{gretl} only handles the case where $\varepsilon_t$ is assumed to be Gaussian. However, when the \option{robust} option to the \cmd{garch} command is given, the estimator \app{gretl} uses for the covariance matrix can be considered Quasi-Maximum Likelihood even with non-normal disturbances. See below for more on the options regarding the GARCH covariance matrix. Example: \begin{code} garch p q ; y const x \end{code} where \verb|p| $\ge 0$ and \verb|q| $>0$ denote the respective lag orders as shown in equation (\ref{eq:garch-vareq}). These values can be supplied in numerical form or as the names of pre-defined scalar variables. \subsection{GARCH estimation} \label{subsec:garch-est} Estimation of the parameters of a GARCH model is by no means a straightforward task. (Consider equation~\ref{eq:garch-vareq}: the conditional variance at any point in time, $\sigma^2_t$, depends on the conditional variance in earlier periods, but $\sigma^2_t$ is not observed, and must be inferred by some sort of Maximum Likelihood procedure.) By default \app{gretl} uses native code that employs the BFGS maximizer; you also have the option (activated by the \verb|--fcp| command-line switch) of using the method proposed by \cite{fiorentini96},\footnote{The algorithm is based on Fortran code deposited in the archive of the \textit{Journal of Applied Econometrics} by the authors, and is used by kind permission of Professor Fiorentini.} which was adopted as a benchmark in the study of GARCH results by \cite{mccullough98}. It employs analytical first and second derivatives of the log-likelihood, and uses a mixed-gradient algorithm, exploiting the information matrix in the early iterations and then switching to the Hessian in the neighborhood of the maximum likelihood. (This progress can be observed if you append the \option{verbose} option to \app{gretl}'s \cmd{garch} command.) Several options are available for computing the covariance matrix of the parameter estimates in connection with the \cmd{garch} command. At a first level, one can choose between a ``standard'' and a ``robust'' estimator. By default, the Hessian is used unless the \option{robust} option is given, in which case the QML estimator is used. A finer choice is available via the \cmd{set} command, as shown in Table~\ref{tab:garch-vcv}. \begin{table}[htbp] \caption{Options for the GARCH covariance matrix} \label{tab:garch-vcv} \begin{center} \begin{tabular}{ll} \multicolumn{1}{c}{\textit{command}} & \multicolumn{1}{c}{\textit{effect}} \\ [4pt] \texttt{set garch\_vcv hessian} & Use the Hessian \\ \texttt{set garch\_vcv im} & Use the Information Matrix \\ \texttt{set garch\_vcv op} & Use the Outer Product of the Gradient \\ \texttt{set garch\_vcv qml} & QML estimator \\ \texttt{set garch\_vcv bw} & Bollerslev--Wooldridge ``sandwich'' estimator \end{tabular} \end{center} \end{table} It is not uncommon, when one estimates a GARCH model for an arbitrary time series, to find that the iterative calculation of the estimates fails to converge. For the GARCH model to make sense, there are strong restrictions on the admissible parameter values, and it is not always the case that there exists a set of values inside the admissible parameter space for which the likelihood is maximized. The restrictions in question can be explained by reference to the simplest (and much the most common) instance of the GARCH model, where $p = q = 1$. In the GARCH(1, 1) model the conditional variance is \begin{equation} \label{eq:condvar} \sigma^2_t = \alpha_0 + \alpha_1 u^2_{t-1} + \delta_1 \sigma^2_{t-1} \end{equation} Taking the unconditional expectation of (\ref{eq:condvar}) we get \[ \sigma^2 = \alpha_0 + \alpha_1 \sigma^2 + \delta_1 \sigma^2 \] so that \[ \sigma^2 = \frac{\alpha_0}{1 - \alpha_1 - \delta_1} \] For this unconditional variance to exist, we require that $\alpha_1 + \delta_1 < 1$, and for it to be positive we require that $\alpha_0 > 0$. A common reason for non-convergence of GARCH estimates (that is, a common reason for the non-existence of $\alpha_i$ and $\delta_i$ values that satisfy the above requirements and at the same time maximize the likelihood of the data) is misspecification of the model. It is important to realize that GARCH, in itself, allows \textit{only} for time-varying volatility in the data. If the \textit{mean} of the series in question is not constant, or if the error process is not only heteroskedastic but also autoregressive, it is necessary to take this into account when formulating an appropriate model. For example, it may be necessary to take the first difference of the variable in question and/or to add suitable regressors, $X_t$, as in (\ref{eq:garch-meaneq}). %%% Local Variables: %%% mode: latex %%% TeX-master: "gretl-guide" %%% End: