\chapter{Cointegration and Vector Error Correction Models} \label{chap:vecm} \section{Introduction} \label{sec:VECM-intro} The twin concepts of cointegration and error correction have drawn a good deal of attention in macroeconometrics over recent years. The attraction of the Vector Error Correction Model (VECM) is that it allows the researcher to embed a representation of economic equilibrium relationships within a relatively rich time-series specification. This approach overcomes the old dichotomy between (a) structural models that faithfully represented macroeconomic theory but failed to fit the data, and (b) time-series models that were accurately tailored to the data but difficult if not impossible to interpret in economic terms. The basic idea of cointegration relates closely to the concept of unit roots (see section~\ref{sec:uroot}). Suppose we have a set of macroeconomic variables of interest, and we find we cannot reject the hypothesis that some of these variables, considered individually, are non-stationary. Specifically, suppose we judge that a subset of the variables are individually integrated of order 1, or I(1). That is, while they are non-stationary in their levels, their first differences are stationary. Given the statistical problems associated with the analysis of non-stationary data (for example, the threat of spurious regression), the traditional approach in this case was to take first differences of all the variables before proceeding with the analysis. But this can result in the loss of important information. It may be that while the variables in question are I(1) when taken individually, there exists a linear combination of the variables that is stationary without differencing, or I(0). (There could be more than one such linear combination.) That is, while the ensemble of variables may be ``free to wander'' over time, nonetheless the variables are ``tied together'' in certain ways. And it may be possible to interpret these ties, or \emph{cointegrating vectors}, as representing equilibrium conditions. For example, suppose we find some or all of the following variables are I(1): money stock, $M$, the price level, $P$, the nominal interest rate, $R$, and output, $Y$. According to standard theories of the demand for money, we would nonetheless expect there to be an equilibrium relationship between real balances, interest rate and output; for example \[ m - p = \gamma_0 + \gamma_1 y + \gamma_2 r \qquad \gamma_1 > 0, \gamma_2 < 0 \] where lower-case variable names denote logs. In equilibrium, then, \[ m - p - \gamma_1 y - \gamma_2 r = \gamma_0 \] Realistically, we should not expect this condition to be satisfied each period. We need to allow for the possibility of short-run disequilibrium. But if the system moves back towards equilibrium following a disturbance, it follows that the vector $x = (m, p, y, r)'$ is bound by a cointegrating vector $\beta' = (\beta_1, \beta_2, \beta_3, \beta_4)$, such that $\beta'x$ is stationary (with a mean of $\gamma_0$). Furthermore, if equilibrium is correctly characterized by the simple model above, we have $\beta_2 = -\beta_1$, $\beta_3 < 0$ and $\beta_4 > 0$. These things are testable within the context of cointegration analysis. There are typically three steps in this sort of analysis: \begin{enumerate} \item Test to determine the number of cointegrating vectors, the \emph{cointegrating rank} of the system. \item Estimate a VECM with the appropriate rank, but subject to no further restrictions. \item Probe the interpretation of the cointegrating vectors as equilibrium conditions by means of restrictions on the elements of these vectors. \end{enumerate} The following sections expand on each of these points, giving further econometric details and explaining how to implement the analysis using \app{gretl}. \section{Vector Error Correction Models as representation of a cointegrated system} \label{sec:VECM-rep} Consider a VAR of order $p$ with a deterministic part given by $\mu_t$ (typically, a polynomial in time). One can write the $n$-variate process $y_t$ as \begin{equation} \label{eq:VECM-VAR} y_t = \mu_t + A_1 y_{t-1} + A_2 y_{t-2} + \cdots + A_p y_{t-p} + \epsilon_t \end{equation} But since $y_{t-1} \equiv y_{t} - \Delta y_t$ and $y_{t-i} \equiv y_{t-1} - (\Delta y_{t-1} + \Delta y_{t-2} + \cdots + \Delta y_{t-i+1})$, we can re-write the above as \begin{equation} \label{eq:VECM} \Delta y_t = \mu_t + \Pi y_{t-1} + \sum_{i=1}^{p-1} \Gamma_i \Delta y_{t-i} + \epsilon_t , \end{equation} where $\Pi = \sum_{i=1}^p A_i$ and $\Gamma_k = -\sum_{i=k}^p A_i$. This is the VECM representation of (\ref{eq:VECM-VAR}). The interpretation of (\ref{eq:VECM}) depends crucially on $r$, the rank of the matrix $\Pi$. \begin{itemize} \item If $r = 0$, the processes are all I(1) and not cointegrated. \item If $r = n$, then $\Pi$ is invertible and the processes are all I(0). \item Cointegration occurs in between, when $0 < r < n$ and $\Pi$ can be written as $\alpha \beta'$. In this case, $y_t$ is I(1), but the combination $z_t = \beta'y_t$ is I(0). If, for example, $r=1$ and the first element of $\beta$ was $-1$, then one could write $z_t = -y_{1,t} + \beta_2 y_{2,t} + \cdots + \beta_n y_{n,t}$, which is equivalent to saying that \[ y_{1_t} = \beta_2 y_{2,t} + \cdots + \beta_n y_{n,t} - z_t \] is a long-run equilibrium relationship: the deviations $z_t$ may not be 0 but they are stationary. In this case, (\ref{eq:VECM}) can be written as \begin{equation} \label{eq:VECMab} \Delta y_t = \mu_t + \alpha \beta' y_{t-1} + \sum_{i=1}^{p-1} \Gamma_i \Delta y_{t-i} + \epsilon_t . \end{equation} If $\beta$ were known, then $z_t$ would be observable and all the remaining parameters could be estimated via OLS. In practice, the procedure estimates $\beta$ first and then the rest. \end{itemize} The rank of $\Pi$ is investigated by computing the eigenvalues of a closely related matrix whose rank is the same as $\Pi$: however, this matrix is by construction symmetric and positive semidefinite. As a consequence, all its eigenvalues are real and non-negative, and tests on the rank of $\Pi$ can therefore be carried out by testing how many eigenvalues are 0. If all the eigenvalues are significantly different from 0, then all the processes are stationary. If, on the contrary, there is at least one zero eigenvalue, then the $y_t$ process is integrated, although some linear combination $\beta'y_t$ might be stationary. At the other extreme, if no eigenvalues are significantly different from 0, then not only is the process $y_t$ non-stationary, but the same holds for any linear combination $\beta'y_t$; in other words, no cointegration occurs. Estimation typically proceeds in two stages: first, a sequence of tests is run to determine $r$, the cointegration rank. Then, for a given rank the parameters in equation (\ref{eq:VECMab}) are estimated. The two commands that \app{gretl} offers for estimating these systems are \texttt{coint2} and \texttt{vecm}, respectively. The syntax for \texttt{coint2} is \begin{code} coint2 p ylist [ ; xlist [ ; zlist ] ] \end{code} where \texttt{p} is the number of lags in (\ref{eq:VECM-VAR}); \texttt{ylist} is a list containing the $y_t$ variables; \texttt{xlist} is an optional list of exogenous variables; and \texttt{zlist} is another optional list of exogenous variables whose effects are assumed to be confined to the cointegrating relationships. The syntax for \texttt{vecm} is \begin{code} vecm p r ylist [ ; xlist [ ; zlist ] ] \end{code} where \texttt{p} is the number of lags in (\ref{eq:VECM-VAR}); \texttt{r} is the cointegration rank; and the lists \texttt{ylist}, \texttt{xlist} and \texttt{zlist} have the same interpretation as in \texttt{coint2}. Both commands can be given specific options to handle the treatment of the deterministic component $\mu_t$. These are discussed in the following section. \section{Interpretation of the deterministic components} \label{sec:coint-5cases} Statistical inference in the context of a cointegrated system depends on the hypotheses one is willing to make on the deterministic terms, which leads to the famous ``five cases.'' In equation (\ref{eq:VECM}), the term $\mu_t$ is usually understood to take the form \[ \mu_t = \mu_0 + \mu_1 \cdot t . \] In order to have the model mimic as closely as possible the features of the observed data, there is a preliminary question to settle. Do the data appear to follow a deterministic trend? If so, is it linear or quadratic? Once this is established, one should impose restrictions on $\mu_0$ and $\mu_1$ that are consistent with this judgement. For example, suppose that the data do not exhibit a discernible trend. This means that $\Delta y_t$ is on average zero, so it is reasonable to assume that its expected value is also zero. Write equation (\ref{eq:VECM}) as \begin{equation} \label{eq:VECM-poly} \Gamma(L) \Delta y_t = \mu_0 + \mu_1 \cdot t + \alpha z_{t-1} + \epsilon_t , \end{equation} where $z_{t} = \beta' y_{t}$ is assumed to be stationary and therefore to possess finite moments. Taking unconditional expectations, we get \[ 0 = \mu_0 + \mu_1 \cdot t + \alpha m_z . \] Since the left-hand side does not depend on $t$, the restriction $\mu_1 = 0$ is a safe bet. As for $\mu_0$, there are just two ways to make the above expression true: either $\mu_0 = 0$ with $m_z = 0$, or $\mu_0$ equals $-\alpha m_z$. The latter possibility is less restrictive in that the vector $\mu_0$ may be non-zero, but is constrained to be a linear combination of the columns of $\alpha$. In that case, $\mu_0$ can be written as $\alpha \cdot c$, and one may write (\ref{eq:VECM-poly}) as \[ \Gamma(L) \Delta y_t = \alpha \left[ \beta' \quad c \right] \left[ \begin{array}{c} y_{t-1} \\ 1 \end{array} \right] + \epsilon_t . \] The long-run relationship therefore contains an intercept. This type of restriction is usually written \[ \alpha'_{\perp} \mu_0 = 0 , \] where $\alpha_{\perp}$ is the left null space of the matrix $\alpha$. An intuitive understanding of the issue can be gained by means of a simple example. Consider a series $x_t$ which behaves as follows % \[ x_t = m + x_{t-1} + \varepsilon_t \] % where $m$ is a real number and $\varepsilon_t$ is a white noise process: $x_t$ is then a random walk with drift $m$. In the special case $m$ = 0, the drift disappears and $x_t$ is a pure random walk. Consider now another process $y_t$, defined by % \[ y_t = k + x_t + u_t \] % where, again, $k$ is a real number and $u_t$ is a white noise process. Since $u_t$ is stationary by definition, $x_t$ and $y_t$ cointegrate: that is, their difference % \[ z_t = y_t - x_t = k + u_t \] % is a stationary process. For $k$ = 0, $z_t$ is simple zero-mean white noise, whereas for $k$ $\ne$ 0 the process $z_t$ is white noise with a non-zero mean. After some simple substitutions, the two equations above can be represented jointly as a VAR(1) system % \[ \left[ \begin{array}{c} y_t \\ x_t \end{array} \right] = \left[ \begin{array}{c} k + m \\ m \end{array} \right] + \left[ \begin{array}{rr} 0 & 1 \\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} y_{t-1} \\ x_{t-1} \end{array} \right] + \left[ \begin{array}{c} u_t + \varepsilon_t \\ \varepsilon_t \end{array} \right] \] % or in VECM form % \begin{eqnarray*} \left[ \begin{array}{c} \Delta y_t \\ \Delta x_t \end{array} \right] & = & \left[ \begin{array}{c} k + m \\ m \end{array} \right] + \left[ \begin{array}{rr} -1 & 1 \\ 0 & 0 \end{array} \right] \left[ \begin{array}{c} y_{t-1} \\ x_{t-1} \end{array} \right] + \left[ \begin{array}{c} u_t + \varepsilon_t \\ \varepsilon_t \end{array} \right] = \\ & = & \left[ \begin{array}{c} k + m \\ m \end{array} \right] + \left[ \begin{array}{r} -1 \\ 0 \end{array} \right] \left[ \begin{array}{rr} 1 & -1 \end{array} \right] \left[ \begin{array}{c} y_{t-1} \\ x_{t-1} \end{array} \right] + \left[ \begin{array}{c} u_t + \varepsilon_t \\ \varepsilon_t \end{array} \right] = \\ & = & \mu_0 + \alpha \beta^{\prime} \left[ \begin{array}{c} y_{t-1} \\ x_{t-1} \end{array} \right] + \eta_t = \mu_0 + \alpha z_{t-1} + \eta_t , \end{eqnarray*} % where $\beta$ is the cointegration vector and $\alpha$ is the ``loadings'' or ``adjustments'' vector. We are now ready to consider three possible cases: \begin{enumerate} \item $m$ $\ne$ 0: In this case $x_t$ is trended, as we just saw; it follows that $y_t$ also follows a linear trend because on average it keeps at a fixed distance $k$ from $x_t$. The vector $\mu_0$ is unrestricted. \item $m$ = 0 and $k$ $\ne$ 0: In this case, $x_t$ is not trended and as a consequence neither is $y_t$. However, the mean distance between $y_t$ and $x_t$ is non-zero. The vector $\mu_0$ is given by % \[ \mu_0 = \left[ \begin{array}{c} k \\ 0 \end{array} \right] \] % which is not null and therefore the VECM shown above does have a constant term. The constant, however, is subject to the restriction that its second element must be 0. More generally, $\mu_0$ is a multiple of the vector $\alpha$. Note that the VECM could also be written as % \[ \left[ \begin{array}{c} \Delta y_t \\ \Delta x_t \end{array} \right] = \left[ \begin{array}{r} -1 \\ 0 \end{array} \right] \left[ \begin{array}{rrr} 1 & -1 & -k \end{array} \right] \left[ \begin{array}{c} y_{t-1} \\ x_{t-1} \\ 1 \end{array} \right] + \left[ \begin{array}{c} u_t + \varepsilon_t \\ \varepsilon_t \end{array} \right] \] % which incorporates the intercept into the cointegration vector. This is known as the ``restricted constant'' case. \item $m$ = 0 and $k$ = 0: This case is the most restrictive: clearly, neither $x_t$ nor $y_t$ are trended, and the mean distance between them is zero. The vector $\mu_0$ is also 0, which explains why this case is referred to as ``no constant.'' \end{enumerate} In most cases, the choice between these three possibilities is based on a mix of empirical observation and economic reasoning. If the variables under consideration seem to follow a linear trend then we should not place any restriction on the intercept. Otherwise, the question arises of whether it makes sense to specify a cointegration relationship which includes a non-zero intercept. One example where this is appropriate is the relationship between two interest rates: generally these are not trended, but the VAR might still have an intercept because the difference between the two (the ``interest rate spread'') might be stationary around a non-zero mean (for example, because of a risk or liquidity premium). The previous example can be generalized in three directions: \begin{enumerate} \item If a VAR of order greater than 1 is considered, the algebra gets more convoluted but the conclusions are identical. \item If the VAR includes more than two endogenous variables the cointegration rank $r$ can be greater than 1. In this case, $\alpha$ is a matrix with $r$ columns, and the case with restricted constant entails the restriction that $\mu_0$ should be some linear combination of the columns of $\alpha$. \item If a linear trend is included in the model, the deterministic part of the VAR becomes $\mu_0 + \mu_1 t$. The reasoning is practically the same as above except that the focus now centers on $\mu_1$ rather than $\mu_0$. The counterpart to the ``restricted constant'' case discussed above is a ``restricted trend'' case, such that the cointegration relationships include a trend but the first differences of the variables in question do not. In the case of an unrestricted trend, the trend appears in both the cointegration relationships and the first differences, which corresponds to the presence of a quadratic trend in the variables themselves (in levels). \end{enumerate} In order to accommodate the five cases, \app{gretl} provides the following options to the \texttt{coint2} and \texttt{vecm} commands: \begin{center} \begin{tabular}{ccl} $\mu_t$ & \textit{option flag} & \textit{description} \\ [4pt] 0 & \option{nc} & no constant \\ $\mu_0, \alpha_{\perp}'\mu_0 = 0 $ & \option{rc} & restricted constant \\ $\mu_0$ & default & unrestricted constant \\ $\mu_0 + \mu_1 t , \alpha_{\perp}'\mu_1 = 0$ & \option{crt} & constant + restricted trend \\ $\mu_0 + \mu_1 t$ & \option{ct} & constant + unrestricted trend \end{tabular} \end{center} Note that for this command the above options are mutually exclusive. In addition, you have the option of using the \option{seasonal} options, for augmenting $\mu_t$ with centered seasonal dummies. In each case, p-values are computed via the approximations by Doornik (1998). \section{The Johansen cointegration tests} \label{sec:johansen-test} The two Johansen tests for cointegration are used to establish the rank of $\beta$; in other words, how many cointegration vectors the system has. These are the ``$\lambda$-max'' test, for hypotheses on individual eigenvalues, and the ``trace'' test, for joint hypotheses. Suppose that the eigenvalues $\lambda_i$ are sorted from largest to smallest. The null hypothesis for the ``$\lambda$-max'' test on the $i$-th eigenvalue is that $\lambda_i = 0$. The corresponding trace test, instead, considers the hypothesis $\lambda_j = 0$ for all $j \ge i$. The \app{gretl} command \cmd{coint2} performs these two tests. The corresponding menu entry in the GUI is ``Model, Time Series, Cointegration Test, Johansen''. As in the ADF test, the asymptotic distribution of the tests varies with the deterministic component $\mu_t$ one includes in the VAR (see section \ref{sec:coint-5cases} above). The following code uses the \cmd{denmark} data file, supplied with \app{gretl}, to replicate Johansen's example found in his 1995 book. % \begin{code} open denmark coint2 2 LRM LRY IBO IDE --rc --seasonal \end{code} % In this case, the vector $y_t$ in equation (\ref{eq:VECM}) comprises the four variables \cmd{LRM}, \cmd{LRY}, \cmd{IBO}, \cmd{IDE}. The number of lags equals $p$ in (\ref{eq:VECM}) (that is, the number of lags of the model written in VAR form). Part of the output is reported below: \begin{center} \begin{code} Johansen test: Number of equations = 4 Lag order = 2 Estimation period: 1974:3 - 1987:3 (T = 53) Case 2: Restricted constant Rank Eigenvalue Trace test p-value Lmax test p-value 0 0.43317 49.144 [0.1284] 30.087 [0.0286] 1 0.17758 19.057 [0.7833] 10.362 [0.8017] 2 0.11279 8.6950 [0.7645] 6.3427 [0.7483] 3 0.043411 2.3522 [0.7088] 2.3522 [0.7076] \end{code} \end{center} Both the trace and $\lambda$-max tests accept the null hypothesis that the smallest eigenvalue is 0 (see the last row of the table), so we may conclude that the series are in fact non-stationary. However, some linear combination may be I(0), since the $\lambda$-max test rejects the hypothesis that the rank of $\Pi$ is 0 (though the trace test gives less clear-cut evidence for this, with a p-value of $0.1284$). \section{Identification of the cointegration vectors} \label{sec:johansen-ident} The core problem in the estimation of equation (\ref{eq:VECM}) is to find an estimate of $\Pi$ that has by construction rank $r$, so it can be written as $\Pi = \alpha \beta'$, where $\beta$ is the matrix containing the cointegration vectors and $\alpha$ contains the ``adjustment'' or ``loading'' coefficients whereby the endogenous variables respond to deviation from equilibrium in the previous period. Without further specification, the problem has multiple solutions (in fact, infinitely many). The parameters $\alpha$ and $\beta$ are under-identified: if all columns of $\beta$ are cointegration vectors, then any arbitrary linear combinations of those columns is a cointegration vector too. To put it differently, if $\Pi = \alpha_0 \beta_0'$ for specific matrices $\alpha_0$ and $\beta_0$, then $\Pi$ also equals $(\alpha_0 Q)(Q^{-1} \beta_0')$ for any conformable non-singular matrix $Q$. In order to find a unique solution, it is therefore necessary to impose some restrictions on $\alpha$ and/or $\beta$. It can be shown that the minimum number of restrictions that is necessary to guarantee identification is $r^2$. Normalizing one coefficient per column to 1 (or $-1$, according to taste) is a trivial first step, which also helps in that the remaining coefficients can be interpreted as the parameters in the equilibrium relations, but this only suffices when $r=1$. The method that \app{gretl} uses by default is known as the ``Phillips normalization'', or ``triangular representation''.\footnote{For comparison with other studies, you may wish to normalize $\beta$ differently. Using the \texttt{set} command you can do \verb|set vecm_norm diag| to select a normalization that simply scales the columns of the original $\beta$ such that $\beta_{ij} = 1$ for $i=j$ and $i \leq r$, as used in the empirical section of Boswijk and Doornik (2004). Another alternative is \verb+set vecm_norm first+, which scales $\beta$ such that the elements on the first row equal 1. To suppress normalization altogether, use \verb+set vecm_norm none+. (To return to the default: \texttt{set vecm\_norm phillips}.)} The starting point is writing $\beta$ in partitioned form as in \[ \beta = \left[ \begin{array}{c} \beta_1 \\ \beta_2 \end{array} \right] , \] where $\beta_1$ is an $r \times r$ matrix and $\beta_2$ is $(n-r) \times r$. Assuming that $\beta_1$ has full rank, $\beta$ can be post-multiplied by $\beta_1^{-1}$, giving \[ \hat{\beta} = \left[ \begin{array}{c} I \\ \beta_2 \beta_1^{-1} \end{array} \right] = \left[ \begin{array}{c} I \\ -B \end{array} \right] , \] The coefficients that \app{gretl} produces are $\hat{\beta}$, with $B$ known as the matrix of unrestricted coefficients. In terms of the underlying equilibrium relationship, the Phillips normalization expresses the system of $r$ equilibrium relations as \begin{eqnarray*} y_{1,t} & = & b_{1,r+1} y_{r+1,t} + \ldots + b_{1,n} y_{n,t} \\ y_{2,t} & = & b_{2,r+1} y_{r+1,t} + \ldots + b_{2,n} y_{n,t} \\ & \vdots & \\ y_{r,t} & = & b_{r,r+1} y_{r+1,t} + \ldots + b_{r,n} y_{r,t} \end{eqnarray*} where the first $r$ variables are expressed as functions of the remaining $n-r$. Although the triangular representation ensures that the statistical problem of estimating $\beta$ is solved, the resulting equilibrium relationships may be difficult to interpret. In this case, the user may want to achieve identification by specifying manually the system of $r^2$ constraints that \app{gretl} will use to produce an estimate of $\beta$. As an example, consider the money demand system presented in section 9.6 of Verbeek (2004). The variables used are \texttt{m} (the log of real money stock M1), \texttt{infl} (inflation), \texttt{cpr} (the commercial paper rate), \texttt{y} (log of real GDP) and \texttt{tbr} (the Treasury bill rate).\footnote{This data set is available in the \texttt{verbeek} data package; see \url{http://gretl.sourceforge.net/gretl_data.html}.} Estimation of $\beta$ can be performed via the commands \begin{code} open money.gdt smpl 1954:1 1994:4 vecm 6 2 m infl cpr y tbr --rc \end{code} and the relevant portion of the output reads \begin{code} Maximum likelihood estimates, observations 1954:1-1994:4 (T = 164) Cointegration rank = 2 Case 2: Restricted constant beta (cointegrating vectors, standard errors in parentheses) m 1.0000 0.0000 (0.0000) (0.0000) infl 0.0000 1.0000 (0.0000) (0.0000) cpr 0.56108 -24.367 (0.10638) (4.2113) y -0.40446 -0.91166 (0.10277) (4.0683) tbr -0.54293 24.786 (0.10962) (4.3394) const -3.7483 16.751 (0.78082) (30.909) \end{code} Interpretation of the coefficients of the cointegration matrix $\beta$ would be easier if a meaning could be attached to each of its columns. This is possible by hypothesizing the existence of two long-run relationships: a money demand equation \[ \mbox{\tt m} = c_1 + \beta_1 \mbox{\tt infl} + \beta_2 \mbox{\tt y} + \beta_3 \mbox{\tt tbr} \] and a risk premium equation \[ \mbox{\tt cpr} = c_2 + \beta_4 \mbox{\tt infl} + \beta_5 \mbox{\tt y} + \beta_6 \mbox{\tt tbr} \] which imply that the cointegration matrix can be normalized as \[ \beta = \left[ \begin{array}{rr} -1 & 0 \\ \beta_1 & \beta_4 \\ 0 & -1 \\ \beta_2 & \beta_5 \\ \beta_3 & \beta_6 \\ c_1 & c_2 \end{array} \right] \] This renormalization can be accomplished by means of the \texttt{restrict} command, to be given after the \texttt{vecm} command or, in the graphical interface, by selecting the ``Test, Linear Restrictions'' menu entry. The syntax for entering the restrictions should be fairly obvious:\footnote{Note that in this context we are bending the usual matrix indexation convention, using the leading index to refer to the \textit{column} of $\beta$ (the particular cointegrating vector). This is standard practice in the literature, and defensible insofar as it is the columns of $\beta$ (the cointegrating relations or equilibrium errors) that are of primary interest.} \begin{code} restrict b[1,1] = -1 b[1,3] = 0 b[2,1] = 0 b[2,3] = -1 end restrict \end{code} which produces \begin{code} Cointegrating vectors (standard errors in parentheses) m -1.0000 0.0000 (0.0000) (0.0000) infl -0.023026 0.041039 (0.0054666) (0.027790) cpr 0.0000 -1.0000 (0.0000) (0.0000) y 0.42545 -0.037414 (0.033718) (0.17140) tbr -0.027790 1.0172 (0.0045445) (0.023102) const 3.3625 0.68744 (0.25318) (1.2870) \end{code} \section{Over-identifying restrictions} \label{sec:johansen-overid} One purpose of imposing restrictions on a VECM system is simply to achieve identification. If these restrictions are simply normalizations, they are not testable and should have no effect on the maximized likelihood. In addition, however, one may wish to formulate constraints on $\beta$ and/or $\alpha$ that derive from the economic theory underlying the equilibrium relationships; substantive restrictions of this sort are then testable via a likelihood-ratio statistic. \app{Gretl} is capable of testing general linear restrictions of the form \begin{equation} \label{eq:Rb} R_b \vec{\beta} = q \end{equation} and/or \begin{equation} \label{eq:Ra} R_a \vec{\alpha} = 0 \end{equation} % Note that the $\beta$ restriction may be non-homogeneous ($q \neq 0$) but the $\alpha$ restriction must be homogeneous. Nonlinear restrictions are not supported, and neither are restrictions that cross between $\beta$ and $\alpha$. In the case where $r > 1$ such restrictions may be in common across all the columns of $\beta$ (or $\alpha$) or may be specific to certain columns of these matrices. This is the case discussed in Boswijk (1995) and Boswijk and Doornik (2004, section 4.4). The restrictions (\ref{eq:Rb}) and (\ref{eq:Ra}) may be written in explicit form as \begin{equation} \label{eq:vecbeta} \vec{\beta} = H\phi + h_0 \end{equation} and \begin{equation} \label{eq:vecalpha} \vec{\alpha'} = G\psi \end{equation} respectively, where $\phi$ and $\psi$ are the free parameter vectors associated with $\beta$ and $\alpha$ respectively. We may refer to the free parameters collectively as $\theta$ (the column vector formed by concatenating $\phi$ and $\psi$). \app{Gretl} uses this representation internally when testing the restrictions. If the list of restrictions that is passed to the \texttt{restrict} command contains more constraints than necessary to achieve identification, then an LR test is performed; moreover, the \texttt{restrict} command can be given the \option{full} switch, in which case full estimates for the restricted system are printed (including the $\Gamma_i$ terms), and the system thus restricted becomes the ``current model'' for the purposes of further tests. Thus you are able to carry out cumulative tests, as in Chapter 7 of Johansen (1995). \subsection{Syntax} \label{sec:vecm-restr-syntax} The full syntax for specifying the restriction is an extension of the one exemplified in the previous section. Inside a \texttt{restrict}\ldots\texttt{end restrict} block, valid statements are of the form \begin{center} \texttt{\emph{parameter linear combination}} = \emph{\texttt{scalar}} \end{center} where a parameter linear combination involves a weighted sum of individual elements of $\beta$ or $\alpha$ (but not both in the same combination); the scalar on the right-hand side must be 0 for combinations involving $\alpha$, but can be any real number for combinations involving $\beta$. Below, we give a few examples of valid restrictions: \begin{code} b[1,1] = 1.618 b[1,4] + 2*b[2,5] = 0 a[1,3] = 0 a[1,1] - a[1,2] = 0 \end{code} A special syntax is reserved for the case when a certain constraint should be applied to all columns of $\beta$: in this case, one index is given for each \texttt{b} term, and the square brackets are dropped. Hence, the following syntax \begin{code} restrict b1 + b2 = 0 end restrict \end{code} corresponds to \[ \beta = \left[ \begin{array}{rr} \beta_{11} & \beta_{21} \\ -\beta_{11} & -\beta_{21} \\ \beta_{13} & \beta_{23} \\ \beta_{14} & \beta_{24} \end{array} \right] \] The same convention is used for $\alpha$: when only one index is given for each \texttt{a} term, the restriction is presumed to apply to all $r$ rows of $\alpha$, or in other words the given variables are weakly exogenous. For instance, the formulation % \begin{code} restrict a3 = 0 a4 = 0 end restrict \end{code} % specifies that variables 3 and 4 do not respond to the deviation from equilibrium in the previous period. Finally, a short-cut is available for setting up complex restrictions (but currently only in relation to $\beta$): you can specify $R_b$ and $q$, as in $R_b \vec{\beta} = q$, by giving the names of previously defined matrices. For example, % \begin{code} matrix I4 = I(4) matrix vR = I4**(I4~zeros(4,1)) matrix vq = mshape(I4,16,1) restrict R = vR q = vq end restrict \end{code} % which manually imposes Phillips normalization on the $\beta$ estimates for a system with cointegrating rank 4. \subsection{An example} \label{sec:vecm-overid-ex} Brand and Cassola (2004) propose a money demand system for the Euro area, in which they postulate three long-run equilibrium relationships: % \begin{center} \begin{tabular}{ll} money demand & $m = \beta_l l + \beta_y y$ \\ Fisher equation & $\pi = \phi l$ \\ Expectation theory of & $l = s$ \\ [-4pt] interest rates \end{tabular} \end{center} % where $m$ is real money demand, $l$ and $s$ are long- and short-term interest rates, $y$ is output and $\pi$ is inflation.\footnote{A traditional formulation of the Fisher equation would reverse the roles of the variables in the second equation, but this detail is immaterial in the present context; moreover, the expectation theory of interest rates implies that the third equilibrium relationship should include a constant for the liquidity premium. However, since in this example the system is estimated with the constant term unrestricted, the liquidity premium gets merged in the system intercept and disappears from $z_t$.} (The names for these variables in the \app{gretl} data file are \verb|m_p|, \texttt{rl}, \texttt{rs}, \texttt{y} and \texttt{infl}, respectively.) The cointegration rank assumed by the authors is 3 and there are 5 variables, giving 15 elements in the $\beta$ matrix. $3 \times 3 = 9$ restrictions are required for identification, and a just-identified system would have $15 - 9 = 6$ free parameters. However, the postulated long-run relationships feature only three free parameters, so the over-identification rank is 3. \begin{script}[htbp] \caption{Estimation of a money demand system with constraints on $\beta$} \label{brand-cassola-script} Input: \begin{scodebit} open brand_cassola.gdt # perform a few transformations m_p = m_p*100 y = y*100 infl = infl/4 rs = rs/4 rl = rl/4 # replicate table 4, page 824 vecm 2 3 m_p infl rl rs y -q genr ll0 = $lnl restrict --full b[1,1] = 1 b[1,2] = 0 b[1,4] = 0 b[2,1] = 0 b[2,2] = 1 b[2,4] = 0 b[2,5] = 0 b[3,1] = 0 b[3,2] = 0 b[3,3] = 1 b[3,4] = -1 b[3,5] = 0 end restrict genr ll1 = $rlnl \end{scodebit} Partial output: \begin{scodebit} Unrestricted loglikelihood (lu) = 116.60268 Restricted loglikelihood (lr) = 115.86451 2 * (lu - lr) = 1.47635 P(Chi-Square(3) > 1.47635) = 0.68774 beta (cointegrating vectors, standard errors in parentheses) m_p 1.0000 0.0000 0.0000 (0.0000) (0.0000) (0.0000) infl 0.0000 1.0000 0.0000 (0.0000) (0.0000) (0.0000) rl 1.6108 -0.67100 1.0000 (0.62752) (0.049482) (0.0000) rs 0.0000 0.0000 -1.0000 (0.0000) (0.0000) (0.0000) y -1.3304 0.0000 0.0000 (0.030533) (0.0000) (0.0000) \end{scodebit} %$ \end{script} Example \ref{brand-cassola-script} replicates Table 4 on page 824 of the Brand and Cassola article.\footnote{Modulo what appear to be a few typos in the article.} Note that we use the \verb|$lnl| accessor after the \texttt{vecm} command to store the unrestricted log-likelihood and the \verb|$rlnl| accessor after \texttt{restrict} for its restricted counterpart. The example continues in script~\ref{brand-cassola-tab5}, where we perform further testing to check whether (a) the income elasticity in the money demand equation is 1 ($\beta_y = 1$) and (b) the Fisher relation is homogeneous ($\phi = 1$). Since the \option{full} switch was given to the initial \texttt{restrict} command, additional restrictions can be applied without having to repeat the previous ones. (The second script contains a few \texttt{printf} commands, which are not strictly necessary, to format the output nicely.) It turns out that both of the additional hypotheses are rejected by the data, with p-values of $0.002$ and $0.004$. \begin{script}[htbp] \caption{Further testing of money demand system} \label{brand-cassola-tab5} Input: \begin{scodebit} restrict b[1,5] = -1 end restrict genr ll_uie = $rlnl restrict b[2,3] = -1 end restrict genr ll_hfh = $rlnl # replicate table 5, page 824 printf "Testing zero restrictions in cointegration space:\n" printf " LR-test, rank = 3: chi^2(3) = %6.4f [%6.4f]\n", 2*(ll0-ll1), \ pvalue(X, 3, 2*(ll0-ll1)) printf "Unit income elasticity: LR-test, rank = 3:\n" printf " chi^2(4) = %g [%6.4f]\n", 2*(ll0-ll_uie), \ pvalue(X, 4, 2*(ll0-ll_uie)) printf "Homogeneity in the Fisher hypothesis:\n" printf " LR-test, rank = 3: chi^2(4) = %6.3f [%6.4f]\n", 2*(ll0-ll_hfh), \ pvalue(X, 4, 2*(ll0-ll_hfh)) \end{scodebit} Output: \begin{scodebit} Testing zero restrictions in cointegration space: LR-test, rank = 3: chi^2(3) = 1.4763 [0.6877] Unit income elasticity: LR-test, rank = 3: chi^2(4) = 17.2071 [0.0018] Homogeneity in the Fisher hypothesis: LR-test, rank = 3: chi^2(4) = 15.547 [0.0037] \end{scodebit} \end{script} Another type of test that is commonly performed is the ``weak exogeneity'' test. In this context, a variable is said to be weakly exogenous if all coefficients on the corresponding row in the $\alpha$ matrix are zero. If this is the case, that variable does not adjust to deviations from any of the long-run equilibria and can be considered an autonomous driving force of the whole system. The code in Example~\ref{brand-cassola-exog} performs this test for each variable in turn, thus replicating the first column of Table 6 on page 825 of Brand and Cassola (2004). The results show that weak exogeneity might perhaps be accepted for the long-term interest rate and real GDP (p-values $0.07$ and $0.08$ respectively). \begin{script}[htbp] \caption{Testing for weak exogeneity} \label{brand-cassola-exog} Input: \begin{scodebit} restrict a1 = 0 end restrict ts_m = 2*(ll0 - $rlnl) restrict a2 = 0 end restrict ts_p = 2*(ll0 - $rlnl) restrict a3 = 0 end restrict ts_l = 2*(ll0 - $rlnl) restrict a4 = 0 end restrict ts_s = 2*(ll0 - $rlnl) restrict a5 = 0 end restrict ts_y = 2*(ll0 - $rlnl) loop foreach i m p l s y --quiet printf "\Delta $i\t%6.3f [%6.4f]\n", ts_$i, pvalue(X, 6, ts_$i) end loop \end{scodebit} Output (variable, LR test, p-value): \begin{scodebit} \Delta m 18.111 [0.0060] \Delta p 21.067 [0.0018] \Delta l 11.819 [0.0661] \Delta s 16.000 [0.0138] \Delta y 11.335 [0.0786] \end{scodebit} %$ \end{script} \subsection{Identification and testability} \label{sec:ident-test} One point regarding VECM restrictions that can be confusing at first is that identification (does the restriction identify the system?) and testability (is the restriction testable?) are quite separate matters. Restrictions can be identifying but not testable; less obviously, they can be testable but not identifying. This can be seen quite easily in relation to a rank-1 system. The restriction $\beta_1 = 1$ is identifying (it pins down the scale of $\beta$) but, being a pure scaling, it is not testable. On the other hand, the restriction $\beta_1 + \beta_2 = 0$ is testable --- the system with this requirement imposed will almost certainly have a lower maximized likelihood --- but it is not identifying; it still leaves open the scale of $\beta$. We said above that the number of restrictions must equal at least $r^2$, where $r$ is the cointegrating rank, for identification. This is a necessary and not a sufficient condition. In fact, when $r>1$ it can be quite tricky to assess whether a given set of restrictions is identifying. \app{Gretl} uses the method suggested by Doornik (1995), where identification is assessed via the rank of the information matrix. It can be shown that for restrictions of the sort (\ref{eq:vecbeta}) and (\ref{eq:vecalpha}) the information matrix has the same rank as the Jacobian matrix % \[ {\cal J}(\theta) = \left[ (I_p \otimes \beta) G : (\alpha \otimes I_{p_1}) H \right] \] A sufficient condition for identification is that the rank of ${\cal J}(\theta)$ equals the number of free parameters. The rank of this matrix is evaluated by examination of its singular values at a randomly selected point in the parameter space. For practical purposes we treat this condition as if it were both necessary and sufficient; that is, we disregard the special cases where identification could be achieved without this condition being met.\footnote{See Boswijk and Doornik (2004, pp.\ 447--8) for discussion of this point.} \section{Numerical solution methods} \label{sec:vecm-opt} In general, the ML estimator for the restricted VECM problem has no closed form solution, hence the maximum must be found via numerical methods.\footnote{The exception is restrictions that are homogeneous, common to all $\beta$ or all $\alpha$ (in case $r>1$), and involve either $\beta$ only or $\alpha$ only. Such restrictions are handled via the modified eigenvalues method set out by Johansen (1995). We solve directly for the ML estimator, without any need for iterative methods.} In some cases convergence may be difficult, and \app{gretl} provides several choices to solve the problem. \subsection{Switching and LBFGS} \label{sec:vecm-algorithms} Two maximization methods are available in \app{gretl}. The default is the switching algorithm set out in Boswijk and Doornik (2004). The alternative is a limited-memory variant of the BFGS algorithm (LBFGS), using analytical derivatives. This is invoked using the \option{lbfgs} flag with the \texttt{restrict} command. The switching algorithm works by explicitly maximizing the likelihood at each iteration, with respect to $\hat{\phi}$, $\hat{\psi}$ and $\hat{\Omega}$ (the covariance matrix of the residuals) in turn. This method shares a feature with the basic Johansen eigenvalues procedure, namely, it can handle a set of restrictions that does not fully identify the parameters. LBFGS, on the other hand, requires that the model be fully identified. When using LBFGS, therefore, you may have to supplement the restrictions of interest with normalizations that serve to identify the parameters. For example, one might use all or part of the Phillips normalization (see section \ref{sec:johansen-ident}). Neither the switching algorithm nor LBFGS is guaranteed to find the global ML solution.\footnote{In developing \app{gretl}'s VECM-testing facilities we have considered a fair number of ``tricky cases'' from various sources. We'd like to thank Luca Fanelli of the University of Bologna and Sven Schreiber of Goethe University Frankfurt for their help in devising torture-tests for \app{gretl}'s VECM code.} The optimizer may end up at a local maximum (or, in the case of the switching algorithm, at a saddle point). The solution (or lack thereof) may be sensitive to the initial value selected for $\theta$. By default, \app{gretl} selects a starting point using a deterministic method based on Boswijk (1995), but two further options are available: the initialization may be adjusted using simulated annealing, or the user may supply an explicit initial value for $\theta$. The default initialization method is: % \begin{enumerate} \item Calculate the unrestricted ML $\hat{\beta}$ using the Johansen procedure. \item If the restriction on $\beta$ is non-homogeneous, use the method proposed by Boswijk (1995): \begin{equation} \phi_0 = -[(I_r \otimes \hat{\beta}_{\perp})'H]^+ (I_r \otimes \hat{\beta}_{\perp})' h_0 \end{equation} where $\hat{\beta}'_{\perp} \hat{\beta} = 0$ and $A^+$ denotes the Moore--Penrose inverse of $A$. Otherwise \begin{equation} \phi_0 = (H'H)^{-1} H' \vec{\hat{\beta}} \end{equation} \item $\vec{\beta_0} = H\phi_0 + h_0$. \item Calculate the unrestricted ML $\hat{\alpha}$ conditional on $\beta_0$, as per Johansen: \begin{equation} \label{eq:Jalpha} \hat{\alpha} = S_{01} \beta_0 (\beta'_0S_{11}\beta_0)^{-1} \end{equation} \item If $\alpha$ is restricted by $\vec{\alpha'} = G\psi$, then $\psi_0 = (G'G)^{-1}G'\,{\rm vec}(\hat{\alpha}')$ and $\vec{\alpha'_0} = G\psi_0$. \end{enumerate} \subsection{Alternative initialization methods} \label{sec:vecm-alt-init} As mentioned above, \app{gretl} offers the option of adjusting the initialization using \textbf{simulated annealing}. This is invoked by adding the \option{jitter} option to the \texttt{restrict} command. The basic idea is this: we start at a certain point in the parameter space, and for each of $n$ iterations (currently $n=4096$) we randomly select a new point within a certain radius of the previous one, and determine the likelihood at the new point. If the likelihood is higher, we jump to the new point; otherwise, we jump with probability $P$ (and remain at the previous point with probability $1-P$). As the iterations proceed, the system gradually ``cools'' --- that is, the radius of the random perturbation is reduced, as is the probability of making a jump when the likelihood fails to increase. In the course of this procedure many points in the parameter space are evaluated, starting with the point arrived at by the deterministic method, which we'll call $\theta_0$. One of these points will be ``best'' in the sense of yielding the highest likelihood: call it $\theta^*$. This point may or may not have a greater likelihood than $\theta_0$. And the procedure has an end point, $\theta_n$, which may or may not be ``best''. The rule followed by \app{gretl} in selecting an initial value for $\theta$ based on simulated annealing is this: use $\theta^*$ if $\theta^* > \theta_0$, otherwise use $\theta_n$. That is, if we get an improvement in the likelihood via annealing, we make full use of this; on the other hand, if we fail to get an improvement we nonetheless allow the annealing to randomize the starting point. Experiments indicated that the latter effect can be helpful. Besides annealing, a further alternative is \textbf{manual initialization}. This is done by passing a predefined vector to the \texttt{set} command with parameter \texttt{initvals}, as in % \begin{verbatim} set initvals myvec \end{verbatim} The details depend on whether the switching algorithm or LBFGS is used. For the switching algorithm, there are two options for specifying the initial values. The more user-friendly one (for most people, we suppose) is to specify a matrix that contains $\vec{\beta}$ followed by $\vec{\alpha}$. For example: \begin{code} open denmark.gdt vecm 2 1 LRM LRY IBO IDE --rc --seasonals matrix BA = {1, -1, 6, -6, -6, -0.2, 0.1, 0.02, 0.03} set initvals BA restrict b[1] = 1 b[1] + b[2] = 0 b[3] + b[4] = 0 end restrict \end{code} In this example --- from Johansen (1995) --- the cointegration rank is 1 and there are 4 variables. However, the model includes a restricted constant (the \option{rc} flag) so that $\beta$ has 5 elements. The $\alpha$ matrix has 4 elements, one per equation. So the matrix \texttt{BA} may be read as \[ \left(\beta_1, \beta_2, \beta_3, \beta_4, \beta_5, \alpha_1, \alpha_2, \alpha_3, \alpha_4 \right) \] The other option, which is compulsory when using LBFGS, is to specify the initial values in terms of the free parameters, $\phi$ and $\psi$. Getting this right is somewhat less obvious. As mentioned above, the implicit-form restriction $R\vec{\beta} = q$ has explicit form $\vec{\beta} = H\phi + h_0$, where $H = R_{\perp}$, the right nullspace of $R$. The vector $\phi$ is shorter, by the number of restrictions, than $\vec{\beta}$. The savvy user will then see what needs to be done. The other point to take into account is that if $\alpha$ is unrestricted, the \textit{effective} length of $\psi$ is 0, since it is then optimal to compute $\alpha$ using Johansen's formula, conditional on $\beta$ (equation \ref{eq:Jalpha} above). The example above could be rewritten as: \begin{code} open denmark.gdt vecm 2 1 LRM LRY IBO IDE --rc --seasonals matrix phi = {-8, -6} set initvals phi restrict --lbfgs b[1] = 1 b[1] + b[2] = 0 b[3] + b[4] = 0 end restrict \end{code} In this more economical formulation the initializer specifies only the two free parameters in $\phi$ (5 elements in $\beta$ minus 3 restrictions). There is no call to give values for $\psi$ since $\alpha$ is unrestricted. \subsection{Scale removal} \label{sec:vecm-scale-removal} Consider a simpler version of the restriction discussed in the previous section, namely, % \begin{code} restrict b[1] = 1 b[1] + b[2] = 0 end restrict \end{code} This restriction comprises a substantive, testable requirement --- that $\beta_1$ and $\beta_2$ sum to zero --- and a normalization or scaling, $\beta_1 = 1$. The question arises, might it be easier and more reliable to maximize the likelihood without imposing $\beta_1 = 1$?\footnote{As a numerical matter, that is. In principle this should make no difference.} If so, we could record this normalization, remove it for the purpose of maximizing the likelihood, then reimpose it by scaling the result. Unfortunately it is not possible to say in advance whether ``scale removal'' of this sort will give better results, for any particular estimation problem. However, this does seem to be the case more often than not. Gretl therefore performs scale removal where feasible, unless you \begin{itemize} \item explicitly forbid this, by giving the \option{no-scaling} option flag to the restrict command; or \item provide a specific vector of initial values; or \item select the LBFGS algorithm for maximization. \end{itemize} Scale removal is deemed infeasible if there are any cross-column restrictions on $\beta$, or any non-homogeneous restrictions involving more than one element of $\beta$. In addition, experimentation has suggested to us that scale removal is inadvisable if the system is just identified with the normalization(s) included, so we do not do it in that case. By ``just identified'' we mean that the system would not be identified if any of the restrictions were removed. On that criterion the above example is not just identified, since the removal of the second restriction would not affect identification; and \app{gretl} would in fact perform scale removal in this case unless the user specified otherwise. %%% Local Variables: %%% mode: latex %%% TeX-master: "gretl-guide" %%% End: