Sophie: gretl-1.9.4-1 x86

gretl-1.9.4-1.x86_64.rpm

\chapter{Cointegration and Vector Error Correction Models}
\label{chap:vecm}

\section{Introduction}
\label{sec:VECM-intro}

The twin concepts of cointegration and error correction have drawn a
good deal of attention in macroeconometrics over recent years.  The
attraction of the Vector Error Correction Model (VECM) is that it
allows the researcher to embed a representation of economic
equilibrium relationships within a relatively rich time-series
specification.  This approach overcomes the old dichotomy between (a)
structural models that faithfully represented macroeconomic theory but
failed to fit the data, and (b) time-series models that were
accurately tailored to the data but difficult if not impossible to
interpret in economic terms.

The basic idea of cointegration relates closely to the concept of unit
roots (see section~\ref{sec:uroot}).  Suppose we have a set of
macroeconomic variables of interest, and we find we cannot reject the
hypothesis that some of these variables, considered individually, are
non-stationary.  Specifically, suppose we judge that a subset of the
variables are individually integrated of order 1, or I(1).  That is,
while they are non-stationary in their levels, their first differences
are stationary.  Given the statistical problems associated with the
analysis of non-stationary data (for example, the threat of spurious
regression), the traditional approach in this case was to take first
differences of all the variables before proceeding with the analysis.

But this can result in the loss of important information.  It may be
that while the variables in question are I(1) when taken individually,
there exists a linear combination of the variables that is stationary
without differencing, or I(0).  (There could be more than one such
linear combination.)  That is, while the ensemble of variables may be
``free to wander'' over time, nonetheless the variables are ``tied
together'' in certain ways.  And it may be possible to interpret these
ties, or \emph{cointegrating vectors}, as representing equilibrium
conditions.

For example, suppose we find some or all of the following variables
are I(1): money stock, $M$, the price level, $P$, the nominal interest
rate, $R$, and output, $Y$.  According to standard theories of the
demand for money, we would nonetheless expect there to be an
equilibrium relationship between real balances, interest rate and
output; for example
\[
m - p = \gamma_0 + \gamma_1 y + \gamma_2 r \qquad \gamma_1 > 0,
\gamma_2 < 0
\]
where lower-case variable names denote logs.  In equilibrium, then,
\[
m - p - \gamma_1 y - \gamma_2 r = \gamma_0
\]
Realistically, we should not expect this condition to be satisfied
each period.  We need to allow for the possibility of short-run
disequilibrium.  But if the system moves back towards equilibrium
following a disturbance, it follows that the vector $x = (m, p, y,
r)'$ is bound by a cointegrating vector $\beta' = (\beta_1, \beta_2,
\beta_3, \beta_4)$, such that $\beta'x$ is stationary (with a mean of
$\gamma_0$).  Furthermore, if equilibrium is correctly characterized
by the simple model above, we have $\beta_2 = -\beta_1$, $\beta_3 < 0$
and $\beta_4 > 0$.  These things are testable within the context of
cointegration analysis.

There are typically three steps in this sort of analysis:
\begin{enumerate}
\item Test to determine the number of cointegrating vectors, the 
  \emph{cointegrating rank} of the system.
\item Estimate a VECM with the appropriate rank, but subject to no
  further restrictions.
\item Probe the interpretation of the cointegrating vectors as
  equilibrium conditions by means of restrictions on the elements
  of these vectors.
\end{enumerate}

The following sections expand on each of these points, giving further
econometric details and explaining how to implement the analysis using
\app{gretl}.


\section{Vector Error Correction Models as representation of a
  cointegrated system}
\label{sec:VECM-rep}

Consider a VAR of order $p$ with a deterministic part given by $\mu_t$
(typically, a polynomial in time). One can write the $n$-variate
process $y_t$ as
\begin{equation}
  \label{eq:VECM-VAR}
  y_t = \mu_t + A_1 y_{t-1} + A_2 y_{t-2} + \cdots + A_p y_{t-p} +
  \epsilon_t 
\end{equation}
But since $y_{t-1} \equiv y_{t} - \Delta y_t$ and $y_{t-i} \equiv
y_{t-1} - (\Delta y_{t-1} + \Delta y_{t-2} + \cdots + \Delta
y_{t-i+1})$, we can re-write the above as
\begin{equation}
  \label{eq:VECM}
  \Delta y_t = \mu_t + \Pi y_{t-1} + \sum_{i=1}^{p-1} \Gamma_i \Delta
  y_{t-i} + \epsilon_t ,
\end{equation}
where $\Pi = \sum_{i=1}^p A_i$ and $\Gamma_k = -\sum_{i=k}^p A_i$.
This is the VECM representation of (\ref{eq:VECM-VAR}).

The interpretation of (\ref{eq:VECM}) depends crucially on $r$, the rank of
the matrix $\Pi$.
\begin{itemize}
\item If $r = 0$, the processes are all I(1) and not cointegrated.
\item If $r = n$, then $\Pi$ is invertible and the processes are all I(0).
\item Cointegration occurs in between, when $0 < r < n$ and $\Pi$ can
  be written as $\alpha \beta'$. In this case, $y_t$ is I(1), but the
  combination $z_t = \beta'y_t$ is I(0). If, for example, $r=1$ and
  the first element of $\beta$ was $-1$, then one could write $z_t =
  -y_{1,t} + \beta_2 y_{2,t} + \cdots + \beta_n y_{n,t}$, which is
  equivalent to saying that
  \[
    y_{1_t} = \beta_2 y_{2,t} + \cdots + \beta_n y_{n,t} - z_t
  \]
  is a long-run equilibrium relationship: the deviations $z_t$
  may not be 0 but they are stationary. In this case, (\ref{eq:VECM})
  can be written as 
  \begin{equation}
    \label{eq:VECMab}
    \Delta y_t = \mu_t + \alpha \beta' y_{t-1} + \sum_{i=1}^{p-1} \Gamma_i 
    \Delta y_{t-i} + \epsilon_t .
  \end{equation}
  If $\beta$ were known, then $z_t$ would be observable and all the
  remaining parameters could be estimated via OLS.  In practice, the
  procedure estimates $\beta$ first and then the rest.
\end{itemize}

The rank of $\Pi$ is investigated by computing the eigenvalues of a
closely related matrix whose rank is the same as $\Pi$: however, this
matrix is by construction symmetric and positive semidefinite.  As a
consequence, all its eigenvalues are real and non-negative, and tests
on the rank of $\Pi$ can therefore be carried out by testing how many
eigenvalues are 0.

If all the eigenvalues are significantly different from 0, then all
the processes are stationary. If, on the contrary, there is at least
one zero eigenvalue, then the $y_t$ process is integrated, although
some linear combination $\beta'y_t$ might be stationary. At the other
extreme, if no eigenvalues are significantly different from 0, then
not only is the process $y_t$ non-stationary, but the same holds for
any linear combination $\beta'y_t$; in other words, no cointegration
occurs.

Estimation typically proceeds in two stages: first, a sequence of
tests is run to determine $r$, the cointegration rank. Then, for a
given rank the parameters in equation (\ref{eq:VECMab}) are estimated.
The two commands that \app{gretl} offers for estimating these systems
are \texttt{coint2} and \texttt{vecm}, respectively. 

The syntax for \texttt{coint2} is 
\begin{code}
  coint2 p ylist [ ; xlist [ ; zlist ] ]
\end{code}
where \texttt{p} is the number of lags in (\ref{eq:VECM-VAR});
\texttt{ylist} is a list containing the $y_t$ variables;
\texttt{xlist} is an optional list of exogenous variables; and
\texttt{zlist} is another optional list of exogenous variables whose
effects are assumed to be confined to the cointegrating relationships.

The syntax for \texttt{vecm} is 
\begin{code}
  vecm p r ylist [ ; xlist [ ; zlist ] ]
\end{code}
where \texttt{p} is the number of lags in (\ref{eq:VECM-VAR});
\texttt{r} is the cointegration rank; and the lists
\texttt{ylist}, \texttt{xlist} and \texttt{zlist} have the
same interpretation as in \texttt{coint2}.

Both commands can be given specific options to handle the treatment of
the deterministic component $\mu_t$. These are discussed in the
following section.

\section{Interpretation of the deterministic components}
\label{sec:coint-5cases}

Statistical inference in the context of a cointegrated system depends
on the hypotheses one is willing to make on the deterministic terms,
which leads to the famous ``five cases.''

In equation (\ref{eq:VECM}), the term $\mu_t$ is usually understood to
take the form
\[
  \mu_t = \mu_0 + \mu_1 \cdot t .
\]
In order to have the model mimic as closely as possible the features
of the observed data, there is a preliminary question to settle. Do
the data appear to follow a deterministic trend?  If so, is it
linear or quadratic?

Once this is established, one should impose restrictions on $\mu_0$
and $\mu_1$ that are consistent with this judgement. For example,
suppose that the data do not exhibit a discernible trend. This means
that $\Delta y_t$ is on average zero, so it is reasonable to assume
that its expected value is also zero. Write equation (\ref{eq:VECM})
as
\begin{equation}
  \label{eq:VECM-poly}
  \Gamma(L) \Delta y_t = \mu_0 + \mu_1 \cdot t + \alpha z_{t-1} +
  \epsilon_t ,
\end{equation}
where $z_{t} = \beta' y_{t}$ is assumed to be stationary and therefore
to possess finite moments. Taking unconditional expectations, we get
\[ 
  0 = \mu_0 + \mu_1 \cdot t + \alpha m_z .
\]
Since the left-hand side does not depend on $t$, the restriction
$\mu_1 = 0$ is a safe bet.  As for $\mu_0$, there are just two ways to
make the above expression true: either $\mu_0 = 0$ with $m_z = 0$, or
$\mu_0$ equals $-\alpha m_z$.  The latter possibility is less
restrictive in that the vector $\mu_0$ may be non-zero, but is
constrained to be a linear combination of the columns of $\alpha$.  In
that case, $\mu_0$ can be written as $\alpha \cdot c$, and one may
write (\ref{eq:VECM-poly}) as
\[
  \Gamma(L) \Delta y_t = \alpha \left[ \beta' \quad c \right] 
  \left[ \begin{array}{c} y_{t-1} \\ 1 \end{array} \right]  
  + \epsilon_t .
\]
The long-run relationship therefore contains an intercept. This type of
restriction is usually written
\[
  \alpha'_{\perp} \mu_0 = 0 ,
\]
where $\alpha_{\perp}$ is the left null space of the matrix $\alpha$.

An intuitive understanding of the issue can be gained by means of a
simple example. Consider a series $x_t$ which behaves as follows
%      
\[ x_t = m + x_{t-1} + \varepsilon_t \] 
%
where $m$ is a real number and $\varepsilon_t$ is a white noise
process: $x_t$ is then a random walk with drift $m$.  In the special
case $m$ = 0, the drift disappears and $x_t$ is a pure random walk.
    
Consider now another process $y_t$, defined by
%      
\[ y_t = k + x_t + u_t \] 
%
where, again, $k$ is a real number and $u_t$ is a white noise process.
Since $u_t$ is stationary by definition, $x_t$ and $y_t$ cointegrate:
that is, their difference
%      
\[ z_t = y_t - x_t = k + u_t \]
%	
is a stationary process. For $k$ = 0, $z_t$ is simple zero-mean white
noise, whereas for $k$ $\ne$ 0 the process $z_t$ is white noise with a
non-zero mean.
  
After some simple substitutions, the two equations above can be
represented jointly as a VAR(1) system
%      
\[ \left[ \begin{array}{c} y_t \\ x_t \end{array} \right] = \left[
  \begin{array}{c} k + m \\ m \end{array} \right] + \left[
  \begin{array}{rr} 0 & 1 \\ 0 & 1 \end{array} \right] \left[
  \begin{array}{c} y_{t-1} \\ x_{t-1} \end{array} \right] + \left[
  \begin{array}{c} u_t + \varepsilon_t \\ \varepsilon_t \end{array}
\right] \]
%	
or in VECM form
%      
\begin{eqnarray*}
  \left[  \begin{array}{c} \Delta y_t \\ \Delta x_t \end{array} \right]  & = & 
  \left[  \begin{array}{c} k + m \\ m \end{array} \right] +
  \left[  \begin{array}{rr} -1 & 1 \\ 0 & 0 \end{array} \right] 
  \left[  \begin{array}{c} y_{t-1} \\ x_{t-1} \end{array} \right] + 
  \left[  \begin{array}{c} u_t + \varepsilon_t \\ \varepsilon_t \end{array} \right] = \\
  & = & 
  \left[  \begin{array}{c} k + m \\ m \end{array} \right] +
  \left[  \begin{array}{r} -1 \\ 0 \end{array} \right]
  \left[  \begin{array}{rr} 1 & -1 \end{array} \right] 
  \left[  \begin{array}{c} y_{t-1} \\ x_{t-1} \end{array} \right] + 
  \left[  \begin{array}{c} u_t + \varepsilon_t \\ \varepsilon_t \end{array} \right] = \\
  & = & 
  \mu_0 + \alpha \beta^{\prime} \left[  \begin{array}{c} y_{t-1} \\ x_{t-1} \end{array} \right] + \eta_t = 
  \mu_0 + \alpha z_{t-1} + \eta_t ,
\end{eqnarray*}
%	
where $\beta$ is the cointegration vector and $\alpha$ is the
``loadings'' or ``adjustments'' vector.
     
We are now ready to consider three possible cases:
    
\begin{enumerate}
\item $m$ $\ne$ 0: In this case $x_t$ is trended, as we just saw; it
  follows that $y_t$ also follows a linear trend because on average it
  keeps at a fixed distance $k$ from $x_t$. The vector $\mu_0$ is
  unrestricted.
	
\item $m$ = 0 and $k$ $\ne$ 0: In this case, $x_t$ is not trended and
  as a consequence neither is $y_t$. However, the mean distance
  between $y_t$ and $x_t$ is non-zero. The vector
  $\mu_0$ is given by
%	  
  \[
  \mu_0 = \left[ \begin{array}{c} k \\ 0 \end{array} \right]
  \]
%	    
  which is not null and therefore the VECM shown above does have a
  constant term. The constant, however, is subject to the restriction
  that its second element must be 0. More generally,
  $\mu_0$ is a multiple of the vector $\alpha$. Note
  that the VECM could also be written as
%	  
  \[
  \left[ \begin{array}{c} \Delta y_t \\ \Delta x_t \end{array} \right]
  = \left[ \begin{array}{r} -1 \\ 0 \end{array} \right] \left[
    \begin{array}{rrr} 1 & -1 & -k \end{array} \right] \left[
    \begin{array}{c} y_{t-1} \\ x_{t-1} \\ 1 \end{array} \right] +
  \left[ \begin{array}{c} u_t + \varepsilon_t \\ \varepsilon_t
    \end{array} \right]
  \]
%	   
  which incorporates the intercept into the cointegration vector. This
  is known as the ``restricted constant'' case.
	
\item $m$ = 0 and $k$ = 0: This case is the most restrictive: clearly,
  neither $x_t$ nor $y_t$ are trended, and the mean distance between
  them is zero. The vector $\mu_0$ is also 0, which explains why this
  case is referred to as ``no constant.''
	
\end{enumerate}

In most cases, the choice between these three possibilities is based
on a mix of empirical observation and economic reasoning. If the
variables under consideration seem to follow a linear trend then we
should not place any restriction on the intercept. Otherwise, the
question arises of whether it makes sense to specify a cointegration
relationship which includes a non-zero intercept. One example where
this is appropriate is the relationship between two interest rates:
generally these are not trended, but the VAR might still have an
intercept because the difference between the two (the ``interest rate
spread'') might be stationary around a non-zero mean (for example,
because of a risk or liquidity premium).
    
The previous example can be generalized in three directions:
    
\begin{enumerate}
\item If a VAR of order greater than 1 is considered, the algebra gets
  more convoluted but the conclusions are identical.
\item If the VAR includes more than two endogenous variables the
  cointegration rank $r$ can be greater than 1. In this case, $\alpha$
  is a matrix with $r$ columns, and the case with restricted constant
  entails the restriction that $\mu_0$ should be some linear
  combination of the columns of $\alpha$.
\item If a linear trend is included in the model, the deterministic
  part of the VAR becomes $\mu_0 + \mu_1 t$. The reasoning is
  practically the same as above except that the focus now centers on
  $\mu_1$ rather than $\mu_0$.  The counterpart to the ``restricted
  constant'' case discussed above is a ``restricted trend'' case, such
  that the cointegration relationships include a trend but the first
  differences of the variables in question do not.  In the case of an
  unrestricted trend, the trend appears in both the cointegration
  relationships and the first differences, which corresponds to the
  presence of a quadratic trend in the variables themselves (in
  levels).
\end{enumerate}

In order to accommodate the five cases, \app{gretl} provides the
following options to the \texttt{coint2} and \texttt{vecm} commands:
\begin{center}
  \begin{tabular}{ccl}
    $\mu_t$ & \textit{option flag} & \textit{description} \\ [4pt]
    0 & \option{nc} & no constant \\
    $\mu_0, \alpha_{\perp}'\mu_0 = 0 $ &  \option{rc} & restricted
       constant \\
    $\mu_0$ &  default & unrestricted constant \\
    $\mu_0 + \mu_1 t , \alpha_{\perp}'\mu_1 = 0$ &  \option{crt} &
     constant + restricted trend \\
    $\mu_0 + \mu_1 t$ &  \option{ct} & 
    constant + unrestricted trend 
  \end{tabular}
\end{center}
Note that for this command the above options are mutually exclusive.
In addition, you have the option of using the \option{seasonal}
options, for augmenting $\mu_t$ with centered seasonal dummies.  In
each case, p-values are computed via the approximations by Doornik
(1998).

\section{The Johansen cointegration tests}
\label{sec:johansen-test}

The two Johansen tests for cointegration are used to establish the
rank of $\beta$; in other words, how many cointegration vectors the
system has.  These are the ``$\lambda$-max'' test, for hypotheses on
individual eigenvalues, and the ``trace'' test, for joint hypotheses.
Suppose that the eigenvalues $\lambda_i$ are sorted from largest to
smallest. The null hypothesis for the ``$\lambda$-max'' test on the
$i$-th eigenvalue is that $\lambda_i = 0$. The corresponding trace
test, instead, considers the hypothesis $\lambda_j = 0$ for all $j \ge
i$.

The \app{gretl} command \cmd{coint2} performs these two
tests. The corresponding menu entry in the GUI is ``Model, Time
Series, Cointegration Test, Johansen''.

As in the ADF test, the asymptotic distribution of the tests varies
with the deterministic component $\mu_t$ one includes in the VAR (see
section \ref{sec:coint-5cases} above). The following code uses the
\cmd{denmark} data file, supplied with \app{gretl}, to replicate
Johansen's example found in his 1995 book.
%
\begin{code}
open denmark
coint2 2 LRM LRY IBO IDE --rc --seasonal
\end{code}
%
In this case, the vector $y_t$ in equation (\ref{eq:VECM}) comprises
the four variables \cmd{LRM}, \cmd{LRY}, \cmd{IBO}, \cmd{IDE}. The
number of lags equals $p$ in (\ref{eq:VECM}) (that is, the number of
lags of the model written in VAR form). Part of the output is reported
below:

\begin{center}
\begin{code}
Johansen test:
Number of equations = 4
Lag order = 2
Estimation period: 1974:3 - 1987:3 (T = 53)

Case 2: Restricted constant
Rank Eigenvalue Trace test p-value   Lmax test  p-value
   0    0.43317     49.144 [0.1284]     30.087 [0.0286]
   1    0.17758     19.057 [0.7833]     10.362 [0.8017]
   2    0.11279     8.6950 [0.7645]     6.3427 [0.7483]
   3   0.043411     2.3522 [0.7088]     2.3522 [0.7076]
\end{code}
\end{center}

Both the trace and $\lambda$-max tests accept the null hypothesis that
the smallest eigenvalue is 0 (see the last row of the table), so we
may conclude that the series are in fact non-stationary.  However,
some linear combination may be I(0), since the $\lambda$-max test
rejects the hypothesis that the rank of $\Pi$ is 0 (though the trace
test gives less clear-cut evidence for this, with a p-value of
$0.1284$).

\section{Identification of the cointegration vectors}
\label{sec:johansen-ident}

The core problem in the estimation of equation (\ref{eq:VECM}) is to
find an estimate of $\Pi$ that has by construction rank $r$, so it can
be written as $\Pi = \alpha \beta'$, where $\beta$ is the matrix
containing the cointegration vectors and $\alpha$ contains the
``adjustment'' or ``loading'' coefficients whereby the endogenous
variables respond to deviation from equilibrium in the previous period.

Without further specification, the problem has multiple solutions (in
fact, infinitely many). The parameters $\alpha$ and $\beta$ are
under-identified: if all columns of $\beta$ are cointegration vectors,
then any arbitrary linear combinations of those columns is a
cointegration vector too.  To put it differently, if $\Pi = \alpha_0
\beta_0'$ for specific matrices $\alpha_0$ and $\beta_0$, then $\Pi$
also equals $(\alpha_0 Q)(Q^{-1} \beta_0')$ for any conformable
non-singular matrix $Q$.  In order to find a unique solution, it
is therefore necessary to impose some restrictions on $\alpha$ and/or
$\beta$. It can be shown that the minimum number of restrictions that
is necessary to guarantee identification is $r^2$.  Normalizing one
coefficient per column to 1 (or $-1$, according to taste) is a trivial
first step, which also helps in that the remaining coefficients can be
interpreted as the parameters in the equilibrium relations, but this
only suffices when $r=1$.

The method that \app{gretl} uses by default is known as the ``Phillips
normalization'', or ``triangular representation''.\footnote{For
  comparison with other studies, you may wish to normalize $\beta$
  differently.  Using the \texttt{set} command you can do
  \verb|set vecm_norm diag| to select a normalization that simply
  scales the columns of the original $\beta$ such that $\beta_{ij} =
  1$ for $i=j$ and $i \leq r$, as used in the empirical section of
  Boswijk and Doornik (2004).  Another alternative is
  \verb+set vecm_norm first+, which scales $\beta$ such that the
  elements on the first row equal 1.  To suppress normalization
  altogether, use \verb+set vecm_norm none+.  (To return to the
  default: \texttt{set vecm\_norm phillips}.)} The starting point is
writing $\beta$ in partitioned form as in
\[
  \beta = \left[
    \begin{array}{c} \beta_1 \\ \beta_2  \end{array}
    \right] ,
\]
where $\beta_1$ is an $r \times r$ matrix and  $\beta_2$ is $(n-r)
\times r$. Assuming that $\beta_1$ has full rank, $\beta$ can be
post-multiplied by $\beta_1^{-1}$, giving
\[
  \hat{\beta} = \left[
    \begin{array}{c} I \\ \beta_2 \beta_1^{-1}  \end{array}
    \right] =
    \left[
    \begin{array}{c} I \\ -B \end{array}
  \right]  ,
\]

The coefficients that \app{gretl} produces are $\hat{\beta}$, with
$B$ known as the matrix of unrestricted coefficients. In
terms of the underlying equilibrium relationship, the Phillips
normalization expresses the system of $r$ equilibrium relations as
  \begin{eqnarray*}
    y_{1,t} & = & b_{1,r+1} y_{r+1,t} + \ldots + b_{1,n} y_{n,t} \\
    y_{2,t} & = & b_{2,r+1} y_{r+1,t} + \ldots + b_{2,n} y_{n,t} \\
    & \vdots & \\
    y_{r,t} & = & b_{r,r+1} y_{r+1,t} + \ldots + b_{r,n} y_{r,t} 
  \end{eqnarray*}
where the first $r$ variables are expressed as functions of the
remaining $n-r$.

Although the triangular representation ensures that the statistical
problem of estimating $\beta$ is solved, the resulting equilibrium
relationships may be difficult to interpret. In this case, the user
may want to achieve identification by specifying manually the system
of $r^2$ constraints that \app{gretl} will use to produce an estimate
of $\beta$.

As an example, consider the money demand system presented in section
9.6 of Verbeek (2004).  The variables used are \texttt{m} (the log of
real money stock M1), \texttt{infl} (inflation), \texttt{cpr} (the
commercial paper rate), \texttt{y} (log of real GDP) and \texttt{tbr}
(the Treasury bill rate).\footnote{This data set is available in the
  \texttt{verbeek} data package; see
  \url{http://gretl.sourceforge.net/gretl_data.html}.}

Estimation of $\beta$ can be performed via the commands
\begin{code}
open money.gdt 
smpl 1954:1 1994:4 
vecm 6 2 m infl cpr y tbr --rc
\end{code}
and the relevant portion of the output reads
\begin{code}
Maximum likelihood estimates, observations 1954:1-1994:4 (T = 164)
Cointegration rank = 2
Case 2: Restricted constant

beta (cointegrating vectors, standard errors in parentheses)

m           1.0000       0.0000 
           (0.0000)     (0.0000) 
infl        0.0000       1.0000 
           (0.0000)     (0.0000) 
cpr        0.56108      -24.367 
          (0.10638)     (4.2113) 
y         -0.40446     -0.91166 
          (0.10277)     (4.0683) 
tbr       -0.54293       24.786 
          (0.10962)     (4.3394) 
const      -3.7483       16.751 
          (0.78082)     (30.909) 
\end{code}
Interpretation of the coefficients of the cointegration matrix $\beta$
would be easier if a meaning could be attached to each of its
columns. This is possible by hypothesizing the existence of two
long-run relationships: a money demand equation
\[
  \mbox{\tt m} = c_1 + \beta_1 \mbox{\tt infl} + \beta_2 \mbox{\tt
    y} + \beta_3 \mbox{\tt tbr}
\]
and a risk premium equation
\[
 \mbox{\tt cpr} = c_2 + \beta_4 \mbox{\tt infl} +
   \beta_5 \mbox{\tt y} + \beta_6 \mbox{\tt tbr}
\]
which imply that the cointegration matrix can be normalized as
\[
  \beta = \left[
    \begin{array}{rr}
      -1 & 0 \\ \beta_1 & \beta_4 \\ 0 & -1 \\ \beta_2 & \beta_5
      \\ \beta_3 & \beta_6 \\ c_1 & c_2
    \end{array}
    \right]
\]

This renormalization can be accomplished by means of the
\texttt{restrict} command, to be given after the \texttt{vecm} command
or, in the graphical interface, by selecting the ``Test, Linear
Restrictions'' menu entry. The syntax for entering the restrictions
should be fairly obvious:\footnote{Note that in this context we are
  bending the usual matrix indexation convention, using the leading
  index to refer to the \textit{column} of $\beta$ (the particular
  cointegrating vector).  This is standard practice in the literature,
  and defensible insofar as it is the columns of $\beta$ (the
  cointegrating relations or equilibrium errors) that are of primary
  interest.}
\begin{code}
restrict
  b[1,1] = -1
  b[1,3] = 0
  b[2,1] = 0
  b[2,3] = -1
end restrict
\end{code}
which produces

\begin{code}
Cointegrating vectors (standard errors in parentheses)

m          -1.0000       0.0000 
           (0.0000)     (0.0000) 
infl     -0.023026     0.041039 
        (0.0054666)   (0.027790) 
cpr         0.0000      -1.0000 
           (0.0000)     (0.0000) 
y          0.42545    -0.037414 
         (0.033718)    (0.17140) 
tbr      -0.027790       1.0172 
        (0.0045445)   (0.023102) 
const       3.3625      0.68744 
          (0.25318)     (1.2870) 
\end{code}

\section{Over-identifying restrictions}
\label{sec:johansen-overid}

One purpose of imposing restrictions on a VECM system is simply to
achieve identification.  If these restrictions are simply
normalizations, they are not testable and should have no effect on the
maximized likelihood.  In addition, however, one may wish to formulate
constraints on $\beta$ and/or $\alpha$ that derive from the economic
theory underlying the equilibrium relationships; substantive
restrictions of this sort are then testable via a likelihood-ratio
statistic.

\app{Gretl} is capable of testing general linear restrictions of the
form 
\begin{equation}
\label{eq:Rb}
R_b \vec{\beta} = q
\end{equation}
and/or
\begin{equation}
\label{eq:Ra}
R_a \vec{\alpha} = 0
\end{equation}
%
Note that the $\beta$ restriction may be non-homogeneous ($q \neq 0$)
but the $\alpha$ restriction must be homogeneous.  Nonlinear
restrictions are not supported, and neither are restrictions that
cross between $\beta$ and $\alpha$.  In the case where $r > 1$ such
restrictions may be in common across all the columns of $\beta$ (or
$\alpha$) or may be specific to certain columns of these matrices.
This is the case discussed in Boswijk (1995) and Boswijk and Doornik
(2004, section 4.4).

The restrictions (\ref{eq:Rb}) and (\ref{eq:Ra}) may be written in
explicit form as
\begin{equation}
\label{eq:vecbeta}
\vec{\beta} = H\phi + h_0
\end{equation}
and
\begin{equation}
\label{eq:vecalpha}
\vec{\alpha'} = G\psi
\end{equation}
respectively, where $\phi$ and $\psi$ are the free parameter vectors
associated with $\beta$ and $\alpha$ respectively.  We may refer
to the free parameters collectively as $\theta$ (the column vector
formed by concatenating $\phi$ and $\psi$).  \app{Gretl} uses this
representation internally when testing the restrictions.

If the list of restrictions that is passed to the \texttt{restrict}
command contains more constraints than necessary to achieve
identification, then an LR test is performed; moreover, the
\texttt{restrict} command can be given the \option{full} switch, in
which case full estimates for the restricted system are printed
(including the $\Gamma_i$ terms), and the system thus restricted
becomes the ``current model'' for the purposes of further tests.  Thus
you are able to carry out cumulative tests, as in Chapter 7 of
Johansen (1995).

\subsection{Syntax}
\label{sec:vecm-restr-syntax}

The full syntax for specifying the restriction is an extension of the
one exemplified in the previous section. Inside a
\texttt{restrict}\ldots\texttt{end restrict} block, valid statements
are of the form
\begin{center}
  \texttt{\emph{parameter linear combination}} = \emph{\texttt{scalar}}
\end{center}
where a parameter linear combination involves a weighted sum of
individual elements of $\beta$ or $\alpha$ (but not both in the same
combination); the scalar on the right-hand side must be 0 for
combinations involving $\alpha$, but can be any real number for
combinations involving $\beta$. Below, we give a few examples of valid
restrictions:
\begin{code}
  b[1,1] = 1.618
  b[1,4] + 2*b[2,5] = 0
  a[1,3] = 0
  a[1,1] - a[1,2] = 0
\end{code}

A special syntax is reserved for the case when a certain constraint
should be applied to all columns of $\beta$: in this case, one index is
given for each \texttt{b} term, and the square brackets are dropped.
Hence, the following syntax
\begin{code}
restrict
  b1 + b2 = 0
end restrict
\end{code}
corresponds to
\[
\beta = \left[
\begin{array}{rr}
\beta_{11} & \beta_{21} \\
-\beta_{11} & -\beta_{21} \\
\beta_{13} & \beta_{23} \\
\beta_{14} & \beta_{24}
\end{array}
\right]
\]
The same convention is used for $\alpha$: when only one index is given for
each \texttt{a} term, the restriction is presumed to apply to all $r$
rows of $\alpha$, or in other words the given variables are weakly
exogenous. For instance, the formulation
%
\begin{code}
restrict
  a3 = 0
  a4 = 0
end restrict
\end{code}
%
specifies that variables 3 and 4 do not respond to the deviation from
equilibrium in the previous period.  

Finally, a short-cut is available for setting up complex restrictions (but
currently only in relation to $\beta$): you can specify $R_b$ and $q$,
as in $R_b \vec{\beta} = q$, by giving the names of previously
defined matrices.  For example,
%
\begin{code}
matrix I4 = I(4)
matrix vR = I4**(I4~zeros(4,1))
matrix vq = mshape(I4,16,1)
restrict
  R = vR
  q = vq
end restrict
\end{code}
%
which manually imposes Phillips normalization on the $\beta$ estimates
for a system with cointegrating rank 4.
 
\subsection{An example}
\label{sec:vecm-overid-ex}

Brand and Cassola (2004) propose a money demand system for the Euro
area, in which they postulate three long-run equilibrium
relationships:
%
\begin{center}
\begin{tabular}{ll}
  money demand & $m = \beta_l l + \beta_y y$ \\
  Fisher equation & $\pi = \phi l$ \\
  Expectation theory of & $l = s$ \\ [-4pt]
  interest rates
\end{tabular}
\end{center}
%
where $m$ is real money demand, $l$ and $s$ are long- and short-term
interest rates, $y$ is output and $\pi$ is inflation.\footnote{A
  traditional formulation of the Fisher equation would reverse the
  roles of the variables in the second equation, but this detail is
  immaterial in the present context; moreover, the expectation theory
  of interest rates implies that the third equilibrium relationship
  should include a constant for the liquidity premium. However, since
  in this example the system is estimated with the constant term
  unrestricted, the liquidity premium gets merged in the system
  intercept and disappears from $z_t$.}  (The names for these
variables in the \app{gretl} data file are \verb|m_p|, \texttt{rl},
\texttt{rs}, \texttt{y} and \texttt{infl}, respectively.)

The cointegration rank assumed by the authors is 3 and there are 5
variables, giving 15 elements in the $\beta$ matrix.  $3 \times 3 = 9$
restrictions are required for identification, and a just-identified
system would have $15 - 9 = 6$ free parameters.  However, the
postulated long-run relationships feature only three free parameters,
so the over-identification rank is 3.

\begin{script}[htbp]
  \caption{Estimation of a money demand system with constraints on $\beta$}
  \label{brand-cassola-script}
Input:
\begin{scodebit}
open brand_cassola.gdt

# perform a few transformations
m_p = m_p*100
y = y*100
infl = infl/4
rs = rs/4
rl = rl/4

# replicate table 4, page 824
vecm 2 3 m_p infl rl rs y -q
genr ll0 = $lnl

restrict --full
  b[1,1] = 1
  b[1,2] = 0
  b[1,4] = 0
  b[2,1] = 0
  b[2,2] = 1
  b[2,4] = 0
  b[2,5] = 0
  b[3,1] = 0
  b[3,2] = 0
  b[3,3] = 1
  b[3,4] = -1
  b[3,5] = 0
end restrict
genr ll1 = $rlnl
\end{scodebit}
Partial output:
\begin{scodebit}
Unrestricted loglikelihood (lu) = 116.60268
Restricted loglikelihood (lr) = 115.86451
2 * (lu - lr) = 1.47635
P(Chi-Square(3) > 1.47635) = 0.68774

beta (cointegrating vectors, standard errors in parentheses)

m_p        1.0000       0.0000       0.0000 
          (0.0000)     (0.0000)     (0.0000) 
infl       0.0000       1.0000       0.0000 
          (0.0000)     (0.0000)     (0.0000) 
rl         1.6108     -0.67100       1.0000 
         (0.62752)   (0.049482)     (0.0000) 
rs         0.0000       0.0000      -1.0000 
          (0.0000)     (0.0000)     (0.0000) 
y         -1.3304       0.0000       0.0000 
        (0.030533)     (0.0000)     (0.0000) 
\end{scodebit}
%$
\end{script}

Example \ref{brand-cassola-script} replicates Table 4 on page 824 of
the Brand and Cassola article.\footnote{Modulo what appear to be a few
  typos in the article.} Note that we use the \verb|$lnl| accessor
after the \texttt{vecm} command to store the unrestricted
log-likelihood and the \verb|$rlnl| accessor after \texttt{restrict}
for its restricted counterpart. 

The example continues in script~\ref{brand-cassola-tab5}, where we
perform further testing to check whether (a) the income elasticity in
the money demand equation is 1 ($\beta_y = 1$) and (b) the Fisher
relation is homogeneous ($\phi = 1$). Since the \option{full} switch
was given to the initial \texttt{restrict} command, additional
restrictions can be applied without having to repeat the previous
ones.  (The second script contains a few \texttt{printf} commands,
which are not strictly necessary, to format the output nicely.)  It
turns out that both of the additional hypotheses are rejected by the
data, with p-values of $0.002$ and $0.004$.

\begin{script}[htbp]
  \caption{Further testing of money demand system}
  \label{brand-cassola-tab5}
Input:
\begin{scodebit}
restrict
  b[1,5] = -1
end restrict
genr ll_uie = $rlnl

restrict
  b[2,3] = -1
end restrict
genr ll_hfh = $rlnl

# replicate table 5, page 824
printf "Testing zero restrictions in cointegration space:\n"
printf "  LR-test, rank = 3: chi^2(3) = %6.4f [%6.4f]\n", 2*(ll0-ll1), \
	pvalue(X, 3, 2*(ll0-ll1))

printf "Unit income elasticity: LR-test, rank = 3:\n"
printf "  chi^2(4) = %g [%6.4f]\n", 2*(ll0-ll_uie), \
	pvalue(X, 4, 2*(ll0-ll_uie))

printf "Homogeneity in the Fisher hypothesis:\n"
printf "  LR-test, rank = 3: chi^2(4) = %6.3f [%6.4f]\n", 2*(ll0-ll_hfh), \
	pvalue(X, 4, 2*(ll0-ll_hfh))
\end{scodebit}
Output:
\begin{scodebit}
Testing zero restrictions in cointegration space:
  LR-test, rank = 3: chi^2(3) = 1.4763 [0.6877]
Unit income elasticity: LR-test, rank = 3:
  chi^2(4) = 17.2071 [0.0018]
Homogeneity in the Fisher hypothesis:
  LR-test, rank = 3: chi^2(4) = 15.547 [0.0037]  
\end{scodebit}
\end{script}

Another type of test that is commonly performed is the ``weak
exogeneity'' test. In this context, a variable is said to be weakly
exogenous if all coefficients on the corresponding row in the $\alpha$
matrix are zero. If this is the case, that variable does not adjust to
deviations from any of the long-run equilibria and can be considered
an autonomous driving force of the whole system.

The code in Example~\ref{brand-cassola-exog} performs this test for
each variable in turn, thus replicating the first column of Table 6 on
page 825 of Brand and Cassola (2004).  The results show that weak
exogeneity might perhaps be accepted for the long-term interest rate
and real GDP (p-values $0.07$ and $0.08$ respectively).

\begin{script}[htbp]
  \caption{Testing for weak exogeneity}
  \label{brand-cassola-exog}
Input:
\begin{scodebit}
restrict
  a1 = 0
end restrict
ts_m = 2*(ll0 - $rlnl)

restrict
  a2 = 0
end restrict
ts_p = 2*(ll0 - $rlnl)

restrict
  a3 = 0
end restrict
ts_l = 2*(ll0 - $rlnl)

restrict
  a4 = 0
end restrict
ts_s = 2*(ll0 - $rlnl)

restrict
  a5 = 0
end restrict
ts_y = 2*(ll0 - $rlnl)

loop foreach i m p l s y --quiet
  printf "\Delta $i\t%6.3f [%6.4f]\n", ts_$i, pvalue(X, 6, ts_$i)
end loop
\end{scodebit}
Output (variable, LR test, p-value):
\begin{scodebit}
\Delta m	18.111 [0.0060]
\Delta p	21.067 [0.0018]
\Delta l	11.819 [0.0661]
\Delta s	16.000 [0.0138]
\Delta y	11.335 [0.0786]
\end{scodebit}
%$
\end{script}

\subsection{Identification and testability}
\label{sec:ident-test}

One point regarding VECM restrictions that can be confusing at first
is that identification (does the restriction identify the system?) and
testability (is the restriction testable?) are quite separate matters.
Restrictions can be identifying but not testable; less obviously, they
can be testable but not identifying.

This can be seen quite easily in relation to a rank-1 system.  The
restriction $\beta_1 = 1$ is identifying (it pins down the scale of
$\beta$) but, being a pure scaling, it is not testable.  On the
other hand, the restriction $\beta_1 + \beta_2 = 0$ is testable ---
the system with this requirement imposed will almost certainly have a
lower maximized likelihood --- but it is not identifying; it still
leaves open the scale of $\beta$.  

We said above that the number of restrictions must equal at least
$r^2$, where $r$ is the cointegrating rank, for identification.  This
is a necessary and not a sufficient condition.  In fact, when $r>1$ it
can be quite tricky to assess whether a given set of restrictions is
identifying.  \app{Gretl} uses the method suggested by Doornik (1995),
where identification is assessed via the rank of the information matrix.

It can be shown that for restrictions of the sort (\ref{eq:vecbeta})
and (\ref{eq:vecalpha}) the information matrix has the same rank as the
Jacobian matrix
%
\[
{\cal J}(\theta) = \left[ (I_p \otimes \beta) G : 
                   (\alpha \otimes I_{p_1}) H \right]
\]

A sufficient condition for identification is that the rank of ${\cal
  J}(\theta)$ equals the number of free parameters.  The rank of this
matrix is evaluated by examination of its singular values at a
randomly selected point in the parameter space.  For practical
purposes we treat this condition as if it were both necessary and
sufficient; that is, we disregard the special cases where
identification could be achieved without this condition being
met.\footnote{See Boswijk and Doornik (2004, pp.\ 447--8) for
  discussion of this point.}


\section{Numerical solution methods}
\label{sec:vecm-opt}

In general, the ML estimator for the restricted VECM problem has no
closed form solution, hence the maximum must be found via numerical
methods.\footnote{The exception is restrictions that are homogeneous,
  common to all $\beta$ or all $\alpha$ (in case $r>1$), and involve
  either $\beta$ only or $\alpha$ only.  Such restrictions are handled
  via the modified eigenvalues method set out by Johansen (1995).  We
  solve directly for the ML estimator, without any need for iterative
  methods.}  In some cases convergence may be difficult, and
\app{gretl} provides several choices to solve the problem.

\subsection{Switching and LBFGS}
\label{sec:vecm-algorithms}

Two maximization methods are available in \app{gretl}. The default is
the switching algorithm set out in Boswijk and Doornik (2004).  The
alternative is a limited-memory variant of the BFGS algorithm (LBFGS),
using analytical derivatives.  This is invoked using the
\option{lbfgs} flag with the \texttt{restrict} command.

The switching algorithm works by explicitly maximizing the likelihood
at each iteration, with respect to $\hat{\phi}$, $\hat{\psi}$ and
$\hat{\Omega}$ (the covariance matrix of the residuals) in turn.  This
method shares a feature with the basic Johansen eigenvalues procedure,
namely, it can handle a set of restrictions that does not fully
identify the parameters.

LBFGS, on the other hand, requires that the model be fully identified.
When using LBFGS, therefore, you may have to supplement the
restrictions of interest with normalizations that serve to identify
the parameters.  For example, one might use all or part of the
Phillips normalization (see section \ref{sec:johansen-ident}).

Neither the switching algorithm nor LBFGS is guaranteed to find the
global ML solution.\footnote{In developing \app{gretl}'s VECM-testing
  facilities we have considered a fair number of ``tricky cases'' from
  various sources. We'd like to thank Luca Fanelli of the University
  of Bologna and Sven Schreiber of Goethe University Frankfurt
  for their help in devising torture-tests for \app{gretl}'s VECM
  code.} The optimizer may end up at a local maximum (or, in the case
of the switching algorithm, at a saddle point).

The solution (or lack thereof) may be sensitive to the initial value
selected for $\theta$.  By default, \app{gretl} selects a starting
point using a deterministic method based on Boswijk (1995), but two
further options are available: the initialization may be adjusted
using simulated annealing, or the user may supply an explicit initial
value for $\theta$.

The default initialization method is:
%
\begin{enumerate}
\item Calculate the unrestricted ML $\hat{\beta}$ using the
  Johansen procedure.
\item If the restriction on $\beta$ is non-homogeneous, use the
  method proposed by Boswijk (1995):
\begin{equation}
\phi_0 = -[(I_r \otimes \hat{\beta}_{\perp})'H]^+ 
  (I_r \otimes \hat{\beta}_{\perp})' h_0
\end{equation}
where $\hat{\beta}'_{\perp} \hat{\beta} = 0$ and $A^+$ denotes
the Moore--Penrose inverse of $A$.  Otherwise
\begin{equation}
\phi_0 = (H'H)^{-1} H' \vec{\hat{\beta}}
\end{equation}
\item $\vec{\beta_0} = H\phi_0 + h_0$.
\item Calculate the unrestricted ML $\hat{\alpha}$ conditional on
  $\beta_0$, as per Johansen:
\begin{equation}
\label{eq:Jalpha}
\hat{\alpha} = S_{01} \beta_0 (\beta'_0S_{11}\beta_0)^{-1}
\end{equation}
\item If $\alpha$ is restricted by $\vec{\alpha'} = G\psi$, then
  $\psi_0 = (G'G)^{-1}G'\,{\rm vec}(\hat{\alpha}')$ and
  $\vec{\alpha'_0} = G\psi_0$.
\end{enumerate}

\subsection{Alternative initialization methods}
\label{sec:vecm-alt-init}

As mentioned above, \app{gretl} offers the option of adjusting the
initialization using \textbf{simulated annealing}.  This is invoked by
adding the \option{jitter} option to the \texttt{restrict} command.

The basic idea is this: we start at a certain point in the parameter
space, and for each of $n$ iterations (currently $n=4096$) we randomly
select a new point within a certain radius of the
previous one, and determine the likelihood at the new point.  If the
likelihood is higher, we jump to the new point; otherwise, we jump
with probability $P$ (and remain at the previous point with
probability $1-P$).  As the iterations proceed, the system gradually
``cools'' --- that is, the radius of the random perturbation is
reduced, as is the probability of making a jump when the likelihood
fails to increase.

In the course of this procedure many points in the parameter space are
evaluated, starting with the point arrived at by the deterministic
method, which we'll call $\theta_0$.  One of these points will be
``best'' in the sense of yielding the highest likelihood: call it
$\theta^*$.  This point may or may not have a greater likelihood than
$\theta_0$.  And the procedure has an end point, $\theta_n$, which may
or may not be ``best''.

The rule followed by \app{gretl} in selecting an initial value for $\theta$
based on simulated annealing is this: use $\theta^*$ if $\theta^* >
\theta_0$, otherwise use $\theta_n$.  That is, if we get an
improvement in the likelihood via annealing, we make full use of this;
on the other hand, if we fail to get an improvement we nonetheless
allow the annealing to randomize the starting point.  Experiments
indicated that the latter effect can be helpful.

Besides annealing, a further alternative is \textbf{manual
  initialization}.  This is done by passing a predefined vector to the
\texttt{set} command with parameter \texttt{initvals}, as in
%
\begin{verbatim}
set initvals myvec
\end{verbatim}

The details depend on whether the switching algorithm or LBFGS is
used.  For the switching algorithm, there are two options for
specifying the initial values.  The more user-friendly one (for most
people, we suppose) is to specify a matrix that contains $\vec{\beta}$
followed by $\vec{\alpha}$. For example:
\begin{code}
open denmark.gdt
vecm 2 1 LRM LRY IBO IDE --rc --seasonals

matrix BA = {1, -1, 6, -6, -6, -0.2, 0.1, 0.02, 0.03}
set initvals BA
restrict
  b[1] = 1
  b[1] + b[2] = 0
  b[3] + b[4] = 0
end restrict
\end{code}

In this example --- from Johansen (1995) --- the cointegration rank is
1 and there are 4 variables.  However, the model includes a restricted
constant (the \option{rc} flag) so that $\beta$ has 5 elements.  The
$\alpha$ matrix has 4 elements, one per equation.  So the matrix
\texttt{BA} may be read as
\[
\left(\beta_1, \beta_2, \beta_3, \beta_4, \beta_5,
 \alpha_1, \alpha_2, \alpha_3, \alpha_4 \right)
\]

The other option, which is compulsory when using LBFGS, is to specify
the initial values in terms of the free parameters, $\phi$ and $\psi$.
Getting this right is somewhat less obvious.  As mentioned above, the
implicit-form restriction $R\vec{\beta} = q$ has explicit form
$\vec{\beta} = H\phi + h_0$, where $H = R_{\perp}$, the right
nullspace of $R$.  The vector $\phi$ is shorter, by the number of
restrictions, than $\vec{\beta}$.  The savvy user will then see what
needs to be done.  The other point to take into account is that if
$\alpha$ is unrestricted, the \textit{effective} length of $\psi$ is
0, since it is then optimal to compute $\alpha$ using Johansen's
formula, conditional on $\beta$ (equation \ref{eq:Jalpha} above).  The
example above could be rewritten as:
\begin{code}
open denmark.gdt
vecm 2 1 LRM LRY IBO IDE --rc --seasonals

matrix phi = {-8, -6}
set initvals phi
restrict --lbfgs
  b[1] = 1
  b[1] + b[2] = 0
  b[3] + b[4] = 0
end restrict
\end{code}

In this more economical formulation the initializer specifies only the
two free parameters in $\phi$ (5 elements in $\beta$ minus 3
restrictions).  There is no call to give values for $\psi$ since
$\alpha$ is unrestricted.

\subsection{Scale removal}
\label{sec:vecm-scale-removal}

Consider a simpler version of the restriction discussed in the
previous section, namely,
%
\begin{code}
restrict
  b[1] = 1
  b[1] + b[2] = 0
end restrict
\end{code}

This restriction comprises a substantive, testable requirement ---
that $\beta_1$ and $\beta_2$ sum to zero --- and a normalization or
scaling, $\beta_1 = 1$.  The question arises, might it be easier and
more reliable to maximize the likelihood without imposing $\beta_1 =
1$?\footnote{As a numerical matter, that is.  In principle this should
  make no difference.}  If so, we could record this normalization,
remove it for the purpose of maximizing the likelihood, then reimpose
it by scaling the result.

Unfortunately it is not possible to say in advance whether ``scale
removal'' of this sort will give better results, for any particular
estimation problem.  However, this does seem to be the case more often
than not.  Gretl therefore performs scale removal where feasible,
unless you
\begin{itemize}
\item explicitly forbid this, by giving the \option{no-scaling} option
  flag to the restrict command; or
\item provide a specific vector of initial values; or
\item select the LBFGS algorithm for maximization.
\end{itemize}

Scale removal is deemed infeasible if there are any cross-column
restrictions on $\beta$, or any non-homogeneous restrictions involving
more than one element of $\beta$.  

In addition, experimentation has suggested to us that scale removal is
inadvisable if the system is just identified with the normalization(s)
included, so we do not do it in that case.  By ``just identified'' we
mean that the system would not be identified if any of the
restrictions were removed.  On that criterion the above example is not
just identified, since the removal of the second restriction would not
affect identification; and \app{gretl} would in fact perform scale
removal in this case unless the user specified otherwise.

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "gretl-guide"
%%% End: