Sophie: gretl-1.8.6-2mdv2010.1 x86

gretl-1.8.6-2mdv2010.1.x86_64.rpm

\chapter{Loop constructs}
\label{chap:looping}

\section{Introduction}
\label{loop-intro}

The command \cmd{loop} opens a special mode in which \app{gretl}
accepts a block of commands to be repeated zero or more times.  This
feature may be useful for, among other things, Monte Carlo simulations,
bootstrapping of test statistics and iterative estimation procedures.
The general form of a loop is:

\begin{code}
loop control-expression [ --progressive | --verbose | --quiet ]
   loop body
endloop
\end{code}

Five forms of control-expression are available, as explained in
section~\ref{loop-control}.

Not all \app{gretl} commands are available within loops.  The commands
that are not presently accepted in this context are shown in
Table~\ref{tab:nonloopcmds}.

\begin{table}[htbp]
\caption{Commands not usable in loops}
\label{tab:nonloopcmds}
\begin{center}
%% The following is generated automatically
\input tabnonloopcmds.tex
\end{center}
\end{table}

By default, the \cmd{genr} command operates quietly in the context of
a loop (without printing information on the variable generated).  To
force the printing of feedback from \cmd{genr} you may specify the
\option{verbose} option to \cmd{loop}.  The \option{quiet} option
suppresses the usual printout of the number of iterations performed,
which may be desirable when loops are nested.

The \option{progressive} option to \cmd{loop} modifies the behavior of
the commands \cmd{print} and \cmd{store}, and certain estimation
commands, in a manner that may be useful with Monte Carlo analyses
(see Section \ref{loop-progressive}).
    
The following sections explain the various forms of the loop control
expression and provide some examples of use of loops.  

\tip{If you are carrying out a substantial Monte Carlo analysis with
  many thousands of repetitions, memory capacity and processing time
  may be an issue.  To minimize the use of computer resources, run
  your script using the command-line program, \app{gretlcli}, with
  output redirected to a file.}

\section{Loop control variants}
\label{loop-control}

\subsection{Count loop}
\label{loop-count}

The simplest form of loop control is a direct specification of the
number of times the loop should be repeated.  We refer to this as a
``count loop''.  The number of repetitions may be a numerical
constant, as in \verb+loop 1000+, or may be read from a scalar
variable, as in \verb+loop replics+.

In the case where the loop count is given by a variable, say
\verb+replics+, in concept \verb+replics+ is an integer; if the value
is not integral, it is converted to an integer by truncation.  Note
that \verb+replics+ is evaluated only once, when the loop is initially
compiled.
      

\subsection{While loop}
\label{loop-while}

A second sort of control expression takes the form of the keyword
\cmd{while} followed by a boolean expression.  For example,
%
\begin{code}
loop while essdiff > .00001
\end{code}

Execution of the commands within the loop will continue so long as (a)
the specified condition evaluates as true and (b) the number of
iterations does not exceed the value of the internal variable
\verb|loop_maxiter|.  By default this equals 250, but you can specify
a different value via the \cmd{set} command (see the \GCR).

\subsection{Index loop}
\label{loop-index}

A third form of loop control uses the special internal index variable
\verb+i+.  In this case you specify starting and ending values for
\verb+i+, which is incremented by one each time round the loop.  The
syntax looks like this: \cmd{loop i=1..20}.

The index variable may be used within the loop body in one or both of
two ways: you can access the integer value of \verb+i+ (see Example
\ref{loop-panel-script}) or you can use its string representation,
\verb+$i+ (see Example \ref{loop-string-script}).

The starting and ending values for the index can be given in numerical
form, or by reference to predefined scalar variables.  In the latter
case the variables are evaluated once, when the loop is set up.  In
addition, with time series data you can give the starting and ending
values in the form of dates, as in \cmd{loop i=1950:1..1999:4}.
      
This form of loop is particularly useful in conjunction with the
\texttt{values()} matrix function when some operation must be carried
out for each value of some discrete variable (see chapter
\ref{chap:discrete}). Consider the following example:

\begin{code}
open greene22_2
open greene22_2
discrete Z8
v8 = values(Z8)
n = rows(v8)
n = rows(v8)
loop i=1..n
  scalar xi = v8[$i]
  smpl (Z8=xi) --restrict --replace
  printf "mean(Y | Z8 = %g) = %8.5f, sd(Y | Z8 = %g) = %g\n", \
    xi, mean(Y), xi, sd(Y)
end loop
\end{code}

In this case, we evaluate the conditional mean and standard deviation
of the variable \texttt{Y} for each value of \texttt{Z8}.

\subsection{Foreach loop}
\label{loop-each}

The fourth form of loop control also uses the internal variable
\verb+i+, but in this case the variable ranges over a specified list
of strings.  The loop is executed once for each string in the list.
This can be useful for performing repetitive operations on a list of
variables.  Here is an example of the syntax:
      
\begin{code}
loop foreach i peach pear plum
   print "$i"
endloop
\end{code}

This loop will execute three times, printing out ``peach'', ``pear''
and ``plum'' on the respective iterations.  

If you wish to loop across a list of variables that are contiguous in
the dataset, you can give the names of the first and last variables in
the list, separated by ``\verb+..+'', rather than having to type all
the names.  For example, say we have 50 variables \verb+AK+,
\verb+AL+, \dots{}, \verb+WY+, containing income levels for the states
of the US.  To run a regression of income on time for each of the
states we could do:

\begin{code}
genr time
loop foreach i AL..WY
   ols $i const time
endloop
\end{code}

This loop variant can also be used for looping across the elements in
a \textit{named list} (see chapter~\ref{chap-persist}).  For example:

\begin{code}
list ylist = y1 y2 y3
loop foreach i ylist
   ols $i const x1 x2
endloop
\end{code}

Note that if you use this idiom inside a function (see
chapter~\ref{chap:functions}), looping across a list that has been
supplied to the function as an argument, it is necessary to use the
syntax \textsl{listname}.\verb|$i| to reference the list-member
variables.  In the context of the example above, this would mean
replacing the third line with
%
\begin{code}
   ols ylist.$i const x1 x2
\end{code}

\subsection{For loop}
\label{loop-for}

The final form of loop control emulates the \cmd{for} statement in the
C programming language.  The sytax is \texttt{loop for}, followed by
three component expressions, separated by semicolons and surrounded by
parentheses.  The three components are as follows:

\begin{enumerate}
\item Initialization: This is evaluated only once, at the start of the
  loop.  Common example: setting a scalar control variable to some
  starting value.
\item Continuation condition: this is evaluated at the top of each
  iteration (including the first).  If the expression evaluates as
  true (non-zero), iteration continues, otherwise it stops. Common
  example: an inequality expressing a bound on a control variable.
\item Modifier: an expression which modifies the value of
  some variable.  This is evaluated prior to checking the
  continuation condition, on each iteration after the first.
  Common example: a control variable is incremented or
  decremented.
\end{enumerate}

Here's a simple example:
%
\begin{code}
loop for (r=0.01; r<.991; r+=.01)
\end{code}

In this example the variable \verb+r+ will take on the values 0.01,
0.02, \dots{}, 0.99 across the 99 iterations.  Note that due to the
finite precision of floating point arithmetic on computers it may be
necessary to use a continuation condition such as the above,
\verb+r<.991+, rather than the more ``natural'' \verb+r<=.99+.  (Using
double-precision numbers on an x86 processor, at the point where you
would expect \verb+r+ to equal 0.99 it may in fact have value
0.990000000000001.)

Any or all of the three expressions governing a \texttt{for} loop may
be omitted --- the minimal form is \texttt{(;;)}.  If the continuation
test is omitted it is implicitly true, so you have an infinite loop
unless you arrange for some other way out, such as a \cmd{break}
statement.

If the initialization expression in a \texttt{for} loop takes the
common form of setting a scalar variable to a given value, the string
representation of that scalar's value will be available within
the loop via the accessor \verb+$+\textsl{varname}.  


\section{Progressive mode}
\label{loop-progressive}

If the \option{progressive} option is given for a command loop,
special behavior is invoked for certain commands, namely, \cmd{print},
\cmd{store} and simple estimation commands.  By ``simple'' here we
mean commands which (a) estimate a single equation (as opposed to a
system of equations) and (b) do so by means of a single command
statement (as opposed to a block of statements, as with \cmd{nls} and
\cmd{mle}).  The paradigm is \cmd{ols}; other possibilities include
\cmd{tsls}, \cmd{wls}, \cmd{logit} and so on.

The special behavior is as follows.

Estimators: The results from each individual iteration of the
estimator are not printed.  Instead, after the loop is completed you
get a printout of (a) the mean value of each estimated coefficient
across all the repetitions, (b) the standard deviation of those
coefficient estimates, (c) the mean value of the estimated standard
error for each coefficient, and (d) the standard deviation of the
estimated standard errors.  This makes sense only if there is some
random input at each step.

\cmd{print}: When this command is used to print the value of a
variable, you do not get a print each time round the loop.  Instead,
when the loop is terminated you get a printout of the mean and
standard deviation of the variable, across the repetitions of the
loop.  This mode is intended for use with variables that have a scalar
value at each iteration, for example the error sum of squares from a
regression.  Data series cannot be printed in this way.

\cmd{store}: This command writes out the values of the specified
scalars, from each time round the loop, to a specified file.  Thus it
keeps a complete record of their values across the iterations.  For
example, coefficient estimates could be saved in this way so as to
permit subsequent examination of their frequency distribution.  Only
one such \cmd{store} can be used in a given loop.

\section{Loop examples}
\label{loop-examples}


\subsection{Monte Carlo example}
\label{loop-mc-example}

A simple example of a Monte Carlo loop in ``progressive'' mode is
shown in Example~\ref{monte-carlo-loop}.

\begin{script}[htbp]
  \caption{Simple Monte Carlo loop}
  \label{monte-carlo-loop}
\begin{scode}
nulldata 50
seed 547
genr x = 100 * uniform()
# open a "progressive" loop, to be repeated 100 times
loop 100 --progressive
   genr u = 10 * normal()
   # construct the dependent variable
   genr y = 10*x + u
   # run OLS regression
   ols y const x
   # grab the coefficient estimates and R-squared
   genr a = $coeff(const)
   genr b = $coeff(x)
   genr r2 = $rsq
   # arrange for printing of stats on these
   print a b r2
   # and save the coefficients to file
   store coeffs.gdt a b
endloop
\end{scode}
\end{script}

This loop will print out summary statistics for the `a' and `b'
estimates and $R^2$ across the 100 repetitions.  After running the
loop, \verb+coeffs.gdt+, which contains the individual coefficient
estimates from all the runs, can be opened in \app{gretl} to examine
the frequency distribution of the estimates in detail.

The command \cmd{nulldata} is useful for Monte Carlo work.  Instead of
opening a ``real'' data set, \cmd{nulldata 50} (for instance) opens a
dummy data set, containing just a constant and an index variable, with
a series length of 50. Constructed variables can then be added using
the \cmd{genr} command.See the \cmd{set} command for information on
generating repeatable pseudo-random series.

\subsection{Iterated least squares}
\label{loop-ils-examples}

Example \ref{greene-ils-script} uses a ``while'' loop to replicate the
estimation of a nonlinear consumption function of the form
	
\[ C = \alpha + \beta Y^{\gamma} + \epsilon \]

as presented in Greene (2000, Example 11.3).  This script is included
in the \app{gretl} distribution under the name \verb+greene11_3.inp+;
you can find it in \app{gretl} under the menu item ``File, Script files,
Practice file, Greene...''.

The option \option{print-final} for the \cmd{ols} command arranges
matters so that the regression results will not be printed each time
round the loop, but the results from the regression on the last
iteration will be printed when the loop terminates.

\begin{script}[htbp]
  \caption{Nonlinear consumption function}
  \label{greene-ils-script}
\begin{scode}
open greene11_3.gdt
# run initial OLS
ols C 0 Y
genr essbak = $ess
genr essdiff = 1
genr beta = $coeff(Y)
genr gamma = 1
# iterate OLS till the error sum of squares converges
loop while essdiff > .00001
   # form the linearized variables
   genr C0 = C + gamma * beta * Y^gamma * log(Y)
   genr x1 = Y^gamma
   genr x2 = beta * Y^gamma * log(Y)
   # run OLS 
   ols C0 0 x1 x2 --print-final --no-df-corr --vcv
   genr beta = $coeff(x1)
   genr gamma = $coeff(x2)
   genr ess = $ess
   genr essdiff = abs(ess - essbak)/essbak
   genr essbak = ess
endloop 
# print parameter estimates using their "proper names"
noecho
printf "alpha = %g\n", $coeff(0)
printf "beta  = %g\n", beta
printf "gamma = %g\n", gamma
\end{scode}
\end{script}

Example~\ref{jack-arma} shows how a loop can be used to
estimate an ARMA model, exploiting the ``outer product of the
gradient'' (OPG) regression discussed by Davidson and MacKinnon in
their \emph{Estimation and Inference in Econometrics}.

\begin{script}[htbp]
  \caption{ARMA 1, 1}
  \label{jack-arma}
\begin{scode}
open armaloop.gdt

genr c = 0
genr a = 0.1
genr m = 0.1

series e = 1.0
genr de_c = e
genr de_a = e
genr de_m = e

genr crit = 1
loop while crit > 1.0e-9

   # one-step forecast errors
   genr e = y - c - a*y(-1) - m*e(-1)  

   # log-likelihood 
   genr loglik = -0.5 * sum(e^2)
   print loglik

   # partials of forecast errors wrt c, a, and m
   genr de_c = -1 - m * de_c(-1) 
   genr de_a = -y(-1) -m * de_a(-1)
   genr de_m = -e(-1) -m * de_m(-1)

   # partials of l wrt c, a and m
   genr sc_c = -de_c * e
   genr sc_a = -de_a * e
   genr sc_m = -de_m * e

   # OPG regression
   ols const sc_c sc_a sc_m --print-final --no-df-corr --vcv

   # Update the parameters
   genr dc = $coeff(sc_c) 
   genr c = c + dc
   genr da = $coeff(sc_a) 
   genr a = a + da
   genr dm = $coeff(sc_m) 
   genr m = m + dm

   printf "  constant        = %.8g (gradient = %#.6g)\n", c, dc
   printf "  ar1 coefficient = %.8g (gradient = %#.6g)\n", a, da
   printf "  ma1 coefficient = %.8g (gradient = %#.6g)\n", m, dm

   genr crit = $T - $ess
   print crit
endloop

genr se_c = $stderr(sc_c)
genr se_a = $stderr(sc_a)
genr se_m = $stderr(sc_m)

noecho
print "
printf "constant = %.8g (se = %#.6g, t = %.4f)\n", c, se_c, c/se_c
printf "ar1 term = %.8g (se = %#.6g, t = %.4f)\n", a, se_a, a/se_a
printf "ma1 term = %.8g (se = %#.6g, t = %.4f)\n", m, se_m, m/se_m
\end{scode}
\end{script}


\subsection{Indexed loop examples}

Example \ref{loop-panel-script} shows an indexed loop in which the
\cmd{smpl} is keyed to the index variable \verb+i+.  Suppose we have a
panel dataset with observations on a number of hospitals for the years
1991 to 2000 (where the year of the observation is indicated by a
variable named \verb+year+).  We restrict the sample to each of these
years in turn and print cross-sectional summary statistics for
variables 1 through 4.

\begin{script}[htbp]
  \caption{Panel statistics}
  \label{loop-panel-script}
\begin{scode}
open hospitals.gdt
loop i=1991..2000
  smpl (year=i) --restrict --replace
  summary 1 2 3 4
endloop
\end{scode}
\end{script}


Example \ref{loop-string-script} illustrates string substitution in an
indexed loop.

\begin{script}[htbp]
  \caption{String substitution}
  \label{loop-string-script}
\begin{scode}
open bea.dat
loop i=1987..2001
  genr V = COMP$i
  genr TC = GOC$i - PBT$i
  genr C = TC - V
  ols PBT$i const TC V
endloop
\end{scode}
\end{script}

The first time round this loop the variable \verb+V+ will be set to
equal \verb+COMP1987+ and the dependent variable for the \cmd{ols}
will be \verb+PBT1987+. The next time round \verb+V+ will be redefined
as equal to \verb+COMP1988+ and the dependent variable in the
regression will be \verb+PBT1988+.  And so on.

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "gretl-guide"
%%% End: