Sophie

Sophie

distrib > Mandriva > cooker > x86_64 > by-pkgid > 6f18ed73a5a43a8071874a0f54a927a1 > files > 126

gretl-1.9.4-1.x86_64.rpm

\chapter{Cheat sheet}
\label{chap:cheatsheet}

This chapter explains how to perform some common --- and some not so
common --- tasks in \app{gretl}'s scripting language.  Some but not
all of the techniques listed here are also available through the
graphical interface.  Although the graphical interface may be more
intuitive and less intimidating at first, we encourage users to take
advantage of the power of \app{gretl}'s scripting language as soon as
they feel comfortable with the program.

\section{Dataset handling}

\subsection{``Weird'' periodicities}

\emph{Problem:} You have data sampled each 3 minutes from 9am onwards;
you'll probably want to specify the hour as 20 periods.

\emph{Solution:}
\begin{code}
setobs 20 9:1 --special
\end{code}

\emph{Comment:} Now functions like \cmd{sdiff()} (``seasonal''
difference) or estimation methods like seasonal ARIMA will work as
expected.

\subsection{Help, my data are backwards!}

\emph{Problem:} Gretl expects time series data to be in chronological
order (most recent observation last), but you have imported
third-party data that are in reverse order (most recent first).

\emph{Solution:}
\begin{code}
setobs 1 1 --cross-section
genr sortkey = -obs
dataset sortby sortkey
setobs 1 1950 --time-series
\end{code}

\emph{Comment:} The first line is required only if the data currently
have a time series interpretation: it removes that interpretation,
because (for fairly obvious reasons) the \cmd{dataset sortby}
operation is not allowed for time series data.  The following two
lines reverse the data, using the negative of the built-in index
variable \texttt{obs}.  The last line is just illustrative: it
establishes the data as annual time series, starting in 1950.

If you have a dataset that is mostly the right way round, but a
particular variable is wrong, you can reverse that variable as
follows:
\begin{code}
genr x = sortby(-obs, x)
\end{code}


\subsection{Dropping missing observations selectively}

\emph{Problem:} You have a dataset with many variables and want to
restrict the sample to those observations for which there are no
missing observations for the variables \texttt{x1}, \texttt{x2} and
\texttt{x3}.

\begin{samepage}
\emph{Solution:}
\begin{code}
list X = x1 x2 x3
smpl --no missing X
\end{code}
\end{samepage}

\emph{Comment:} You can now save the file via a \cmd{store} command
to preserve a subsampled version of the dataset. Alternative solution
based on the \cmd{ok} function, such as
\begin{code}
list X = x1 x2 x3
genr sel = ok(X)
smpl sel --restrict
\end{code}
are perhaps less obvious, but more flexible. Pick your poison.

\subsection{``By'' operations}

\emph{Problem:} You have a discrete variable \texttt{d} and you want
to run some commands (for example, estimate a model) by splitting the
sample according to the values of \texttt{d}.

\emph{Solution:}
\begin{code}
matrix vd = values(d)
m = rows(vd)
loop for i=1..m
  scalar sel = vd[i]
  smpl (d=sel) --restrict --replace
  ols y const x
endloop
smpl --full
\end{code}

\emph{Comment:} The main ingredient here is a loop.  You can have
\app{gretl} perform as many instructions as you want for each value of
\texttt{d}, as long as they are allowed inside a loop. Note, however,
that if all you want is descriptive statistics, the \cmd{summary}
command does have a \option{by} option.

\subsection{Adding a time series to a panel}

\emph{Problem:} You have a panel dataset (comprising observations of
$n$ indidivuals in each of $T$ periods) and you want to add a variable
which is available in straight time-series form.  For example, you
want to add annual CPI data to a panel in order to deflate nominal
income figures.

In gretl a panel is represented in stacked time-series format, so in
effect the task is to create a new variable which holds $n$ stacked
copies of the original time series.  Let's say the panel comprises 500
individuals observed in the years 1990, 1995 and 2000 ($n=500$,
$T=3$), and we have these CPI data in the ASCII file \texttt{cpi.txt}:

\begin{code}
date cpi
1990 130.658
1995 152.383
2000 172.192
\end{code}

What we need is for the CPI variable in the panel to repeat these
three values 500 times.

\emph{Solution:} Simple!  With the panel dataset open in gretl,
\begin{code}
append cpi.txt
\end{code}

\emph{Comment:} If the length of the time series is the same as the
length of the time dimension in the panel (3 in this example), gretl
will perform the stacking automatically.  Rather than using the
\cmd{append} command you could use the ``Append data'' item under
the \textsf{File} menu in the GUI program.  For this to work, your
main dataset must be recognized as a panel.  This can be arranged via
the \cmd{setobs} command or the ``Dataset structure'' item under
the \textsf{Data} menu.


\section{Creating/modifying variables}

\subsection{Generating a dummy variable for a specific observation}

\emph{Problem:} Generate $d_t = 0$ for all observation but one, for
which $d_t = 1$.

\emph{Solution:}
\begin{code}
  genr d = (t="1984:2")
\end{code}

\emph{Comment:} The internal variable \texttt{t} is used to refer to
observations in string form, so if you have a cross-section sample you
may just use \texttt{d = (t="123")}; of course, if the dataset has
data labels, use the corresponding label. For example, if you open the
dataset \texttt{mrw.gdt}, supplied with \app{gretl} among the
examples, a dummy variable for Italy could be generated via
\begin{code}
  genr DIta = (t="Italy")
\end{code}

Note that this method does not require scripting at all. In fact, you
might as well use the GUI Menu ``Add/Define new variable'' for the
same purpose, with the same syntax.

\subsection{Generating an ARMA(1,1)}

\emph{Problem:} Generate $y_t = 0.9 y_{t-1} + \varepsilon_t - 0.5
\varepsilon_{t-1}$, with $\varepsilon_t \sim N\!I\!I\!D(0,1)$.

\emph{Recommended solution:}
\begin{code}
alpha = 0.9
theta = -0.5
series y = filter(normal(), {1, theta}, alpha)
\end{code}

\emph{``Bread and butter'' solution:}
\begin{code}
alpha = 0.9
theta = -0.5
series e = normal()
series y = 0
series y = alpha * y(-1) + e + theta * e(-1)
\end{code}

\emph{Comment:} The \cmd{filter} function is specifically designed for
this purpose so in most cases you'll want to take advantage of its
speed and flexibility. That said, in some cases you may want to
generate the series in a manner which is more transparent (maybe for
teaching purposes). 

In the second solution, the statement \cmd{series y = 0} is necessary
because the next statement evaluates \texttt{y} recursively, so
\texttt{y[1]} must be set.  Note that you must use the keyword
\texttt{series} here instead of writing \texttt{genr y = 0} or simply
\texttt{y = 0}, to ensure that \texttt{y} is a series and
not a scalar.

\subsection{Recoding a variable}

\emph{Problem:} You want to recode a variable by classes. For example,
you have the age of a sample of individuals ($x_i$) and you need to
compute age classes ($y_i$) as
\begin{eqnarray*}
  y_i = 1 & \mathrm{for} & x_i < 18 \\
  y_i = 2 & \mathrm{for} & 18 \le x_i < 65 \\
  y_i = 3 & \mathrm{for} & x_i \ge 65
\end{eqnarray*}

\emph{Solution:}
\begin{code}
series y = 1 + (x >= 18) + (x >= 65)
\end{code}

\emph{Comment:} True and false expressions are evaluated as 1 and 0
respectively, so they can be manipulated algebraically as any other
number. The same result could also be achieved by using the
conditional assignment operator (see below), but in most cases it
would probably lead to more convoluted constructs.

\subsection{Conditional assignment}

\emph{Problem:} Generate $y_t$ via the following rule:
\[
  y_t = \left\{ 
    \begin{array}{ll} 
      x_t & \mathrm{for} \quad d_t > a \\ 
      z_t & \mathrm{for} \quad d_t \le a 
    \end{array}
    \right. 
\]

\emph{Solution:}
\begin{code}
series y = (d > a) ? x : z
\end{code}

\emph{Comment:} There are several alternatives to the one presented
above. One is a brute force solution using loops. Another one, more
efficient but still suboptimal, would be 
\begin{code}
series y = (d>a)*x + (d<=a)*z  
\end{code}
However, the ternary conditional assignment operator is not only the
most numerically efficient way to accomplish what we want, it is also
remarkably transparent to read when one gets used to it. Some readers
may find it helpful to note that the conditional assignment operator
works exactly the same way as the \texttt{=IF()} function in
spreadsheets.

\subsection{Generating a time index for panel datasets}

\emph{Problem:} \app{Gretl} has a \texttt{\$unit} accessor, but not
the equivalent for time. What should I use?

\emph{Solution:}
\begin{code}
series x = time
\end{code}

\emph{Comment:} The special construct \cmd{genr time} and its variants
are aware of whether a dataset is a panel.

\section{Neat tricks}
\label{sec:cheat-neat}

\subsection{Interaction dummies}

\emph{Problem:} You want to estimate the model $y_i = \mathbf{x}_i
\beta_1 + \mathbf{z}_i \beta_2 + d_i \beta_3 + (d_i \cdot \mathbf{z}_i)
\beta_4 + \varepsilon_t$, where $d_i$ is a dummy variable while
$\mathbf{x}_i$ and $\mathbf{z}_i$ are vectors of explanatory
variables.

\emph{Solution:}
\begin{code}
list X = x1 x2 x3
list Z = z1 z2
list dZ = null
loop foreach i Z
  series d$i = d * $i
  list dZ = dZ d$i
endloop 

ols y X Z d dZ
\end{code} 
%$

\emph{Comment:} It's amazing what string substitution can do for
you, isn't it?

\subsection{Realized volatility}

\emph{Problem:} Given data by the minute, you want to compute the
``realized volatility'' for the hour as $RV_t = \frac{1}{60}
\sum_{\tau=1}^{60} y_{t:\tau}^2$. Imagine your sample starts at time 1:1.

\emph{Solution:}
\begin{code}
smpl --full
genr time
genr minute = int(time/60) + 1
genr second = time % 60
setobs minute second --panel
genr rv = psd(y)^2
setobs 1 1
smpl second=1 --restrict
store foo rv
\end{code}

\emph{Comment:} Here we trick \app{gretl} into thinking that our
dataset is a panel dataset, where the minutes are the ``units'' and
the seconds are the ``time''; this way, we can take advantage of the
special function \cmd{psd()}, panel standard deviation.  Then we
simply drop all observations but one per minute and save the resulting
data (\texttt{store foo rv} translates as ``store in the \app{gretl}
datafile \texttt{foo.gdt} the series \texttt{rv}'').

\subsection{Looping over two paired lists}

\emph{Problem:} Suppose you have two lists with the same number of
elements, and you want to apply some command to corresponding elements
over a loop.

\emph{Solution:}
\begin{code}
list L1 = a b c
list L2 = x y z

k1 = 1
loop foreach i L1 --quiet
    k2 = 1
    loop foreach j L2 --quiet
        if k1=k2
            ols $i 0 $j
        endif
        k2++
    endloop
    k1++
endloop
\end{code}

\emph{Comment:} The simplest way to achieve the result is to loop over
all possible combinations and filter out the unneeded ones via an
\texttt{if} condition, as above. That said, in some cases variable
names can help. For example, if
\begin{code}
  list Lx = x1 x2 x3
  list Ly = y1 y2 y3
\end{code}
looping over the integers is quite intuitive and certainly more elegant:
\begin{code}
  loop for i=1..3
    ols y$i const x$i
  endloop
\end{code}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "gretl-guide"
%%% End: