\chapter{Cheat sheet} \label{chap:cheatsheet} This chapter explains how to perform some common --- and some not so common --- tasks in \app{gretl}'s scripting language. Some but not all of the techniques listed here are also available through the graphical interface. Although the graphical interface may be more intuitive and less intimidating at first, we encourage users to take advantage of the power of \app{gretl}'s scripting language as soon as they feel comfortable with the program. \section{Dataset handling} \subsection{``Weird'' periodicities} \emph{Problem:} You have data sampled each 3 minutes from 9am onwards; you'll probably want to specify the hour as 20 periods. \emph{Solution:} \begin{code} setobs 20 9:1 --special \end{code} \emph{Comment:} Now functions like \cmd{sdiff()} (``seasonal'' difference) or estimation methods like seasonal ARIMA will work as expected. \subsection{Help, my data are backwards!} \emph{Problem:} Gretl expects time series data to be in chronological order (most recent observation last), but you have imported third-party data that are in reverse order (most recent first). \emph{Solution:} \begin{code} setobs 1 1 --cross-section genr sortkey = -obs dataset sortby sortkey setobs 1 1950 --time-series \end{code} \emph{Comment:} The first line is required only if the data currently have a time series interpretation: it removes that interpretation, because (for fairly obvious reasons) the \cmd{dataset sortby} operation is not allowed for time series data. The following two lines reverse the data, using the negative of the built-in index variable \texttt{obs}. The last line is just illustrative: it establishes the data as annual time series, starting in 1950. If you have a dataset that is mostly the right way round, but a particular variable is wrong, you can reverse that variable as follows: \begin{code} genr x = sortby(-obs, x) \end{code} \subsection{Dropping missing observations selectively} \emph{Problem:} You have a dataset with many variables and want to restrict the sample to those observations for which there are no missing observations for the variables \texttt{x1}, \texttt{x2} and \texttt{x3}. \begin{samepage} \emph{Solution:} \begin{code} list X = x1 x2 x3 smpl --no missing X \end{code} \end{samepage} \emph{Comment:} You can now save the file via a \cmd{store} command to preserve a subsampled version of the dataset. Alternative solution based on the \cmd{ok} function, such as \begin{code} list X = x1 x2 x3 genr sel = ok(X) smpl sel --restrict \end{code} are perhaps less obvious, but more flexible. Pick your poison. \subsection{``By'' operations} \emph{Problem:} You have a discrete variable \texttt{d} and you want to run some commands (for example, estimate a model) by splitting the sample according to the values of \texttt{d}. \emph{Solution:} \begin{code} matrix vd = values(d) m = rows(vd) loop for i=1..m scalar sel = vd[i] smpl (d=sel) --restrict --replace ols y const x endloop smpl --full \end{code} \emph{Comment:} The main ingredient here is a loop. You can have \app{gretl} perform as many instructions as you want for each value of \texttt{d}, as long as they are allowed inside a loop. Note, however, that if all you want is descriptive statistics, the \cmd{summary} command does have a \option{by} option. \subsection{Adding a time series to a panel} \emph{Problem:} You have a panel dataset (comprising observations of $n$ indidivuals in each of $T$ periods) and you want to add a variable which is available in straight time-series form. For example, you want to add annual CPI data to a panel in order to deflate nominal income figures. In gretl a panel is represented in stacked time-series format, so in effect the task is to create a new variable which holds $n$ stacked copies of the original time series. Let's say the panel comprises 500 individuals observed in the years 1990, 1995 and 2000 ($n=500$, $T=3$), and we have these CPI data in the ASCII file \texttt{cpi.txt}: \begin{code} date cpi 1990 130.658 1995 152.383 2000 172.192 \end{code} What we need is for the CPI variable in the panel to repeat these three values 500 times. \emph{Solution:} Simple! With the panel dataset open in gretl, \begin{code} append cpi.txt \end{code} \emph{Comment:} If the length of the time series is the same as the length of the time dimension in the panel (3 in this example), gretl will perform the stacking automatically. Rather than using the \cmd{append} command you could use the ``Append data'' item under the \textsf{File} menu in the GUI program. For this to work, your main dataset must be recognized as a panel. This can be arranged via the \cmd{setobs} command or the ``Dataset structure'' item under the \textsf{Data} menu. \section{Creating/modifying variables} \subsection{Generating a dummy variable for a specific observation} \emph{Problem:} Generate $d_t = 0$ for all observation but one, for which $d_t = 1$. \emph{Solution:} \begin{code} genr d = (t="1984:2") \end{code} \emph{Comment:} The internal variable \texttt{t} is used to refer to observations in string form, so if you have a cross-section sample you may just use \texttt{d = (t="123")}; of course, if the dataset has data labels, use the corresponding label. For example, if you open the dataset \texttt{mrw.gdt}, supplied with \app{gretl} among the examples, a dummy variable for Italy could be generated via \begin{code} genr DIta = (t="Italy") \end{code} Note that this method does not require scripting at all. In fact, you might as well use the GUI Menu ``Add/Define new variable'' for the same purpose, with the same syntax. \subsection{Generating an ARMA(1,1)} \emph{Problem:} Generate $y_t = 0.9 y_{t-1} + \varepsilon_t - 0.5 \varepsilon_{t-1}$, with $\varepsilon_t \sim N\!I\!I\!D(0,1)$. \emph{Recommended solution:} \begin{code} alpha = 0.9 theta = -0.5 series y = filter(normal(), {1, theta}, alpha) \end{code} \emph{``Bread and butter'' solution:} \begin{code} alpha = 0.9 theta = -0.5 series e = normal() series y = 0 series y = alpha * y(-1) + e + theta * e(-1) \end{code} \emph{Comment:} The \cmd{filter} function is specifically designed for this purpose so in most cases you'll want to take advantage of its speed and flexibility. That said, in some cases you may want to generate the series in a manner which is more transparent (maybe for teaching purposes). In the second solution, the statement \cmd{series y = 0} is necessary because the next statement evaluates \texttt{y} recursively, so \texttt{y[1]} must be set. Note that you must use the keyword \texttt{series} here instead of writing \texttt{genr y = 0} or simply \texttt{y = 0}, to ensure that \texttt{y} is a series and not a scalar. \subsection{Recoding a variable} \emph{Problem:} You want to recode a variable by classes. For example, you have the age of a sample of individuals ($x_i$) and you need to compute age classes ($y_i$) as \begin{eqnarray*} y_i = 1 & \mathrm{for} & x_i < 18 \\ y_i = 2 & \mathrm{for} & 18 \le x_i < 65 \\ y_i = 3 & \mathrm{for} & x_i \ge 65 \end{eqnarray*} \emph{Solution:} \begin{code} series y = 1 + (x >= 18) + (x >= 65) \end{code} \emph{Comment:} True and false expressions are evaluated as 1 and 0 respectively, so they can be manipulated algebraically as any other number. The same result could also be achieved by using the conditional assignment operator (see below), but in most cases it would probably lead to more convoluted constructs. \subsection{Conditional assignment} \emph{Problem:} Generate $y_t$ via the following rule: \[ y_t = \left\{ \begin{array}{ll} x_t & \mathrm{for} \quad d_t > a \\ z_t & \mathrm{for} \quad d_t \le a \end{array} \right. \] \emph{Solution:} \begin{code} series y = (d > a) ? x : z \end{code} \emph{Comment:} There are several alternatives to the one presented above. One is a brute force solution using loops. Another one, more efficient but still suboptimal, would be \begin{code} series y = (d>a)*x + (d<=a)*z \end{code} However, the ternary conditional assignment operator is not only the most numerically efficient way to accomplish what we want, it is also remarkably transparent to read when one gets used to it. Some readers may find it helpful to note that the conditional assignment operator works exactly the same way as the \texttt{=IF()} function in spreadsheets. \subsection{Generating a time index for panel datasets} \emph{Problem:} \app{Gretl} has a \texttt{\$unit} accessor, but not the equivalent for time. What should I use? \emph{Solution:} \begin{code} series x = time \end{code} \emph{Comment:} The special construct \cmd{genr time} and its variants are aware of whether a dataset is a panel. \section{Neat tricks} \label{sec:cheat-neat} \subsection{Interaction dummies} \emph{Problem:} You want to estimate the model $y_i = \mathbf{x}_i \beta_1 + \mathbf{z}_i \beta_2 + d_i \beta_3 + (d_i \cdot \mathbf{z}_i) \beta_4 + \varepsilon_t$, where $d_i$ is a dummy variable while $\mathbf{x}_i$ and $\mathbf{z}_i$ are vectors of explanatory variables. \emph{Solution:} \begin{code} list X = x1 x2 x3 list Z = z1 z2 list dZ = null loop foreach i Z series d$i = d * $i list dZ = dZ d$i endloop ols y X Z d dZ \end{code} %$ \emph{Comment:} It's amazing what string substitution can do for you, isn't it? \subsection{Realized volatility} \emph{Problem:} Given data by the minute, you want to compute the ``realized volatility'' for the hour as $RV_t = \frac{1}{60} \sum_{\tau=1}^{60} y_{t:\tau}^2$. Imagine your sample starts at time 1:1. \emph{Solution:} \begin{code} smpl --full genr time genr minute = int(time/60) + 1 genr second = time % 60 setobs minute second --panel genr rv = psd(y)^2 setobs 1 1 smpl second=1 --restrict store foo rv \end{code} \emph{Comment:} Here we trick \app{gretl} into thinking that our dataset is a panel dataset, where the minutes are the ``units'' and the seconds are the ``time''; this way, we can take advantage of the special function \cmd{psd()}, panel standard deviation. Then we simply drop all observations but one per minute and save the resulting data (\texttt{store foo rv} translates as ``store in the \app{gretl} datafile \texttt{foo.gdt} the series \texttt{rv}''). \subsection{Looping over two paired lists} \emph{Problem:} Suppose you have two lists with the same number of elements, and you want to apply some command to corresponding elements over a loop. \emph{Solution:} \begin{code} list L1 = a b c list L2 = x y z k1 = 1 loop foreach i L1 --quiet k2 = 1 loop foreach j L2 --quiet if k1=k2 ols $i 0 $j endif k2++ endloop k1++ endloop \end{code} \emph{Comment:} The simplest way to achieve the result is to loop over all possible combinations and filter out the unneeded ones via an \texttt{if} condition, as above. That said, in some cases variable names can help. For example, if \begin{code} list Lx = x1 x2 x3 list Ly = y1 y2 y3 \end{code} looping over the integers is quite intuitive and certainly more elegant: \begin{code} loop for i=1..3 ols y$i const x$i endloop \end{code} %%% Local Variables: %%% mode: latex %%% TeX-master: "gretl-guide" %%% End: