\chapter{Cheat sheet} \label{chap:cheatsheet} This chapter explains how to perform some common --- and some not so common --- tasks in \app{gretl}'s scripting language. Some but not all of the techniques listed here are also available through the graphical interface. Although the graphical interface may be more intuitive and less intimidating at first, we encourage users to take advantage of the power of \app{gretl}'s scripting language as soon as they feel comfortable with the program. \section{Dataset handling} \subsection{``Weird'' periodicities} \emph{Problem:} You have data sampled each 3 minutes from 9am onwards; you'll probably want to specify the hour as 20 periods. \emph{Solution:} \begin{code} setobs 20 9:1 --special \end{code} \emph{Comment:} Now functions like \texttt{sdiff()} (``seasonal'' difference) or estimation methods like seasonal ARIMA will work as expected. \subsection{Help, my data are backwards!} \emph{Problem:} Gretl expects time series data to be in chronological order (most recent observation last), but you have imported third-party data that are in reverse order (most recent first). \emph{Solution:} \begin{code} setobs 1 1 --cross-section genr sortkey = -obs dataset sortby sortkey setobs 1 1950 --time-series \end{code} \emph{Comment:} The first line is required only if the data currently have a time series interpretation: it removes that interpretation, because (for fairly obvious reasons) the \texttt{dataset sortby} operation is not allowed for time series data. The following two lines reverse the data, using the negative of the built-in index variable \texttt{obs}. The last line is just illustrative: it establishes the data as annual time series, starting in 1950. If you have a dataset that is mostly the right way round, but a particular variable is wrong, you can reverse that variable as follows: \begin{code} genr x = sortby(-obs, x) \end{code} \subsection{Dropping missing observations selectively} \emph{Problem:} You have a dataset with many variables and want to restrict the sample to those observations for which there are no missing observations for the variables \texttt{x1}, \texttt{x2} and \texttt{x3}. \begin{samepage} \emph{Solution:} \begin{code} list X = x1 x2 x3 genr sel = ok(X) smpl sel --restrict \end{code} \end{samepage} \emph{Comment:} You can now save the file via a \texttt{store} command to preserve a subsampled version of the dataset. \subsection{``By'' operations} \emph{Problem:} You have a discrete variable \texttt{d} and you want to run some commands (for example, estimate a model) by splitting the sample according to the values of \texttt{d}. \emph{Solution:} \begin{code} matrix vd = values(d) m = rows(vd) loop for i=1..m scalar sel = vd[i] smpl (d=sel) --restrict --replace ols y const x end loop smpl full \end{code} \emph{Comment:} The main ingredient here is a loop. You can have \app{gretl} perform as many instructions as you want for each value of \texttt{d}, as long as they are allowed inside a loop. \section{Creating/modifying variables} \subsection{Generating a dummy variable for a specific observation} \emph{Problem:} Generate $d_t = 0$ for all observation but one, for which $d_t = 1$. \emph{Solution:} \begin{code} genr d = (t="1984:2") \end{code} \emph{Comment:} The internal variable \texttt{t} is used to refer to observations in string form, so if you have a cross-section sample you may just use \texttt{d = (t="123")}; of course, if the dataset has data labels, use the corresponding label. For example, if you open the dataset \texttt{mrw.gdt}, supplied with \app{gretl} among the examples, a dummy variable for Italy could be generated via \begin{code} genr DIta = (t="Italy") \end{code} Note that this method does not require scripting at all. In fact, you might as well use the GUI Menu ``Add/Define new variable'' for the same purpose, with the same syntax. \subsection{Generating an ARMA(1,1)} \emph{Problem:} Generate $y_t = 0.9 y_{t-1} + \varepsilon_t - 0.5 \varepsilon_{t-1}$, with $\varepsilon_t \sim N\!I\!I\!D(0,1)$. \emph{Solution:} \begin{code} alpha = 0.9 theta = -0.5 series e = normal() series y = 0 series y = alpha * y(-1) + e + theta * e(-1) \end{code} \emph{Comment:} The statement \texttt{series y = 0} is necessary because the next statement evaluates \texttt{y} recursively, so \texttt{y[1]} must be set. Note that you must use the keyword \texttt{series} here instead of writing \texttt{genr y = 0} or simply \texttt{y = 0}, to ensure that \texttt{y} is a series and not a scalar. \subsection{Conditional assignment} \emph{Problem:} Generate $y_t$ via the following rule: \[ y_t = \left\{ \begin{array}{ll} x_t & \mathrm{for} \quad d_t > a \\ z_t & \mathrm{for} \quad d_t \le a \end{array} \right. \] \emph{Solution:} \begin{code} series y = (d > a) ? x : z \end{code} \emph{Comment:} There are several alternatives to the one presented above. One is a brute force solution using loops. Another one, more efficient but still suboptimal, would be \begin{code} series y = (d>a)*x + (d<=a)*z \end{code} However, the ternary conditional assignment operator is not only the most numerically efficient way to accomplish what we want, it is also remarkably transparent to read when one gets used to it. Some readers may find it helpful to note that the conditional assignment operator works exactly the same way as the \texttt{=IF()} function in spreadsheets. \subsection{Generating a time index for panel datasets} \emph{Problem:} \app{Gretl} has a \texttt{\$unit} accessor, but not the equivalent for time. What should I use? \emph{Solution:} \begin{code} series x = time \end{code} \emph{Comment:} The special construct \cmd{genr time} and its variants are aware of whether a dataset is a panel. \section{Neat tricks} \label{sec:cheat-neat} \subsection{Interaction dummies} \emph{Problem:} You want to estimate the model $y_i = \mathbf{x}_i \beta_1 + \mathbf{z}_i \beta_2 + d_i \beta_3 + (d_i \cdot \mathbf{z}_i) \beta_4 + \varepsilon_t$, where $d_i$ is a dummy variable while $\mathbf{x}_i$ and $\mathbf{z}_i$ are vectors of explanatory variables. \emph{Solution:} \begin{code} list X = x1 x2 x3 list Z = z1 z2 list dZ = null loop foreach i Z series d$i = d * $i list dZ = dZ d$i end loop ols y X Z d dZ \end{code} %$ \emph{Comment:} It's amazing what string substitution can do for you, isn't it? \subsection{Realized volatility} \emph{Problem:} Given data by the minute, you want to compute the ``realized volatility'' for the hour as $RV_t = \frac{1}{60} \sum_{\tau=1}^{60} y_{t:\tau}^2$. Imagine your sample starts at time 1:1. \emph{Solution:} \begin{code} smpl full genr time genr minute = int(time/60) + 1 genr second = time % 60 setobs minute second --panel genr rv = psd(y)^2 setobs 1 1 smpl second=1 --restrict store foo rv \end{code} \emph{Comment:} Here we trick \app{gretl} into thinking that our dataset is a panel dataset, where the minutes are the ``units'' and the seconds are the ``time''; this way, we can take advantage of the special function \texttt{psd()}, panel standard deviation. Then we simply drop all observations but one per minute and save the resulting data (\texttt{store foo rv} translates as ``store in the \app{gretl} datafile \texttt{foo.gdt} the series \texttt{rv}''). \subsection{Looping over two paired lists} \emph{Problem:} Suppose you have two lists with the same number of elements, and you want to apply some command to corresponding elements over a loop. \emph{Solution:} \begin{code} list L1 = a b c list L2 = x y z k1 = 1 loop foreach i L1 --quiet k2 = 1 loop foreach j L2 --quiet if k1=k2 ols $i 0 $j endif k2++ end loop k1++ end loop \end{code} \emph{Comment:} The simplest way to achieve the result is to loop over all possible combinations and filter out the unneeded ones via an \texttt{if} condition, as above. That said, in some cases variable names can help. For example, if \begin{code} list Lx = x1 x2 x3 list Ly = y1 y2 y3 \end{code} looping over the integers is quite intuitive and certainly more elegant: \begin{code} loop for i=1..3 ols y$i const x$i end loop \end{code} %%% Local Variables: %%% mode: latex %%% TeX-master: "gretl-guide" %%% End: