\chapter{Maximum likelihood estimation} \label{chap:mle} \section{Generic ML estimation with gretl} Maximum likelihood estimation is a cornerstone of modern inferential procedures. Gretl provides a way to implement this method for a wide range of estimation problems, by use of the \texttt{mle} command. We give here a few examples. To give a foundation for the examples that follow, we start from a brief reminder on the basics of ML estimation. Given a sample of size $T$, it is possible to define the density function\footnote{We are supposing here that our data are a realization of continuous random variables. For discrete random variables, everything continues to apply by referring to the probability function instead of the density. In both cases, the distribution may be conditional on some exogenous variables.} for the whole sample, namely the joint distribution of all the observations $f(\mathbf{Y} ; \theta)$, where $\mathbf{Y} = \left\{ y_1, \ldots, y_T \right\}$. Its shape is determined by a $k$-vector of unknown parameters $\theta$, which we assume is contained in a set $\Theta$, and which can be used to evaluate the probability of observing a sample with any given characteristics. After observing the data, the values $\mathbf{Y}$ are given, and this function can be evaluated for any legitimate value of $\theta$. In this case, we prefer to call it the \emph{likelihood} function; the need for another name stems from the fact that this function works as a density when we use the $y_t$s as arguments and $\theta$ as parameters, whereas in this context $\theta$ is taken as the function's argument, and the data $\mathbf{Y}$ only have the role of determining its shape. In standard cases, this function has a unique maximum. The location of the maximum is unaffected if we consider the logarithm of the likelihood (or log-likelihood for short): this function will be denoted as \[ \LogLik(\theta) = \log f(\mathbf{Y}; \theta) \] The log-likelihood functions that gretl can handle are those where $\LogLik(\theta)$ can be written as \[ \LogLik(\theta) = \sum_{t=1}^T \ell_t(\theta) \] which is true in most cases of interest. The functions $\ell_t(\theta)$ are called the log-likelihood contributions. Moreover, the location of the maximum is obviously determined by the data $\mathbf{Y}$. This means that the value \begin{equation} \label{eq:maxlik} \hat{\theta}(\mathbf{Y}) = \stackunder{\theta \in \Theta}{\mathrm{Argmax}} \LogLik(\theta) \end{equation} is some function of the observed data (a statistic), which has the property, under mild conditions, of being a consistent, asymptotically normal and asymptotically efficient estimator of $\theta$. Sometimes it is possible to write down explicitly the function $\hat{\theta}(\mathbf{Y})$; in general, it need not be so. In these circumstances, the maximum can be found by means of numerical techniques. These often rely on the fact that the log-likelihood is a smooth function of $\theta$, and therefore on the maximum its partial derivatives should all be 0. The \textsl{gradient vector}, or \textsl{score vector}, is a function that enjoys many interesting statistical properties in its own right; it will be denoted here as $\mathbf{g}(\theta)$. It is a $k$-vector with typical element \[ g_i(\theta) = \frac{\partial\LogLik(\theta)}{\partial\theta_i} = \sum_{t=1}^T \frac{\partial\ell_t(\theta)}{\partial\theta_i} \] Gradient-based methods can be shortly illustrated as follows: \begin{enumerate} \item pick a point $\theta_0 \in \Theta$; \item evaluate $\mathbf{g}(\theta_0)$; \item if $\mathbf{g}(\theta_0)$ is ``small'', stop. Otherwise, compute a direction vector $d(\mathbf{g}(\theta_0))$; \item evaluate $\theta_1 = \theta_0 + d(\mathbf{g}(\theta_0))$; \item substitute $\theta_0$ with $\theta_1$; \item restart from 2. \end{enumerate} Many algorithms of this kind exist; they basically differ from one another in the way they compute the direction vector $d(\mathbf{g}(\theta_0))$, to ensure that $\LogLik(\theta_1) > \LogLik(\theta_0)$ (so that we eventually end up on the maximum). The method \app{gretl} uses to maximize the log-likelihood is a gradient-based algorithm known as the \textbf{BFGS} (Broyden, Fletcher, Goldfarb and Shanno) method. This technique is used in most econometric and statistical packages, as it is well-established and remarkably powerful. Clearly, in order to make this technique operational, it must be possible to compute the vector $\mathbf{g}(\theta)$ for any value of $\theta$. In some cases this vector can be written explicitly as a function of $\mathbf{Y}$. If this is not possible or too difficult the gradient may be evaluated numerically. The choice of the starting value, $\theta_0$, is crucial in some contexts and inconsequential in others. In general, however, it is advisable to start the algorithm from ``sensible'' values whenever possible. If a consistent estimator is available, this is usually a safe and efficient choice: this ensures that in large samples the starting point will be likely close to $\hat{\theta}$ and convergence can be achieved in few iterations. The maxmimum number of iterations allowed for the BFGS procedure, and the relative tolerance for assessing convergence, can be adjusted using the \cmd{set} command: the relevant variables are \verb+bfgs_maxiter+ (default value 500) and \verb+bfgs_toler+ (default value, the machine precision to the power 3/4). \subsection{Covariance matrix and standard errors} By default the covariance matrix of the parameter estimates is based on the Outer Product of the Gradient. That is, \[ \widehat{\mbox{Var}}_{\mbox{\scriptsize OPG}}(\hat{\theta}) = \left(G'(\hat{\theta}) G(\hat{\theta}) \right)^{-1} \] where $G(\hat{\theta})$ is the $T \times k$ matrix of contributions to the gradient. Two other options are available. If the \option{hessian} flag is given, the covariance matrix is computed from a numerical approximation to the Hessian at convergence. If the \option{robust} option is selected, the quasi-ML ``sandwich'' estimator is used: \[ \widehat{\mbox{Var}}_{\mbox{\scriptsize QML}}(\hat{\theta}) = H(\hat{\theta})^{-1} G'(\hat{\theta}) G(\hat{\theta}) H(\hat{\theta})^{-1} \] where $H$ denotes the numerical approximation to the Hessian. \section{Gamma estimation} \label{sec:gamma} Suppose we have a sample of $T$ independent and identically distributed observations from a Gamma distribution. The density function for each observation $x_t$ is \begin{equation} \label{eq:gammadens} f(x_t) = \frac{\alpha^p}{\Gamma(p)} x_t^{p-1} \exp\left({-\alpha x_t}\right) \end{equation} The log-likelihood for the entire sample can be written as the logarithm of the joint density of all the observations. Since these are independent and identical, the joint density is the product of the individual densities, and hence its log is \begin{equation} \label{eq:gammaloglik} \LogLik(\alpha, p) = \sum_{t=1}^T \log \left[ \frac{\alpha^p}{\Gamma(p)} x_t^{p-1} \exp\left({-\alpha x_t}\right) \right] = \sum_{t=1}^T \ell_t \end{equation} where \[ \ell_t = p \cdot \log (\alpha x_t) - \gamma(p) - \log x_t - \alpha x_t \] and $\gamma(\cdot)$ is the log of the gamma function. In order to estimate the parameters $\alpha$ and $p$ via ML, we need to maximize (\ref{eq:gammaloglik}) with respect to them. The corresponding \app{gretl} code snippet is \begin{code} scalar alpha = 1 scalar p = 1 mle logl = p*ln(alpha * x) - lngamma(p) - ln(x) - alpha * x end mle \end{code} The two statements \begin{code} alpha = 1 p = 1 \end{code} are necessary to ensure that the variables \texttt{p} and \texttt{alpha} exist before the computation of \texttt{logl} is attempted. The values of these variables will be changed by the execution of the \texttt{mle} command; upon successful completion, they will be replaced by the ML estimates. The starting value is 1 for both; this is arbitrary and does not matter much in this example (more on this later). The above code can be made more readable, and marginally more efficient, by defining a variable to hold $\alpha \cdot x_t$. This command can be embedded into the \texttt{mle} block as follows: \begin{code} scalar alpha = 1 scalar p = 1 mle logl = p*ln(ax) - lngamma(p) - ln(x) - ax series ax = alpha*x params alpha p end mle \end{code} In this case, it is necessary to include the line \texttt{params alpha p} to set the symbols \texttt{p} and \texttt{alpha} apart from \texttt{ax}, which is a temporarily generated variable and not a parameter to be estimated. In a simple example like this, the choice of the starting values is almost inconsequential; the algorithm is likely to converge no matter what the starting values are. However, consistent method-of-moments estimators of $p$ and $\alpha$ can be simply recovered from the sample mean $m$ and variance $V$: since it can be shown that \[ E(x_t) = p/\alpha \qquad V(x_t) = p/\alpha^2 \] it follows that the following estimators \begin{eqnarray*} \bar{\alpha} & = & m/V \\ \bar{p} & = & m \cdot \bar{\alpha} \end{eqnarray*} are consistent, and therefore suitable to be used as starting point for the algorithm. The gretl script code then becomes \begin{code} scalar m = mean(x) scalar alpha = m/var(x) scalar p = m*alpha mle logl = p*ln(ax) - lngamma(p) - ln(x) - ax series ax = alpha*x params alpha p end mle \end{code} Another thing to note is that sometimes parameters are constrained within certain boundaries: in this case, for example, both $\alpha$ and $p$ must be positive numbers. \app{Gretl} does not check for this: it is the user's responsibility to ensure that the function is always evaluated at an admissible point in the parameter space during the iterative search for the maximum. An effective technique is to define a variable for checking that the parameters are admissible and setting the log-likelihood as undefined if the check fails. An example, which uses the conditional assignment operator, follows: \begin{code} scalar m = mean(x) scalar alpha = m/var(x) scalar p = m*alpha mle logl = check ? p*ln(ax) - lngamma(p) - ln(x) - ax : NA series ax = alpha*x scalar check = (alpha>0) & (p>0) params alpha p end mle \end{code} \section{Stochastic frontier cost function} \label{sec:frontier} When modeling a cost function, it is sometimes worthwhile to incorporate explicitly into the statistical model the notion that firms may be inefficient, so that the observed cost deviates from the theoretical figure not only because of unobserved heterogeneity between firms, but also because two firms could be operating at a different efficiency level, despite being identical under all other respects. In this case we may write \[ C_i = C^*_i + u_i + v_i \] where $C_i$ is some variable cost indicator, $C_i^*$ is its ``theoretical'' value, $u_i$ is a zero-mean disturbance term and $v_i$ is the inefficiency term, which is supposed to be nonnegative by its very nature. A linear specification for $C_i^*$ is often chosen. For example, the Cobb--Douglas cost function arises when $C_i^*$ is a linear function of the logarithms of the input prices and the output quantities. The \emph{stochastic frontier} model is a linear model of the form $y_i = x_i \beta + \varepsilon_i$ in which the error term $\varepsilon_i$ is the sum of $u_i$ and $v_i$. A common postulate is that $u_i \sim N(0,\sigma_u^2)$ and $v_i \sim \left|N(0,\sigma_v^2)\right|$. If independence between $u_i$ and $v_i$ is also assumed, then it is possible to show that the density function of $\varepsilon_i$ has the form: \begin{equation} \label{eq:frontdens} f(\varepsilon_i) = \sqrt{\frac{2}{\pi}} \Phi\left(\frac{\lambda \varepsilon_i}{\sigma}\right) \frac{1}{\sigma} \phi\left(\frac{\varepsilon_i}{\sigma}\right) \end{equation} where $\Phi(\cdot)$ and $\phi(\cdot)$ are, respectively, the distribution and density function of the standard normal, $\sigma = \sqrt{\sigma^2_u + \sigma^2_v}$ and $\lambda = \frac{\sigma_u}{\sigma_v}$. As a consequence, the log-likelihood for one observation takes the form (apart form an irrelevant constant) \[ \ell_t = \log\Phi\left(\frac{\lambda \varepsilon_i}{\sigma}\right) - \left[ \log(\sigma) + \frac{\varepsilon_i^2}{2 \sigma^2} \right] \] Therefore, a Cobb--Douglas cost function with stochastic frontier is the model described by the following equations: \begin{eqnarray*} \log C_i & = & \log C^*_i + \varepsilon_i \\ \log C^*_i & = & c + \sum_{j=1}^m \beta_j \log y_{ij} + \sum_{j=1}^n \alpha_j \log p_{ij} \\ \varepsilon_i & = & u_i + v_i \\ u_i & \sim & N(0,\sigma_u^2) \\ v_i & \sim & \left|N(0,\sigma_v^2)\right| \end{eqnarray*} In most cases, one wants to ensure that the homogeneity of the cost function with respect to the prices holds by construction. Since this requirement is equivalent to $\sum_{j=1}^n \alpha_j = 1$, the above equation for $C^*_i$ can be rewritten as \begin{equation} \label{eq:CobbDouglasFrontier} \log C_i - \log p_{in} = c + \sum_{j=1}^m \beta_j \log y_{ij} + \sum_{j=2}^n \alpha_j (\log p_{ij} - \log p_{in}) + \varepsilon_i \end{equation} The above equation could be estimated by OLS, but it would suffer from two drawbacks: first, the OLS estimator for the intercept $c$ is inconsistent because the disturbance term has a non-zero expected value; second, the OLS estimators for the other parameters are consistent, but inefficient in view of the non-normality of $\varepsilon_i$. Both issues can be addressed by estimating (\ref{eq:CobbDouglasFrontier}) by maximum likelihood. Nevertheless, OLS estimation is a quick and convenient way to provide starting values for the MLE algorithm. Example \ref{cost-estimation} shows how to implement the model described so far. The \texttt{banks91} file contains part of the data used in Lucchetti, Papi and Zazzaro (2001). \begin{script}[htbp] \caption{Estimation of stochastic frontier cost function} \label{cost-estimation} \begin{scode} open banks91 # Cobb-Douglas cost function ols cost const y p1 p2 p3 # Cobb-Douglas cost function with homogeneity restrictions genr rcost = cost - p3 genr rp1 = p1 - p3 genr rp2 = p2 - p3 ols rcost const y rp1 rp2 # Cobb-Douglas cost function with homogeneity restrictions # and inefficiency scalar b0 = $coeff(const) scalar b1 = $coeff(y) scalar b2 = $coeff(rp1) scalar b3 = $coeff(rp2) scalar su = 0.1 scalar sv = 0.1 mle logl = ln(cnorm(e*lambda/ss)) - (ln(ss) + 0.5*(e/ss)^2) scalar ss = sqrt(su^2 + sv^2) scalar lambda = su/sv series e = rcost - b0*const - b1*y - b2*rp1 - b3*rp2 params b0 b1 b2 b3 su sv end mle \end{scode} \end{script} \section{GARCH models} \label{sec:garch} GARCH models are handled by gretl via a native function. However, it is instructive to see how they can be estimated through the \texttt{mle} command. The following equations provide the simplest example of a GARCH(1,1) model: \begin{eqnarray*} y_t & = & \mu + \varepsilon_t \\ \varepsilon_t & = & u_t \cdot \sigma_t \\ u_t & \sim & N(0,1) \\ h_t & = & \omega + \alpha \varepsilon^2_{t-1} + \beta h_{t-1}. \end{eqnarray*} Since the variance of $y_t$ depends on past values, writing down the log-likelihood function is not simply a matter of summing the log densities for individual observations. As is common in time series models, $y_t$ cannot be considered independent of the other observations in our sample, and consequently the density function for the whole sample (the joint density for all observations) is not just the product of the marginal densities. Maximum likelihood estimation, in these cases, is achieved by considering \emph{conditional} densities, so what we maximize is a conditional likelihood function. If we define the information set at time $t$ as \[ F_t = \left\{ y_t, y_{t-1}, \ldots \right\} , \] then the density of $y_t$ conditional on $F_{t-1}$ is normal: \[ y_t | F_{t-1} \sim N\left[ \mu, h_{t} \right]. \] By means of the properties of conditional distributions, the joint density can be factorized as follows \[ f(y_t, y_{t-1}, \ldots) = \left[ \prod_{t=1}^T f(y_t |F_{t-1}) \right] \cdot f(y_0) \] If we treat $y_0$ as fixed, then the term $f(y_0)$ does not depend on the unknown parameters, and therefore the conditional log-likelihood can then be written as the sum of the individual contributions as \begin{equation} \label{eq:garchloglik} \LogLik(\mu,\omega,\alpha,\beta) = \sum_{t=1}^T \ell_t \end{equation} where \[ \ell_t = \log \left[ \frac{1}{\sqrt{h_t}} \phi\left( \frac{y_t - \mu}{\sqrt{h_t}} \right) \right] = - \frac{1}{2} \left[ \log(h_t) + \frac{(y_t - \mu)^2}{h_t} \right] \] The following script shows a simple application of this technique, which uses the data file \texttt{djclose}; it is one of the example dataset supplied with gretl and contains daily data from the Dow Jones stock index. \begin{code} open djclose series y = 100*ldiff(djclose) scalar mu = 0.0 scalar omega = 1 scalar alpha = 0.4 scalar beta = 0.0 mle ll = -0.5*(log(h) + (e^2)/h) series e = y - mu series h = var(y) series h = omega + alpha*(e(-1))^2 + beta*h(-1) params mu omega alpha beta end mle \end{code} \section{Analytical derivatives} \label{sec:anal-der} Computation of the score vector is essential for the working of the BFGS method. In all the previous examples, no explicit formula for the computation of the score was given, so the algorithm was fed numerically evaluated gradients. Numerical computation of the score for the $i$-th parameter is performed via a finite approximation of the derivative, namely \[ \pder{\LogLik(\theta_1, \ldots, \theta_n)}{\theta_i} \simeq \frac{\LogLik(\theta_1, \ldots, \theta_i + h, \ldots, \theta_n) - \LogLik(\theta_1, \ldots, \theta_i - h, \ldots, \theta_n)}{2h} \] where $h$ is a small number. In many situations, this is rather efficient and accurate. However, one might want to avoid the approximation and specify an exact function for the derivatives. As an example, consider the following script: % \begin{code} nulldata 1000 genr x1 = normal() genr x2 = normal() genr x3 = normal() genr ystar = x1 + x2 + x3 + normal() genr y = (ystar > 0) scalar b0 = 0 scalar b1 = 0 scalar b2 = 0 scalar b3 = 0 mle logl = y*ln(P) + (1-y)*ln(1-P) series ndx = b0 + b1*x1 + b2*x2 + b3*x3 series P = cnorm(ndx) params b0 b1 b2 b3 end mle --verbose \end{code} Here, 1000 data points are artificially generated for an ordinary probit model:\footnote{Again, gretl does provide a native \texttt{probit} command (see section \ref{sec:logit-probit}), but a probit model makes for a nice example here.} $y_t$ is a binary variable, which takes the value 1 if $y_t^* = \beta_1 x_{1t} + \beta_2 x_{2t} + \beta_3 x_{3t} + \varepsilon_t > 0$ and 0 otherwise. Therefore, $y_t = 1$ with probability $\Phi(\beta_1 x_{1t} + \beta_2 x_{2t} + \beta_3 x_{3t}) = \pi_t$. The probability function for one observation can be written as \[ P(y_t) = \pi_t^{y_t} ( 1 -\pi_t )^{1-y_t} \] Since the observations are independent and identically distributed, the log-likelihood is simply the sum of the individual contributions. Hence \[ \LogLik = \sum_{t=1}^T y_t \log(\pi_t) + (1 - y_t) \log(1 - \pi_t) \] The \option{verbose} switch at the end of the \texttt{end mle} statement produces a detailed account of the iterations done by the BFGS algorithm. In this case, numerical differentiation works rather well; nevertheless, computation of the analytical score is straightforward, since the derivative $\pder{\LogLik}{\beta_i}$ can be written as \[ \pder{\LogLik}{\beta_i} = \pder{\LogLik}{\pi_t} \cdot \pder{\pi_t}{\beta_i} \] via the chain rule, and it is easy to see that \begin{eqnarray*} \pder{\LogLik}{\pi_t} & = & \frac{y_t}{\pi_t} - \frac{1 - y_t}{1 - \pi_t} \\ \pder{\pi_t}{\beta_i} & = & \phi(\beta_1 x_{1t} + \beta_2 x_{2t} + \beta_3 x_{3t}) \cdot x_{it} \end{eqnarray*} The \texttt{mle} block in the above script can therefore be modified as follows: % \begin{code} mle logl = y*ln(P) + (1-y)*ln(1-P) series ndx = b0 + b1*x1 + b2*x2 + b3*x3 series P = cnorm(ndx) series tmp = dnorm(ndx)*(y/P - (1-y)/(1-P)) deriv b0 = tmp deriv b1 = tmp*x1 deriv b2 = tmp*x2 deriv b3 = tmp*x3 end mle --verbose \end{code} Note that the \texttt{params} statement has been replaced by a series of \texttt{deriv} statements; these have the double function of identifying the parameters over which to optimize and providing an analytical expression for their respective score elements. \section{Debugging ML scripts} \label{sec:mle-debug} We have discussed above the main sorts of statements that are permitted within an \texttt{mle} block, namely % \begin{itemize} \item auxiliary commands to generate helper variables; \item \texttt{deriv} statements to specify the gradient with respect to each of the parameters; and \item a \texttt{params} statement to identify the parameters in case analytical derivatives are not given. \end{itemize} For the purpose of debugging ML estimators one additional sort of statement is allowed: you can print the value of a relevant variable at each step of the iteration. This facility is more restricted then the regular \texttt{print} command. The command word \texttt{print} should be followed by the name of just one variable (a scalar, series or matrix). In the last example above a key variable named \texttt{tmp} was generated, forming the basis for the analytical derivatives. To track the progress of this variable one could add a print statement within the ML block, as in % \begin{code} series tmp = dnorm(ndx)*(y/P - (1-y)/(1-P)) print tmp \end{code} \section{Using functions} \label{sec:mle-func} The \texttt{mle} command allows you to estimate models that \app{gretl} does not provide natively: in some cases, it may be a good idea to wrap up the \texttt{mle} block in a user-defined function (see Chapter \ref{chap:functions}), so as to extend \app{gretl}'s capabilities in a modular and flexible way. As an example, we will take a simple case of a model that \app{gretl} does not yet provide natively: the zero-inflated Poisson model, or ZIP for short.\footnote{The actual ZIP model is in fact a bit more general than the one presented here. The specialized version discussed in this section was chosen for the sake of simplicity. For futher details, see Greene (2003).} In this model, we assume that we observe a mixed population: for some individuals, the variable $y_t$ is (conditionally on a vector of exogenous covariates $x_t$) distributed as a Poisson random variate; for some others, $y_t$ is identically 0. The trouble is, we don't know which category a given individual belongs to. For instance, suppose we have a sample of women, and the variable $y_t$ represents the number of children that woman $t$ has. There may be a certain proportion, $\alpha$, of women for whom $y_t = 0$ with certainty (maybe out of a personal choice, or due to physical impossibility). But there may be other women for whom $y_t = 0$ just as a matter of chance --- they haven't happened to have any children at the time of observation. In formulae: \begin{eqnarray*} P(y_t = k | x_t) & = & \alpha d_t + (1 - \alpha) \left[e^{-\mu_t} \frac{\mu_t^{y_t}}{y_t!}\right] \\ \mu_t & = & \exp(x_t \beta) \\ d_t & = & \left\{ \begin{array}{ll} 1 & \mathrm{for} \quad y_t = 0 \\ 0 & \mathrm{for} \quad y_t > 0 \end{array} \right. \end{eqnarray*} Writing a \texttt{mle} block for this model is not difficult: \begin{code} mle ll = logprob series xb = exp(b0 + b1 * x) series d = (y=0) series poiprob = exp(-xb) * xb^y / gamma(y+1) series logprob = (alpha>0) && (alpha<1) ? \ log(alpha*d + (1-alpha)*poiprob) : NA params alpha b0 b1 end mle -v \end{code} However, the code above has to be modified each time we change our specification by, say, adding an explanatory variable. Using functions, we can simplify this task considerably and eventually be able to write something easy like \begin{code} list X = const x zip(y, X) \end{code} \begin{script}[htbp] \caption{Zero-inflated Poisson Model --- user-level function} \label{mle-zip-main} \begin{scode} /* user-level function: estimate the model and print out the results */ function zip(series y, list X) matrix ret = zip_estimate(y, X) matrix coef = ret[,1] matrix vcv = ret[,2:cols(ret)] printf "\nZero-inflated Poisson model:\n\n" scalar c = coef[1] scalar se = sqrt(vcv[1,1]) scalar zs = c/se scalar pv = 2*pvalue(n, zs) printf " alpha%9.4f%9.4f%8.3f%8.3f\n", c, se, zs, pv k = 2 loop foreach i X -q sprintf s "$i" scalar c = coef[k] scalar se = sqrt(vcv[k,k]) scalar zs = c/se scalar pv = 2*pvalue(n, zs) printf "%10s%9.4f%9.4f%8.3f%8.3f\n", s, c, se, zs, pv k++ end loop end function \end{scode} \end{script} %$ Let's see how this can be done. First we need to define a function called \texttt{zip()} that will take two arguments: a dependent variable \texttt{y} and a list of explanatory variables \texttt{X}. An example of such function can be seen in script~\ref{mle-zip-main}. By inspecting the function code, you can see that the actual estimation does not happen here: rather, the \texttt{zip()} function merely formats and prints out the results coming from another user-written function, namely \texttt{zip\_estimate()}. \begin{script}[htbp] \caption{Zero-inflated Poisson Model --- internal functions} \label{mle-zip-inc} \begin{scode} /* compute the log probabilities for the plain Poisson model */ function ln_poi_prob(series y, list X, matrix beta) series xb = lincomb(X, beta) series ret = -exp(xb) + y*xb - lngamma(y+1) return series ret end function /* compute the log probabilities for the zero-inflated Poisson model */ function ln_zip_prob(series y, list X, matrix beta, scalar p0) # check if the probability is in [0,1]; otherwise, return NA if (p0>1) || (p0<0) series ret = NA else series ret = ln_poi_prob(y, X, beta) + ln(1-p0) series ret = (y=0) ? ln(p0 + exp(ret)) : ret endif return series ret end function /* do the actual estimation (silently) */ function zip_estimate(series y, list X) # initialize alpha to a "sensible" value: half the frequency # of zeros in the sample scalar alpha = mean(y=0)/2 # initialize the coeffs (we assume the first explanatory # variable is the constant here) matrix coef = zeros(nelem(X), 1) coef[1] = mean(y) / (1-alpha) # do the actual ML estimation mle ll = ln_zip_prob(y, X, coef, alpha) params alpha coef end mle --hessian --quiet matrix ret = $coeff ~ $vcv return matrix ret end function \end{scode} \end{script} %$ The function \texttt{zip\_estimate()} is not meant to be executed directly; it just contains the number-crunching part of the job, whose results are then picked up by the end function \texttt{zip()}. In turn, \texttt{zip\_estimate()} calls other user-written functions to perform other tasks. The whole set of ``internal'' functions is shown in the panel \ref{mle-zip-inc}. All the functions shown in \ref{mle-zip-main} and \ref{mle-zip-inc} can be stored in a separate \texttt{inp} file and executed once, at the beginning of our job, by means of the \texttt{include} command. Assuming the name of this script file is \texttt{zip\_est.inp}, the following is an example script which \begin{itemize} \item includes the script file; \item generates a simulated dataset; \item performs the estimation of a ZIP model on the artificial data. \end{itemize} % \begin{script}[htbp] % \caption{Zero-inflated Poisson Model --- example \texttt{.inp} file} % \label{mle-zip-inp} % \begin{scode} \begin{code} set echo off set messages off # include the user-written functions include zip_est.inp # generate the artificial data nulldata 1000 set seed 732237 scalar truep = 0.2 scalar b0 = 0.2 scalar b1 = 0.5 series x = normal() series y = (uniform()<truep) ? 0 : genpois(exp(b0 + b1*x)) list X = const x # estimate the zero-inflated Poisson model zip(y, X) \end{code} % \end{scode} % \end{script} The results are as follows: \begin{code} Zero-inflated Poisson model: alpha 0.2031 0.0238 8.531 0.000 const 0.2570 0.0417 6.162 0.000 x 0.4667 0.0321 14.527 0.000 \end{code} A further step may then be creating a function package for accessing your new \texttt{zip()} function via \app{gretl}'s graphical interface. For details on how to do this, see section \ref{sec:func-packages}. %%% Local Variables: %%% mode: latex %%% TeX-master: "gretl-guide" %%% End: