\chapter{Discrete and censored dependent variables} \label{chap:discr-models} \section{Logit and probit models} \label{sec:logit-probit} It often happens that one wants to specify and estimate a model in which the dependent variable is not continuous, but discrete. A typical example is a model in which the dependent variable is the occupational status of an individual (1 = employed, 0 = unemployed). A convenient way of formalizing this situation is to consider the variable $y_i$ as a Bernoulli random variable and analyze its distribution conditional on the explanatory variables $x_i$. That is, % \begin{equation} \label{eq:qr-Bernoulli} y_i = \left\{ \begin{array}{ll} 1 & P_i \\ 0 & 1 - P_i \end{array} \right. \end{equation} % where $P_i = P(y_i = 1 | x_i) $ is a given function of the explanatory variables $x_i$. In most cases, the function $P_i$ is a cumulative distribution function $F$, applied to a linear combination of the $x_i$s. In the probit model, the normal cdf is used, while the logit model employs the logistic function $\Lambda()$. Therefore, we have % \begin{eqnarray} \label{eq:qr-link} \textrm{probit} & \qquad & P_i = F(z_i) = \Phi(z_i) \\ \textrm{logit} & \qquad & P_i = F(z_i) = \Lambda(z_i) = \frac{1}{1 + e^{-z_i}} \\ & &z_i = \sum_{j=1}^k x_{ij} \beta_j \end{eqnarray} % where $z_i$ is commonly known as the \emph{index} function. Note that in this case the coefficients $\beta_j$ cannot be interpreted as the partial derivatives of $E(y_i | x_i)$ with respect to $x_{ij}$. However, for a given value of $x_i$ it is possible to compute the vector of ``slopes'', that is \[ \mathrm{slope}_j(\bar{x}) = \left. \pder{F(z)}{x_j} \right|_{z = \bar{z}} \] \app{Gretl} automatically computes the slopes, setting each explanatory variable at its sample mean. Another, equivalent way of thinking about this model is in terms of an unobserved variable $y^*_i$ which can be described thus: % \begin{equation} \label{eq:qr-latent} y^*_i = \sum_{j=1}^k x_{ij} \beta_j + \varepsilon_i = z_i + \varepsilon_i \end{equation} % We observe $y_i = 1$ whenever $y^*_i > 0$ and $y_i = 0$ otherwise. If $\varepsilon_i$ is assumed to be normal, then we have the probit model. The logit model arises if we assume that the density function of $\varepsilon_i$ is % \[ \lambda(\varepsilon_i) = \pder{\Lambda(\varepsilon_i)}{\varepsilon_i} = \frac{e^{-\varepsilon_i}}{(1 + e^{-\varepsilon_i})^2} \] Both the probit and logit model are estimated in \app{gretl} via maximum likelihood, where the log-likelihood can be written as \begin{equation} \label{eq:qr-loglik} L(\beta) = \sum_{y_i=0} \ln [ 1 - F(z_i)] + \sum_{y_i=1} \ln F(z_i), \end{equation} which is always negative, since $0 < F(\cdot) < 1$. Since the score equations do not have a closed form solution, numerical optimization is used. However, in most cases this is totally transparent to the user, since usually only a few iterations are needed to ensure convergence. The \option{verbose} switch can be used to track the maximization algorithm. \begin{script}[htbp] \caption{Estimation of simple logit and probit models} \label{simple-QR} \begin{scode} open greene19_1 logit GRADE const GPA TUCE PSI probit GRADE const GPA TUCE PSI \end{scode} \end{script} As an example, we reproduce the results given in Greene (2000), chapter 21, where the effectiveness of a program for teaching economics is evaluated by the improvements of students' grades. Running the code in example \ref{simple-QR} gives the following output: \begin{code} Model 1: Logit estimates using the 32 observations 1-32 Dependent variable: GRADE VARIABLE COEFFICIENT STDERROR T STAT SLOPE (at mean) const -13.0213 4.93132 -2.641 GPA 2.82611 1.26294 2.238 0.533859 TUCE 0.0951577 0.141554 0.672 0.0179755 PSI 2.37869 1.06456 2.234 0.449339 Mean of GRADE = 0.344 Number of cases 'correctly predicted' = 26 (81.2%) f(beta'x) at mean of independent vars = 0.189 McFadden's pseudo-R-squared = 0.374038 Log-likelihood = -12.8896 Likelihood ratio test: Chi-square(3) = 15.4042 (p-value 0.001502) Akaike information criterion (AIC) = 33.7793 Schwarz Bayesian criterion (BIC) = 39.6422 Hannan-Quinn criterion (HQC) = 35.7227 Predicted 0 1 Actual 0 18 3 1 3 8 Model 2: Probit estimates using the 32 observations 1-32 Dependent variable: GRADE VARIABLE COEFFICIENT STDERROR T STAT SLOPE (at mean) const -7.45232 2.54247 -2.931 GPA 1.62581 0.693883 2.343 0.533347 TUCE 0.0517288 0.0838903 0.617 0.0169697 PSI 1.42633 0.595038 2.397 0.467908 Mean of GRADE = 0.344 Number of cases 'correctly predicted' = 26 (81.2%) f(beta'x) at mean of independent vars = 0.328 McFadden's pseudo-R-squared = 0.377478 Log-likelihood = -12.8188 Likelihood ratio test: Chi-square(3) = 15.5459 (p-value 0.001405) Akaike information criterion (AIC) = 33.6376 Schwarz Bayesian criterion (BIC) = 39.5006 Hannan-Quinn criterion (HQC) = 35.581 Predicted 0 1 Actual 0 18 3 1 3 8 \end{code} In this context, the \dollar{uhat} accessor function takes a special meaning: it returns generalized residuals as defined in Gourieroux \textit{et al} (1987), which can be interpreted as unbiased estimators of the latent disturbances $\varepsilon_t$. These are defined as % \begin{equation} \label{eq:QR-genres} u_i = \left\{ \begin{array}{ll} y_i - \hat{P}_i & \textrm{for the logit model} \\ y_i\cdot \frac{\phi(\hat{z}_i)}{\Phi(\hat{z}_i)} - ( 1 - y_i ) \cdot \frac{\phi(\hat{z}_i)}{1 - \Phi(\hat{z}_i)} & \textrm{for the probit model} \\ \end{array} \right. \end{equation} Among other uses, generalized residuals are often used for diagnostic purposes. For example, it is very easy to set up an omitted variables test equivalent to the familiar LM test in the context of a linear regression; example \ref{QR-add} shows how to perform a variable addition test. \begin{script}[htbp] \caption{Variable addition test in a probit model} \label{QR-add} \begin{scode} open greene19_1 probit GRADE const GPA PSI series u = $uhat %$ ols u const GPA PSI TUCE -q printf "Variable addition test for TUCE:\n" printf "Rsq * T = %g (p. val. = %g)\n", $trsq, pvalue(X,1,$trsq) \end{scode} \end{script} \subsection{The perfect prediction problem} \label{sec:perfpred} One curious characteristic of logit and probit models is that (quite paradoxically) estimation is not feasible if a model fits the data perfectly; this is called the \emph{perfect prediction problem}. The reason why this problem arises is easy to see by considering equation (\ref{eq:qr-loglik}): if for some vector $\beta$ and scalar $k$ it's the case that $z_i < k$ whenever $y_i=0$ and $z_i > k$ whenever $y_i=1$, the same thing is true for any multiple of $\beta$. Hence, $L(\beta)$ can be made arbitrarily close to 0 simply by choosing enormous values for $\beta$. As a consequence, the log-likelihood has no maximum, despite being bounded. \app{Gretl} has a mechanism for preventing the algorithm from iterating endlessly in search of a non-existent maximum. One sub-case of interest is when the perfect prediction problem arises because of a single binary explanatory variable. In this case, the offending variable is dropped from the model and estimation proceeds with the reduced specification. Nevertheless, it may happen that no single ``perfect classifier'' exists among the regressors, in which case estimation is simply impossible and the algorithm stops with an error. This behavior is triggered during the iteration process if \[ \stackunder{i: y_i = 0}{\max z_i} \, < \, \stackunder{i: y_i = 1}{\min z_i} \] If this happens, unless your model is trivially mis-specified (like predicting if a country is an oil exporter on the basis of oil revenues), it is normally a small-sample problem: you probably just don't have enough data to estimate your model. You may want to drop some of your explanatory variables. This problem is well analyzed in Stokes (2004); the results therein are replicated in the example script \texttt{murder\_rates.inp}. \section{Ordered response models} \label{sec:ordered} These models constitute a simple variation on ordinary logit/probit models, and are usually applied when the dependent variable is a discrete and ordered measurement --- not simply binary, but on an ordinal rather than an interval scale. For example, this sort of model may be applied when the dependent variable is a qualitative assessment such as ``Good'', ``Average'' and ``Bad''. In the general case, consider an ordered response variable, $y$, that can take on any of the $J+1$ values ${0,1,2,\dots,J}$. We suppose, as before, that underlying the observed response is a latent variable, \[ y^* = X\beta + \varepsilon = z + \varepsilon \] Now define ``cut points'', $\alpha_1 < \alpha_2 < \cdots < \alpha_J$, such that % \begin{equation*} \begin{array}{ll} y = 0 & \textrm{if } \, y^* \leq \alpha_1 \\ y = 1 & \textrm{if } \, \alpha_1 < y^* \leq \alpha_2 \\ \vdots \\ y = J & \textrm{if } \, y^* > \alpha_J \\ \end{array} \end{equation*} For example, if the response takes on three values there will be two such cut points, $\alpha_1$ and $\alpha_2$. The probability that individual $i$ exhibits response $j$, conditional on the characteristics $x_i$, is then given by % \begin{equation} \label{eq:QR-ordered} P(y_i = j \,|\, x_i) = \left\{ \begin{array}{ll} P(y^* \leq \alpha_1 \,|\, x_i) = F(\alpha_1 - z_i) & \textrm{for }\, j = 0 \\ P(\alpha_j < y^* \leq \alpha_{j+1} \,|\, x_i) = F(\alpha_{j+1} - z_i) - F(\alpha_j - z_i) & \textrm{for }\, 0 < j < J \\ P(y^* > \alpha_J \,|\, x_i) = 1 - F(\alpha_J - z_i) & \textrm{for }\, j = J \end{array} \right. \end{equation} % The unknown parameters $\alpha_j$ are estimated jointly with the $\beta$s via maximum likelihood. The $\hat{\alpha}_j$ estimates are reported by \app{gretl} as \texttt{cut1}, \texttt{cut2} and so on. In order to apply these models in \app{gretl}, the dependent variable must either take on only non-negative integer values, or be explicitly marked as discrete. (In case the variable has non-integer values, it will be recoded internally.) Note that \app{gretl} does not provide a separate command for ordered models: the \texttt{logit} and \texttt{probit} commands automatically estimate the ordered version if the dependent variable is acceptable, but not binary. Example \ref{ex:oprobit} reproduces the results presented in section 15.10 of Wooldridge (2002a). The question of interest in this analysis is what difference it makes, to the allocation of assets in pension funds, whether individual plan participants have a choice in the matter. The response variable is an ordinal measure of the weight of stocks in the pension portfolio. Having reported the results of estimation of the ordered model, Wooldridge illustrates the effect of the \texttt{choice} variable by reference to an ``average'' participant. The example script shows how one can compute this effect in \app{gretl}. \begin{script}[htbp] \caption{Ordered probit model} \label{ex:oprobit} \begin{scode} /* Replicate the results in Wooldridge, Econometric Analysis of Cross Section and Panel Data, section 15.10, using pension-plan data from Papke (AER, 1998). The dependent variable, pctstck (percent stocks), codes the asset allocation responses of "mostly bonds", "mixed" and "mostly stocks" as {0, 50, 100}. The independent variable of interest is "choice", a dummy indicating whether individuals are able to choose their own asset allocations. */ open pension.gdt # demographic characteristics of participant list DEMOG = age educ female black married # dummies coding for income level list INCOME = finc25 finc35 finc50 finc75 finc100 finc101 # Papke's OLS approach ols pctstck const choice DEMOG INCOME wealth89 prftshr # save the OLS choice coefficient choice_ols = $coeff(choice) # estimate ordered probit probit pctstck choice DEMOG INCOME wealth89 prftshr k = $ncoeff matrix b = $coeff[1:k-2] a1 = $coeff[k-1] a2 = $coeff[k] /* Wooldridge illustrates the 'choice' effect in the ordered probit by reference to a single, non-black male aged 60, with 13.5 years of education, income in the range $50K - $75K and wealth of $200K, participating in a plan with profit sharing. */ matrix X = {60, 13.5, 0, 0, 0, 0, 0, 0, 1, 0, 0, 200, 1} # with 'choice' = 0 scalar Xb = (0 ~ X) * b P0 = cdf(N, a1 - Xb) P50 = cdf(N, a2 - Xb) - P0 P100 = 1 - cdf(N, a2 - Xb) E0 = 50 * P50 + 100 * P100 # with 'choice' = 1 Xb = (1 ~ X) * b P0 = cdf(N, a1 - Xb) P50 = cdf(N, a2 - Xb) - P0 P100 = 1 - cdf(N, a2 - Xb) E1 = 50 * P50 + 100 * P100 printf "\nWith choice, E(y) = %.2f, without E(y) = %.2f\n", E1, E0 printf "Estimated choice effect via ML = %.2f (OLS = %.2f)\n", E1 - E0, choice_ols \end{scode} \end{script} After estimating ordered models, the \dollar{uhat} accessor yields generalized residuals as in binary models; additionally, the \dollar{yhat} accessor function returns $\hat{z}_i$, so it is possible to compute an unbiased estimator of the latent variable $y^*_i$ simply by adding the two together. \section{Multinomial logit} \label{sec:mlogit} When the dependent variable is not binary and does not have a natural ordering, \emph{multinomial} models are used. Multinomial logit is supported in \app{gretl} via the \verb|--multinomial| option to the \texttt{logit} command. Simple models can also be handled via the \texttt{mle} command (see chapter \ref{chap:mle}). We give here an example of such a model. Let the dependent variable, $y_i$, take on integer values $0,1,\dots p$. The probability that $y_i = k$ is given by \[ P(y_i = k | x_i) = \frac{\exp(x_i \beta_k)}{\sum_{j=0}^p \exp(x_i \beta_j)} \] For the purpose of identification one of the outcomes must be taken as the ``baseline''; it is usually assumed that $\beta_0 = 0$, in which case \[ P(y_i = k | x_i) = \frac{\exp(x_i \beta_k)}{1 + \sum_{j=1}^p \exp(x_i \beta_j)} \] and \[ P(y_i = 0 | x_i) = \frac{1}{1 + \sum_{j=1}^p \exp(x_i \beta_j)} . \] Example~\ref{ex:mlogit} reproduces Table 15.2 in Wooldridge (2002a), based on data on career choice from Keane and Wolpin (1997). The dependent variable is the occupational status of an individual (0 = in school; 1 = not in school and not working; 2 = working), and the explanatory variables are education and work experience (linear and square) plus a ``black'' binary variable. The full data set is a panel; here the analysis is confined to a cross-section for 1987. For explanations of the matrix methods employed in the script, see chapter~\ref{chap:matrices}. \begin{script}[htbp] \caption{Multinomial logit} \label{ex:mlogit} \begin{scode} function series mlogitlogprobs (series y, matrix X, matrix theta) scalar n = max(y) scalar k = cols(X) matrix b = mshape(theta,k,n) matrix tmp = X*b series ret = -ln(1 + sumr(exp(tmp))) loop for i=1..n --quiet series x = tmp[,i] ret += (y=$i)? x : 0 endloop return ret end function open keane.gdt # for the manual mle variant the dep. var. must be 0-based status = status - 1 # and we must exclude missing values smpl (year=87 && ok(status)) --restrict matrix X = { const, educ, exper, expersq, black } scalar k = cols(X) matrix theta = zeros(2*k, 1) mle loglik = mlogitlogprobs(status,X,theta) params theta end mle --hessian # Compare the built-in command (in this case we don't need # status to be 0-based, and NAs are handled correctly) smpl --full status = status + 1 smpl (year=87) --restrict logit status 0 educ exper expersq black --multinomial \end{scode} %$ \end{script} \section{The Tobit model} \label{sec:tobit} The Tobit model is used when the dependent variable of a model is \emph{censored}.\footnote{We assume here that censoring occurs from below at 0. Censoring from above, or at a point different from zero, can be rather easily handled by re-defining the dependent variable appropriately. For the more general case of two-sided censoring the \cmd{intreg} command may be used (see below).} Assume a latent variable $y^*_i$ can be described as % \[ y^*_i = \sum_{j=1}^k x_{ij} \beta_j + \varepsilon_i , \] % where $\varepsilon_i \sim N(0,\sigma^2)$. If $y^*_i$ were observable, the model's parameters could be estimated via ordinary least squares. On the contrary, suppose that we observe $y_i$, defined as % \begin{equation} \label{eq:tobit} y_i = \left\{ \begin{array}{ll} y^*_i & \mathrm{for} \quad y^*_i > 0 \\ 0 & \mathrm{for} \quad y^*_i \le 0 \end{array} \right. \end{equation} % In this case, regressing $y_i$ on the $x_i$s does not yield consistent estimates of the parameters $\beta$, because the conditional mean $E(y_i|x_i)$ is not equal to $\sum_{j=1}^k x_{ij} \beta_j$. It can be shown that restricting the sample to non-zero observations would not yield consistent estimates either. The solution is to estimate the parameters via maximum likelihood. The syntax is simply % \begin{code} tobit depvar indvars \end{code} As usual, progress of the maximization algorithm can be tracked via the \option{verbose} switch, while \dollar{uhat} returns the generalized residuals. Note that in this case the generalized residual is defined as $\hat{u}_i = E(\varepsilon_i | y_i = 0)$ for censored observations, so the familiar equality $\hat{u}_i = y_i - \hat{y}_i$ only holds for uncensored observations, that is, when $y_i>0$. An important difference between the Tobit estimator and OLS is that the consequences of non-normality of the disturbance term are much more severe: non-normality implies inconsistency for the Tobit estimator. For this reason, the output for the tobit model includes the Chesher--Irish (1987) normality test by default. \section{Interval regression} \label{sec:intreg} The interval regression model arises when the dependent variable is unobserved for some (possibly all) observations; what we observe instead is an interval in which the dependent variable lies. In other words, the data generating process is assumed to be \[ y^*_i = x_i \beta + \epsilon_i \] but we only know that $m_i \le y^*_i \le M_i$, where the interval may be left- or right-unbounded (but not both). If $m_i = M_i$, we effectively observe $y^*_i$ and no information loss occurs. In practice, each observation belongs to one of four categories: \begin{enumerate} \item left-unbounded, when $m_i = -\infty$, \item right-unbounded, when $M_i = \infty$, \item bounded, when $-\infty < m_i < M_i <\infty$ and \item point observations when $m_i = M_i$. \end{enumerate} It is interesting to note that this model bears similarities to other models in several special cases: \begin{itemize} \item When all observations are point observations the model trivially reduces to the ordinary linear regression model. \item When $m_i = M_i$ when $y^*_i> 0$, while $m_i = -\infty$ and $M_i = 0$ otherwise, we have the Tobit model (see \ref{sec:tobit}). \item The interval model could be thought of an ordered probit model (see \ref{sec:ordered}) in which the cut points (the $\alpha_j$ coefficients in eq. \ref{eq:QR-ordered}) are observed and don't need to be estimated. \end{itemize} The \app{gretl} command \texttt{intreg} estimates interval models by maximum likelihood, assuming normality of the disturbance term $\epsilon_i$. Its syntax is % \begin{code} intreg minvar maxvar X \end{code} % where \texttt{minvar} contains the $m_i$ series, with \texttt{NA}s for left-unbounded observations, and \texttt{maxvar} contains $M_i$, with \texttt{NA}s for right-unbounded observations. By default, standard errors are computed using the negative inverse of the Hessian. If the \option{robust} flag is given, then QML or Huber--White standard errors are calculated instead. In this case the estimated covariance matrix is a ``sandwich'' of the inverse of the estimated Hessian and the outer product of the gradient. If the model specification contains regressors other than just a constant, the output includes a chi-square statistic for testing the joint null hypothesis that none of these regressors has any effect on the outcome. This is a Wald statistic based on the estimated covariance matrix. If you wish to construct a likelihood ratio test, this is easily done by estimating both the full model and the null model (containing only the constant), saving the log-likelihood in both cases via the \dollar{lnl} accessor, and then referring twice the difference between the two log-likelihoods to the chi-square distribution with $k$ degrees of freedom, where $k$ is the number of additional regressors (see the \texttt{pvalue} command in the \GCR). An example is contained in the sample script \texttt{wtp.inp}, provided with the \app{gretl} distribution. \begin{script}[htbp] \caption{Interval model on artificial data} \label{ex:interval} Input: \begin{scodebit} nulldata 100 # generate artificial data set seed 201449 x = normal() epsilon = 0.2*normal() ystar = 1 + x + epsilon lo_bound = floor(ystar) hi_bound = ceil(ystar) # run the interval model intreg lo_bound hi_bound const x # estimate ystar gen_resid = $uhat yhat = $yhat + gen_resid corr ystar yhat \end{scodebit} Output (selected portions): \begin{scodebit} Model 1: Interval estimates using the 100 observations 1-100 Lower limit: lo_bound, Upper limit: hi_bound coefficient std. error t-ratio p-value --------------------------------------------------------- const 0.993762 0.0338325 29.37 1.22e-189 *** x 0.986662 0.0319959 30.84 8.34e-209 *** Chi-square(1) 950.9270 p-value 8.3e-209 Log-likelihood -44.21258 Akaike criterion 94.42517 Schwarz criterion 102.2407 Hannan-Quinn 97.58824 sigma = 0.223273 Left-unbounded observations: 0 Right-unbounded observations: 0 Bounded observations: 100 Point observations: 0 ... corr(ystar, yhat) = 0.98960092 Under the null hypothesis of no correlation: t(98) = 68.1071, with two-tailed p-value 0.0000 \end{scodebit} \end{script} As with the probit and Tobit models, after a model has been estimated the \dollar{uhat} accessor returns the generalized residual, which is an estimate of $\epsilon_i$: more precisely, it equals $y_i - x_i \hat{\beta}$ for point observations and $E(\epsilon_i| m_i, M_i, x_i)$ otherwise. Note that it is possible to compute an unbiased predictor of $y^*_i$ by summing this estimate to $x_i \hat{\beta}$. Script \ref{ex:interval} shows an example. As a further similarity with Tobit, the interval regression model may deliver inconsistent estimates if the disturbances are non-normal; hence, the Chesher--Irish (1987) test for normality is included by default here too. \section{Sample selection model} \label{sec:heckit} In the sample selection model (also known as ``Tobit II'' model), there are two latent variables: % \begin{eqnarray} \label{eq:heckit1} y^*_i & = & \sum_{j=1}^k x_{ij} \beta_j + \varepsilon_i \\ \label{eq:heckit2} s^*_i & = & \sum_{j=1}^p z_{ij} \gamma_j + \eta_i \end{eqnarray} % and the observation rule is given by % \begin{equation} \label{eq:tobitII} y_i = \left\{ \begin{array}{ll} y^*_i & \mathrm{for} \quad s^*_i > 0 \\ \diamondsuit & \mathrm{for} \quad s^*_i \le 0 \end{array} \right. \end{equation} In this context, the $\diamondsuit$ symbol indicates that for some observations we simply do not have data on $y$: $y_i$ may be 0, or missing, or anything else. A dummy variable $d_i$ is normally used to set censored observations apart. One of the most popular applications of this model in econometrics is a wage equation coupled with a labor force participation equation: we only observe the wage for the employed. If $y^*_i$ and $s^*_i$ were (conditionally) independent, there would be no reason not to use OLS for estimating equation (\ref{eq:heckit1}); otherwise, OLS does not yield consistent estimates of the parameters $\beta_j$. Since conditional independence between $y^*_i$ and $s^*_i$ is equivalent to conditional independence between $\varepsilon_i$ and $\eta_i$, one may model the co-dependence between $\varepsilon_i$ and $\eta_i$ as \[ \varepsilon_i = \lambda \eta_i + v_i ; \] substituting the above expression in (\ref{eq:heckit1}), you obtain the model that is actually estimated: \[ y_i = \sum_{j=1}^k x_{ij} \beta_j + \lambda \hat{\eta}_i + v_i , \] so the hypothesis that censoring does not matter is equivalent to the hypothesis $H_0: \lambda = 0$, which can be easily tested. The parameters can be estimated via maximum likelihood under the assumption of joint normality of $\varepsilon_i$ and $\eta_i$; however, a widely used alternative method yields the so-called \emph{Heckit} estimator, named after Heckman (1979). The procedure can be briefly outlined as follows: first, a probit model is fit on equation (\ref{eq:heckit2}); next, the generalized residuals are inserted in equation (\ref{eq:heckit1}) to correct for the effect of sample selection. \app{Gretl} provides the \texttt{heckit} command to carry out estimation; its syntax is % \begin{code} heckit y X ; d Z \end{code} % where \texttt{y} is the dependent variable, \texttt{X} is a list of regressors, \texttt{d} is a dummy variable holding 1 for uncensored observations and \texttt{Z} is a list of explanatory variables for the censoring equation. Since in most cases maximum likelihood is the method of choice, by default \app{gretl} computes ML estimates. The 2-step Heckit estimates can be obtained by using the \option{two-step} option. After estimation, the \dollar{uhat} accessor contains the generalized residuals. As in the ordinary Tobit model, the residuals equal the difference between actual and fitted $y_i$ only for uncensored observations (those for which $d_i = 1$). Example \ref{ex:heckit} shows two estimates from the dataset used in Mroz (1987): the first one replicates Table 22.7 in Greene (2003),\footnote{Note that the estimates given by \app{gretl} do not coincide with those found in the printed volume. They do, however, match those found on the errata web page for Greene's book: \url{http://pages.stern.nyu.edu/~wgreene/Text/Errata/ERRATA5.htm}.} while the second one replicates table 17.1 in Wooldridge (2002a). \begin{script}[htbp] \caption{Heckit model} \label{ex:heckit} \begin{scode} open mroz87.gdt genr EXP2 = AX^2 genr WA2 = WA^2 genr KIDS = (KL6+K618)>0 # Greene's specification list X = const AX EXP2 WE CIT list Z = const WA WA2 FAMINC KIDS WE heckit WW X ; LFP Z --two-step heckit WW X ; LFP Z # Wooldridge's specification series NWINC = FAMINC - WW*WHRS series lww = log(WW) list X = const WE AX EXP2 list Z = X NWINC WA KL6 K618 heckit lww X ; LFP Z --two-step \end{scode} \end{script} % \section{Count data} % \label{sec:poisson} % also include example script for negative binomial (done in Verbeek % example files). %%% Local Variables: %%% mode: latex %%% TeX-master: "gretl-guide" %%% End: