Sophie: gretl-1.8.6-2mdv2010.1 x86

gretl-1.8.6-2mdv2010.1.x86_64.rpm

\chapter{Discrete and censored dependent variables}
\label{chap:discr-models}

\section{Logit and probit models}
\label{sec:logit-probit}

It often happens that one wants to specify and estimate a model in
which the dependent variable is not continuous, but discrete. A
typical example is a model in which the dependent variable is the
occupational status of an individual (1 = employed, 0 = unemployed). A
convenient way of formalizing this situation is to consider the
variable $y_i$ as a Bernoulli random variable and analyze its
distribution conditional on the explanatory variables $x_i$.  That is,
%
\begin{equation}
  \label{eq:qr-Bernoulli}
  y_i = \left\{ 
    \begin{array}{ll} 
      1 & P_i \\ 0 & 1 - P_i 
    \end{array}
    \right.
\end{equation}
%
where $P_i = P(y_i = 1 | x_i) $ is a given function of the explanatory
variables $x_i$.

In most cases, the function $P_i$ is a cumulative distribution
function $F$, applied to a linear combination of the $x_i$s. In the
probit model, the normal cdf is used, while the logit model employs
the logistic function $\Lambda()$. Therefore, we have
%
\begin{eqnarray}
  \label{eq:qr-link}
  \textrm{probit} & \qquad & P_i = F(z_i) = \Phi(z_i)  \\
  \textrm{logit}  & \qquad & P_i = F(z_i) = \Lambda(z_i) = \frac{1}{1 + e^{-z_i}} \\
  & &z_i = \sum_{j=1}^k x_{ij} \beta_j
\end{eqnarray}
%
where $z_i$ is commonly known as the \emph{index} function. Note that
in this case the coefficients $\beta_j$ cannot be interpreted as the
partial derivatives of $E(y_i | x_i)$ with respect to
$x_{ij}$.  However, for a given value of $x_i$ it is possible to
compute the vector of ``slopes'', that is
\[
  \mathrm{slope}_j(\bar{x}) = \left. \pder{F(z)}{x_j} \right|_{z =
    \bar{z}}
\]
\app{Gretl} automatically computes the slopes, setting each
explanatory variable at its sample mean.

Another, equivalent way of thinking about this model is in terms of
an unobserved variable $y^*_i$ which can be described thus:
%
\begin{equation}
  \label{eq:qr-latent}
  y^*_i = \sum_{j=1}^k x_{ij} \beta_j + \varepsilon_i = z_i  +
  \varepsilon_i 
\end{equation}
%
We observe $y_i = 1$ whenever $y^*_i > 0$ and $y_i = 0$ otherwise. If
$\varepsilon_i$ is assumed to be normal, then we have the probit
model. The logit model arises if we assume that the density function
of $\varepsilon_i$ is
%
\[
  \lambda(\varepsilon_i) =
  \pder{\Lambda(\varepsilon_i)}{\varepsilon_i} =
  \frac{e^{-\varepsilon_i}}{(1 + e^{-\varepsilon_i})^2}
\]

Both the probit and logit model are estimated in \app{gretl} via
maximum likelihood, where the log-likelihood can be written as
\begin{equation}
  \label{eq:qr-loglik}
  L(\beta) = \sum_{y_i=0} \ln [ 1 - F(z_i)] + \sum_{y_i=1} \ln F(z_i),
\end{equation}
which is always negative, since $0 < F(\cdot) < 1$.  Since the score
equations do not have a closed form solution, numerical optimization
is used. However, in most cases this is totally transparent to the
user, since usually only a few iterations are needed to ensure
convergence. The \option{verbose} switch can be used to track the
maximization algorithm.

\begin{script}[htbp]
  \caption{Estimation of simple logit and probit models}
  \label{simple-QR}
\begin{scode}
open greene19_1

logit GRADE const GPA TUCE PSI
probit GRADE const GPA TUCE PSI
\end{scode}
\end{script}

As an example, we reproduce the results given in Greene (2000),
chapter 21, where the effectiveness of a program for teaching
economics is evaluated by the improvements of students' grades.
Running the code in example \ref{simple-QR} gives the following output:
\begin{code}

Model 1: Logit estimates using the 32 observations 1-32
Dependent variable: GRADE

      VARIABLE       COEFFICIENT        STDERROR      T STAT       SLOPE
                                                                  (at mean)
  const               -13.0213           4.93132      -2.641
  GPA                   2.82611          1.26294       2.238      0.533859   
  TUCE                  0.0951577        0.141554      0.672      0.0179755  
  PSI                   2.37869          1.06456       2.234      0.449339   

  Mean of GRADE = 0.344
  Number of cases 'correctly predicted' = 26 (81.2%)
  f(beta'x) at mean of independent vars = 0.189
  McFadden's pseudo-R-squared = 0.374038
  Log-likelihood = -12.8896
  Likelihood ratio test: Chi-square(3) = 15.4042 (p-value 0.001502)
  Akaike information criterion (AIC) = 33.7793
  Schwarz Bayesian criterion (BIC) = 39.6422
  Hannan-Quinn criterion (HQC) = 35.7227

           Predicted
             0    1
  Actual 0  18    3
         1   3    8

Model 2: Probit estimates using the 32 observations 1-32
Dependent variable: GRADE

      VARIABLE       COEFFICIENT        STDERROR      T STAT       SLOPE
                                                                  (at mean)
  const                -7.45232          2.54247      -2.931
  GPA                   1.62581          0.693883      2.343      0.533347   
  TUCE                  0.0517288        0.0838903     0.617      0.0169697  
  PSI                   1.42633          0.595038      2.397      0.467908   

  Mean of GRADE = 0.344
  Number of cases 'correctly predicted' = 26 (81.2%)
  f(beta'x) at mean of independent vars = 0.328
  McFadden's pseudo-R-squared = 0.377478
  Log-likelihood = -12.8188
  Likelihood ratio test: Chi-square(3) = 15.5459 (p-value 0.001405)
  Akaike information criterion (AIC) = 33.6376
  Schwarz Bayesian criterion (BIC) = 39.5006
  Hannan-Quinn criterion (HQC) = 35.581

           Predicted
             0    1
  Actual 0  18    3
         1   3    8

\end{code}

In this context, the \dollar{uhat} accessor function takes a
special meaning: it returns generalized residuals as defined in
Gourieroux \textit{et al} (1987), which can be interpreted as unbiased
estimators of the latent disturbances $\varepsilon_t$. These are
defined as
%
\begin{equation}
  \label{eq:QR-genres}
  u_i = \left\{
    \begin{array}{ll}
      y_i - \hat{P}_i & \textrm{for the logit model} \\
      y_i\cdot \frac{\phi(\hat{z}_i)}{\Phi(\hat{z}_i)} - 
      ( 1 - y_i ) \cdot \frac{\phi(\hat{z}_i)}{1 - \Phi(\hat{z}_i)}
      & \textrm{for the probit model} \\
    \end{array}
    \right.
\end{equation}

Among other uses, generalized residuals are often used for diagnostic
purposes.  For example, it is very easy to set up an omitted variables
test equivalent to the familiar LM test in the context of a linear
regression; example \ref{QR-add} shows how to perform a variable
addition test.

\begin{script}[htbp]
  \caption{Variable addition test in a probit model}
  \label{QR-add}
\begin{scode}
open greene19_1

probit GRADE const GPA PSI
series u = $uhat 
%$
ols u const GPA PSI TUCE -q
printf "Variable addition test for TUCE:\n"
printf "Rsq * T = %g (p. val. = %g)\n", $trsq, pvalue(X,1,$trsq) 
\end{scode}
\end{script}

\subsection{The perfect prediction problem}
\label{sec:perfpred}

One curious characteristic of logit and probit models is that (quite
paradoxically) estimation is not feasible if a model fits the data
perfectly; this is called the \emph{perfect prediction problem}. The
reason why this problem arises is easy to see by considering equation
(\ref{eq:qr-loglik}): if for some vector $\beta$ and scalar $k$ it's
the case that $z_i < k$ whenever $y_i=0$ and $z_i > k$ whenever
$y_i=1$, the same thing is true for any multiple of $\beta$. Hence,
$L(\beta)$ can be made arbitrarily close to 0 simply by choosing
enormous values for $\beta$. As a consequence, the log-likelihood has
no maximum, despite being bounded.

\app{Gretl} has a mechanism for preventing the algorithm from
iterating endlessly in search of a non-existent maximum. One sub-case
of interest is when the perfect prediction problem arises because of
a single binary explanatory variable. In this case, the offending
variable is dropped from the model and estimation proceeds with the
reduced specification. Nevertheless, it may happen that no single
``perfect classifier'' exists among the regressors, in which case
estimation is simply impossible and the algorithm stops with an
error. This behavior is triggered during the iteration process if
\[
  \stackunder{i: y_i = 0}{\max z_i} \, < \,
  \stackunder{i: y_i = 1}{\min z_i}  
\]
If this happens, unless your model is trivially mis-specified (like
predicting if a country is an oil exporter on the basis of oil
revenues), it is normally a small-sample problem: you probably just
don't have enough data to estimate your model. You may want to drop
some of your explanatory variables.

This problem is well analyzed in Stokes (2004); the results therein
are replicated in the example script \texttt{murder\_rates.inp}. 

\section{Ordered response models}
\label{sec:ordered}

These models constitute a simple variation on ordinary logit/probit
models, and are usually applied when the dependent variable is a
discrete and ordered measurement --- not simply binary, but on an
ordinal rather than an interval scale.  For example, this sort of
model may be applied when the dependent variable is a qualitative
assessment such as ``Good'', ``Average'' and ``Bad''.  

In the general case, consider an ordered response variable, $y$, that
can take on any of the $J+1$ values ${0,1,2,\dots,J}$.  We suppose, as
before, that underlying the observed response is a latent variable,
\[
  y^* = X\beta + \varepsilon = z + \varepsilon
\]
Now define ``cut points'', $\alpha_1 < \alpha_2 < \cdots < \alpha_J$,
such that
%
\begin{equation*}
  \begin{array}{ll}
    y = 0 & \textrm{if } \, y^* \leq \alpha_1 \\
    y = 1 & \textrm{if } \, \alpha_1 < y^* \leq \alpha_2 \\
    \vdots \\
    y = J & \textrm{if } \, y^* > \alpha_J \\
  \end{array}
\end{equation*}

For example, if the response takes on three values there will be two
such cut points, $\alpha_1$ and $\alpha_2$.

The probability that individual $i$ exhibits response $j$, conditional
on the characteristics $x_i$, is then given by
%
\begin{equation}
  \label{eq:QR-ordered}
  P(y_i = j \,|\, x_i) = \left\{
    \begin{array}{ll}
      P(y^* \leq \alpha_1 \,|\, x_i) = F(\alpha_1 - z_i) & \textrm{for }\, j = 0 \\
      P(\alpha_j < y^* \leq \alpha_{j+1} \,|\, x_i) = 
      F(\alpha_{j+1} - z_i) -  F(\alpha_j - z_i) & \textrm{for }\, 0 < j < J \\
      P(y^* > \alpha_J \,|\, x_i) = 1 -  F(\alpha_J - z_i) & \textrm{for }\, j = J 
    \end{array}
    \right.
\end{equation}
%
The unknown parameters $\alpha_j$ are estimated jointly with the
$\beta$s via maximum likelihood.  The $\hat{\alpha}_j$ estimates are
reported by \app{gretl} as \texttt{cut1}, \texttt{cut2} and so on.

In order to apply these models in \app{gretl}, the dependent variable
must either take on only non-negative integer values, or be explicitly
marked as discrete.  (In case the variable has non-integer values, it
will be recoded internally.)  Note that \app{gretl} does not provide a
separate command for ordered models: the \texttt{logit} and
\texttt{probit} commands automatically estimate the ordered version if
the dependent variable is acceptable, but not binary.

Example \ref{ex:oprobit} reproduces the results presented in section
15.10 of Wooldridge (2002a).  The question of interest in this
analysis is what difference it makes, to the allocation of assets
in pension funds, whether individual plan participants have a
choice in the matter.  The response variable is an ordinal measure of
the weight of stocks in the pension portfolio.  Having reported the
results of estimation of the ordered model, Wooldridge illustrates the
effect of the \texttt{choice} variable by reference to an ``average''
participant.  The example script shows how one can compute this effect
in \app{gretl}.  

\begin{script}[htbp]
  \caption{Ordered probit model}
  \label{ex:oprobit}
\begin{scode}
/*
  Replicate the results in Wooldridge, Econometric Analysis of Cross
  Section and Panel Data, section 15.10, using pension-plan data from
  Papke (AER, 1998).

  The dependent variable, pctstck (percent stocks), codes the asset
  allocation responses of "mostly bonds", "mixed" and "mostly stocks"
  as {0, 50, 100}.

  The independent variable of interest is "choice", a dummy indicating
  whether individuals are able to choose their own asset allocations.
*/

open pension.gdt

# demographic characteristics of participant
list DEMOG = age educ female black married
# dummies coding for income level
list INCOME = finc25 finc35 finc50 finc75 finc100 finc101

# Papke's OLS approach
ols pctstck const choice DEMOG INCOME wealth89 prftshr
# save the OLS choice coefficient 
choice_ols = $coeff(choice)

# estimate ordered probit
probit pctstck choice DEMOG INCOME wealth89 prftshr

k = $ncoeff
matrix b = $coeff[1:k-2]
a1 = $coeff[k-1]
a2 = $coeff[k]

/* 
   Wooldridge illustrates the 'choice' effect in the ordered probit 
   by reference to a single, non-black male aged 60, with 13.5 years 
   of education, income in the range $50K - $75K and wealth of $200K, 
   participating in a plan with profit sharing.
*/
matrix X = {60, 13.5, 0, 0, 0, 0, 0, 0, 1, 0, 0, 200, 1}

# with 'choice' = 0
scalar Xb = (0 ~ X) * b
P0 = cdf(N, a1 - Xb)
P50 = cdf(N, a2 - Xb) - P0
P100 = 1 - cdf(N, a2 - Xb)
E0 = 50 * P50 + 100 * P100

# with 'choice' = 1
Xb = (1 ~ X) * b
P0 = cdf(N, a1 - Xb)
P50 = cdf(N, a2 - Xb) - P0
P100 = 1 - cdf(N, a2 - Xb)
E1 = 50 * P50 + 100 * P100

printf "\nWith choice, E(y) = %.2f, without E(y) = %.2f\n", E1, E0
printf "Estimated choice effect via ML = %.2f (OLS = %.2f)\n", E1 - E0,
  choice_ols
\end{scode}
\end{script}

After estimating ordered models, the \dollar{uhat} accessor yields
generalized residuals as in binary models; additionally, the
\dollar{yhat} accessor function returns $\hat{z}_i$, so it is
possible to compute an unbiased estimator of the latent variable
$y^*_i$ simply by adding the two together.

\section{Multinomial logit}
\label{sec:mlogit}

When the dependent variable is not binary and does not have a natural
ordering, \emph{multinomial} models are used.  Multinomial logit is
supported in \app{gretl} via the \verb|--multinomial| option to the
\texttt{logit} command.  Simple models can also be handled via the
\texttt{mle} command (see chapter \ref{chap:mle}). We give here an
example of such a model.  Let the dependent variable, $y_i$, take on
integer values $0,1,\dots p$.  The probability that $y_i = k$ is given
by
\[
  P(y_i = k |  x_i) = \frac{\exp(x_i \beta_k)}{\sum_{j=0}^p \exp(x_i \beta_j)}
\]
For the purpose of identification one of the outcomes must be taken as
the ``baseline''; it is usually assumed that $\beta_0 = 0$, in which case
\[
  P(y_i = k |  x_i) = \frac{\exp(x_i \beta_k)}{1 + \sum_{j=1}^p \exp(x_i \beta_j)} 
\]
and
\[
  P(y_i = 0 |  x_i) = \frac{1}{1 + \sum_{j=1}^p \exp(x_i \beta_j)} .
\]

Example~\ref{ex:mlogit} reproduces Table 15.2 in Wooldridge (2002a),
based on data on career choice from Keane and Wolpin (1997).  The
dependent variable is the occupational status of an individual (0 = in
school; 1 = not in school and not working; 2 = working), and the
explanatory variables are education and work experience (linear and
square) plus a ``black'' binary variable.  The full data set is a
panel; here the analysis is confined to a cross-section for 1987.  For
explanations of the matrix methods employed in the script, see
chapter~\ref{chap:matrices}.

\begin{script}[htbp]
  \caption{Multinomial logit}
  \label{ex:mlogit}
\begin{scode}
function series mlogitlogprobs (series y, matrix X, matrix theta)
  scalar n = max(y)
  scalar k = cols(X)
  matrix b = mshape(theta,k,n)
  matrix tmp = X*b
  series ret = -ln(1 + sumr(exp(tmp)))

  loop for i=1..n --quiet
    series x = tmp[,i]
    ret += (y=$i)? x : 0 
  endloop

  return ret
end function

open keane.gdt

# for the manual mle variant the dep. var. must be 0-based
status = status - 1
# and we must exclude missing values
smpl (year=87 && ok(status)) --restrict

matrix X = { const, educ, exper, expersq, black }
scalar k = cols(X)
matrix theta = zeros(2*k, 1)

mle loglik = mlogitlogprobs(status,X,theta)
  params theta
end mle --hessian

# Compare the built-in command (in this case we don't need
# status to be 0-based, and NAs are handled correctly)
smpl --full
status = status + 1
smpl (year=87) --restrict
logit status 0 educ exper expersq black --multinomial
\end{scode}
%$
\end{script}


\section{The Tobit model}
\label{sec:tobit}

The Tobit model is used when the dependent variable of a model is
\emph{censored}.\footnote{We assume here that censoring occurs from
  below at 0. Censoring from above, or at a point different from zero,
  can be rather easily handled by re-defining the dependent variable
  appropriately. For the more general case of two-sided censoring the
  \cmd{intreg} command may be used (see below).}  
Assume a latent variable $y^*_i$ can be described
as
%
\[
  y^*_i = \sum_{j=1}^k x_{ij} \beta_j + \varepsilon_i ,
\]
%
where $\varepsilon_i \sim N(0,\sigma^2)$. If $y^*_i$ were observable,
the model's parameters could be estimated via ordinary least squares.
On the contrary, suppose that we observe $y_i$, defined as
%
\begin{equation}
  \label{eq:tobit}
  y_i = \left\{ 
    \begin{array}{ll} 
      y^*_i & \mathrm{for} \quad y^*_i > 0 \\ 
      0 & \mathrm{for} \quad y^*_i \le 0 
    \end{array}
    \right. 
\end{equation}
%
In this case, regressing $y_i$ on the $x_i$s does not yield
consistent estimates of the parameters $\beta$, because the
conditional mean $E(y_i|x_i)$ is not equal to $\sum_{j=1}^k x_{ij}
\beta_j$.  It can be shown that restricting the sample to non-zero
observations would not yield consistent estimates either. The solution
is to estimate the parameters via maximum likelihood. The syntax is
simply
%
\begin{code}
tobit depvar indvars
\end{code}

As usual, progress of the maximization algorithm can be tracked via
the \option{verbose} switch, while \dollar{uhat} returns the
generalized residuals. Note that in this case the generalized residual
is defined as $\hat{u}_i = E(\varepsilon_i | y_i = 0)$ for censored
observations, so the familiar equality $\hat{u}_i = y_i - \hat{y}_i$
only holds for uncensored observations, that is, when $y_i>0$.

An important difference between the Tobit estimator and OLS is that
the consequences of non-normality of the disturbance term are much
more severe: non-normality implies inconsistency for the Tobit
estimator. For this reason, the output for the tobit model includes
the Chesher--Irish (1987) normality test by default.

\section{Interval regression}
\label{sec:intreg}

The interval regression model arises when the dependent variable is
unobserved for some (possibly all) observations; what we observe
instead is an interval in which the dependent variable lies.  In other
words, the data generating process is assumed to be
\[
  y^*_i = x_i \beta + \epsilon_i
\] 
but we only know that $m_i \le y^*_i \le M_i$, where the interval may
be left- or right-unbounded (but not both). If $m_i = M_i$, we
effectively observe $y^*_i$ and no information loss occurs. In
practice, each observation belongs to one of four categories:
\begin{enumerate}
\item left-unbounded, when $m_i = -\infty$,
\item right-unbounded, when $M_i = \infty$,
\item bounded, when $-\infty < m_i < M_i <\infty$ and
\item point observations when $m_i = M_i$.
\end{enumerate}

It is interesting to note that this model bears similarities to other
models in several special cases:
\begin{itemize}
\item When all observations are point observations the model trivially
  reduces to the ordinary linear regression model.
\item When $m_i = M_i$ when $y^*_i> 0$, while $m_i = -\infty$ and $M_i
  = 0$ otherwise, we have the Tobit model (see \ref{sec:tobit}).
\item The interval model could be thought of an ordered probit model
  (see \ref{sec:ordered}) in which the cut points (the $\alpha_j$
  coefficients in eq. \ref{eq:QR-ordered}) are observed and don't need
  to be estimated.
\end{itemize}

The \app{gretl} command \texttt{intreg} estimates interval models by
maximum likelihood, assuming normality of the disturbance term
$\epsilon_i$.  Its syntax is
%
\begin{code}
intreg minvar maxvar X
\end{code}
%
where \texttt{minvar} contains the $m_i$ series, with \texttt{NA}s for
left-unbounded observations, and \texttt{maxvar} contains $M_i$, with
\texttt{NA}s for right-unbounded observations.  By default, standard
errors are computed using the negative inverse of the Hessian.  If the
\option{robust} flag is given, then QML or Huber--White standard
errors are calculated instead. In this case the estimated covariance
matrix is a ``sandwich'' of the inverse of the estimated Hessian and
the outer product of the gradient.

If the model specification contains regressors other than just a
constant, the output includes a chi-square statistic for testing the
joint null hypothesis that none of these regressors has any effect on
the outcome.  This is a Wald statistic based on the estimated
covariance matrix.  If you wish to construct a likelihood ratio test,
this is easily done by estimating both the full model and the null
model (containing only the constant), saving the log-likelihood in
both cases via the \dollar{lnl} accessor, and then referring twice the
difference between the two log-likelihoods to the chi-square
distribution with $k$ degrees of freedom, where $k$ is the number of
additional regressors (see the \texttt{pvalue} command in the
\GCR). An example is contained in the sample script \texttt{wtp.inp},
provided with the \app{gretl} distribution.

\begin{script}[htbp]
  \caption{Interval model on artificial data}
  \label{ex:interval}
Input:
\begin{scodebit}
nulldata 100
# generate artificial data
set seed 201449 
x = normal()
epsilon = 0.2*normal()
ystar = 1 + x + epsilon
lo_bound = floor(ystar)
hi_bound = ceil(ystar)

# run the interval model
intreg lo_bound hi_bound const x

# estimate ystar
gen_resid = $uhat
yhat = $yhat + gen_resid
corr ystar yhat 
\end{scodebit}
Output (selected portions):
\begin{scodebit}
Model 1: Interval estimates using the 100 observations 1-100
Lower limit: lo_bound, Upper limit: hi_bound

             coefficient   std. error   t-ratio    p-value 
  ---------------------------------------------------------
  const       0.993762     0.0338325     29.37    1.22e-189 ***
  x           0.986662     0.0319959     30.84    8.34e-209 ***

Chi-square(1)        950.9270   p-value              8.3e-209
Log-likelihood      -44.21258   Akaike criterion     94.42517
Schwarz criterion    102.2407   Hannan-Quinn         97.58824

sigma = 0.223273
Left-unbounded observations: 0
Right-unbounded observations: 0
Bounded observations: 100
Point observations: 0

...

corr(ystar, yhat) = 0.98960092
Under the null hypothesis of no correlation:
 t(98) = 68.1071, with two-tailed p-value 0.0000
\end{scodebit}
\end{script}

As with the probit and Tobit models, after a model has been estimated
the \dollar{uhat} accessor returns the generalized residual, which is
an estimate of $\epsilon_i$: more precisely, it equals $y_i - x_i
\hat{\beta}$ for point observations and $E(\epsilon_i| m_i, M_i, x_i)$
otherwise. Note that it is possible to compute an unbiased predictor
of $y^*_i$ by summing this estimate to $x_i \hat{\beta}$. Script
\ref{ex:interval} shows an example. As a further similarity with
Tobit, the interval regression model may deliver inconsistent
estimates if the disturbances are non-normal; hence, the
Chesher--Irish (1987) test for normality is included by default here
too.


\section{Sample selection model}
\label{sec:heckit}

In the sample selection model (also known as ``Tobit II'' model),
there are two latent variables:
%
\begin{eqnarray}
  \label{eq:heckit1}
  y^*_i & = & \sum_{j=1}^k x_{ij} \beta_j + \varepsilon_i \\
  \label{eq:heckit2}
  s^*_i & = & \sum_{j=1}^p z_{ij} \gamma_j + \eta_i 
\end{eqnarray}
%
and the observation rule is given by
%
\begin{equation}
  \label{eq:tobitII}
  y_i = \left\{ 
    \begin{array}{ll} 
      y^*_i & \mathrm{for} \quad s^*_i > 0 \\ 
      \diamondsuit & \mathrm{for} \quad s^*_i \le 0 
    \end{array}
    \right. 
\end{equation}

In this context, the $\diamondsuit$ symbol indicates that for some
observations we simply do not have data on $y$: $y_i$ may be 0, or
missing, or anything else. A dummy variable $d_i$ is normally used to
set censored observations apart.

One of the most popular applications of this model in econometrics is
a wage equation coupled with a labor force participation equation: we
only observe the wage for the employed. If $y^*_i$ and $s^*_i$ were
(conditionally) independent, there would be no reason not to use OLS
for estimating equation (\ref{eq:heckit1}); otherwise, OLS does not
yield consistent estimates of the parameters $\beta_j$.

Since conditional independence between $y^*_i$ and $s^*_i$ is
equivalent to conditional independence between $\varepsilon_i$ and
$\eta_i$, one may model the co-dependence between $\varepsilon_i$ and
$\eta_i$ as 
\[
  \varepsilon_i = \lambda \eta_i + v_i ;
\]
substituting the above expression in (\ref{eq:heckit1}), you obtain
the model that is actually estimated:
\[
  y_i = \sum_{j=1}^k x_{ij} \beta_j + \lambda \hat{\eta}_i + v_i ,
\]
so the hypothesis that censoring does not matter is equivalent to the
hypothesis $H_0: \lambda = 0$, which can be easily tested.

The parameters can be estimated via maximum likelihood under the
assumption of joint normality of $\varepsilon_i$ and $\eta_i$;
however, a widely used alternative method yields the so-called
\emph{Heckit} estimator, named after Heckman (1979). The procedure can
be briefly outlined as follows: first, a probit model is fit on
equation (\ref{eq:heckit2}); next, the generalized residuals are
inserted in equation (\ref{eq:heckit1}) to correct for the effect of
sample selection.

\app{Gretl} provides the \texttt{heckit} command to carry out
estimation; its syntax is
%
\begin{code}
heckit y X ; d Z
\end{code}
%
where \texttt{y} is the dependent variable, \texttt{X} is a list of
regressors, \texttt{d} is a dummy variable holding 1 for uncensored
observations and \texttt{Z} is a list of explanatory variables for the
censoring equation.

Since in most cases maximum likelihood is the method of
choice, by default \app{gretl} computes ML estimates. The 2-step
Heckit estimates can be obtained by using the \option{two-step}
option. After estimation, the \dollar{uhat} accessor contains the
generalized residuals. As in the ordinary Tobit model, the residuals
equal the difference between actual and fitted $y_i$ only for
uncensored observations (those for which $d_i = 1$).

Example \ref{ex:heckit} shows two estimates from the dataset used in
Mroz (1987): the first one replicates Table 22.7 in Greene
(2003),\footnote{Note that the estimates given by \app{gretl} do not
  coincide with those found in the printed volume.  They do, however,
  match those found on the errata web page for Greene's book:
  \url{http://pages.stern.nyu.edu/~wgreene/Text/Errata/ERRATA5.htm}.}
while the second one replicates table 17.1 in Wooldridge (2002a).

\begin{script}[htbp]
  \caption{Heckit model}
  \label{ex:heckit}
\begin{scode}
open mroz87.gdt

genr EXP2 = AX^2
genr WA2 = WA^2
genr KIDS = (KL6+K618)>0

# Greene's specification

list X = const AX EXP2 WE CIT
list Z = const WA WA2 FAMINC KIDS WE

heckit WW X ; LFP Z --two-step 
heckit WW X ; LFP Z 

# Wooldridge's specification

series NWINC = FAMINC - WW*WHRS
series lww = log(WW)
list X = const WE AX EXP2
list Z = X NWINC WA KL6 K618

heckit lww X ; LFP Z --two-step 
\end{scode}
\end{script}

% \section{Count data}
% \label{sec:poisson}

% also include example script for negative binomial (done in Verbeek
% example files).

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "gretl-guide"
%%% End: