Error term in logistic regression

View: New views
2 Messages — Rating Filter:   Alert me  

Error term in logistic regression

by K F Pearce :: Rate this Message:

| View Threaded | Show Only this Message

Hello everyone,

The logistic regression model utilises the relationship:

Logit(P)= XB   (1)                 (in matrix formation)

but I would like to ask why an error term does not appear in (1) i.e.
why it is not:

Logit(P)= XB + E

I know that for logistic regression, for a single binomial response,
E(y)=np  and for linear regression:

E(Y)=XB     (2) (in matrix formation)

or Y=XB+E   (in matrix formation)

But I cannot see the link between (1) and (2) i.e. why the error term
does not appear in (1).  Can anyone help?

Many thanks,
Kim

Re: Error term in logistic regression : Replies

by K F Pearce :: Rate this Message:

| View Threaded | Show Only this Message

Hello everyone,

Many thanks to all who replied to my question on the error term in
logistic regression (below).

Here I list a selection of the replies I received.  If anyone has any
comments then feel free to email me.

Many thanks to you all,

Kind Regards,
Kim

>-----Original Message-----
>From: A UK-based worldwide e-mail broadcast system mailing
>list [mailto:allstat@...] On Behalf Of K F Pearce
>Sent: 26 February 2007 09:01
>To: allstat@...
>Subject: Error term in logistic regression
>
>Hello everyone,
>
>The logistic regression model utilises the relationship:
>
>Logit(P)= XB   (1)                 (in matrix formation)
>
>but I would like to ask why an error term does not appear in (1) i.e.
>why it is not:
>
>Logit(P)= XB + E
>
>I know that for logistic regression, for a single binomial
>response, E(y)=np  and for linear regression:
>
>E(Y)=XB     (2) (in matrix formation)
>
>or Y=XB+E   (in matrix formation)
>
>But I cannot see the link between (1) and (2) i.e. why the
>error term does not appear in (1).  Can anyone help?
>
>Many thanks,
>Kim
>

REPLIES:

Dear Kim,

   An excellent question and one that is close to my heart. The truth is
that the second more sensible suggested model below

Logit(P)= XB + E

Makes the vector of Logit(P) a stochastic unobserved latent variable of
size n (where n is sample size). So, instead of Logit(P)= XB where there
are only p unknowns, where p = column size of X or number of elements in
B, the logical model, that you suggest and I also use, now has n+p
unknowns and the likelihood becomes untractable. In short, because the
Logit(P)s are not observed, as opposed to the Ys in a regression model,
it is quite hard to get good information on the E, which is required to
estimate B.

However, there are many modern approaches to deal with unobserved latent
variables, both Bayesian and frequentist. You may like to see my paper
at ANZJS

Gerlach R., Bird R. and Hall A. (2002) ``Bayesian variable selection in
logistic regression: predicting company earnings direction'', Australian
& NZ Journal of Statistics, 2, pp 155-168.

As one Bayesian example that shows good properties, or a recent paper
that includes mis-classification of binomial observations

Gerlach, R. and Stamey, J. ``Bayesian model selection for logistic
regression with misclassified outcomes'', Statistical Modelling: An
International Journal (to appear, accepted 01/2007).

which will appear soon. Hope this helps a little.    

Regards
***********************************************
the error term is not included because, for any particular probability,
the error is fixed.  That is, to describe a normal distribution you need
mean and sd; to describe a binomial distribution, you do not need sd
(although you need N)
**********************************************

In logistic regression, the observation is not p or logit p, it is Y,
and we assume Y follows a Bernoulli distribution with parameter p.
Therefore the error is included in the Y part, not the equation.
Similarly in (2), the error is specified in the distribution of Y, which
is usually Normal, and the equation links one of its parameter (mu) with
X and B.

Hope that's clear.
************************************************
Dear Kim,

There are two good reasons for not including such a term, at least for
uncorrelated data (i.e. assuming absence of nesting or repeated
measures):

First, the binary outcome Y is Bernoulli distributed with expectation P
= exp(XB)/[1 + exp(XB)] and error variance P(1-P), and so there is an
error term, but not in P itself which is E(Y). This is analogous to
linear regression, where the E(Y) = XB without e and Y = XB + e.

Secondly, to my knowledge the variance of such an e in logistic
egression is not identifiable with only one observation per person. This
can be verified by taking the integral of exp(XB + e) / [1 + exp(XB + e)
] over f(e), where f(e) is the normal density function.

Using the following classical approximation: PHI(X) = PSI(1.7X), with
approx error < 0.01, where PHI = cumulative normal, PSI(X) = exp(X) / [1
+ exp(X)],
and integrating out the e-term, you will get again the logit function
but with a different scale parameter in it. So the two models, without
and with e, cannot be distinguished from each other with only 1
observation per person.
p.s.
The first reason is the best one.
On 2nd thought, the identifiability issue can perhaps be resolved by
grouping observations and testing for the presence of what is called
over/underdispersion. But I never dived into the details of that topic.
****************************************************
Dear Kim,
As far as I understand it is an issue about notation.
The GLM models a function of the expected value of Y (mu) using a linear
combination of the X. In the logistic model, the probability P = mu =
E(Y) and that is the reason why the error term does not appear.
Having said that, I have checked three different books discussing GLM's
and in none of them the discussion about the error term for binary
models is terribly clear.
Hope this helps a bit.
Best wishes,
*******************************************************
In logistic regression you are modelling the probability which already
contains the random phenomenum. Logit(p)-p is probability of success.
In Linear regressions, Y=XB+e, the ditributional assumption is that E is
normally distributed with mean zero and variance sigma square. zero mean
indicates expectation of E is zero. So if i take expection of Y =XB+e,
you have E(Y)=E(XB)+E(e) since XB is constant E(XB)=XB but E(e)=0.
Hope this helps.
Cheers.
*******************************************************
Kim

The regression gives the probability of a response: the 'error' arrives
in converting that to an actual response. There is no need to calculate
it (unlike in a nonrmal theory case) because errors are determined by
the properties of the binomial distribution.

Regards
********************************************************
Dear Kim,
The reason is that  the Bernoulli distribution is a single parameter
distribution with variance theta(1-theta) so that given the mean (theta)
all variability is known. On the other hand the Normal distribution is a
two parameter distribution so that given the mean mu the variance is not
known. A similar thing to logistic regression happens with Poisson
regression.

This means that both logistic and (particularly) Poisson regression are
extremely vulnerable to the effects of lack of fit, since there is no
residual sigma term to sweep this up.
In consequence significance can be severely overstated.

Have a look at

1. Robinson LD, Jewell NP. Some surprising results about covariate
adjustment in logistic regression models. International Statistical
Review 1991;58:227-240.

for a discussion about problems that can arise if one tries to use
ordinary regression as a guide to what one should see in logistic
regression.
***************************************************************
In addition this issue is discussed briefly on the web:
http://www-gatago.com/sci/math/symbolic/35699567.html