DHCP Failure on Renew

View: New views
3 Messages — Rating Filter:   Alert me  

DHCP Failure on Renew

by Michael Jones-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

We have been having problems with DHCP enabled client dying after half lease
time for quite a while (Also reported by Ulrich a few days ago).

My observations are that the DHCP thread gets stuck in the DHCPST_REBINDING
state and never sends the request nor yields to any other thread.

I'm still trying to understand to whole state machine, but perhaps somebody
could look at the "else" in dhcpc.c (trunk), line 1714 - IMHO that "else"
should not be there. Also, I'm not quite sure how the DHCPST_REBINDING state
should conclude once it has been entered.

cu,
Michael


_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: DHCP Failure on Renew

by uprinz :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi!

Hm, I am actually working in that direction...
My Problem with DHCP was, that it did not work if you change the network
completely, i.e. unplug the system at home with 192.168.3.0/24 and plug
it in at work with 10.201.32.0/16.

In my opinion there are several problems buriedin DHCP, but let's stick
to the one you saw...

I found this in line 1773:
         else if (dhcpState == DHCPST_REBINDING) {
             retries++;
             if (tmo > dhcpConfig->dyn_rebindTime - (NutGetSeconds() -
leaseTime)) {
                 tmo = dhcpConfig->dyn_rebindTime - NutGetSeconds() -
leaseTime;
             }
             if (dhcpConfig->dyn_rebindTime <= NutGetSeconds() -
leaseTime) {
                 retries = 0;
                 dhcpState = DHCPST_REBINDING;
             }
             /* Broadcast a request for our previous configuration. We
                must not include a server identifier. */
             else if (DhcpBroadcastRequest(sock, bp, xid,
dhcpConfig->dyn_yiaddr, dhcpConfig->dyn_yiaddr, 0, secs) < 0) {


This looks a bit like if the DHCPST_REBINDING is timed out, it stays in
DHCPST_REBINDING in the second if()...
Looks like someone tried to escape this situation by checking for
retries, but the if( retries > n) 'out of here' is missing.
On the other hand, there is a timeout and if that is over one needs to
get out of that loop instead of staying inside.

How did you find out about that problem? Is there a backup DHCP in your
network and the primary failed? Or did you simply leave the box in you
network that long, that the lease expired? I think I remember, that the
EIR did not work after lying around a longer time playing musik. I
simply reset it and it worked again for days. But I never tried to find
out why and I don't know the lease-time the Fritbox assigns to it's
clients.

I'll try that DHCP lease timeout in the next few days. Never checked
that, as my problems were at another situation.

By the way, I modified the DM9000 driver for the EIR and now I can
switch networks with different subnets and subnet-sizes and I get a dhcp
lease... Yes, ouch. Never expected the problem to be that far down in
the OSI model :)

Regards
Ulrich

Michael Jones schrieb:

> Hello,
>
> We have been having problems with DHCP enabled client dying after half lease
> time for quite a while (Also reported by Ulrich a few days ago).
>
> My observations are that the DHCP thread gets stuck in the DHCPST_REBINDING
> state and never sends the request nor yields to any other thread.
>
> I'm still trying to understand to whole state machine, but perhaps somebody
> could look at the "else" in dhcpc.c (trunk), line 1714 - IMHO that "else"
> should not be there. Also, I'm not quite sure how the DHCPST_REBINDING state
> should conclude once it has been entered.
>
> cu,
> Michael
>
>
> _______________________________________________
> http://lists.egnite.de/mailman/listinfo/en-nut-discussion
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: DHCP Failure on Renew

by Michael Jones-11 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

Ulrich wrote:

>How did you find out about that problem? Is there a backup DHCP in your
>network and the primary failed? Or did you simply leave the box in you
>network that long, that the lease expired? I think I remember, that the
>EIR did not work after lying around a longer time playing musik. I
>simply reset it and it worked again for days. But I never tried to find
>out why and I don't know the lease-time the Fritbox assigns to it's
>clients.

>I'll try that DHCP lease timeout in the next few days. Never checked
>that, as my problems were at another situation.

Well, I simply setup an DHCP server with a 1 hour lease time - the system
crash after half an hour so no big challenge.

As far as I can see the DHCP code is a little overcomplicated and the main
codes readability could have benefited from being stuck in a switch/case
construct.

Any way; I found two problematic points in the code so far. The First, in
line 1708 will cause the DhcpBroadcastRequest call, which is essential for
keeping the state machine running, to not being called as at the time that
this is executed it is already '<=' leaseTime.

So changing the condition to < might help. But also the DhcpBroadcastRequest
should, in my opinion, always be called to make sure the state machine does
not go stale - hence remove the 'else' in line 1714. Also, recalling
NutGetSeconds() again and again throughout the routine also seems like a
waste and induces a possible race condition as the second might just tick
over between to calls.


        else if (dhcpState == DHCPST_RENEWING) {
            retries++;
            if (tmo / 1000 > dhcpConfig->dyn_rebindTime - (NutGetSeconds() -
leaseTime)) {
                tmo = dhcpConfig->dyn_rebindTime - (NutGetSeconds() -
leaseTime) * 1000;
            }
1708:       if (dhcpConfig->dyn_rebindTime <= NutGetSeconds() - leaseTime) {
                retries = 0;
                dhcpState = DHCPST_REBINDING;
            }
            /* Send a request to our leasing server. We must not include
               the server identifier. */
            else if (DhcpSendRequest(sock, dhcpConfig->dyn_sid, bp, xid,
dhcpConfig->dyn_yiaddr, dhcpConfig->dyn_yiaddr, 0, secs) <
                     0) {
                /* Unicast transmit error. */
                retries = 0;
                dhcpState = DHCPST_REBINDING;


A bit lower we encounter the same. Here, I feel that the whole leaseTime
'if' is completely de-placed


        else if (dhcpState == DHCPST_REBINDING) {
            retries++;
            if (tmo > dhcpConfig->dyn_rebindTime - (NutGetSeconds() -
leaseTime)) {
                tmo = dhcpConfig->dyn_rebindTime - NutGetSeconds() -
leaseTime;
            }
1746:       if (dhcpConfig->dyn_rebindTime <= NutGetSeconds() - leaseTime) {
                retries = 0;
                dhcpState = DHCPST_REBINDING;
            }
            /* Broadcast a request for our previous configuration. We
               must not include a server identifier. */
            else if (DhcpBroadcastRequest(sock, bp, xid,
dhcpConfig->dyn_yiaddr, dhcpConfig->dyn_yiaddr, 0, secs) < 0) {
                /* Fatal transmit error on broadcast. Give up. */
                dhcpState = DHCPST_IDLE;

My current implementation with simply removing both 'else' works like a
charm and I have not had any negative side effects. But, my plan is fully
understand the routine and then to re-write and simplify the routine...

Cu,
Michael

_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion