Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

View: New views
13 Messages — Rating Filter:   Alert me  

Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Rick Macklem :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I can now reproduce what I think others were seeing as slow reconnects
when using NFSv3 over TCP against a server that disconnects inactive
TCP connections. I have had some luck figuring out what is going on
and can reproduce it fairly easily, but I really need help from someone
who understands the FreeBSD TCP stack.

Here's what happens when things break:
- the krpc client does a reconnect like normal and the 3 way handshake
   works for the new socket.
- the client sends an RPC on the new socket
- at about the same time as this data send, the FreeBSD client sends a
   reset (yes, that's correct, an RST).
--> not surprisingly, the server closes down the new connection and
     things get stuck until a timeout and another reconnect get things
     going again.

This can be seen by the snoop trace that follows, with the RST at
packet #18. (If there isn't a client->server RST packet generated
by FreeBSD, the new connection works fine.)

What do I know about this from printfs in the code:
- It seems to happen when the new socket is at the same address as
   the old one. (Not a bug. Just happens that uma_zalloc() allocates
   the same memory as the old one that has been soclose()'d.)
- It is timing related. If I add too many printfs, I can't reproduce
   the problem.

My TCP is really rusty, but my theory is that some timer has been
set by the FIN sent from the server for the old socket and it is
still working on sending the RST when, somehow, the new socket and
its tcp pcb get used for the send?

Anyone out there able to help? (Please, please..)

Here's a snoop trace of one of these (sorry about the long lines).
You can see a successful reconnect, a Fin from the server disconnecting
after 360sec and then the new connection being created with port#871.
Then, at packet #18, there's that pesky RST!!
--- snoop trace of FreeBSD-current client nfsv4-test mounted against
     a Solaris10 server called nfsv4-solaris. (The mount is the regular
     client, although that shouldn't matter, using NFSv3 over TCP.)
   1   0.00000 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=864 Fin Ack=2981056244 Seq=3573648276 Len=0 Win=16588 Options=<nop,nop,tstamp 2883094 1218858>
   2   0.00010 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=740 Syn Seq=729399224 Len=0 Win=65535 Options=<mss 1460,nop,wscale 3,sackOK,tstamp 2883094 0>
   3   0.00054 nfsv4-solaris -> nfsv4-test.cis.uoguelph.ca TCP D=864 S=2049 Rst Seq=2981056244 Len=0 Win=0
   4   0.00055 nfsv4-solaris -> nfsv4-test.cis.uoguelph.ca TCP D=740 S=2049 Syn Ack=729399225 Seq=38857223 Len=0 Win=49232 Options=<nop,nop,tstamp 11852 2883094,mss 1460,nop,wscale 0,nop,nop,sackOK>
   5   0.00086 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=740 Ack=38857224 Seq=729399225 Len=0 Win=8326 Options=<nop,nop,tstamp 2883095 11852>
   6   0.00104 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris NFS C FSSTAT3 FH=9D01
   7   0.00180 nfsv4-solaris -> nfsv4-test.cis.uoguelph.ca TCP D=740 S=2049 Ack=729399357 Seq=38857224 Len=0 Win=49100 Options=<nop,nop,tstamp 11852 2883095>
   8   0.00295 nfsv4-solaris -> nfsv4-test.cis.uoguelph.ca NFS R FSSTAT3 OK
   9   0.10350 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=740 Ack=38857396 Seq=729399357 Len=0 Win=16588 Options=<nop,nop,tstamp 2883198 11853>
  10 360.00514 nfsv4-solaris -> nfsv4-test.cis.uoguelph.ca TCP D=740 S=2049 Fin Ack=729399357 Seq=38857396 Len=0 Win=49232 Options=<nop,nop,tstamp 47853 2883198>
  11 360.00549 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=740 Rst Ack=38857397 Seq=729399357 Len=0 Win=0
  12 360.00557 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=740 Rst Ack=38857397 Seq=729399357 Len=0 Win=0
  13 360.00586 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=740 Ack=38857397 Seq=729399357 Len=0 Win=16588 Options=<nop,nop,tstamp 3242021 47853>
  14 360.00594 nfsv4-solaris -> nfsv4-test.cis.uoguelph.ca TCP D=740 S=2049 Rst Seq=38857397 Len=0 Win=0
  15 495.15000 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=871 Syn Seq=369106877 Len=0 Win=65535 Options=<mss 1460,nop,wscale 3,sackOK,tstamp 3376760 0>
  16 495.15013 nfsv4-solaris -> nfsv4-test.cis.uoguelph.ca TCP D=871 S=2049 Syn Ack=369106878 Seq=159825643 Len=0 Win=49232 Options=<nop,nop,tstamp 61367 3376760,mss 1460,nop,wscale 0,nop,nop,sackOK>
  17 495.15040 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=871 Ack=159825644 Seq=369106878 Len=0 Win=8326 Options=<nop,nop,tstamp 3376761 61367>
  18 495.15089 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=871 Rst Ack=159825644 Seq=369106878 Len=0 Win=0
  19 495.15162 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris NFS C FSSTAT3 FH=9D01
  20 495.15265 nfsv4-solaris -> nfsv4-test.cis.uoguelph.ca TCP D=871 S=2049 Rst Seq=159825644 Len=0 Win=0
  21 495.15305 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=883 Syn Seq=3875668538 Len=0 Win=65535 Options=<mss 1460,nop,wscale 3,sackOK,tstamp 3376763 0>
  22 495.15322 nfsv4-solaris -> nfsv4-test.cis.uoguelph.ca TCP D=883 S=2049 Syn Ack=3875668539 Seq=159964181 Len=0 Win=49232 Options=<nop,nop,tstamp 61367 3376763,mss 1460,nop,wscale 0,nop,nop,sackOK>
  23 495.15348 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=883 Ack=159964182 Seq=3875668539 Len=0 Win=8326 Options=<nop,nop,tstamp 3376764 61367>
  24 495.15368 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris NFS C FSSTAT3 FH=9D01
  25 495.15374 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=883 Rst Ack=159964182 Seq=3875668539 Len=0 Win=0
  26 495.15427 nfsv4-solaris -> nfsv4-test.cis.uoguelph.ca TCP D=883 S=2049 Ack=3875668671 Seq=159964182 Len=0 Win=49100 Options=<nop,nop,tstamp 61367 3376764>
  27 495.15457 nfsv4-test.cis.uoguelph.ca -> nfsv4-solaris TCP D=2049 S=883 Rst Ack=159964182 Seq=3875668671 Len=0 Win=0

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Parent Message unknown Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Rick Macklem :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Rick Macklem wrote:
> I can now reproduce what I think others were seeing as slow reconnects
> when using NFSv3 over TCP against a server that disconnects inactive
> TCP connections. I have had some luck figuring out what is going on
> and can reproduce it fairly easily, but I really need help from someone
> who understands the FreeBSD TCP stack.
>
Ok, I haven't made much progress on this, but here's what little I
currently know about it.

The problem occurs after a server has dropped an inactive TCP connection
for an NFS over TCP mount (in my case a Solaris10 server). When the client
does a new connection it, for some reason, sends a RST at almost exactly
the same time as the first RPC request on the new TCP connection, causing
the server to shut it down.

Ok, things I now know don't affect this are:
- doing the soshutdown(), soclose() on the old connection. I commented
   them out and it still happened.
- Avoiding the sobind() on the new connection, done before the
   soconnect().
- Using a non-reserved port#.
(The above tests shot down pretty well all the "theories" I could come up
with.)

The only thing I've found that avoids the problem:
- putting a 2sec delay right before the soconnect() call. (A 1sec delay
   made it hard to reproduce and I've never reproduced it yet with a 2sec
   delay.)
   Not much of a fix, though.

Now, here's where someone may be able to help?

Grep'ng around, I found 4 places where the TCP stack called ip_output()
(one in each of tcp_output.c, tcp_subr.c, tcp_timewait.c and
tcp_syncache.c) and I put a printf like this just before them:
  if (flags & TH_RST)
  printf("sent a reset\n");

  (The exact format varies for each, because of where the TCP
          header flags are and have different printf messages.)

Now, the weird part is, that when the extraneous RST is sent to the
server, I don't get any printf. (I do get a few of these, but at other
times for what appear to be legitimate RSTs.)

I can't see anywhere else that the TCP stack would send an RST and, so,
I'm stuck w.r.t. figuring out what is sending them?

Anyone know of another place the TCP stack would make the send happen?
(Or is it queued earlier when I see the printf message, and then the
send is "triggered" inside the ip layer when the first data is sent on
the new connection?)

rick, who is getting sick of looking at this:-)

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Rui Paulo-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 5 Nov 2009, at 16:36, Rick Macklem wrote:

>
>
> Rick Macklem wrote:
>> I can now reproduce what I think others were seeing as slow  
>> reconnects
>> when using NFSv3 over TCP against a server that disconnects inactive
>> TCP connections. I have had some luck figuring out what is going on
>> and can reproduce it fairly easily, but I really need help from  
>> someone
>> who understands the FreeBSD TCP stack.
>>
> Ok, I haven't made much progress on this, but here's what little I
> currently know about it.
>
> The problem occurs after a server has dropped an inactive TCP  
> connection
> for an NFS over TCP mount (in my case a Solaris10 server). When the  
> client
> does a new connection it, for some reason, sends a RST at almost  
> exactly
> the same time as the first RPC request on the new TCP connection,  
> causing
> the server to shut it down.
>
> Ok, things I now know don't affect this are:
> - doing the soshutdown(), soclose() on the old connection. I commented
>  them out and it still happened.
> - Avoiding the sobind() on the new connection, done before the
>  soconnect().
> - Using a non-reserved port#.
> (The above tests shot down pretty well all the "theories" I could  
> come up
> with.)
>
> The only thing I've found that avoids the problem:
> - putting a 2sec delay right before the soconnect() call. (A 1sec  
> delay
>  made it hard to reproduce and I've never reproduced it yet with a  
> 2sec
>  delay.)
>  Not much of a fix, though.
>
> Now, here's where someone may be able to help?
>
> Grep'ng around, I found 4 places where the TCP stack called ip_output
> ()
> (one in each of tcp_output.c, tcp_subr.c, tcp_timewait.c and  
> tcp_syncache.c) and I put a printf like this just before them:
> if (flags & TH_RST)
> printf("sent a reset\n");
>
> (The exact format varies for each, because of where the TCP
>         header flags are and have different printf messages.)
>
> Now, the weird part is, that when the extraneous RST is sent to the
> server, I don't get any printf. (I do get a few of these, but at other
> times for what appear to be legitimate RSTs.)
>
> I can't see anywhere else that the TCP stack would send an RST and,  
> so,
> I'm stuck w.r.t. figuring out what is sending them?
>
> Anyone know of another place the TCP stack would make the send happen?
> (Or is it queued earlier when I see the printf message, and then the
> send is "triggered" inside the ip layer when the first data is sent on
> the new connection?)

Are you running TSO?

--
Rui Paulo

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Rui Paulo-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 5 Nov 2009, at 16:36, Rick Macklem wrote:

> Rick Macklem wrote:
>> I can now reproduce what I think others were seeing as slow  
>> reconnects
>> when using NFSv3 over TCP against a server that disconnects inactive
>> TCP connections. I have had some luck figuring out what is going on
>> and can reproduce it fairly easily, but I really need help from  
>> someone
>> who understands the FreeBSD TCP stack.
>>
> Ok, I haven't made much progress on this, but here's what little I
> currently know about it.
>
> The problem occurs after a server has dropped an inactive TCP  
> connection
> for an NFS over TCP mount (in my case a Solaris10 server). When the  
> client
> does a new connection it, for some reason, sends a RST at almost  
> exactly
> the same time as the first RPC request on the new TCP connection,  
> causing
> the server to shut it down.
>
> Ok, things I now know don't affect this are:
> - doing the soshutdown(), soclose() on the old connection. I commented
> them out and it still happened.
> - Avoiding the sobind() on the new connection, done before the
> soconnect().
> - Using a non-reserved port#.
> (The above tests shot down pretty well all the "theories" I could  
> come up
> with.)
>
> The only thing I've found that avoids the problem:
> - putting a 2sec delay right before the soconnect() call. (A 1sec  
> delay
> made it hard to reproduce and I've never reproduced it yet with a 2sec
> delay.)
> Not much of a fix, though.
>
> Now, here's where someone may be able to help?
>
> Grep'ng around, I found 4 places where the TCP stack called ip_output
> ()
> (one in each of tcp_output.c, tcp_subr.c, tcp_timewait.c and  
> tcp_syncache.c) and I put a printf like this just before them:
> if (flags & TH_RST)
> printf("sent a reset\n");
>
> (The exact format varies for each, because of where the TCP
>       header flags are and have different printf messages.)
>
> Now, the weird part is, that when the extraneous RST is sent to the
> server, I don't get any printf. (I do get a few of these, but at other
> times for what appear to be legitimate RSTs.)
>
> I can't see anywhere else that the TCP stack would send an RST and,  
> so,
> I'm stuck w.r.t. figuring out what is sending them?
>
> Anyone know of another place the TCP stack would make the send happen?
> (Or is it queued earlier when I see the printf message, and then the
> send is "triggered" inside the ip layer when the first data is sent on
> the new connection?)
>
> rick, who is getting sick of looking at this:-)

One option would be trying to calling ddb on top of ip_output() and  
checking the backtrace.

--
Rui Paulo

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Rick Macklem :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Thu, 5 Nov 2009, Rui Paulo wrote:

>
> Are you running TSO?
>
Wow, I owe you a beer:-) That was the magic bullet...

I'm a dinosaur, so when I first saw this, I thought of that wonderful
time sharing front end to IBM mainframes I had the priviledge of using
in the 70s. (There was also a TSO emulator in the early Unix releases,
which set your terminal to the worst possible setting imaginable and
introduced delays of seconds when you tried to do anything. It was
pretty funny for those of us who had experienced the real thing.)

Anyhow, I figured you probably didn't mean this so I grep'd around
and found net.inet.tcp.tso, set it to 0 and...the problem went away.
(I have gotten a RST for the new port# once since then, but it was
in the middle of the 3way handshake instead of after it, so it didn't
cause any grief, just another 3way handshake right away.)

I have no idea if the problem is something generic w.r.t. TSO or specific
to the Intel 82801BA/CAM and/or the fxp driver for it. (I checked and none
of the other net cards I have lying around have TSO support, so I can't
test this by replacing the net card/driver.)

Does anyone know enough about TSO to know if the problem is net chip
specific of generic to using it?

Thanks for the help. I don't think I would have ever found that, rick



_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Pyun YongHyeon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Nov 06, 2009 at 03:30:39PM -0500, Rick Macklem wrote:

>
>
> On Thu, 5 Nov 2009, Rui Paulo wrote:
>
> >
> >Are you running TSO?
> >
> Wow, I owe you a beer:-) That was the magic bullet...
>
> I'm a dinosaur, so when I first saw this, I thought of that wonderful
> time sharing front end to IBM mainframes I had the priviledge of using
> in the 70s. (There was also a TSO emulator in the early Unix releases,
> which set your terminal to the worst possible setting imaginable and
> introduced delays of seconds when you tried to do anything. It was
> pretty funny for those of us who had experienced the real thing.)
>
> Anyhow, I figured you probably didn't mean this so I grep'd around
> and found net.inet.tcp.tso, set it to 0 and...the problem went away.
> (I have gotten a RST for the new port# once since then, but it was
> in the middle of the 3way handshake instead of after it, so it didn't
> cause any grief, just another 3way handshake right away.)
>
> I have no idea if the problem is something generic w.r.t. TSO or specific
> to the Intel 82801BA/CAM and/or the fxp driver for it. (I checked and none

fxp(4) has TSO support but I don't think 82801BA has TSO
capability. Only i82550 and i82551 support TSO. You can check it
with ifconfig if fxp(4) think the controller supports TSO.

> of the other net cards I have lying around have TSO support, so I can't
> test this by replacing the net card/driver.)
>
> Does anyone know enough about TSO to know if the problem is net chip
> specific of generic to using it?
>
> Thanks for the help. I don't think I would have ever found that, rick
>
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Rick Macklem :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Thu, 5 Nov 2009, Rui Paulo wrote:

>
> Are you running TSO?
>
I spoke too soon the last time. It appears that the setting of
net.inet.tcp.tso does not have any effect and that the Intel chip
on this machine doesn't do TSO. (I tried swapping it for a 3C905 and
the RSTs are showing up.)

I guess it was just coincidence that the RSTs seemed to stop happening
for a while after I flipped the sysctl.

I think the problem is related to the fact that the server end has
started to close down the connection (send a FIN), but the client
doesn't do anything until an RPC shows up and then tried to do a
new connection right after shutting down the old one. (Solaris10
generates RSTs on the old connection and it seems that somehow
triggers one being sent to the server with the new port# instead
of the old port#.)

I'm about to try doing a soshutdown() at the time the server sends
the FIN to the client, to see what effect that has.

I still can't figure out where the pesky RST gets sent or I could
get a stack trace at that point and see what is doing it.

Having lottsa fun with it, rick

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Julian Elischer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Rick Macklem wrote:

>
>
> On Thu, 5 Nov 2009, Rui Paulo wrote:
>
>>
>> Are you running TSO?
>>
> I spoke too soon the last time. It appears that the setting of
> net.inet.tcp.tso does not have any effect and that the Intel chip
> on this machine doesn't do TSO. (I tried swapping it for a 3C905 and
> the RSTs are showing up.)
>
> I guess it was just coincidence that the RSTs seemed to stop happening
> for a while after I flipped the sysctl.
>
> I think the problem is related to the fact that the server end has
> started to close down the connection (send a FIN), but the client
> doesn't do anything until an RPC shows up and then tried to do a
> new connection right after shutting down the old one. (Solaris10
> generates RSTs on the old connection and it seems that somehow
> triggers one being sent to the server with the new port# instead
> of the old port#.)
>
> I'm about to try doing a soshutdown() at the time the server sends
> the FIN to the client, to see what effect that has.
>
> I still can't figure out where the pesky RST gets sent or I could
> get a stack trace at that point and see what is doing it.
>
> Having lottsa fun with it, rick

not as much fun as we are watching.  :-)

>
> _______________________________________________
> freebsd-current@... mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Rick Macklem :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Thu, 5 Nov 2009, Rui Paulo wrote:

>>
>> Now, here's where someone may be able to help?
>>
>> Grep'ng around, I found 4 places where the TCP stack called ip_output()
>> (one in each of tcp_output.c, tcp_subr.c, tcp_timewait.c and
>> tcp_syncache.c) and I put a printf like this just before them:
>> if (flags & TH_RST)
>> printf("sent a reset\n");
>>
>> (The exact format varies for each, because of where the TCP
>>        header flags are and have different printf messages.)
>>
>> Now, the weird part is, that when the extraneous RST is sent to the
>> server, I don't get any printf. (I do get a few of these, but at other
>> times for what appear to be legitimate RSTs.)
>>
>> I can't see anywhere else that the TCP stack would send an RST and, so,
>> I'm stuck w.r.t. figuring out what is sending them?
>>
Ok, if you found the previous posts entertaining, you might enjoy this:-)

Along with the printfs before all the ip_output() calls, I added calls
inside ip_output() and, eventually, even calls in front of every
if_output(). Never got anything that indicated an RST was being sent.
(I only saw what I expected, which was an ACK reply being sent.)

BUT, at almost exactly the same time, there were the FreeBSD8-CURRENT
client->server RST packets on the server's snoop trace.

Hmm, did a tcpdump in the client and, yes, the same packets were there.

To keep it simple, I had done the dinosaur thing and plugged both the
client and server into an old, dumb 10baseT hub, so that I could easily
watch everything. (I also had an uplink cable to the net port in the
wall, so I could move kernels around from the machine I usually build
them on.)

I was at the point where I couldn't conceivably figure out where the
FreeBSD-CURRENT client was generating these RSTs. So guess what...
--> it wasn't

I unplugged the uplink cable and, no more RSTs. I've been testing for
long enough now, that I am 99% certain they were being injected. Since
the from address and even the MAC address is correct, I can only assume
that it was the Cisco switch that was doing the injecting. (How else
could a packet come in from the Cisco switch with the MAC address of
the FreeBSD-CURRENT client machine?)

It was usually triggered by a server reboot. After the server reboot,
the server does send an RST to the client. This seems legit, but might
be what makes Cisco think that "bad things" are happening? (I have no
access to info about the switches or their configuration, although the
campus standard is for these ports to be used by a single desktop machine
only and not a switch or hub.)

So, it seems that FreeBSD8-CURRENT reconnects fine now, so long as
nothing is injecting RSTs into the newly created connection.

Well, I'm not sure I found this fun, but hopefully others are
entertained:-), rick

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Chuck Swiger-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 9, 2009, at 3:04 PM, Rick Macklem wrote:
[ ... ]
> It was usually triggered by a server reboot. After the server reboot,
> the server does send an RST to the client. This seems legit, but might
> be what makes Cisco think that "bad things" are happening? (I have no
> access to info about the switches or their configuration, although the
> campus standard is for these ports to be used by a single desktop  
> machine
> only and not a switch or hub.)

The description you've provided suggests your network admins are  
configuring end-user ports with "Port Fast" to avoid the time required  
to do spanning tree learning & detection; they want you to not use a  
switch or hub on such ports to avoid the risk of creating a loop.  
Cisco routers have some options which cause them to drop packets and  
disable the port in such a mode if it sees more than the allowed # of  
ether MAC addresses coming from that port, or if it receives BPDU  
packets indicating that a switch was connected to the port; however,  
this wouldn't cause RST packets to be generated, you'd just lose your  
uplink.

Seeing forged RST packets suggests that something like the Sandvine  
PTS equipment is also around on that network.

Regards,
--
-Chuck

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Rick Macklem :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Mon, 9 Nov 2009, Chuck Swiger wrote:

> On Nov 9, 2009, at 3:04 PM, Rick Macklem wrote:
> [ ... ]
>> It was usually triggered by a server reboot. After the server reboot,
>> the server does send an RST to the client. This seems legit, but might
>> be what makes Cisco think that "bad things" are happening? (I have no
>> access to info about the switches or their configuration, although the
>> campus standard is for these ports to be used by a single desktop machine
>> only and not a switch or hub.)
>
> The description you've provided suggests your network admins are configuring
> end-user ports with "Port Fast" to avoid the time required to do spanning
> tree learning & detection; they want you to not use a switch or hub on such
> ports to avoid the risk of creating a loop.  Cisco routers have some options
> which cause them to drop packets and disable the port in such a mode if it
> sees more than the allowed # of ether MAC addresses coming from that port, or
> if it receives BPDU packets indicating that a switch was connected to the
> port; however, this wouldn't cause RST packets to be generated, you'd just
> lose your uplink.
>
> Seeing forged RST packets suggests that something like the Sandvine PTS
> equipment is also around on that network.
>
I'll admit I've never seen any of the hardware and don't know what all
is set up. (The campus networking folks seem to consider such things
"need to know" and I'm not in the "need to know" category:-) So, I
can't say what is at fault, it just sure looks like the RSTs are coming
down the uplink and they even have the MAC of the FreeBSD-CURRENT client.

I do recall that we were the biggest Cisco IP phone installation they
had ever done, when it went in, but that was a fair number of years ago.

rick


_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Doug Rabson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 9 Nov 2009, at 23:04, Rick Macklem wrote:

>
>
> On Thu, 5 Nov 2009, Rui Paulo wrote:
>
>>> Now, here's where someone may be able to help?
>>> Grep'ng around, I found 4 places where the TCP stack called  
>>> ip_output()
>>> (one in each of tcp_output.c, tcp_subr.c, tcp_timewait.c and  
>>> tcp_syncache.c) and I put a printf like this just before them:
>>> if (flags & TH_RST)
>>> printf("sent a reset\n");
>>>
>>> (The exact format varies for each, because of where the TCP
>>>       header flags are and have different printf messages.)
>>> Now, the weird part is, that when the extraneous RST is sent to the
>>> server, I don't get any printf. (I do get a few of these, but at  
>>> other
>>> times for what appear to be legitimate RSTs.)
>>> I can't see anywhere else that the TCP stack would send an RST  
>>> and, so,
>>> I'm stuck w.r.t. figuring out what is sending them?
> Ok, if you found the previous posts entertaining, you might enjoy  
> this:-)
>
> Along with the printfs before all the ip_output() calls, I added calls
> inside ip_output() and, eventually, even calls in front of every  
> if_output(). Never got anything that indicated an RST was being sent.
> (I only saw what I expected, which was an ACK reply being sent.)
>
> BUT, at almost exactly the same time, there were the FreeBSD8-CURRENT
> client->server RST packets on the server's snoop trace.
>
> Hmm, did a tcpdump in the client and, yes, the same packets were  
> there.
>
> To keep it simple, I had done the dinosaur thing and plugged both the
> client and server into an old, dumb 10baseT hub, so that I could  
> easily
> watch everything. (I also had an uplink cable to the net port in the
> wall, so I could move kernels around from the machine I usually build
> them on.)
>
> I was at the point where I couldn't conceivably figure out where the
> FreeBSD-CURRENT client was generating these RSTs. So guess what...
> --> it wasn't
>
> I unplugged the uplink cable and, no more RSTs. I've been testing for
> long enough now, that I am 99% certain they were being injected. Since
> the from address and even the MAC address is correct, I can only  
> assume
> that it was the Cisco switch that was doing the injecting. (How else
> could a packet come in from the Cisco switch with the MAC address of
> the FreeBSD-CURRENT client machine?)
>
> It was usually triggered by a server reboot. After the server reboot,
> the server does send an RST to the client. This seems legit, but might
> be what makes Cisco think that "bad things" are happening? (I have no
> access to info about the switches or their configuration, although the
> campus standard is for these ports to be used by a single desktop  
> machine
> only and not a switch or hub.)
>
> So, it seems that FreeBSD8-CURRENT reconnects fine now, so long as
> nothing is injecting RSTs into the newly created connection.
>
> Well, I'm not sure I found this fun, but hopefully others are
> entertained:-), rick
>

We had some issues at work where the Cisco intrusion detection thing  
was resetting connections bogusly. Do you have IDS enabled on your  
Cisco gear?

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: Help needed: TCP Wizards (was 8.0-RC1 NFS client timeout issue)

by Olaf Seibert-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon 09 Nov 2009 at 18:04:39 -0500, Rick Macklem wrote:
> Well, I'm not sure I found this fun, but hopefully others are
> entertained:-), rick

I have to admit that I am :-) Although reading this kind of stories is
usually more entertaining than being in them...

Thanks again for your adventures!
-Olaf.
--
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."