Possible scheduler (SCHED_ULE) bug?

View: New views
15 Messages — Rating Filter:   Alert me  

Possible scheduler (SCHED_ULE) bug?

by Jaime Bozza-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I believe I found a problem with the ULE scheduler - At least the fact that there is a problem, but I'm not sure where to go from here.   The system locks all processes, but doesn't panic, so I have no output to give.  

I was able to duplicate this on three different machines and solved it by switching to the scheduler to 4BSD.

Here's the environment:

FreeBSD 7.2 i386, installed from bootonly ISO, Custom install, minimal, no other changes other than setting timezone, changing root password, and turning on sshd (allowing root and password connection).

Running portsnap (fetch, then extract) to get latest ports tree.

>From ports, make installs of lang/php5 and www/lighttpd, using defaults for all ports installed.

Modified lighttpd.conf for PHP (attached diff), created a short script called uploadfile.php (attached).  File was installed at /usr/local/www/data/uploadfile.php

Start lighttpd (lighttpd_enable="YES" in rc.conf, /usr/local/etc/rc.d/lighttpd start), connect and run script.

As long as I upload a file less than 64K, everything works fine.  If I try to upload something larger than 64K, system no longer responds.   Console prompt at login will allow me to enter username/password, but nothing happens after that.  Console prompt logged in will allow me to type a single line, but if I press enter, nothing after that.

No errors get written anywhere - console, logs, etc.

I'm at a loss of what to do next.  Can anyone give me ideas of what else I can do?








--- lighttpd.conf.sample        2009-10-23 09:37:50.000000000 -0500
+++ lighttpd.conf       2009-10-23 10:02:00.000000000 -0500
@@ -20,7 +20,7 @@
 #                               "mod_auth",
 #                               "mod_status",
 #                               "mod_setenv",
-#                               "mod_fastcgi",
+                                "mod_fastcgi",
 #                               "mod_proxy",
 #                               "mod_simple_vhost",
 #                               "mod_evhost",
@@ -39,7 +39,7 @@
 server.document-root        = "/usr/local/www/data/"
 
 ## where to send error-messages to
-server.errorlog             = "/var/log/lighttpd.error.log"
+server.errorlog             = "/tmp/lighttpd.error.log"
 
 # files to check for if .../ is requested
 index-file.names            = ( "index.php", "index.html",
@@ -115,7 +115,7 @@
 # server.tag                 = "lighttpd"
 
 #### accesslog module
-accesslog.filename          = "/var/log/lighttpd.access.log"
+accesslog.filename          = "/tmp/lighttpd.access.log"
 
 ## deny access the file-extensions
 #
@@ -324,3 +324,20 @@
 # Enable IPV6 and IPV4 together
 server.use-ipv6 = "enable"
 $SERVER["socket"] == "0.0.0.0:80" { }
+
+  fastcgi.server = ( ".php" => ((
+    "bin-path" => "/usr/local/bin/php-cgi",
+    "socket" => "/tmp/hermes.php.socket",
+    "min-procs" => 1,
+    "max-procs" => 1,
+    "bin-environment" => (
+      "PHP_FCGI_CHILDREN" => "4",
+      "PHP_FCGI_MAX_REQUESTS" => "50",
+      "PHPRC" => "/data/sites/support/conf/"
+      ),
+    "bin-copy-environment" => (
+      "PATH", "SHELL", "USER", "TZ"
+      ),
+    "broken-scriptfilename" => "enable",
+  )))
+
\

_______________________________________________
freebsd-stable@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..."

uploadfile.php (880 bytes) Download Attachment

Re: Possible scheduler (SCHED_ULE) bug?

by Scott Ullrich :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 23, 2009 at 2:46 PM, Jaime Bozza <jbozza@...> wrote:

> I believe I found a problem with the ULE scheduler - At least the fact that there is a problem, but I'm not sure where to go from here.   The system locks all processes, but doesn't panic, so I have no output to give.
>
> I was able to duplicate this on three different machines and solved it by switching to the scheduler to 4BSD.
>
> Here's the environment:
>
> FreeBSD 7.2 i386, installed from bootonly ISO, Custom install, minimal, no other changes other than setting timezone, changing root password, and turning on sshd (allowing root and password connection).
>
> Running portsnap (fetch, then extract) to get latest ports tree.
>
> >From ports, make installs of lang/php5 and www/lighttpd, using defaults for all ports installed.
>
> Modified lighttpd.conf for PHP (attached diff), created a short script called uploadfile.php (attached).  File was installed at /usr/local/www/data/uploadfile.php
>
> Start lighttpd (lighttpd_enable="YES" in rc.conf, /usr/local/etc/rc.d/lighttpd start), connect and run script.
>
> As long as I upload a file less than 64K, everything works fine.  If I try to upload something larger than 64K, system no longer responds.   Console prompt at login will allow me to enter username/password, but nothing happens after that.  Console prompt logged in will allow me to type a single line, but if I press enter, nothing after that.
>
> No errors get written anywhere - console, logs, etc.
>
> I'm at a loss of what to do next.  Can anyone give me ideas of what else I can do?

Try adding this or changing these items in lighttpd.conf:

## FreeBSD!
server.event-handler = "freebsd-kqueue"
server.network-backend = "writev"

Scott
_______________________________________________
freebsd-stable@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..."

Re: Possible scheduler (SCHED_ULE) bug?

by Dylan Cochran-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 10/23/09, Jaime Bozza <jbozza@...> wrote:

> I believe I found a problem with the ULE scheduler - At least the fact that
> there is a problem, but I'm not sure where to go from here.   The system
> locks all processes, but doesn't panic, so I have no output to give.
>
> I was able to duplicate this on three different machines and solved it by
> switching to the scheduler to 4BSD.
>
> Here's the environment:
>
> FreeBSD 7.2 i386, installed from bootonly ISO, Custom install, minimal, no
> other changes other than setting timezone, changing root password, and
> turning on sshd (allowing root and password connection).
>
> Running portsnap (fetch, then extract) to get latest ports tree.
>
> >From ports, make installs of lang/php5 and www/lighttpd, using defaults for
> all ports installed.
>
> Modified lighttpd.conf for PHP (attached diff), created a short script
> called uploadfile.php (attached).  File was installed at
> /usr/local/www/data/uploadfile.php
>
> Start lighttpd (lighttpd_enable="YES" in rc.conf,
> /usr/local/etc/rc.d/lighttpd start), connect and run script.
>
> As long as I upload a file less than 64K, everything works fine.  If I try
> to upload something larger than 64K, system no longer responds.   Console
> prompt at login will allow me to enter username/password, but nothing
> happens after that.  Console prompt logged in will allow me to type a single
> line, but if I press enter, nothing after that.
>
> No errors get written anywhere - console, logs, etc.
>
> I'm at a loss of what to do next.  Can anyone give me ideas of what else I
> can do?

Superficially, this seams identical to a deadlock I reported for
7.1-RC1. Would you mind compiling a kernel with these options:

options DDB
options KDB
options SW_WATCHDOG
options DEBUG_VFS_LOCKS


then add the following to /etc/rc.conf:

watchdogd_enable="YES"
watchdogd_flags="-e 'ls -al /etc'"

This should force a panic when the lockup happens again, which will
drop to a debugger.

Please check the backtrace, and tell me if the call stack is the same
as this one (between the --- interrupt, and --- syscall sections):

KDB: stack backtrace:
db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...)
at db_trace_self_wrapper+0x26
kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29
hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9
lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c
Xtimerint() at Xtimerint+0x1f
--- interrupt, eip = 0xc07ff29d, esp = 0xe66e0b48, ebp = 0xe66e0c34 ---
kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d
do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at do_sendfile+0xb1
sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13
syscall(e66e0d38) at syscall+0x335
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (393, FreeBSD ELF32, sendfile), eip = 0x282cb0cb, esp =
0xbfbfc7cc, ebp = 0xbfbfe848 ---
KDB: enter: watchdog timeout

You can type 'reboot' to reboot the machine (in my case, panic would
not work, so a useful dump wasn't in the cards)
_______________________________________________
freebsd-stable@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..."

RE: Possible scheduler (SCHED_ULE) bug?

by Jaime Bozza-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Try adding this or changing these items in lighttpd.conf:
>
> ## FreeBSD!
> server.event-handler = "freebsd-kqueue"
> server.network-backend = "writev"

Scott,

Lighttpd was already using freebsd-kqueue, but I added the writev network-backend and the problem went away.   With this additional information I was able to track down kern/138999, which seems to be the exact issue I'm having.

The additional information I have (over the PR) is that:
1) Files over 64K cause the problem, not just larger files
2) switching over to SCHED_4BSD eliminates the problem - system no longer locks.  
3) 7.2 amd64 doesn't have the problem - Tested a similar configuration and was not able to duplicate on amd64 at all.

I'm CC'ing the original submitter of the PR to give him an update to see if he had any additional luck.

Jaime





_______________________________________________
freebsd-stable@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..."

Re: Possible scheduler (SCHED_ULE) bug?

by Kostik Belousov :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 23, 2009 at 04:28:59PM -0400, Dylan Cochran wrote:

> On 10/23/09, Jaime Bozza <jbozza@...> wrote:
> > I believe I found a problem with the ULE scheduler - At least the fact that
> > there is a problem, but I'm not sure where to go from here.   The system
> > locks all processes, but doesn't panic, so I have no output to give.
> >
> > I was able to duplicate this on three different machines and solved it by
> > switching to the scheduler to 4BSD.
> >
> > Here's the environment:
> >
> > FreeBSD 7.2 i386, installed from bootonly ISO, Custom install, minimal, no
> > other changes other than setting timezone, changing root password, and
> > turning on sshd (allowing root and password connection).
> >
> > Running portsnap (fetch, then extract) to get latest ports tree.
> >
> > >From ports, make installs of lang/php5 and www/lighttpd, using defaults for
> > all ports installed.
> >
> > Modified lighttpd.conf for PHP (attached diff), created a short script
> > called uploadfile.php (attached).  File was installed at
> > /usr/local/www/data/uploadfile.php
> >
> > Start lighttpd (lighttpd_enable="YES" in rc.conf,
> > /usr/local/etc/rc.d/lighttpd start), connect and run script.
> >
> > As long as I upload a file less than 64K, everything works fine.  If I try
> > to upload something larger than 64K, system no longer responds.   Console
> > prompt at login will allow me to enter username/password, but nothing
> > happens after that.  Console prompt logged in will allow me to type a single
> > line, but if I press enter, nothing after that.
> >
> > No errors get written anywhere - console, logs, etc.
> >
> > I'm at a loss of what to do next.  Can anyone give me ideas of what else I
> > can do?
>
> Superficially, this seams identical to a deadlock I reported for
> 7.1-RC1. Would you mind compiling a kernel with these options:
>
> options DDB
> options KDB
> options SW_WATCHDOG
> options DEBUG_VFS_LOCKS
>
>
> then add the following to /etc/rc.conf:
>
> watchdogd_enable="YES"
> watchdogd_flags="-e 'ls -al /etc'"
>
> This should force a panic when the lockup happens again, which will
> drop to a debugger.
>
> Please check the backtrace, and tell me if the call stack is the same
> as this one (between the --- interrupt, and --- syscall sections):
>
> KDB: stack backtrace:
> db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...)
> at db_trace_self_wrapper+0x26
> kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29
> hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9
> lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c
> Xtimerint() at Xtimerint+0x1f
> --- interrupt, eip = 0xc07ff29d, esp = 0xe66e0b48, ebp = 0xe66e0c34 ---
> kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d
> do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at do_sendfile+0xb1
> sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13
> syscall(e66e0d38) at syscall+0x335
> Xint0x80_syscall() at Xint0x80_syscall+0x20
> --- syscall (393, FreeBSD ELF32, sendfile), eip = 0x282cb0cb, esp =
> 0xbfbfc7cc, ebp = 0xbfbfe848 ---
> KDB: enter: watchdog timeout
Can you look up the source line for kern_sendfile+0x90d in your
kernel ? Do kgdb kernel.debug, then execute "list *(kern_sendfile+0x90d)".


attachment0 (203 bytes) Download Attachment

Re: Possible scheduler (SCHED_ULE) bug?

by Jacob Myers-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jaime Bozza wrote:
[snip]
> The additional information I have (over the PR) is that:
> 1) Files over 64K cause the problem, not just larger files
I thought it was over 1 MB or so. But maybe I'm wrong. ISTR that I
couldn't trigger it with some images of around 70K.

> 2) switching over to SCHED_4BSD eliminates the problem - system no longer locks.  
I will have to test this. This is indeed interesting...

> 3) 7.2 amd64 doesn't have the problem - Tested a similar configuration and was not able to duplicate on amd64 at all.
I can replicate this problem on FreeBSD 7.2/amd64 reliably.

--
Jacob Myers <Jacob@...> | Website: http://whotookspaz.org
Network Admin, Wilcox Technologies  | Public key: 186A424A
Using FreeBSD since 2007            | Public shell: http://bit.ly/42iGCR
Answer a fool according to his folly, lest he be wise in his own conceit
        -- Proverbs, 26:5
_______________________________________________
freebsd-stable@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..."

RE: Possible scheduler (SCHED_ULE) bug?

by Jaime Bozza-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> > The additional information I have (over the PR) is that:
> > 1) Files over 64K cause the problem, not just larger files
> I thought it was over 1 MB or so. But maybe I'm wrong. ISTR that I
> couldn't trigger it with some images of around 70K.

I discovered it originally with a 72K file.  After some tests, I found a 63K file worked and a 65K file didn't.  When I get back into the office, I can test the actual boundary (65535, 65536, 65537, etc), but 64K seems pretty logical.

> > 2) switching over to SCHED_4BSD eliminates the problem - system no
> longer locks.
> I will have to test this. This is indeed interesting...
>
> > 3) 7.2 amd64 doesn't have the problem - Tested a similar
> configuration and was not able to duplicate on amd64 at all.
> I can replicate this problem on FreeBSD 7.2/amd64 reliably.

I haven't tried larger files - Maybe the boundary is different on amd64?   Doing some quick tests right now, I was able to upload a 100MB file without a problem, but this is an AMD64 system with SMP, plus the filesystem is all ZFS, so there are too many things different.  I'll have to setup a system that closely mirrors the rest of my tests (UFS, ULE, no SMP, etc) before I can say I'm not having a problem there.

Jaime

_______________________________________________
freebsd-stable@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..."

Re: Possible scheduler (SCHED_ULE) bug?

by Arnaud Houdelette :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jaime Bozza a écrit :

>>> The additional information I have (over the PR) is that:
>>> 1) Files over 64K cause the problem, not just larger files
>>>      
>> I thought it was over 1 MB or so. But maybe I'm wrong. ISTR that I
>> couldn't trigger it with some images of around 70K.
>>    
>
> I discovered it originally with a 72K file.  After some tests, I found a 63K file worked and a 65K file didn't.  When I get back into the office, I can test the actual boundary (65535, 65536, 65537, etc), but 64K seems pretty logical.
>
>  
>>> 2) switching over to SCHED_4BSD eliminates the problem - system no
>>>      
>> longer locks.
>> I will have to test this. This is indeed interesting...
>>
>>    
>>> 3) 7.2 amd64 doesn't have the problem - Tested a similar
>>>      
>> configuration and was not able to duplicate on amd64 at all.
>> I can replicate this problem on FreeBSD 7.2/amd64 reliably.
>>    
>
> I haven't tried larger files - Maybe the boundary is different on amd64?   Doing some quick tests right now, I was able to upload a 100MB file without a problem, but this is an AMD64 system with SMP, plus the filesystem is all ZFS, so there are too many things different.  I'll have to setup a system that closely mirrors the rest of my tests (UFS, ULE, no SMP, etc) before I can say I'm not having a problem there.
>
> Jaime
>  
I had the same issue using 7.1 amd64, with ZFS, no SMP.
Not really sure what is the size boundary. I can't really test either,
as the machine is remote.
But I confirm that each tentative upload of certain relatively 'big'
files (around 1MB) with wordpress hanged the system before I switched
from sendfile to writev.

I might do some test on amd64 7.2 with no SMP if it can be of any use ?

Arnaud
_______________________________________________
freebsd-stable@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..."

Re: Possible scheduler (SCHED_ULE) bug?

by Jacob Myers-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Arnaud Houdelette wrote:

> I had the same issue using 7.1 amd64, with ZFS, no SMP.
> Not really sure what is the size boundary. I can't really test either,
> as the machine is remote.
> But I confirm that each tentative upload of certain relatively 'big'
> files (around 1MB) with wordpress hanged the system before I switched
> from sendfile to writev.
>
> I might do some test on amd64 7.2 with no SMP if it can be of any use ?
>
> Arnaud
I can confirm it happens without SMP on 7.2 and amd64. If you can give
it a try though, well, the more information the better. Any boundary
information, even approximate (well, mostly testing if 64K is the
boundary or if 1 MB or so is) would probably be good, too.

--
Jacob Myers <Jacob@...> | Website: http://whotookspaz.org
Network Admin, Wilcox Technologies  | Public key: 186A424A
Using FreeBSD since 2007            | Public shell: http://bit.ly/42iGCR
Answer a fool according to his folly, lest he be wise in his own conceit
        -- Proverbs, 26:5



signature.asc (915 bytes) Download Attachment

RE: Possible scheduler (SCHED_ULE) bug?

by Jaime Bozza-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

From: Jacob Myers [mailto:jacob@...]

> Arnaud Houdelette wrote:
> > I had the same issue using 7.1 amd64, with ZFS, no SMP.
> > Not really sure what is the size boundary. I can't really test
> either,
> > as the machine is remote.
> > But I confirm that each tentative upload of certain relatively 'big'
> > files (around 1MB) with wordpress hanged the system before I switched
> > from sendfile to writev.
> >
> > I might do some test on amd64 7.2 with no SMP if it can be of any use
> ?
> >
> > Arnaud
>
> I can confirm it happens without SMP on 7.2 and amd64. If you can give
> it a try though, well, the more information the better. Any boundary
> information, even approximate (well, mostly testing if 64K is the
> boundary or if 1 MB or so is) would probably be good, too.

I haven't tested the specific boundaries yet, but I will do that shortly.

I *was* able to get a crash dump on the i386 system - Will post the details shortly.

My amd64 system is a test system with ZFS, so I couldn't get a crash dump.   Trying to work around that.

On both systems, I used a 72K file (73,688 bytes) to test.  Both systems would "lock up", and then a few seconds later kdb would come up.   It wasn't an immediate thing, at least not on the i386 system.  I wasn't able to watch the amd64 system since it's too far away to time.

Jaime

_______________________________________________
freebsd-stable@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..."

RE: Possible scheduler (SCHED_ULE) bug?

by Jaime Bozza-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Sincerely,

Jaime Bozza
MindSites Group, LLC


From: Dylan Cochran [mailto:heliocentric@...]

> Superficially, this seams identical to a deadlock I reported for
> 7.1-RC1. Would you mind compiling a kernel with these options:
>
> <snip>
> KDB: stack backtrace:
> db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...)
> at db_trace_self_wrapper+0x26
> kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29
> hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9
> lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c
> Xtimerint() at Xtimerint+0x1f
> --- interrupt, eip = 0xc07ff29d, esp = 0xe66e0b48, ebp = 0xe66e0c34 ---
> kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d
> do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at
> do_sendfile+0xb1
> sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13
> syscall(e66e0d38) at syscall+0x335
> Xint0x80_syscall() at Xint0x80_syscall+0x20
> --- syscall (393, FreeBSD ELF32, sendfile), eip = 0x282cb0cb, esp =
> 0xbfbfc7cc, ebp = 0xbfbfe848 ---
> KDB: enter: watchdog timeout
>
> You can type 'reboot' to reboot the machine (in my case, panic would
> not work, so a useful dump wasn't in the cards)

Different offset on mine, but of course I'm using a different kernel.  
kern_sendfile+0x6ad
do_sendfile+0xb1
sendfile+0x13

Luckily, I was able to get a panic, so I have all the files necessary to debug.  Here's the backtrace:

(kgdb) backtrace
#0  doadump () at pcpu.h:196
#1  0xc07f2c57 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc07f2f62 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc0497e47 in db_panic (addr=Could not find the frame base for "db_panic".
) at /usr/src/sys/ddb/db_command.c:446
#4  0xc04985bc in db_command (last_cmdp=0xc0ca9154, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:413
#5  0xc04986ca in db_command_loop () at /usr/src/sys/ddb/db_command.c:466
#6  0xc049a17d in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:228
#7  0xc081fdf6 in kdb_trap (type=3, code=0, tf=0xc72e2a5c) at /usr/src/sys/kern/subr_kdb.c:524
#8  0xc0b01b9b in trap (frame=0xc72e2a5c) at /usr/src/sys/i386/i386/trap.c:692
#9  0xc0ae58fb in calltrap () at /usr/src/sys/i386/i386/exception.s:166
#10 0xc081ff7a in kdb_enter_why (why=0xc0b677b2 "watchdog", msg=0xc0b7ef1d "watchdog timeout") at cpufunc.h:60
#11 0xc07b0cad in hardclock (usermode=0, pc=3229966301) at /usr/src/sys/kern/kern_clock.c:640
#12 0xc0aedf1c in lapic_handle_timer (frame=0xc72e2afc) at /usr/src/sys/i386/i386/local_apic.c:785
#13 0xc0ae5edf in Xtimerint () at apic_vector.s:108
#14 0xc0855fdd in kern_sendfile (td=0xc771db40, uap=0xc72e2cfc, hdr_uio=0x0, trl_uio=0x0, compat=0) at atomic.h:160
#15 0xc0856d31 in do_sendfile (td=0xc771db40, uap=0xc72e2cfc, compat=0) at /usr/src/sys/kern/uipc_syscalls.c:1775
#16 0xc0856dd3 in sendfile (td=0xc771db40, uap=0xc72e2cfc) at /usr/src/sys/kern/uipc_syscalls.c:1746
#17 0xc0b01365 in syscall (frame=0xc72e2d38) at /usr/src/sys/i386/i386/trap.c:1094
#18 0xc0ae5960 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:262
#19 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

This is all a bit new to me (debugging, etc), so let me know if you need anything else!

Jaime

_______________________________________________
freebsd-stable@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..."

RE: Possible scheduler (SCHED_ULE) bug?

by Jaime Bozza-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

From: Kostik Belousov [mailto:kostikbel@...]
> Can you look up the source line for kern_sendfile+0x90d in your
> kernel ? Do kgdb kernel.debug, then execute "list *(kern_sendfile+0x90d)".

In my case, it was kern_sendfile+0x6ad (rebuilt with RELENG_7 this weekend).

Here's the output:

(kgdb) list *(kern_sendfile+0x6ad)
0xc0855fdd is in kern_sendfile (atomic.h:160).
155     static __inline int
156     atomic_cmpset_int(volatile u_int *dst, u_int exp, u_int src)
157     {
158             u_char res;
159
160             __asm __volatile(
161             "       " MPLOCKED "            "
162             "       cmpxchgl %2,%1 ;        "
163             "       sete    %0 ;            "
164             "1:                             "

Not much to go on there.  I posted a backtrace in a previous email, but the relevant sections (I think) are:

#14 0xc0855fdd in kern_sendfile (td=0xc771db40, uap=0xc72e2cfc, hdr_uio=0x0, trl_uio=0x0, compat=0) at atomic.h:160
#15 0xc0856d31 in do_sendfile (td=0xc771db40, uap=0xc72e2cfc, compat=0) at /usr/src/sys/kern/uipc_syscalls.c:1775
#16 0xc0856dd3 in sendfile (td=0xc771db40, uap=0xc72e2cfc) at /usr/src/sys/kern/uipc_syscalls.c:1746
#17 0xc0b01365 in syscall (frame=0xc72e2d38) at /usr/src/sys/i386/i386/trap.c:1094
#18 0xc0ae5960 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:262
#19 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

I'm still going to test the specific boundary, but if there's more information I can give, let me know!

Jaime


_______________________________________________
freebsd-stable@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..."

RE: Possible scheduler (SCHED_ULE) bug?

by Jaime Bozza-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

From: Arnaud Houdelette [mailto:arnaud.houdelette@...]

> I haven't tried larger files - Maybe the boundary is different on amd64?   Doing some quick tests
> right now, I was able to upload a 100MB file without a problem, but this is an AMD64 system with SMP,
> plus the filesystem is all ZFS, so there are too many things different.  I'll have to setup a system
> that closely mirrors the rest of my tests (UFS, ULE, no SMP, etc) before I can say I'm not having a
> problem there.
> >
> > Jaime
> >
> I had the same issue using 7.1 amd64, with ZFS, no SMP.
> Not really sure what is the size boundary. I can't really test either,
> as the machine is remote.
> But I confirm that each tentative upload of certain relatively 'big'
> files (around 1MB) with wordpress hanged the system before I switched
> from sendfile to writev.
>
> I might do some test on amd64 7.2 with no SMP if it can be of any use ?
>
> Arnaud

I was able to duplicate the problem on 7.2-STABLE amd64 no SMP - Problem didn't seem to happen with SMP on.  While I wasn't able to get a crash dump, the crash looked similar.

Jaime

_______________________________________________
freebsd-stable@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..."

Re: Possible scheduler (SCHED_ULE) bug?

by Jacob Myers-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jaime Bozza wrote:

> From: Arnaud Houdelette [mailto:arnaud.houdelette@...]
>> I haven't tried larger files - Maybe the boundary is different on amd64?   Doing some quick tests
>> right now, I was able to upload a 100MB file without a problem, but this is an AMD64 system with SMP,
>> plus the filesystem is all ZFS, so there are too many things different.  I'll have to setup a system
>> that closely mirrors the rest of my tests (UFS, ULE, no SMP, etc) before I can say I'm not having a
>> problem there.
>>> Jaime
>>>
>> I had the same issue using 7.1 amd64, with ZFS, no SMP.
>> Not really sure what is the size boundary. I can't really test either,
>> as the machine is remote.
>> But I confirm that each tentative upload of certain relatively 'big'
>> files (around 1MB) with wordpress hanged the system before I switched
>> from sendfile to writev.
>>
>> I might do some test on amd64 7.2 with no SMP if it can be of any use ?
>>
>> Arnaud
>
> I was able to duplicate the problem on 7.2-STABLE amd64 no SMP - Problem didn't seem to happen with SMP on.  While I wasn't able to get a crash dump, the crash looked similar.
>
> Jaime
>
FWIW, there was a fix committed for this:
http://svn.freebsd.org/viewvc/base?view=revision&revision=198853
See if it helps.

--
Jacob Myers <Jacob@...> | Website: http://whotookspaz.org
Network Admin, Wilcox Technologies  | Public key: 186A424A
Using FreeBSD since 2007            | Public shell: http://bit.ly/42iGCR
Answer a fool according to his folly, lest he be wise in his own conceit
        -- Proverbs, 26:5



signature.asc (915 bytes) Download Attachment

Re: Possible scheduler (SCHED_ULE) bug?

by Attilio Rao-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/10/23 Jaime Bozza <jbozza@...>:
> I believe I found a problem with the ULE scheduler - At least the fact that there is a problem, but I'm not sure where to go from here.   The system locks all processes, but doesn't panic, so I have no output to give.
>
> I was able to duplicate this on three different machines and solved it by switching to the scheduler to 4BSD.
>
> Here's the environment:
>
> FreeBSD 7.2 i386, installed from bootonly ISO, Custom install, minimal, no other changes other than setting timezone, changing root password, and turning on sshd (allowing root and password connection).

Did you recompile your kernel? Can you show me the revision of
src/sys/kern/sched_ule.c you used?

Attilio


--
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-stable@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..."