WARNING: This server is unstable and will be retired in the next days. If you want to keep this forum available, please request immediately a migration on the Nabble Support forum. Forums that don't receive any migration request will be deleted forever.

server stops handling requests - nowhere near 100% CPU or Memory used

View: New views
10 Messages — Rating Filter:   Alert me  

server stops handling requests - nowhere near 100% CPU or Memory used

by Claudiu Saftoiu :: Rate this Message:

| View Threaded | Show Only this Message

Hello all,

I recently made a lot of changes to my ZODB app, and now I'm experiencing some puzzling behavior.

Normally I have ~15 clients making requests to the server every few seconds. Some of these
requests commit new data to the database, while others just process existing data. I also
have a 'heartbeat' client, which access the view called '/is_alive', which is entirely described
by this function:

    def is_alive(request): return True

Once a day or so, I will get a report from the heartbeat client that the server is down - no heartbeat
has succeeded in the past 10 minutes. Indeed, any URL I go to on the server, will simply not
load, but take forever - even this '/is_alive' view. 

I can still SSH into the server, however. Running 'top', I see that the server is not taking any
CPU time, and is not taking any large amount of memory. The computer itself runs
just fine. I can even access the database with the paster shell, make requests, commit
things, etc., without any exceptional delays. If I CTRL+C the webserver and restart it,
things immediately work just fine. 

Looking at the immediately preceding stdout, I see no hints of errors of any kind. However,
a printline is executed on the server every time two of those clients completes a request,
and I notice that, leading up to the current point in the stdout (where the webserver does not
respond anymore), there are a few "Pipe is broken" messages (which happen when, e.g.,
I go to a view with a web-browser and close the tab before the page loads), and I notice
that requests stopped completing from one of the two clients, then from the other. It seems
a gradual slowdown of some kind, though I'm not entirely certain.

My question is: what could possibly be causing the server to 'lock up', even on a
simple view like 'is_alive', without using any memory or CPU? Is there some ZODB
resource that might be getting gradually exhausted because I'm not handling it properly?

Thanks in advance,
- Claudiu

_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@...
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: server stops handling requests - nowhere near 100% CPU or Memory used

by M.J. Pieters :: Rate this Message:

| View Threaded | Show Only this Message

On Thu, Apr 19, 2012 at 17:20, Claudiu Saftoiu <csaftoiu@...> wrote:
> My question is: what could possibly be causing the server to 'lock up', even
> on a
> simple view like 'is_alive', without using any memory or CPU? Is there some
> ZODB
> resource that might be getting gradually exhausted because I'm not handling
> it properly?

I don't know, anything could lock up your site if something is waiting
for a lock, for example.

Use http://pypi.python.org/pypi/z3c.deadlockdebugger to figure out
what the threads are doing at this time. Preferably, trigger the
dump_threads() method of that module on SIGUSR1, like the Zope
signalstack product does (see
http://svn.plone.org/svn/collective/Products.signalstack/trunk/Products/signalstack/__init__.py
for the exact code to bind the signal handler). That'll tell you
exactly what each thread is busy with when you send the signal.

--
Martijn Pieters
_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@...
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: server stops handling requests - nowhere near 100% CPU or Memory used

by Claudiu Saftoiu :: Rate this Message:

| View Threaded | Show Only this Message

On Thu, Apr 19, 2012 at 11:33 AM, Martijn Pieters <mj@...> wrote:
On Thu, Apr 19, 2012 at 17:20, Claudiu Saftoiu <csaftoiu@...> wrote:
> My question is: what could possibly be causing the server to 'lock up', even
> on a
> simple view like 'is_alive', without using any memory or CPU? Is there some
> ZODB
> resource that might be getting gradually exhausted because I'm not handling
> it properly?

I don't know, anything could lock up your site if something is waiting
for a lock, for example.

Are there locks that could possibly be used for the 'is_alive' function? Here is the
definition in its entirety.

In 'configure.zcml': 
  <view
    view=".views.is_alive"
    name="is_alive"
    renderer="json"
    />
in 'views.py': 
    def is_alive(request):
        return True

Whatever the problem is, it causes 'is_alive' to take forever, and the CPU is not
spinning at 100%, and memory usage is not high. Could this be a lock problem?

Use http://pypi.python.org/pypi/z3c.deadlockdebugger to figure out
what the threads are doing at this time. Preferably, trigger the
dump_threads() method of that module on SIGUSR1, like the Zope
signalstack product does (see
http://svn.plone.org/svn/collective/Products.signalstack/trunk/Products/signalstack/__init__.py
for the exact code to bind the signal handler). That'll tell you
exactly what each thread is busy with when you send the signal.

Thanks, I'll check this out.
- Claudiu

_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@...
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: server stops handling requests - nowhere near 100% CPU or Memory used

by M.J. Pieters :: Rate this Message:

| View Threaded | Show Only this Message

On Thu, Apr 19, 2012 at 18:22, Claudiu Saftoiu <csaftoiu@...> wrote:

> Are there locks that could possibly be used for the 'is_alive' function?
> Here is the
> definition in its entirety.
>
> In 'configure.zcml':
>   <view
>     view=".views.is_alive"
>     name="is_alive"
>     renderer="json"
>     />
> in 'views.py':
>     def is_alive(request):
>         return True
>
> Whatever the problem is, it causes 'is_alive' to take forever, and the CPU
> is not
> spinning at 100%, and memory usage is not high. Could this be a lock
> problem?

I have no idea; all you told is that you use the ZODB, not what server
framework you use to register your views. Is this Grok, Bluebream,
Repoze.BFG, Zope 2 or something else?

I don't think that is_alive would be the cause of this, it looks like
a simple enough view. But if your server uses a thread pool, and all
the other threads are now occupied by something that got locked up,
then it could be that the server is not answering your is_alive
request at all because it is waiting for a thread to free up first.

--
Martijn Pieters
_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@...
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: server stops handling requests - nowhere near 100% CPU or Memory used

by Claudiu Saftoiu :: Rate this Message:

| View Threaded | Show Only this Message

I have no idea; all you told is that you use the ZODB, not what server
framework you use to register your views. Is this Grok, Bluebream,
Repoze.BFG, Zope 2 or something else?

Ah yes, sorry about that. I'm using Repoze.BFG . Does that help any?

I don't think that is_alive would be the cause of this, it looks like
a simple enough view. But if your server uses a thread pool, and all
the other threads are now occupied by something that got locked up,
then it could be that the server is not answering your is_alive
request at all because it is waiting for a thread to free up first.

Yea, I don't think it is the 'is_alive'. I more mentioned that to help
diagnose the problem, i.e., whatever the problem is, it also
affects simple views that don't touch the database at all. 

Hmm, it might be the thread pool issue that you mention. That seems
to make sense. I'll have to see if I have any views that are never finishing. Thanks
for the pointer.

Thanks again,
- Claudiu

_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@...
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: server stops handling requests - nowhere near 100% CPU or Memory used

by Claudiu Saftoiu :: Rate this Message:

| View Threaded | Show Only this Message

I don't know, anything could lock up your site if something is waiting
for a lock, for example.

Use http://pypi.python.org/pypi/z3c.deadlockdebugger to figure out
what the threads are doing at this time. Preferably, trigger the
dump_threads() method of that module on SIGUSR1, like the Zope
signalstack product does (see
http://svn.plone.org/svn/collective/Products.signalstack/trunk/Products/signalstack/__init__.py
for the exact code to bind the signal handler). That'll tell you
exactly what each thread is busy with when you send the signal.

That module seems to be just the trick for seeing if I have
a deadlock issue. I set up the SIGUSR1 thing, but whenever I would
send the signal to the server, it would stop, saying something about
system call interrupting the 'select', so I just made it output the contents
of dump_threads() to a file once a minute. If it happens again I'll look
at the tail end of the file & see what they're all busy with. 

Thanks, seems like this will likely do the trick,
- Claudiu

_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@...
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: server stops handling requests - nowhere near 100% CPU or Memory used

by Claudiu Saftoiu :: Rate this Message:

| View Threaded | Show Only this Message

I don't think that is_alive would be the cause of this, it looks like
a simple enough view. But if your server uses a thread pool, and all
the other threads are now occupied by something that got locked up,
then it could be that the server is not answering your is_alive
request at all because it is waiting for a thread to free up first.

Ah, so it seems that, as I leave the server running longer & longer,
more & more threads are taken up with a `.recv()` call. I think one of
my clients opens requests and does not read them/close them. Eventually
all the threads are blocking in that fashion. 

I will fix my clients. But, is there a server-side fix to this (again, using Repoze.BFG)?
Something to time out the connection after 60 seconds or so if nothing has happened?

Thanks,
- Claudiu

_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@...
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: server stops handling requests - nowhere near 100% CPU or Memory used

by Claudiu Saftoiu :: Rate this Message:

| View Threaded | Show Only this Message

Ah, so it seems that, as I leave the server running longer & longer,
more & more threads are taken up with a `.recv()` call. I think one of
my clients opens requests and does not read them/close them. Eventually
all the threads are blocking in that fashion. 

I will fix my clients. But, is there a server-side fix to this (again, using Repoze.BFG)?
Something to time out the connection after 60 seconds or so if nothing has happened?

Additional info: in particular the blocked threads' stack dumps look like this:

Thread 140605868680960:
  File "/usr/lib/python2.6/threading.py", line 504, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py", line 878, in worker_thread_callback
    runnable()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py", line 1052, in <lambda>
    lambda: self.process_request_in_thread(request, client_address))
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py", line 1068, in process_request_in_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python2.6/SocketServer.py", line 322, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.6/SocketServer.py", line 617, in __init__
    self.handle()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py", line 442, in handle
    BaseHTTPRequestHandler.handle(self)
  File "/usr/lib/python2.6/BaseHTTPServer.py", line 329, in handle
    self.handle_one_request()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py", line 437, in handle_one_request
    self.wsgi_execute()
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py", line 287, in wsgi_execute
    self.wsgi_start_response)
  File "/home/tsa/env/lib/python2.6/site-packages/repoze/zodbconn/connector.py", line 21, in __call__
    result = self.next_app(environ, start_response)
  File "/home/tsa/env/lib/python2.6/site-packages/repoze/zodbconn/cachecleanup.py", line 25, in __call__
    return self.next_app(environ, start_response)
  File "/home/tsa/env/lib/python2.6/site-packages/repoze/retry/__init__.py", line 65, in __call__
    chunk = original_wsgi_input.read(rest)
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py", line 474, in read
    data = self.file.read(length)
  File "/usr/lib/python2.6/socket.py", line 377, in read
    data = self._sock.recv(left)

Was my assessment of the situation accurate?

_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@...
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: server stops handling requests - nowhere near 100% CPU or Memory used

by Alexandre Garel-2 :: Rate this Message:

| View Threaded | Show Only this Message

Le 20/04/2012 16:06, Claudiu Saftoiu a écrit :
Ah, so it seems that, as I leave the server running longer & longer,
more & more threads are taken up with a `.recv()` call. I think one of
my clients opens requests and does not read them/close them. Eventually
all the threads are blocking in that fashion. 

I will fix my clients. But, is there a server-side fix to this (again, using Repoze.BFG)?
Something to time out the connection after 60 seconds or so if nothing has happened?

Additional info: in particular the blocked threads' stack dumps look like this:

Thread 140605868680960:
  ---------- [snip]----------
  File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py", line 474, in read
    data = self.file.read(length)
  File "/usr/lib/python2.6/socket.py", line 377, in read
    data = self._sock.recv(left)

Was my assessment of the situation accurate?

I would say so, and it seems that what makes the thread hang on is that the socket is not closed by your clients. (otherwise recv would exit with an exception).

Alex
_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@...
https://mail.zope.org/mailman/listinfo/zodb-dev

Re: server stops handling requests - nowhere near 100% CPU or Memory used

by Marius Gedminas-4 :: Rate this Message:

| View Threaded | Show Only this Message

On Fri, Apr 20, 2012 at 10:06:40AM -0400, Claudiu Saftoiu wrote:
> >
> > Ah, so it seems that, as I leave the server running longer & longer,
> > more & more threads are taken up with a `.recv()` call. I think one of
> > my clients opens requests and does not read them/close them.

Does not write, rather.

> > Eventually
> > all the threads are blocking in that fashion.
> >
> > I will fix my clients. But, is there a server-side fix to this (again,
> > using Repoze.BFG)?
> >
>
> Additional info: in particular the blocked threads' stack dumps look like
> this:
> Thread 140605868680960:
>   File "/usr/lib/python2.6/threading.py", line 504, in __bootstrap
>     self.__bootstrap_inner()
>   File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
>     self.run()
>   File "/usr/lib/python2.6/threading.py", line 484, in run
>     self.__target(*self.__args, **self.__kwargs)
>   File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
> line 878, in worker_thread_callback
>     runnable()
>   File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
> line 1052, in <lambda>
>     lambda: self.process_request_in_thread(request, client_address))
>   File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
> line 1068, in process_request_in_thread
>     self.finish_request(request, client_address)
>   File "/usr/lib/python2.6/SocketServer.py", line 322, in finish_request
>     self.RequestHandlerClass(request, client_address, self)
>   File "/usr/lib/python2.6/SocketServer.py", line 617, in __init__
>     self.handle()
>   File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
> line 442, in handle
>     BaseHTTPRequestHandler.handle(self)
>   File "/usr/lib/python2.6/BaseHTTPServer.py", line 329, in handle
>     self.handle_one_request()
>   File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
> line 437, in handle_one_request
>     self.wsgi_execute()
>   File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
> line 287, in wsgi_execute
>     self.wsgi_start_response)
>   File
> "/home/tsa/env/lib/python2.6/site-packages/repoze/zodbconn/connector.py",
> line 21, in __call__
>     result = self.next_app(environ, start_response)
>   File
> "/home/tsa/env/lib/python2.6/site-packages/repoze/zodbconn/cachecleanup.py",
> line 25, in __call__
>     return self.next_app(environ, start_response)
>   File
> "/home/tsa/env/lib/python2.6/site-packages/repoze/retry/__init__.py", line
> 65, in __call__
>     chunk = original_wsgi_input.read(rest)
>   File "/home/tsa/env/lib/python2.6/site-packages/paste/httpserver.py",
> line 474, in read
>     data = self.file.read(length)
>   File "/usr/lib/python2.6/socket.py", line 377, in read
>     data = self._sock.recv(left)
>
> Was my assessment of the situation accurate?
It appears so.  Here's the code of repoze.retry:
https://github.com/repoze/repoze.retry/blob/master/repoze/retry/__init__.py

Line numbers differ somewhat; the original_wsgi_input.read(rest) is on
line 71 on that github page right now.  It does look like this is the
bit that reads the HTTP request from the client.

> > Something to time out the connection after 60 seconds or so if nothing has
> > happened?

You're using paste.httpserver.  It has support for killing hung threads:
http://pythonpaste.org/modules/httpserver.html#paste.httpserver.server_runner
has a list of options.

You could also put nginx in front, to take care of ill-behaved HTTP
clients.

Marius Gedminas
--
Never be afraid to tell the world who you are.
                -- Anonymous


_______________________________________________
For more information about ZODB, see http://zodb.org/

ZODB-Dev mailing list  -  ZODB-Dev@...
https://mail.zope.org/mailman/listinfo/zodb-dev

signature.asc (205 bytes) Download Attachment