[ tcl-Bugs-2869384 ] Tcl_AsyncMark does not work with --enable-threads

View: New views
1 Messages — Rating Filter:   Alert me  

[ tcl-Bugs-2869384 ] Tcl_AsyncMark does not work with --enable-threads

by SourceForge.net :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bugs item #2869384, was opened at 2009-09-29 05:33
Message generated for change (Comment added) made by flatworm
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2869384&group_id=10894

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: 04. Async Events
Group: development: 8.6b1.1
Status: Open
Resolution: None
Priority: 6
Private: No
Submitted By: Yevgen Ryazanov (eugene_cdn)
Assigned to: Joe Mistachkin (mistachkin)
Summary: Tcl_AsyncMark does not work with --enable-threads

Initial Comment:
OS: sparc Solaris 10
Tcl_AsynMark cannot be called in a signal handler because it locks the same mutex (notifierMutex) as Tcl_WaitForEvent. If a signal is handled in main thread when it serves Tcl events, then deadlock is very probable. Attached example will produce a stack like this
     -----------------  lwp# 1 / thread# 1  --------------------
     feb40408 lwp_park (0, 0, 0)
     ff0385fc Tcl_MutexLock (ff08247c, 0, 0, 1000, 0, 0) + b4
     ff039624 Tcl_AlertNotifier (2bda8, 0, 0, 1000, 0, 0) + 34
     fefd83a8 Tcl_ThreadAlert (1, 0, 0, 0, 0, 0) + a8
     feedf7e0 Tcl_AsyncMark (1a80d8, 0, 0, 0, 0, 0) + 60
     fee306a8 GotAlarm (e, 0, ffbfeb10, 1, 21c30, 21c2c) + 30
     feb40494 __sighndlr (e, 0, ffbfeb10, fee30678, 0, 1) + c
     feb3558c call_user_handler (e, 0, 4, 0, ff0c2000, ffbfeb10) + 3b8
     feb4187c _write   (4, ff05ce88, 1, 0, 0, 2c0b5) + c
     ff03a2f0 Tcl_WaitForEvent (224e8, fffffffd, ff1966e4, ff3ee0f8, ff3f06e0, 0) + 3c0
     fefd7f74 Tcl_DoOneEvent (0, 6e7f8, 0, 0, 1, 0) + 1ec
     ff17c738 Tk_MainLoop (2c1a8, 31188, 0, 1, 1, 5400) + 48
     ff1966e4 Tk_MainEx (2, ffbff2ac, 10e90, 2c1a8, 4, 4) + b1c
     00010e60 main     (2, ffbff2ac, ffbff2b8, 21000, ff0c0100, ff0c0140) + 50
     000109e0 _start   (0, 0, 0, 0, 0, 0) + 108
     -----------------  lwp# 2 / thread# 2  --------------------
     feb40408 lwp_park (0, 0, 0)
     ff0385fc Tcl_MutexLock (ff08247c, fea7bf1c, fea7be9c, fea7be1c, 0, fea7bea5) + b4
     ff03a8e4 NotifierThreadProc (0, fea7c000, 0, 0, ff03a4a8, 1) + 43c
     feb40368 _lwp_start (0, 0, 0, 0, 0, 0)
in a few seconds. To reproduce the problem, use regular wish built with --enable-threads and use attached script as input:
    wish run.tcl
It will hang shortly. To speed up hanging, one can decrease timeouts. Tk must serve some events to make the application hanging faster, so the script changes text widget in loop. Build script is attached (need to modify paths).

----------------------------------------------------------------------

Comment By: Konstantin Khomoutov (flatworm)
Date: 2009-11-04 15:12

Message:
Note that in order to compile the CVS HEAD with the proposed patch applied
under Linux 2.6.26 I had to tweak the Makefile.

Namely, to allow compiling these recursive mutexes code, _XOPEN_SOURCE
must be defined to be 500 or higher, and as this POSIX compatibility layer
makes strncasecmp() inaccessible, _BSD_SOURCE must also be defined to
return it back again.

So, a temporary solution is to make the usual "./configure" step, then
open the generated Makefile, locate the "AC_FLAGS" variable and add this
line below:
AC_FLAGS += -D_XOPEN_SOURCE=500 -D_BSD_SOURCE

----------------------------------------------------------------------

Comment By: Joe Mistachkin (mistachkin)
Date: 2009-11-02 13:21

Message:
It seems that my patch fixes something; however, the original test script
still hangs on Linux 2.6.  Thanks to flatworm for helping me test this.  I
would really appreciate it if the original submitter of the bug would run
both tests in their environment and report the results here.

----------------------------------------------------------------------

Comment By: Joe Mistachkin (mistachkin)
Date: 2009-11-02 09:49

Message:
Slightly modified build script and test case to run in my env.

----------------------------------------------------------------------

Comment By: Joe Mistachkin (mistachkin)
Date: 2009-11-02 09:34

Message:
Please try the attached patch in your environment.

----------------------------------------------------------------------

Comment By: Joe Mistachkin (mistachkin)
Date: 2009-11-02 05:56

Message:
The extensive use of locking in the Unix notifier seems to be the cause of
this issue?  I believe that with careful analysis of the locks in the Unix
notifier they can be minimized.  Of special concern is waiting for (an|any)
event while holding any locks because we have no way of knowing precisely
when an event will be triggered.

----------------------------------------------------------------------

Comment By: Yevgen Ryazanov (eugene_cdn)
Date: 2009-10-28 23:47

Message:
Any update? It is not a minor issue. One of the oldest group of functions
does not work in the situation it was primarily designed for. And there is
no workaround.

----------------------------------------------------------------------

Comment By: Yevgen Ryazanov (eugene_cdn)
Date: 2009-09-29 16:46

Message:
I don't think that recursive mutex will help. Cannot modify global data
safely in a signal handler without be sure that you are alone.
Agree that the problem cannot be solved without help of another thread
that does not receive signals.
Please also note that there may be several signals come before "end of
special handling".
Draft idea:
Need a thread that blocks all signals (except, maybe one, usr123) or does
not use notifierMutex. Let's call it right_thread.
AsyncMark:
- if (trylock(¬ifierMutex) == 0)
-    // normal code
- lock (&another_mutex)
- queue async_tls_data
- unlock (&another_mutex)
- wake up right_thread (using pthread_kill(right_thread,usr123) or
pthread_cond_signal)
right_thread:
- lock (&another_mutex)
- take asyn_tls_data from queue
- save async_tls_data to right interp
- unlock (&another_mutex)

There is a non-blocking technique to modify global data. May also be used
in any solution.

----------------------------------------------------------------------

Comment By: Konstantin Khomoutov (flatworm)
Date: 2009-09-29 16:19

Message:
It seems this function is actually named pthread_mutex_trylock().

The idea is interesting, but I don't quite get how to implement it:
In the defautl mode of operation pthread_mutex_trylock() is defined to
return immediately with EBUSY if the mutex lock is held, and Tcl core
doesn't expect this behaviour from the mutex API it uses (and exposes).
If we instead make all mutexes "recursive" (in terms of pthreads), this
will fix deadlocks occuring in the same thread because of signal handling
but will break expectations in all other cases when a mutex locking
function is supposed to wait on a mutex already locked by some other
thread.

It need further thinking (and taking into account threading subsystem on
Windows) as Tcl mutex API hides platform details from the programmer.

----------------------------------------------------------------------

Comment By: Yevgen Ryazanov (eugene_cdn)
Date: 2009-09-29 15:35

Message:
pthread_trylock may help (in addition with global flag or something). I
don't have clear idea though.

----------------------------------------------------------------------

Comment By: Konstantin Khomoutov (flatworm)
Date: 2009-09-29 15:27

Message:
The problem with this approach is that it implies we control all the
threads existing when our setup code is beging run and postulate the policy
than there shall never be a thread created which will have singals
unmasked, except the signal-processing thread.

I reckon this is impossible to achieve from an extension implementing
signal handling, and can only possibly be done if integrated directly in
the core, which is questionnable, as the concept of signals only exists on
Unix.

----------------------------------------------------------------------

Comment By: Alexandre Ferrieux (ferrieux)
Date: 2009-09-29 12:54

Message:
I seem to recall (see that 12-year old paper
http://www.linuxjournal.com/article/2121 ) that the recommended way of
doing pthread+signals is to block all signals in all threads except one,
thus effectively guiding signals to one specific thread that you control.
Is this still working in modern pthreads ?

----------------------------------------------------------------------

Comment By: Konstantin Khomoutov (flatworm)
Date: 2009-09-29 12:17

Message:
I've hit this while implementing an extension to handle POSIX signals (it's
unfinished yet so not released).

The problem appears to be more complex and I doubt it can be fixed at all
for all possible cases of a deadlock,
simply because pthread's mutex is allowed to deadlock when locked twice
from the same thread, and that's
what it does on Linux 2.6 at least.

The problem is that the signal handler bumps into the thread's execution
stack at unpredictable moments,
and hence the signal handling can be considered as a "superthread" which
don't play by the usual rules for threads.

Conceptually, to make Tcl_AsyncMark() (and any other function which tries
to lock the target thread's LTS) not deadlock,
we have to make locking function only attempt locking if some flag is not
raised (meaning that no lock
is held). But using a flag means sharing a mutable state, which inherently
implies using another mutex, so we're back at the starting point.
Masking all signals before locking the LTS mutex is also a bad idea: a)
too many syscalls b) this doesn't help to prevent mutual deadlocks between
two threads.

So, in my package mentioned above I finally implemented a solution
involving a special manager thread with which signal handlers interact and
which dispatches events to target threads. The solution is complicated but
works. As I intend to release the code under a Tcl-like lisence, I will be
happy to share it, if you so wish.

----------------------------------------------------------------------

Comment By: Yevgen Ryazanov (eugene_cdn)
Date: 2009-09-29 05:54

Message:
Tried on Linux, Red Hat 4 update 4. The same problem.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2869384&group_id=10894

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Tcl-Bugs mailing list
Tcl-Bugs@...
https://lists.sourceforge.net/lists/listinfo/tcl-bugs