[ tcl-Bugs-2629338 ] segfault on traced vars in TclCallVarTraces

View: New views
1 Messages — Rating Filter:   Alert me  

[ tcl-Bugs-2629338 ] segfault on traced vars in TclCallVarTraces

by SourceForge.net :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bugs item #2629338, was opened at 2009-02-22 23:40
Message generated for change (Comment added) made by dgp
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2629338&group_id=10894

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: 07. Variables
Group: obsolete: 8.5.6
Status: Open
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Don Porter (dgp)
Summary: segfault on traced vars in TclCallVarTraces

Initial Comment:
I've had a bug preventing me from upgrading my product to 8.5 for some time, but I've never been able to narrow it down to a simple example. So, In lieu of that, here's a patch that fixes the bug.

The issue is commonly triggered by tdom, which makes extensive use of variable traces (well, one per dom object.)

The core issue is essentially that TclUnsetVarStruct calls all the unset traces on the variable, then sets about deleting the traces. The problem is, the traces may have already deleted themselves when they were called.  (which tdom's always do.) In particular, the 'head' of the list may have removed itself.

In that case, TclUnsetVarStruct starts iterating over the items to be freed while looking at already-freed memory. Since it was freed so recently, it gets away with this in the normal case, but on rare occasions, the memory has already been reallocated. Hilarity ensues.

I'm not sure if this counts as the same bug or not, but TclDeleteNamespaceVars neglects to update the nextTracePtr for the activeVarTracePtr on the interp, with similar results. I've never seen that bug trigger anywhere but windows, but it shows up fairly often there in our product (on tcl 8.4.13 as well as 8.5.6.)

I also noticed that the VarTrace entries are freed with 'Tcl_EventuallyFree' without updating their next pointers. Since another call might have the current object preserved, but not the next one, the current object might stick around and continue being used while the next object has already been freed. So the patch includes a change to null out the next pointer when calling eventuallyfree. I'm not sure if there's a user-visible bug associated with that or not. I did a cursory check for similar antipatterns in other files for var traces (and found none) but didn't bother checking other link lists.

I will attach the patch.

----------------------------------------------------------------------

>Comment By: Don Porter (dgp)
Date: 2009-11-05 17:56

Message:

To clarify a bit, the patch itself does not worry me.

What worries me is that the bug is still lurking in
there somewhere; this patch was merely a workaround
and not a true fix, and all we've done is make the
true bug even harder to discover.

----------------------------------------------------------------------

Comment By: Don Porter (dgp)
Date: 2009-11-05 17:52

Message:

I'm very uneasy about this patch, since I still
have no understanding about what bug it is
fixing, nor any test case to confirm that it actually
fixes anything.

The original description says "The core issue is essentially that
TclUnsetVarStruct calls all the unset
traces on the variable, then sets about deleting the traces. The problem
is, the traces may have already deleted themselves when they were called.
(which tdom's always do.) In particular, the 'head' of the list may have
removed itself."

However, UnsetVarStruct moves all trace operations
over to a dummy copy that ought to be out of reach.
tdom does make calls to Tcl_UntraceVar(), true, but
those are no-ops, since they lookup the original
traced variable, which no longer has any traces
to remove since they've all been moved over to
the dummy.

----------------------------------------------------------------------

Comment By: Don Porter (dgp)
Date: 2009-10-19 17:22

Message:
 keeping open until an actual test is in place.

sorry, misread the patch.  one line in tclTrace.c; many in tclVar.c.

----------------------------------------------------------------------

Comment By: Don Porter (dgp)
Date: 2009-10-19 17:19

Message:

Only one line got committed on core-8-5-branch?



----------------------------------------------------------------------

Comment By: Donal K. Fellows (dkf)
Date: 2009-10-17 18:36

Message:
Looking at the patch (as converted by dgp) applied to HEAD (with obvious
offsets) it all looks good to me, so I've applied it (and also to the 8.5
branch). The fact that it's had substantive in-service testing makes me
hopeful...

My only concern is the lack of an effective test for this (and hence
whether the test is minimal) but occasionally that's what one has to put up
with. Leaving this in pending in case anyone thinks of a way to test.
Priority lowered.

----------------------------------------------------------------------

Comment By: Kris Raney (kraney)
Date: 2009-10-01 16:46

Message:
I can add at this point that we've been using TCL with this patch in our
product for nearly a year now, with no recurrences of the original crash,
and with no known regressions caused by the patch.

----------------------------------------------------------------------

Comment By: Don Porter (dgp)
Date: 2009-04-09 17:19

Message:

tclVar.c got a major major rewrite for 8.5.

I won't be too surprised if that introduced
some bugs, but will make only slow progress
without the help of the rewriter.

The surprising claim is that at least later
8.4.* releases suffer the same bug.  I would
have expected those releases to be quite
stable, unless we can point to some destabilizing
commit.  In particular the general problem
that running trace handler routines can change
the set of traces, I believed to have settled on
a robust solution with the activeTracePtr lists.

Anyhow, I'm not giving up on this, but it doesn't
look like a fix will get accepted in time for the
8.5.7 release.

----------------------------------------------------------------------

Comment By: Kris Raney (kraney)
Date: 2009-04-09 15:27

Message:
That's correct, the bug was observed in 8.4.13 as well, though hitting it
seemed mostly (but not entirely) confined to the windows port.

I believe the bug was not present in 8.4.9, but I wouldn't swear to it.

----------------------------------------------------------------------

Comment By: Don Porter (dgp)
Date: 2009-04-09 15:19

Message:

Do I understand this bug is observed
in release Tcl 8.4.13 as well as Tcl 8.5.6 ?

If so, has this bug always been present?

If not, is there any guess as to when it
first showed up?


----------------------------------------------------------------------

Comment By: Don Porter (dgp)
Date: 2009-04-09 14:23

Message:

sadly there is very little that is "self-evident"
about the trace implementation.

----------------------------------------------------------------------

Comment By: Kris Raney (kraney)
Date: 2009-03-20 13:55

Message:
No, as I mentioned in the original post, I've never been able to get a
simple example to trigger the bug. My hypothesis is one's memory usage has
to get interesting before it will trigger. That's why I fixed it myself
before entering a ticket. Hopefully, the necessity of the changes I've made
are self-evident, though naturally I'm sure you'd like to have a test case
to try it with. I wish I could oblige.

The bug triggers from namespace delete, from itcl::delete, and when a tdom
DOM object created using the form "tdom createDocument foo var" goes out of
scope. Not on every call to any of those; just on rare occasions. It occurs
most frequently on windows, when using a tclkit based shell. It occurs with
threads enabled, and without threads as well.

----------------------------------------------------------------------

Comment By: Don Porter (dgp)
Date: 2009-03-20 12:43

Message:

Here's the patch in more customary
format, derived from core-8-5-branch.

Does the original reporter have a short
script to demo the bug?

File Added: 2629338.patch

----------------------------------------------------------------------

Comment By: Donal K. Fellows (dkf)
Date: 2009-02-23 05:45

Message:
Good thing you attached the patch in your initial submission; SF only
allows the submitter (or a project dev) to attach files after submission.
(It's an anti-maliciousness measure.)

----------------------------------------------------------------------

Comment By: Kris Raney (kraney)
Date: 2009-02-22 23:43

Message:
Oops, forgot to log in. That was me.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110894&aid=2629338&group_id=10894

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Tcl-Bugs mailing list
Tcl-Bugs@...
https://lists.sourceforge.net/lists/listinfo/tcl-bugs