|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 - 4 | Next > |
|
|
|
|
|
|
|
|
the sorry saga of the talloc soname 'fix'Hi Jeremy and Simo,
> Thanks a *lot* for this one Simo, much appreciated. I'm finding all this congratulation rather disturbing. The patch from Simo now creates the very problem you are all so keen to avoid. With the soname bump we had lots of standard mechanisms in place (both packaging and loader) to try to stop having two versions of the library in place at the same time. It would be detected at package install time, and also at runtime by Metze's patch. Now let's look at what happens with this much applauded patch from Simo. 1) the major so versions of the libraries are now the same, so you're telling the packaging managers and the loader that they are compatible. So distros and users will quite happily mix the versions now. 2) the code is not in fact now compatible, as the patch from Metze is still in there, so if anyone actually tries to load both of them, then it will abort(), with no feedback to the user on what is actually wrong, and no mechanism for them to fix it except to manually recompile one or both of the offending packages, if they can even work out what packages need changing. As for avoiding this with symbol versioning, that's all well and good on platforms that have symbol versioning, but many of the platforms that we claim we support don't have that, so now we've left them out in the cold. The so major number change is a crude instrument for handling changes in versions of libraries, but it is portable to all shared library systems I know of, and it works. I really don't understand why there is this revulsion at using the library version number for exactly what it was designed for. Cheers, Tridge |
|
|
Re: the sorry saga of the talloc soname 'fix'Hi again,
I forgot to mention one more problem with what Simo has now setup, just in case you are now tempted to remove Metze's magic number check as a way to 'fix' the problem I pointed out in my last email. The fix for the bug that started this whole discussion also involved a change to the internal struct talloc_reference_handle. By changing the .so number this change was safe, as the linker and package manager (plus Metze's paranoia patch!) will ensure they aren't mixed at runtime. But now that Simo has reverted that change and thus explicitly allowed the old code and the new code to reside in the same running process, we will be mixing the old structure with the new one. This means we at minimum will get valgrind errors where we reference memory beyond the end of the allocated structure. I wonder if it is even an exploitable security hole? So yes, congratulations all round are in order. You've just overridden the talloc package maintainer by introducing real bugs, made us non-portable, introduced silent and difficult to track failures of applications, broken the ABI promises, lied to the distro package managers and loader and generally had a great time. But at least we haven't brought the good name of free software into disrepute by using up a precious .so number, so it was all worth it. Cheers, Tridge |
|
|
Re: the sorry saga of the talloc soname 'fix'On Sat, 2009-07-04 at 11:24 +1000, tridge@... wrote:
> Hi Jeremy and Simo, > > > Thanks a *lot* for this one Simo, much appreciated. > > I'm finding all this congratulation rather disturbing. The patch from > Simo now creates the very problem you are all so keen to avoid. > > With the soname bump we had lots of standard mechanisms in place (both > packaging and loader) to try to stop having two versions of the > library in place at the same time. It would be detected at package > install time, and also at runtime by Metze's patch. Tridge, let's start with a very simple assumption I think we can all agree on. If, in the same process, you end up with 2 different versions of the same library you are pretty much doomed no matter what. (Sure you can be lucky sometimes but in general you are just waiting on a ticking bomb). So, based on this very basic assumption, we can proceed in understanding what happens with or without the soname bump. > Now let's look at what happens with this much applauded patch from > Simo. > > 1) the major so versions of the libraries are now the same, so you're > telling the packaging managers and the loader that they are > compatible. So distros and users will quite happily mix the versions > now. What we are telling packagers is that if you have an application currently using libtalloc.so.1.2.0 and you upgrade to libtalloc.so.1.4.0 then all is fine. And indeed it is. You are not using 2 libraries, you are just using a newer one that provides the same interfaces as the old one, in short it is ABI compatible. It is generally understood that it is ok to use a newer version with an older program. It is also generally understood that if App X is linked against lib Y v.1.5 it is not really a good idea to try to use it with lib Y v.1.1 This is reflected in package dependencies. You will see that samba version 3.2.x which uses talloc 1.2.0 has a dependency on the talloc package with version >= 1.2.0 And if you try to use that samba version with libtalloc 1.4.0 you will see it works. Likewise the next samba package that uses 1.4.0 will have a dependency on libtalloc >= 1.4.0 so, at install time, the user will be told to upgrade libtalloc as well. > 2) the code is not in fact now compatible, as the patch from Metze is > still in there, so if anyone actually tries to load both of them, > then it will abort(), with no feedback to the user on what is > actually wrong, and no mechanism for them to fix it except to > manually recompile one or both of the offending packages, if they can > even work out what packages need changing. No, this can't happen, for the simple reason that you have only one version of the library with the same soname in any given process, for one soname version the dynamic loader will always find the same library. Whatever is internal to the library is not exposed to the rest of the application so there is no code compatibility issue. Of course, if the magic number were exposed to the application (for example in the header files), and the app relied on that number outside of the library itself, than the change would be basically a violation of the ABI. The linker looks at the major soname version by searching for libtalloc.so.1 That's why we normally have a symbolic link like: libtalloc.so.1 -> libtalloc.so.1.2.0 (although we could install the library directly as libtalloc.so.1 without any problem and indeed it is so in some versions of Fedora IIRC). So in /usr/lib you can point only at one version with soname .1 and that means you can't have 2 versions with the same soname loaded at the same time, as the dynamic loader will pick up only one. So the only real consequence of a program linked against libtalloc.so.1.4.0 that finds libtalloc.so.1.2.0 is that it will fail to start because the loader will not find some of the new symbols that are present in 1.4.0 but not in 1.2.0. This will immediately tell the user that they are using the wrong library version. > As for avoiding this with symbol versioning, that's all well and good > on platforms that have symbol versioning, but many of the platforms > that we claim we support don't have that, so now we've left them out > in the cold. Well, we are not using symbol versioning at the moment anyway. We could if we wanted to commit to an even stronger ABI promise, and we would not let platforms without it any more in the cold than they are right now. Platforms with poor linkers/loaders and no packaging system already know the dangers of dynamic libaries, that's why most people tend to build static binaries on them. I suggest people keep building on those platform with all the samba libraries statically builtin, this will avoid them any versioning problem whatsoever, and they can be as careless as they want towards libraries then. > The so major number change is a crude instrument for handling changes > in versions of libraries, but it is portable to all shared library > systems I know of, and it works. I really don't understand why there > is this revulsion at using the library version number for exactly what > it was designed for. On the contrary if you have libtalloc.so.1 and libtalloc.so.2 the loader could theoretically load both and it will just use whatever symbol is found because we don't do symbol versioning, and the symbols in the 2 libraries have the same names. Because the 2 libraries use the same symbol names what can happen is that the loader will load most simbols from, say, libtalloc.so.1.2.0 and the "missing ones" from libtalloc.so.2.0.0 (it really depends on what order they are loaded). This would lead to the abort() metze introduced. And he introduced it exactly because of this problem, that happens only when you change the soname. So, in short, what you describe in 1 and 2 is what happen if you bump the soname not if you keep it. Of course if you need to introduce a change that has to break the ABI there is no alternative, and the packager will have to be careful not to include libraries that link against 2 versions of the library. Now, about your other concerns: > I forgot to mention one more problem with what Simo has now setup, > just in case you are now tempted to remove Metze's magic number check > as a way to 'fix' the problem I pointed out in my last email. > > The fix for the bug that started this whole discussion also involved a > change to the internal struct talloc_reference_handle. By changing the > .so number this change was safe, as the linker and package manager > (plus Metze's paranoia patch!) will ensure they aren't mixed at > runtime. Internal symbols are not exported (see also talloc.exports) so this is not going to happen. > But now that Simo has reverted that change and thus explicitly allowed > the old code and the new code to reside in the same running process, > we will be mixing the old structure with the new one. As explained before this can't happen. > This means we at > minimum will get valgrind errors where we reference memory beyond the > end of the allocated structure. I wonder if it is even an exploitable > security hole? It's not, as it can't happen. > So yes, congratulations all round are in order. You've just overridden > the talloc package maintainer by introducing real bugs, made us > non-portable, introduced silent and difficult to track failures of > applications, broken the ABI promises, lied to the distro package > managers and loader and generally had a great time. But at least we > haven't brought the good name of free software into disrepute by > using up a precious .so number, so it was all worth it. I did none of the things you mention here, and certainly I did not have a great time, by *FAR*. Simo. Note: it is a public holiday here, I won't probably be reachable until Monday (US EST time), so the discussion will have to resume next week if there are still doubts or issues. -- Simo Sorce Samba Team GPL Compliance Officer <simo@...> Principal Software Engineer at Red Hat, Inc. <simo@...> -- Simo Sorce Samba Team GPL Compliance Officer <simo@...> Principal Software Engineer at Red Hat, Inc. <simo@...> |
|
|
Re: the sorry saga of the talloc soname 'fix'On Sat, Jul 04, 2009 at 11:48:36AM +1000, tridge@... wrote:
> So yes, congratulations all round are in order. You've just overridden > the talloc package maintainer by introducing real bugs, made us > non-portable, introduced silent and difficult to track failures of > applications, broken the ABI promises, lied to the distro package > managers and loader and generally had a great time. But at least we > haven't brought the good name of free software into disrepute by > using up a precious .so number, so it was all worth it. Tridge, I don't think this has overridden you as the talloc library maintainer - this thing is your baby and you make the technical decisions on how it works. If you say the right technical bugfix is to require us to break the ABI, we break the ABI and deal with it, no question. But you must admit this fix didn't *require* us to break the ABI. I'm sorry you feel overridden on this, I don't think that was the intention of anyone. Simo is representing Red Hat as the packaging maintainer of the libraries they ship in a distro. If he says doing it this way to maintain the ABI will cause less problems for them and for users of the library, then I would believe him and back him on this. It's been a long time since I delt with the complexities of the details of a runtime linker, so I'm not going to weigh into any technical argument on this, as I'm sure I'll just get it wrong :-). But if Simo says this will cause less problems for a distro, then I'd just trust him. It is after all his job to manage this on behalf of Red Hat, and I think he does a really good job at doing so. That's what my congratulations were for. As Rodney King said, "Can't we all just get along ?" :-). Can't do much more discussion of this today, as it's the 4th of July and I have a barbecue to prepare :-). Happy 4th, everyone (even for non-Americans :-). Jeremy. |
|
|
Re: the sorry saga of the talloc soname 'fix'Hi Jeremy,
> Tridge, I don't think this has overridden you as > the talloc library maintainer I explicity said I didn't want this approach taken. Simo put the patch it anyway and then you and others cheered. When the maintainer doesn't want a particular change then you have 3 choices: 1) convince them to change their mind 2) fork 3) stage a coup and overthrow the maintainer It is _not_ an option just to push the patch anyway and ignore the maintainers wishes. If this had been an uncontentious bug fix and I had not explicitly stated that I didn't want this approach taken then it would have been fine. As it is, you and Simo chose to override the maintainer. That should not happen. > - this thing is your baby and you make the technical decisions on > how it works. If you say the right technical bugfix is to require > us to break the ABI, we break the ABI and deal with it, no > question. yes, that or you convince me I am wrong. > Simo is representing Red Hat as the packaging maintainer > of the libraries they ship in a distro. If he says doing it > this way to maintain the ABI will cause less problems for them > and for users of the library, then I would believe him and > back him on this. As you say, he is the RedHat maintainer. That means he gets to say what is in the RedHat package. That doesn't give him the right to force upstream to follow his approach. He would certainly have been well within his rights to have a patch as part of the RedHat package that changed how talloc behaves if he failed to convince me to make a change and he thought he had no choice. > As Rodney King said, "Can't we all just get along ?" :-). we can get along with talloc embedded in the Samba tree if we follow the norms of developer behaviour within the team. Metze did absolutely the right thing with his patch for this. He argued for the change, and offered me a patch in his own tree that he suggested I put in. Simo suggested a change, I said I didn't want that change and he put it in anyway. What then really pissed me off was all the cheering for this flagrant disregard for the normal flow of free software development. I want to keep the libraries I maintain mastered within the Samba git tree as I think that is of great benefit to Samba. We won't be able to do that if other team members push changes into those libraries when I have asked them not to. Cheers, Tridge |
|
|
Re: the sorry saga of the talloc soname 'fix'On Sat, 2009-07-04 at 13:24 -0700, Jeremy Allison wrote:
> On Sat, Jul 04, 2009 at 11:48:36AM +1000, tridge@... wrote: > > > So yes, congratulations all round are in order. You've just overridden > > the talloc package maintainer by introducing real bugs, made us > > non-portable, introduced silent and difficult to track failures of > > applications, broken the ABI promises, lied to the distro package > > managers and loader and generally had a great time. But at least we > > haven't brought the good name of free software into disrepute by > > using up a precious .so number, so it was all worth it. > > Tridge, I don't think this has overridden you as > the talloc library maintainer - this thing is your > baby and you make the technical decisions on how it works. > If you say the right technical bugfix is to require us to > break the ABI, we break the ABI and deal with it, no question. > But you must admit this fix didn't *require* us to break the > ABI. I'm sorry you feel overridden on this, I don't think > that was the intention of anyone. such an imperative that this had to be overridden without a discussion and agreement on the list? Can I expect things that I maintain and care passionately about (but my turn out to be wrong about) be overridden just because it feels better to do so than hold a discussion? Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org Samba Developer, Cisco Inc. |
|
|
Re: the sorry saga of the talloc soname 'fix'On Sat, 2009-07-04 at 11:24 +1000, tridge@... wrote:
> Hi Jeremy and Simo, > > > Thanks a *lot* for this one Simo, much appreciated. > > I'm finding all this congratulation rather disturbing. The patch from > Simo now creates the very problem you are all so keen to avoid. > > With the soname bump we had lots of standard mechanisms in place (both > packaging and loader) to try to stop having two versions of the > library in place at the same time. It would be detected at package > install time, and also at runtime by Metze's patch. > As for avoiding this with symbol versioning, that's all well and good > on platforms that have symbol versioning, but many of the platforms > that we claim we support don't have that, so now we've left them out > in the cold. Why are we even trying to provide and use 'Samba' shared libraries on systems without symbol versions? (And yes, I'm aware that systems without GNU ld can't use them, but given that Samba libraries such as talloc are meant to be a 'value add' - not the fundamental purpose of the project, we should be able to set the rules about where we support them to the places where we can do so with sanity). Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org Samba Developer, Cisco Inc. |
|
|
Re: the sorry saga of the talloc soname 'fix'On Mon, 2009-07-06 at 09:00 +1000, Andrew Bartlett wrote:
> On Sat, 2009-07-04 at 13:24 -0700, Jeremy Allison wrote: > > On Sat, Jul 04, 2009 at 11:48:36AM +1000, tridge@... wrote: > > > > > So yes, congratulations all round are in order. You've just overridden > > > the talloc package maintainer by introducing real bugs, made us > > > non-portable, introduced silent and difficult to track failures of > > > applications, broken the ABI promises, lied to the distro package > > > managers and loader and generally had a great time. But at least we > > > haven't brought the good name of free software into disrepute by > > > using up a precious .so number, so it was all worth it. > > > > Tridge, I don't think this has overridden you as > > the talloc library maintainer - this thing is your > > baby and you make the technical decisions on how it works. > > If you say the right technical bugfix is to require us to > > break the ABI, we break the ABI and deal with it, no question. > > But you must admit this fix didn't *require* us to break the > > ABI. I'm sorry you feel overridden on this, I don't think > > that was the intention of anyone. > > The problem I have here is: In our development tree, why was time of > such an imperative that this had to be overridden without a discussion > and agreement on the list? Just for the record. I pushed the patch only after I got 3 acks from 3 of the most senior samba team members. If anyone had nacked the patch I would have never in my right mind thought of pushing it. But there was no negative comment. Perhaps I shouldn't have taken silence as an implicit ack, but normally we nack on the list when a proposed patch is not liked. In all cases where there are only acks we always assume all silent members are not objecting. Simo. -- Simo Sorce Samba Team GPL Compliance Officer <simo@...> Principal Software Engineer at Red Hat, Inc. <simo@...> -- Simo Sorce Samba Team GPL Compliance Officer <simo@...> Principal Software Engineer at Red Hat, Inc. <simo@...> |
|
|
|
|
|
Re: the sorry saga of the talloc soname 'fix'tridge@... wrote:
> Hi Simo, > > > I pushed the patch only after I got 3 acks from 3 of the most senior > > samba team members. If anyone had nacked the patch I would have never in > > my right mind thought of pushing it. But there was no negative comment. > > huh? I objected very strongly, and you were explicitly reverting a > change I made where I chose to change the .so number to resolve the > ABI issue. > > There is no way you could have considered my comments as a "silent > ack" > > Cheers, Tridge The discussion was open for all to see. Clearly, there was no consensus on this important matter when a precipitous move was made. Over the years the Samba Team developed due process, and for the most followed it with harmonious practice. We must learn from this incident, or else face increasing tension from within. I do not wish to speculate where that may lead the team, or what will be left if we do not strive to restore relations within. I prefer to see what happened, not as badly dealt with conflict, but rather as an opportunity to regroup aroung our mutual goals and objectives as a team. Can we do that? - John T. |
|
|
Re: the sorry saga of the talloc soname 'fix'On Sun, 2009-07-05 at 20:56 -0400, simo wrote:
> Perhaps I shouldn't have taken silence as an implicit ack, but normally > we nack on the list when a proposed patch is not liked. In all cases > where there are only acks we always assume all silent members are not > objecting. I think, particularly for reverts, that this is a very dangerous assumption. Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org Samba Developer, Cisco Inc. |
|
|
Re: the sorry saga of the talloc soname 'fix'Andrew Bartlett schrieb:
> On Sat, 2009-07-04 at 11:24 +1000, tridge@... wrote: >> Hi Jeremy and Simo, >> >> > Thanks a *lot* for this one Simo, much appreciated. >> >> I'm finding all this congratulation rather disturbing. The patch from >> Simo now creates the very problem you are all so keen to avoid. >> >> With the soname bump we had lots of standard mechanisms in place (both >> packaging and loader) to try to stop having two versions of the >> library in place at the same time. It would be detected at package >> install time, and also at runtime by Metze's patch. > >> As for avoiding this with symbol versioning, that's all well and good >> on platforms that have symbol versioning, but many of the platforms >> that we claim we support don't have that, so now we've left them out >> in the cold. > > Why are we even trying to provide and use 'Samba' shared libraries on > systems without symbol versions? > > (And yes, I'm aware that systems without GNU ld can't use them, but > given that Samba libraries such as talloc are meant to be a 'value add' > - not the fundamental purpose of the project, we should be able to set > the rules about where we support them to the places where we can do so > with sanity). have and use symbol versioning. (But that should be a different discussion). But as talloc is a memory mange library, it's a risk to have two versions loaded at the same time, the same as if you would load 2 versions of glibc. There're cases where there's no risk, like in the libwbclient library where the public api doesn't expose talloc pointers being used, as we have a wbcFreeMemory() function. Here the libwbclient internal could use one version of talloc and callers could use a different version. Even with symbol versioning, we would need to make sure we only load one version of talloc, if a library exposes talloc pointers in the public api, libtevent and other's do this. In this case we need to make sure that the library and its caller use the same version of talloc. metze |
|
|
Re: the sorry saga of the talloc soname 'fix'Hi Simo,
I've now spent all day looking at libtalloc and how it interacts with what is currently in Ubuntu Jaunty. I have downloaded a Fedora image but haven't yet installed it to see if Fedora is as badly placed as Ubuntu is. The result of my investigation is that libtalloc is a complete mess. It turns out that with current Ubuntu we cannot completely avoid having both the old talloc and the new talloc in the same process at the same time. However, if we bump the .so number then at least the developers will get a warning about the mess. I've put together some files for you to look at to give you some idea of just how bad this whole mess is. See http://samba.org/tridge/talloc_mess/ The files are: - a libtalloc 1.2.0 (dev and lib) matching what is currently in Ubuntu Jaunty - a libtalloc 1.4.0 (dev and lib) matching what is produced if we followed your suggested course of action - a libtalloc 2.0.0 (dev and lib) matching what is produced if we follow my preferred choice of using a new .so number - source tar balls with debian build rules for all of the above - a sample 'testtalloc' package that demonstrates the problems (deb plus source) The testtalloc package produces two binaries. One is called test_ldb, and it creates a ldb then tries to free it with talloc_free() which is about as simple a ldb program as you can have. The other is called test_mapi which initialises the MAPI subsystem from openchange then uses talloc_report_full() to show the memory that has been used. I chose these two binaries as they demonstrate different types of brokenness in the way that talloc/ldb/mapi/samba/openchange etc have all been packaged. For example: - The libldb-samba4-0 package provides a libldb.so.0 which has a built in static copy of talloc. - the libmapi.so package links to a dynamic libtalloc.so, but also links to libdcerpc.so - libdcerpc.so has a staticly linked talloc built in - etc etc The same type of brokenness is rife through all the various packages that use talloc currently. If we used the approach you are advocating, then all of these packages (ldb, openchange, mapi, samba etc) won't be marked as needing to be rebuilt. Yet they will all abort with no error message when you actually use them, because they will mix the two incompatible ABIs. Try the test_ldb and test_mapi binaries to see the abort. If we use the approach that I prefer, which is to change the .so number to 2, then at least the developers get a nice warning like this: /usr/bin/ld: warning: libtalloc.so.1, needed by /usr/lib/gcc/x86_64-linux-gnu/4.3.3/../../../../lib/libmapi.so, may conflict with libtalloc.so.2 So at least someone gets told that it won't work at build time, which gives some hope that it might get fixed. If we up the .so number to 2 then you can also see the brokenness by looking at the dependencies, because we are explicitly marking the ABI as having changed. It is easy to see the brokenness using ldd, or by using dpkg. If we don't do this then we're saying "the ABI is the same" when it isn't. This is clearly shown by the abort in the test progams above, regardless of whether you install the 1.4.0 libtalloc or the 2.0.0 libtalloc. So even with your attempts to make the ABIs more similar by putting backward compatibility code into talloc.c we get aborts because the internal structures are not compatible (which is nicely caught by Metze's patch). Your attempts to make the ABIs compatible are not enough, and would pollute the code with a lot of cruft that serves no purpose, plus it will remove the warnings that developers that would otherwise get when things are going to go wrong with some of the libraries. So Simo, please look at the above examples, then please revert your commit. Also, in future, please don't revert a maintainers commits without checking with the maintainer. Also, Metze, you were right, your abort() check on version really is needed, and really does happen with real examples. Thanks! To prevent this happening in future we have to stop mixing staticly linked libraries with shared versions of the same libs. That will mean a lot of changes to the way that lots of libs are produced by the Samba project and how they are linked into projects like openchange. I hope I don't have to spend another day like today tracing shared library problems. As I have said several times previous when proposals of Samba shared libs come up, getting shared libs right is really _really_ hard. We have come nowhere near to getting it right yet, and the work required to get it right is quite substantial. I'm not volunteering to do the work. Cheers, Tridge |
|
|
Re: the sorry saga of the talloc soname 'fix'Hi, Tridge!
On Mon, Jul 06, 2009 at 08:21:34PM +1000, tridge@... wrote: > I've now spent all day looking at libtalloc and how it interacts with > what is currently in Ubuntu Jaunty. I have downloaded a Fedora image > but haven't yet installed it to see if Fedora is as badly placed as > Ubuntu is. Just to get it right, I've got a question: The changes to talloc you made are: a) Add talloc_reparent b) Print error messages to stderr if talloc_[steal|free] is used when references still exist. Is that right? Thanks, Volker |
|
|
Re: the sorry saga of the talloc soname 'fix'Hi Volker,
> The changes to talloc you made are: > > a) Add talloc_reparent > > b) Print error messages to stderr if talloc_[steal|free] is > used when references still exist. It isn't just printing error messages. It also refuses to do ambiguous operations. So if you do this: int *p1 = talloc(NULL, int); int *p2 = talloc(NULL, int) int *c1 = talloc(p1, int); int *r1 = talloc_reference(p2, c1); assert(r1 == c1); then at the end, one piece of code might do: talloc_free(r1); and another bit of code might do: talloc_free(c1); the problem is that c1==r1, so talloc doesn't know which one you mean. If you free via one of the parents (p1 or p2) then there is no ambiguity. If you free via a child with references then there is no way to know what the programmer intended. We can't look up the program stack and say "ahh, they are in the module that did the reference, they must mean to reduce the ref count". There is just no sane way to do that in C. At first I just added the warning, but kept the old behaviour, which was to pick one of the two ambiguous choices. In the case of talloc_free() that was to free the "most recent parent" (or close enough, the details of "most recent" are complicated by a steal). In the case of steal it was to "always steal from the 'real' parent". Once Andrew and I started to look at where these warnings cropped up though, it became very clear that they were in fact real bugs. Some were memory leaks. Some just assumed the opposite behaviour of what was actually done. None of them were clearly 'correct'. So the only sane choice was to change the API. So we now refuse the talloc_free(), and print a warning giving both the line of code where the talloc_free() happened, and all the places that did talloc_reference() of the pointer. That makes it easy to find the problem. The usual fix is to then use talloc_unlink() which allows you to specify which of the two possibilities you wanted. The same applies to talloc_steal(). I added talloc_reparent() as it allows you to solve the problem in much the same way that talloc_unlink() solves the problem for talloc_free(). A talloc_reparent() allows you to do a "steal" like operation, but specifying which parent you want to steal from so there is no ambiguity. If you try to do an ambiguous talloc_steal() then it will be refused, and will return NULL. In all likelyhood existing code will crash, as we often do: a->p = talloc_steal(x, y); and then don't check the return. It isn't ideal that code like this will now crash, but at least there will be a clear warning printed first saying what the problem is. We still need a better way than just using stderr for these warnings. The patch adding #ifdef DEVELOPER is not the right fix, as these bugs can occur at runtime. We need something like: talloc_set_log_fn(samba_talloc_log); so the application offers a clear function to use for logging talloc warnings. All of this is both an API change (although I would claim only for code that was already broken!) and an ABI change. Thus we need a new .so number. Cheers, Tridge |
|
|
Re: the sorry saga of the talloc soname 'fix'Hi, Tridge!
On Mon, Jul 06, 2009 at 09:22:55PM +1000, tridge@... wrote: > All of this is both an API change (although I would claim only for > code that was already broken!) and an ABI change. Thus we need a new > .so number. Thanks for the explanation. So if we skipped the "location" piece of the change, the error messages would get a lot less useful, and the only real change would be to create memory leaks in already broken code. Right? Volker |
|
|
Re: the sorry saga of the talloc soname 'fix'Hi Tridge,
tridge@... wrote: > I chose these two binaries as they demonstrate different types of > brokenness in the way that talloc/ldb/mapi/samba/openchange etc have > all been packaged. For example: > > - The libldb-samba4-0 package provides a libldb.so.0 which has a > built in static copy of talloc. > > - the libmapi.so package links to a dynamic libtalloc.so, but also > links to libdcerpc.so > > - libdcerpc.so has a staticly linked talloc built in > > - etc etc > > The same type of brokenness is rife through all the various packages > that use talloc currently. > ago, so I would expect Karmic to ship with properly working packages. More specifically, libldb-samba4-0 is now gone (there's only one ldb package left) and libdcerpc/libtalloc link against a shared version of libtalloc or fail, rather than falling back to a static version of the library like it would previously. Cheers, Jelmer |
|
|
Re: the sorry saga of the talloc soname 'fix'Hi Volker,
> So if we skipped the "location" piece of the change, the > error messages would get a lot less useful, and the only > real change would be to create memory leaks in already > broken code. Right? yep. I did think about doing that, but I really disliked the idea of creating a bunch of very difficult to track down leaks. I know this change is not pain free, but I think some pain is unavoidable. The previous API was broken (which of course was my fault, as I put the ambiguity in there in the first place). Cheers, Tridge |
| < Prev | 1 - 2 - 3 - 4 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |