Re: [SCM] Samba Shared Repository - branch master updated - release-4-0-0alpha8-205-g7119241

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 - 4 | Next >

Parent Message unknown Re: [SCM] Samba Shared Repository - branch master updated - release-4-0-0alpha8-205-g7119241

by Volker Lendecke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Jul 03, 2009 at 08:26:53AM -0500, Simo Sorce wrote:

> The branch, master has been updated
>        via  7119241c0d12768b31ebdb489aa0bbba6ca21e40 (commit)
>        via  30b2014a01b31d66dd76e0562c5d769dfacf167b (commit)
>        via  2738178d1301f9c1c4144c7472c9419911cd816e (commit)
>       from  b54e48b830dbc3d66f9de5d2711a57a1630809e2 (commit)
>
> http://gitweb.samba.org/?p=samba.git;a=shortlog;h=master
>
>
> - Log -----------------------------------------------------------------
> commit 7119241c0d12768b31ebdb489aa0bbba6ca21e40
> Author: Simo Sorce <idra@...>
> Date:   Fri Jul 3 08:42:23 2009 -0400
>
>     Sort the signature files
>
> commit 30b2014a01b31d66dd76e0562c5d769dfacf167b
> Author: Simo Sorce <idra@...>
> Date:   Thu Jun 18 20:06:00 2009 -0400
>
>     Expose functions need by backend writers
>    
>     move publicly needed structures and functions in the public header.
>     Stop installing internal headers.
>     Update the signature and exports files with the new exposed
>     function.
>
> commit 2738178d1301f9c1c4144c7472c9419911cd816e
> Author: Simo Sorce <idra@...>
> Date:   Thu Jul 2 09:29:20 2009 -0400
>
>     Restore ABI compatibility for talloc.
Thanks a *lot*!

Volker


signature.asc (204 bytes) Download Attachment

Parent Message unknown Re: [SCM] Samba Shared Repository - branch master updated - release-4-0-0alpha8-205-g7119241

by Jeremy Allison :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Jul 03, 2009 at 08:26:53AM -0500, Simo Sorce wrote:
>
> commit 2738178d1301f9c1c4144c7472c9419911cd816e
> Author: Simo Sorce <idra@...>
> Date:   Thu Jul 2 09:29:20 2009 -0400
>
>     Restore ABI compatibility for talloc.

Thanks a *lot* for this one Simo, much appreciated.

Jeremy.

the sorry saga of the talloc soname 'fix'

by tridge@samba.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jeremy and Simo,

 > Thanks a *lot* for this one Simo, much appreciated.

I'm finding all this congratulation rather disturbing. The patch from
Simo now creates the very problem you are all so keen to avoid.

With the soname bump we had lots of standard mechanisms in place (both
packaging and loader) to try to stop having two versions of the
library in place at the same time. It would be detected at package
install time, and also at runtime by Metze's patch.

Now let's look at what happens with this much applauded patch from
Simo.

 1) the major so versions of the libraries are now the same, so you're
 telling the packaging managers and the loader that they are
 compatible. So distros and users will quite happily mix the versions
 now.

 2) the code is not in fact now compatible, as the patch from Metze is
 still in there, so if anyone actually tries to load both of them,
 then it will abort(), with no feedback to the user on what is
 actually wrong, and no mechanism for them to fix it except to
 manually recompile one or both of the offending packages, if they can
 even work out what packages need changing.

As for avoiding this with symbol versioning, that's all well and good
on platforms that have symbol versioning, but many of the platforms
that we claim we support don't have that, so now we've left them out
in the cold.

The so major number change is a crude instrument for handling changes
in versions of libraries, but it is portable to all shared library
systems I know of, and it works. I really don't understand why there
is this revulsion at using the library version number for exactly what
it was designed for.

Cheers, Tridge

Re: the sorry saga of the talloc soname 'fix'

by tridge@samba.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi again,

I forgot to mention one more problem with what Simo has now setup,
just in case you are now tempted to remove Metze's magic number check
as a way to 'fix' the problem I pointed out in my last email.

The fix for the bug that started this whole discussion also involved a
change to the internal struct talloc_reference_handle. By changing the
.so number this change was safe, as the linker and package manager
(plus Metze's paranoia patch!) will ensure they aren't mixed at
runtime.

But now that Simo has reverted that change and thus explicitly allowed
the old code and the new code to reside in the same running process,
we will be mixing the old structure with the new one. This means we at
minimum will get valgrind errors where we reference memory beyond the
end of the allocated structure. I wonder if it is even an exploitable
security hole?

So yes, congratulations all round are in order. You've just overridden
the talloc package maintainer by introducing real bugs, made us
non-portable, introduced silent and difficult to track failures of
applications, broken the ABI promises, lied to the distro package
managers and loader and generally had a great time. But at least we
haven't brought the good name of free software into disrepute by
using up a precious .so number, so it was all worth it.

Cheers, Tridge

Re: the sorry saga of the talloc soname 'fix'

by simo-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, 2009-07-04 at 11:24 +1000, tridge@... wrote:

> Hi Jeremy and Simo,
>
>  > Thanks a *lot* for this one Simo, much appreciated.
>
> I'm finding all this congratulation rather disturbing. The patch from
> Simo now creates the very problem you are all so keen to avoid.
>
> With the soname bump we had lots of standard mechanisms in place (both
> packaging and loader) to try to stop having two versions of the
> library in place at the same time. It would be detected at package
> install time, and also at runtime by Metze's patch.

Tridge,
let's start with a very simple assumption I think we can all agree on.

If, in the same process, you end up with 2 different versions of the
same library you are pretty much doomed no matter what. (Sure you can be
lucky sometimes but in general you are just waiting on a ticking bomb).

So, based on this very basic assumption, we can proceed in understanding
what happens with or without the soname bump.

> Now let's look at what happens with this much applauded patch from
> Simo.
>
>  1) the major so versions of the libraries are now the same, so you're
>  telling the packaging managers and the loader that they are
>  compatible. So distros and users will quite happily mix the versions
>  now.

What we are telling packagers is that if you have an application
currently using libtalloc.so.1.2.0 and you upgrade to libtalloc.so.1.4.0
then all is fine. And indeed it is. You are not using 2 libraries, you
are just using a newer one that provides the same interfaces as the old
one, in short it is ABI compatible.

It is generally understood that it is ok to use a newer version with an
older program. It is also generally understood that if App X is linked
against lib Y v.1.5 it is not really a good idea to try to use it with
lib Y v.1.1
This is reflected in package dependencies. You will see that samba
version 3.2.x which uses talloc 1.2.0 has a dependency on the talloc
package with version >= 1.2.0

And if you try to use that samba version with libtalloc 1.4.0 you will
see it works.

Likewise the next samba package that uses 1.4.0 will have a dependency
on libtalloc >= 1.4.0 so, at install time, the user will be told to
upgrade libtalloc as well.


>  2) the code is not in fact now compatible, as the patch from Metze is
>  still in there, so if anyone actually tries to load both of them,
>  then it will abort(), with no feedback to the user on what is
>  actually wrong, and no mechanism for them to fix it except to
>  manually recompile one or both of the offending packages, if they can
>  even work out what packages need changing.

No, this can't happen, for the simple reason that you have only one
version of the library with the same soname in any given process, for
one soname version the dynamic loader will always find the same library.

Whatever is internal to the library is not exposed to the rest of the
application so there is no code compatibility issue.

Of course, if the magic number were exposed to the application (for
example in the header files), and the app relied on that number outside
of the library itself, than the change would be basically a violation of
the ABI.

The linker looks at the major soname version by searching for
libtalloc.so.1 That's why we normally have a symbolic link like:
libtalloc.so.1 -> libtalloc.so.1.2.0 (although we could install the
library directly as libtalloc.so.1 without any problem and indeed it is
so in some versions of Fedora IIRC).

So in /usr/lib you can point only at one version with soname .1 and that
means you can't have 2 versions with the same soname loaded at the same
time, as the dynamic loader will pick up only one.

So the only real consequence of a program linked against
libtalloc.so.1.4.0 that finds libtalloc.so.1.2.0 is that it will fail to
start because the loader will not find some of the new symbols that are
present in 1.4.0 but not in 1.2.0. This will immediately tell the user
that they are using the wrong library version.

> As for avoiding this with symbol versioning, that's all well and good
> on platforms that have symbol versioning, but many of the platforms
> that we claim we support don't have that, so now we've left them out
> in the cold.

Well, we are not using symbol versioning at the moment anyway. We could
if we wanted to commit to an even stronger ABI promise, and we would not
let platforms without it any more in the cold than they are right now.
Platforms with poor linkers/loaders and no packaging system already know
the dangers of dynamic libaries, that's why most people tend to build
static binaries on them. I suggest people keep building on those
platform with all the samba libraries statically builtin, this will
avoid them any versioning problem whatsoever, and they can be as
careless as they want towards libraries then.

> The so major number change is a crude instrument for handling changes
> in versions of libraries, but it is portable to all shared library
> systems I know of, and it works. I really don't understand why there
> is this revulsion at using the library version number for exactly what
> it was designed for.

On the contrary if you have libtalloc.so.1 and libtalloc.so.2 the loader
could theoretically load both and it will just use whatever symbol is
found because we don't do symbol versioning, and the symbols in the 2
libraries have the same names.

Because the 2 libraries use the same symbol names what can happen is
that the loader will load most simbols from, say, libtalloc.so.1.2.0 and
the "missing ones" from libtalloc.so.2.0.0 (it really depends on what
order they are loaded). This would lead to the abort() metze introduced.
And he introduced it exactly because of this problem, that happens only
when you change the soname.

So, in short, what you describe in 1 and 2 is what happen if you bump
the soname not if you keep it.

Of course if you need to introduce a change that has to break the ABI
there is no alternative, and the packager will have to be careful not to
include libraries that link against 2 versions of the library.


Now, about your other concerns:

> I forgot to mention one more problem with what Simo has now setup,
> just in case you are now tempted to remove Metze's magic number check
> as a way to 'fix' the problem I pointed out in my last email.
>
> The fix for the bug that started this whole discussion also involved a
> change to the internal struct talloc_reference_handle. By changing the
> .so number this change was safe, as the linker and package manager
> (plus Metze's paranoia patch!) will ensure they aren't mixed at
> runtime.

Internal symbols are not exported (see also talloc.exports) so this is
not going to happen.

> But now that Simo has reverted that change and thus explicitly allowed
> the old code and the new code to reside in the same running process,
> we will be mixing the old structure with the new one.

As explained before this can't happen.

> This means we at
> minimum will get valgrind errors where we reference memory beyond the
> end of the allocated structure. I wonder if it is even an exploitable
> security hole?

It's not, as it can't happen.

> So yes, congratulations all round are in order. You've just overridden
> the talloc package maintainer by introducing real bugs, made us
> non-portable, introduced silent and difficult to track failures of
> applications, broken the ABI promises, lied to the distro package
> managers and loader and generally had a great time. But at least we
> haven't brought the good name of free software into disrepute by
> using up a precious .so number, so it was all worth it.

I did none of the things you mention here, and certainly I did not have
a great time, by *FAR*.

Simo.

Note: it is a public holiday here, I won't probably be reachable until
Monday (US EST time), so the discussion will have to resume next week if
there are still doubts or issues.

--
Simo Sorce
Samba Team GPL Compliance Officer <simo@...>
Principal Software Engineer at Red Hat, Inc. <simo@...>
--
Simo Sorce
Samba Team GPL Compliance Officer <simo@...>
Principal Software Engineer at Red Hat, Inc. <simo@...>


Re: the sorry saga of the talloc soname 'fix'

by Jeremy Allison :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Jul 04, 2009 at 11:48:36AM +1000, tridge@... wrote:

> So yes, congratulations all round are in order. You've just overridden
> the talloc package maintainer by introducing real bugs, made us
> non-portable, introduced silent and difficult to track failures of
> applications, broken the ABI promises, lied to the distro package
> managers and loader and generally had a great time. But at least we
> haven't brought the good name of free software into disrepute by
> using up a precious .so number, so it was all worth it.

Tridge, I don't think this has overridden you as
the talloc library maintainer - this thing is your
baby and you make the technical decisions on how it works.
If you say the right technical bugfix is to require us to
break the ABI, we break the ABI and deal with it, no question.
But you must admit this fix didn't *require* us to break the
ABI. I'm sorry you feel overridden on this, I don't think
that was the intention of anyone.

Simo is representing Red Hat as the packaging maintainer
of the libraries they ship in a distro. If he says doing it
this way to maintain the ABI will cause less problems for them
and for users of the library, then I would believe him and
back him on this.

It's been a long time since I delt with the complexities of
the details of a runtime linker, so I'm not going to weigh into
any technical argument on this, as I'm sure I'll just get it
wrong :-).

But if Simo says this will cause less problems for a distro,
then I'd just trust him. It is after all his job to manage this
on behalf of Red Hat, and I think he does a really good job
at doing so. That's what my congratulations were for.

As Rodney King said, "Can't we all just get along ?" :-).

Can't do much more discussion of this today, as it's the
4th of July and I have a barbecue to prepare :-). Happy
4th, everyone (even for non-Americans :-).

Jeremy.

Re: the sorry saga of the talloc soname 'fix'

by tridge@samba.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jeremy,

 > Tridge, I don't think this has overridden you as
 > the talloc library maintainer

I explicity said I didn't want this approach taken. Simo put the patch
it anyway and then you and others cheered.

When the maintainer doesn't want a particular change then you have 3
choices:

  1) convince them to change their mind

  2) fork

  3) stage a coup and overthrow the maintainer

It is _not_ an option just to push the patch anyway and ignore the
maintainers wishes.

If this had been an uncontentious bug fix and I had not explicitly
stated that I didn't want this approach taken then it would have been
fine. As it is, you and Simo chose to override the maintainer. That
should not happen.

 > - this thing is your baby and you make the technical decisions on
 > how it works.  If you say the right technical bugfix is to require
 > us to break the ABI, we break the ABI and deal with it, no
 > question.

yes, that or you convince me I am wrong.

 > Simo is representing Red Hat as the packaging maintainer
 > of the libraries they ship in a distro. If he says doing it
 > this way to maintain the ABI will cause less problems for them
 > and for users of the library, then I would believe him and
 > back him on this.

As you say, he is the RedHat maintainer. That means he gets to say
what is in the RedHat package. That doesn't give him the right to
force upstream to follow his approach.

He would certainly have been well within his rights to have a patch as
part of the RedHat package that changed how talloc behaves if he
failed to convince me to make a change and he thought he had no choice.

 > As Rodney King said, "Can't we all just get along ?" :-).

we can get along with talloc embedded in the Samba tree if we follow
the norms of developer behaviour within the team.

Metze did absolutely the right thing with his patch for this. He
argued for the change, and offered me a patch in his own tree that he
suggested I put in.

Simo suggested a change, I said I didn't want that change and he put
it in anyway. What then really pissed me off was all the cheering for
this flagrant disregard for the normal flow of free software
development.

I want to keep the libraries I maintain mastered within the Samba git
tree as I think that is of great benefit to Samba. We won't be able to
do that if other team members push changes into those libraries when I
have asked them not to.

Cheers, Tridge

Re: the sorry saga of the talloc soname 'fix'

by Andrew Bartlett :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, 2009-07-04 at 13:24 -0700, Jeremy Allison wrote:

> On Sat, Jul 04, 2009 at 11:48:36AM +1000, tridge@... wrote:
>
> > So yes, congratulations all round are in order. You've just overridden
> > the talloc package maintainer by introducing real bugs, made us
> > non-portable, introduced silent and difficult to track failures of
> > applications, broken the ABI promises, lied to the distro package
> > managers and loader and generally had a great time. But at least we
> > haven't brought the good name of free software into disrepute by
> > using up a precious .so number, so it was all worth it.
>
> Tridge, I don't think this has overridden you as
> the talloc library maintainer - this thing is your
> baby and you make the technical decisions on how it works.
> If you say the right technical bugfix is to require us to
> break the ABI, we break the ABI and deal with it, no question.
> But you must admit this fix didn't *require* us to break the
> ABI. I'm sorry you feel overridden on this, I don't think
> that was the intention of anyone.
The problem I have here is:  In our development tree, why was time of
such an imperative that this had to be overridden without a discussion
and agreement on the list?

Can I expect things that I maintain and care passionately about (but my
turn out to be wrong about) be overridden just because it feels better
to do so than hold a discussion?

Andrew Bartlett

--
Andrew Bartlett
http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org
Samba Developer, Cisco Inc.


signature.asc (196 bytes) Download Attachment

Re: the sorry saga of the talloc soname 'fix'

by Andrew Bartlett :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, 2009-07-04 at 11:24 +1000, tridge@... wrote:

> Hi Jeremy and Simo,
>
>  > Thanks a *lot* for this one Simo, much appreciated.
>
> I'm finding all this congratulation rather disturbing. The patch from
> Simo now creates the very problem you are all so keen to avoid.
>
> With the soname bump we had lots of standard mechanisms in place (both
> packaging and loader) to try to stop having two versions of the
> library in place at the same time. It would be detected at package
> install time, and also at runtime by Metze's patch.

> As for avoiding this with symbol versioning, that's all well and good
> on platforms that have symbol versioning, but many of the platforms
> that we claim we support don't have that, so now we've left them out
> in the cold.

Why are we even trying to provide and use 'Samba' shared libraries on
systems without symbol versions?  

(And yes, I'm aware that systems without GNU ld can't use them, but
given that Samba libraries such as talloc are meant to be a 'value add'
- not the fundamental purpose of the project, we should be able to set
the rules about where we support them to the places where we can do so
with sanity).

Andrew Bartlett
--
Andrew Bartlett
http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org
Samba Developer, Cisco Inc.


signature.asc (196 bytes) Download Attachment

Re: the sorry saga of the talloc soname 'fix'

by simo-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 2009-07-06 at 09:00 +1000, Andrew Bartlett wrote:

> On Sat, 2009-07-04 at 13:24 -0700, Jeremy Allison wrote:
> > On Sat, Jul 04, 2009 at 11:48:36AM +1000, tridge@... wrote:
> >
> > > So yes, congratulations all round are in order. You've just overridden
> > > the talloc package maintainer by introducing real bugs, made us
> > > non-portable, introduced silent and difficult to track failures of
> > > applications, broken the ABI promises, lied to the distro package
> > > managers and loader and generally had a great time. But at least we
> > > haven't brought the good name of free software into disrepute by
> > > using up a precious .so number, so it was all worth it.
> >
> > Tridge, I don't think this has overridden you as
> > the talloc library maintainer - this thing is your
> > baby and you make the technical decisions on how it works.
> > If you say the right technical bugfix is to require us to
> > break the ABI, we break the ABI and deal with it, no question.
> > But you must admit this fix didn't *require* us to break the
> > ABI. I'm sorry you feel overridden on this, I don't think
> > that was the intention of anyone.
>
> The problem I have here is:  In our development tree, why was time of
> such an imperative that this had to be overridden without a discussion
> and agreement on the list?

Just for the record.
I pushed the patch only after I got 3 acks from 3 of the most senior
samba team members. If anyone had nacked the patch I would have never in
my right mind thought of pushing it. But there was no negative comment.

Perhaps I shouldn't have taken silence as an implicit ack, but normally
we nack on the list when a proposed patch is not liked. In all cases
where there are only acks we always assume all silent members are not
objecting.

Simo.

--
Simo Sorce
Samba Team GPL Compliance Officer <simo@...>
Principal Software Engineer at Red Hat, Inc. <simo@...>
--
Simo Sorce
Samba Team GPL Compliance Officer <simo@...>
Principal Software Engineer at Red Hat, Inc. <simo@...>


Parent Message unknown Re: the sorry saga of the talloc soname 'fix'

by tridge@samba.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Simo,

 > I pushed the patch only after I got 3 acks from 3 of the most senior
 > samba team members. If anyone had nacked the patch I would have never in
 > my right mind thought of pushing it. But there was no negative comment.

huh? I objected very strongly, and you were explicitly reverting a
change I made where I chose to change the .so number to resolve the
ABI issue.

There is no way you could have considered my comments as a "silent
ack"

Cheers, Tridge

Re: the sorry saga of the talloc soname 'fix'

by John H Terpstra - Samba Team :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

tridge@... wrote:

> Hi Simo,
>
>  > I pushed the patch only after I got 3 acks from 3 of the most senior
>  > samba team members. If anyone had nacked the patch I would have never in
>  > my right mind thought of pushing it. But there was no negative comment.
>
> huh? I objected very strongly, and you were explicitly reverting a
> change I made where I chose to change the .so number to resolve the
> ABI issue.
>
> There is no way you could have considered my comments as a "silent
> ack"
>
> Cheers, Tridge

The discussion was open for all to see.  Clearly, there was no consensus
on this important matter when a precipitous move was made.  Over the
years the Samba Team developed due process, and for the most followed it
with harmonious practice.

We must learn from this incident, or else face increasing tension from
within. I do not wish to speculate where that may lead the team, or what
will be left if we do not strive to restore relations within.

I prefer to see what happened, not as badly dealt with conflict, but
rather as an opportunity to regroup aroung our mutual goals and
objectives as a team.  Can we do that?

- John T.


Re: the sorry saga of the talloc soname 'fix'

by Andrew Bartlett :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, 2009-07-05 at 20:56 -0400, simo wrote:
> Perhaps I shouldn't have taken silence as an implicit ack, but normally
> we nack on the list when a proposed patch is not liked. In all cases
> where there are only acks we always assume all silent members are not
> objecting.

I think, particularly for reverts, that this is a very dangerous
assumption.

Andrew Bartlett

--
Andrew Bartlett
http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org
Samba Developer, Cisco Inc.


signature.asc (196 bytes) Download Attachment

Re: the sorry saga of the talloc soname 'fix'

by Stefan (metze) Metzmacher :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andrew Bartlett schrieb:

> On Sat, 2009-07-04 at 11:24 +1000, tridge@... wrote:
>> Hi Jeremy and Simo,
>>
>>  > Thanks a *lot* for this one Simo, much appreciated.
>>
>> I'm finding all this congratulation rather disturbing. The patch from
>> Simo now creates the very problem you are all so keen to avoid.
>>
>> With the soname bump we had lots of standard mechanisms in place (both
>> packaging and loader) to try to stop having two versions of the
>> library in place at the same time. It would be detected at package
>> install time, and also at runtime by Metze's patch.
>
>> As for avoiding this with symbol versioning, that's all well and good
>> on platforms that have symbol versioning, but many of the platforms
>> that we claim we support don't have that, so now we've left them out
>> in the cold.
>
> Why are we even trying to provide and use 'Samba' shared libraries on
> systems without symbol versions?  
>
> (And yes, I'm aware that systems without GNU ld can't use them, but
> given that Samba libraries such as talloc are meant to be a 'value add'
> - not the fundamental purpose of the project, we should be able to set
> the rules about where we support them to the places where we can do so
> with sanity).
I'd support to go that road and only support shared libraries when we
have and use symbol versioning. (But that should be a different discussion).

But as talloc is a memory mange library, it's a risk to have two
versions loaded at the same time, the same as if you would load 2
versions of glibc.

There're cases where there's no risk, like in the libwbclient library
where the public api doesn't expose talloc pointers being used,
as we have a wbcFreeMemory() function. Here the libwbclient internal
could use one version of talloc and callers could use a different version.

Even with symbol versioning, we would need to make sure we only
load one version of talloc, if a library exposes talloc pointers
in the public api, libtevent and other's do this. In this case we need
to make sure that the library and its caller use the same version of talloc.

metze



signature.asc (260 bytes) Download Attachment

Re: the sorry saga of the talloc soname 'fix'

by tridge@samba.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Simo,

I've now spent all day looking at libtalloc and how it interacts with
what is currently in Ubuntu Jaunty. I have downloaded a Fedora image
but haven't yet installed it to see if Fedora is as badly placed as
Ubuntu is.

The result of my investigation is that libtalloc is a complete mess.
It turns out that with current Ubuntu we cannot completely avoid
having both the old talloc and the new talloc in the same process at
the same time. However, if we bump the .so number then at least the
developers will get a warning about the mess.

I've put together some files for you to look at to give you some idea
of just how bad this whole mess is. See

  http://samba.org/tridge/talloc_mess/

The files are:

 - a libtalloc 1.2.0 (dev and lib) matching what is currently in Ubuntu
   Jaunty

 - a libtalloc 1.4.0 (dev and lib) matching what is produced if we
   followed your suggested course of action

 - a libtalloc 2.0.0 (dev and lib) matching what is produced if we
   follow my preferred choice of using a new .so number

 - source tar balls with debian build rules for all of the above

 - a sample 'testtalloc' package that demonstrates the problems (deb
   plus source)

The testtalloc package produces two binaries. One is called test_ldb,
and it creates a ldb then tries to free it with talloc_free() which is
about as simple a ldb program as you can have. The other is called
test_mapi which initialises the MAPI subsystem from openchange then
uses talloc_report_full() to show the memory that has been used.

I chose these two binaries as they demonstrate different types of
brokenness in the way that talloc/ldb/mapi/samba/openchange etc have
all been packaged. For example:

  - The libldb-samba4-0 package provides a libldb.so.0 which has a
    built in static copy of talloc.

  - the libmapi.so package links to a dynamic libtalloc.so, but also
    links to libdcerpc.so

  - libdcerpc.so has a staticly linked talloc built in

  - etc etc

The same type of brokenness is rife through all the various packages
that use talloc currently.

If we used the approach you are advocating, then all of these packages
(ldb, openchange, mapi, samba etc) won't be marked as needing to be
rebuilt. Yet they will all abort with no error message when you
actually use them, because they will mix the two incompatible
ABIs. Try the test_ldb and test_mapi binaries to see the abort.

If we use the approach that I prefer, which is to change the .so
number to 2, then at least the developers get a nice warning like
this:

  /usr/bin/ld: warning: libtalloc.so.1, needed by /usr/lib/gcc/x86_64-linux-gnu/4.3.3/../../../../lib/libmapi.so, may conflict with libtalloc.so.2

So at least someone gets told that it won't work at build time, which
gives some hope that it might get fixed.

If we up the .so number to 2 then you can also see the brokenness by
looking at the dependencies, because we are explicitly marking the ABI
as having changed. It is easy to see the brokenness using ldd, or by
using dpkg.

If we don't do this then we're saying "the ABI is the same" when it
isn't. This is clearly shown by the abort in the test progams above,
regardless of whether you install the 1.4.0 libtalloc or the 2.0.0
libtalloc.

So even with your attempts to make the ABIs more similar by putting
backward compatibility code into talloc.c we get aborts because the
internal structures are not compatible (which is nicely caught by
Metze's patch).  Your attempts to make the ABIs compatible are not
enough, and would pollute the code with a lot of cruft that serves no
purpose, plus it will remove the warnings that developers that would
otherwise get when things are going to go wrong with some of the
libraries.

So Simo, please look at the above examples, then please revert your
commit. Also, in future, please don't revert a maintainers commits
without checking with the maintainer.

Also, Metze, you were right, your abort() check on version really is
needed, and really does happen with real examples. Thanks!

To prevent this happening in future we have to stop mixing staticly
linked libraries with shared versions of the same libs. That will mean
a lot of changes to the way that lots of libs are produced by the
Samba project and how they are linked into projects like openchange.

I hope I don't have to spend another day like today tracing shared
library problems. As I have said several times previous when proposals
of Samba shared libs come up, getting shared libs right is really
_really_ hard. We have come nowhere near to getting it right yet, and
the work required to get it right is quite substantial. I'm not
volunteering to do the work.

Cheers, Tridge

Re: the sorry saga of the talloc soname 'fix'

by Volker Lendecke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi, Tridge!

On Mon, Jul 06, 2009 at 08:21:34PM +1000, tridge@... wrote:
> I've now spent all day looking at libtalloc and how it interacts with
> what is currently in Ubuntu Jaunty. I have downloaded a Fedora image
> but haven't yet installed it to see if Fedora is as badly placed as
> Ubuntu is.

Just to get it right, I've got a question:

The changes to talloc you made are:

a) Add talloc_reparent

b) Print error messages to stderr if talloc_[steal|free] is
   used when references still exist.

Is that right?

Thanks,

Volker


signature.asc (204 bytes) Download Attachment

Re: the sorry saga of the talloc soname 'fix'

by tridge@samba.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Volker,

 > The changes to talloc you made are:
 >
 > a) Add talloc_reparent
 >
 > b) Print error messages to stderr if talloc_[steal|free] is
 >    used when references still exist.

It isn't just printing error messages. It also refuses to do ambiguous
operations. So if you do this:

  int *p1 = talloc(NULL, int);
  int *p2 = talloc(NULL, int)
  int *c1 = talloc(p1, int);
  int *r1 = talloc_reference(p2, c1);
  assert(r1 == c1);

then at the end, one piece of code might do:

  talloc_free(r1);

and another bit of code might do:

  talloc_free(c1);

the problem is that c1==r1, so talloc doesn't know which one you
mean. If you free via one of the parents (p1 or p2) then there is no
ambiguity. If you free via a child with references then there is no
way to know what the programmer intended. We can't look up the program
stack and say "ahh, they are in the module that did the reference,
they must mean to reduce the ref count". There is just no sane way to
do that in C.

At first I just added the warning, but kept the old behaviour, which
was to pick one of the two ambiguous choices. In the case of
talloc_free() that was to free the "most recent parent" (or close
enough, the details of "most recent" are complicated by a steal). In
the case of steal it was to "always steal from the 'real' parent".

Once Andrew and I started to look at where these warnings cropped up
though, it became very clear that they were in fact real bugs. Some
were memory leaks. Some just assumed the opposite behaviour of what
was actually done. None of them were clearly 'correct'. So the only
sane choice was to change the API.

So we now refuse the talloc_free(), and print a warning giving both
the line of code where the talloc_free() happened, and all the places
that did talloc_reference() of the pointer. That makes it easy to find
the problem. The usual fix is to then use talloc_unlink() which allows
you to specify which of the two possibilities you wanted.

The same applies to talloc_steal().  I added talloc_reparent() as it
allows you to solve the problem in much the same way that
talloc_unlink() solves the problem for talloc_free(). A
talloc_reparent() allows you to do a "steal" like operation, but
specifying which parent you want to steal from so there is no
ambiguity.

If you try to do an ambiguous talloc_steal() then it will be refused,
and will return NULL. In all likelyhood existing code will crash, as
we often do:

  a->p = talloc_steal(x, y);

and then don't check the return. It isn't ideal that code like this
will now crash, but at least there will be a clear warning printed
first saying what the problem is.

We still need a better way than just using stderr for these
warnings. The patch adding #ifdef DEVELOPER is not the right fix, as
these bugs can occur at runtime. We need something like:

  talloc_set_log_fn(samba_talloc_log);

so the application offers a clear function to use for logging talloc
warnings.

All of this is both an API change (although I would claim only for
code that was already broken!) and an ABI change. Thus we need a new
.so number.

Cheers, Tridge

Re: the sorry saga of the talloc soname 'fix'

by Volker Lendecke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi, Tridge!

On Mon, Jul 06, 2009 at 09:22:55PM +1000, tridge@... wrote:
> All of this is both an API change (although I would claim only for
> code that was already broken!) and an ABI change. Thus we need a new
> .so number.

Thanks for the explanation.

So if we skipped the "location" piece of the change, the
error messages would get a lot less useful, and the only
real change would be to create memory leaks in already
broken code. Right?

Volker


signature.asc (204 bytes) Download Attachment

Re: the sorry saga of the talloc soname 'fix'

by Jelmer Vernooij :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Tridge,

tridge@... wrote:

> I chose these two binaries as they demonstrate different types of
> brokenness in the way that talloc/ldb/mapi/samba/openchange etc have
> all been packaged. For example:
>
>   - The libldb-samba4-0 package provides a libldb.so.0 which has a
>     built in static copy of talloc.
>
>   - the libmapi.so package links to a dynamic libtalloc.so, but also
>     links to libdcerpc.so
>
>   - libdcerpc.so has a staticly linked talloc built in
>
>   - etc etc
>
> The same type of brokenness is rife through all the various packages
> that use talloc currently.
>  
FWIW, these issues were fixed in Debian experimental a couple of months
ago, so I would expect Karmic to ship with properly working packages.
More specifically, libldb-samba4-0 is now gone (there's only one ldb
package left) and libdcerpc/libtalloc link against a shared version of
libtalloc or fail, rather than falling back to a static version of the
library like it would previously.

Cheers,

Jelmer

Re: the sorry saga of the talloc soname 'fix'

by tridge@samba.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Volker,

 > So if we skipped the "location" piece of the change, the
 > error messages would get a lot less useful, and the only
 > real change would be to create memory leaks in already
 > broken code. Right?

yep. I did think about doing that, but I really disliked the idea of
creating a bunch of very difficult to track down leaks.

I know this change is not pain free, but I think some pain is
unavoidable. The previous API was broken (which of course was my
fault, as I put the ambiguity in there in the first place).

Cheers, Tridge
< Prev | 1 - 2 - 3 - 4 | Next >