LDB: Patch for fixing counter variables

View: New views
11 Messages — Rating Filter:   Alert me  

LDB: Patch for fixing counter variables

by Matthias Dieter Wallnöfer-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Most of our LDB objects are stored in structures containing a dynamic array and a counter. The counter variable is typically an "unsigned integer". Until here it's all okay. But with the major part of the counter variables (i, j...) it's not: since they're "signed". Well, this doesn't give problems till we don't break the ~ 2,1 * 10^9 barrier - but the API is fine to go beyond: with the possible result of infinite loops (since we are never able to reach the maximum).
Now you can argue: "this does happen very rarely" - where I'm also with you. But that doesn't mean that those counters aren't buggy. Since the LDB is a public library for  LDAP-like directories I would be very pleased to see this fixed.
To achieve this I prepared a patch which changes "signed" types to "unsigned" ones which fix the problem. It passes "make test" and doesn't seem to give problems. The only thing I would like to ask you is to review it, so I could push it soon.

Matthias





ldb_counters.patch (65K) Download Attachment

Re: LDB: Patch for fixing counter variables

by Volker Lendecke :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Nov 06, 2009 at 09:19:46PM +0000, Matthias Dieter Wallnöfer wrote:
> Most of our LDB objects are stored in structures
> containing a dynamic array and a counter. The counter

A quick side-remark for this: I've recently figured out that
for most arrays the explicit counter is superfluous, as we
now have talloc_array_length().

Volker


attachment0 (196 bytes) Download Attachment

Re: LDB: Patch for fixing counter variables

by Andrew Bartlett :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 2009-11-06 at 21:19 +0000, Matthias Dieter Wallnöfer wrote:

> Most of our LDB objects are stored in structures containing a dynamic
> array and a counter. The counter variable is typically an "unsigned
> integer". Until here it's all okay. But with the major part of the
> counter variables (i, j...) it's not: since they're "signed". Well,
> this doesn't give problems till we don't break the ~ 2,1 * 10^9
> barrier - but the API is fine to go beyond: with the possible result
> of infinite loops (since we are never able to reach the maximum).
> Now you can argue: "this does happen very rarely" - where I'm also
> with you. But that doesn't mean that those counters aren't buggy.
> Since the LDB is a public library for  LDAP-like directories I would
> be very pleased to see this fixed.
> To achieve this I prepared a patch which changes "signed" types to
> "unsigned" ones which fix the problem. It passes "make test" and
> doesn't seem to give problems. The only thing I would like to ask you
> is to review it, so I could push it soon.
Matthias,

As you may have come to expect, my first request is that you split the
patch up.  There are parts of this that look safe, and many parts that
need to looked at more carefully.

The changes the the Python look most dubious, as you are changing the
type of something that is then interpreted as an integer in the python.
(so please remove all the Python changes).

You also have typo fixes and changes to use LDB_SUCCESS mixed in the
same patch.  

Once you do that, I'll take another look, and we can see if some of
these can be easily merged.   Perhaps also produce the patches with more
context?  It's hard to tell from the changed declaration what the effect
would be (otherwise I'll have to do the review in gitk).

Thanks,

Andrew Bartlett

--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org
Samba Developer, Cisco Inc.



signature.asc (196 bytes) Download Attachment

Parent Message unknown Re: LDB: Patch for fixing counter variables

by Matthias Dieter Wallnöfer-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I don't have much time to chat now, so I submit you a new version of the patch: divided in C and python bindings.

Matthias

--- Andrew Bartlett <abartlet@...> schrieb am Sa, 7.11.2009:

Von: Andrew Bartlett <abartlet@...>
Betreff: Re: LDB: Patch for fixing counter variables
An: mdw@...
CC: idra@..., "Samba Technical Mailinglist" <samba-technical@...>
Datum: Samstag, 7. November 2009, 1:37

On Fri, 2009-11-06 at 21:19 +0000, Matthias Dieter Wallnöfer wrote:

> Most of our LDB objects are stored in structures containing a dynamic
> array and a counter. The counter variable is typically an "unsigned
> integer". Until here it's all okay. But with the major part of the
> counter variables (i, j...) it's not: since they're "signed". Well,
> this doesn't give problems till we don't break the ~ 2,1 * 10^9
> barrier - but the API is fine to go beyond: with the possible result
> of infinite loops (since we are never able to reach the maximum).
> Now you can argue: "this does happen very rarely" - where I'm also
> with you. But that doesn't mean that those counters aren't buggy.
> Since the LDB is a public library for  LDAP-like directories I would
> be very pleased to see this fixed.
> To achieve this I prepared a patch which changes "signed" types to
> "unsigned" ones which fix the problem. It passes "make test" and
> doesn't seem to give problems. The only thing I would like to ask you
> is to review it, so I could push it soon.
Matthias,

As you may have come to expect, my first request is that you split the
patch up.  There are parts of this that look safe, and many parts that
need to looked at more carefully.

The changes the the Python look most dubious, as you are changing the
type of something that is then interpreted as an integer in the python.
(so please remove all the Python changes).

You also have typo fixes and changes to use LDB_SUCCESS mixed in the
same patch. 

Once you do that, I'll take another look, and we can see if some of
these can be easily merged.   Perhaps also produce the patches with more
context?  It's hard to tell from the changed declaration what the effect
would be (otherwise I'll have to do the review in gitk).

Thanks,

Andrew Bartlett

--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org
Samba Developer, Cisco Inc.







ldb_counters.patch (61K) Download Attachment
ldb_counters_py.patch (4K) Download Attachment

Re: LDB: Patch for fixing counter variables

by Jelmer Vernooij :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi matthias,

On Sat, 2009-11-07 at 09:24 +0000, Matthias Dieter Wallnöfer wrote:
> I don't have much time to chat now, so I submit you a new version of the patch: divided in C and python bindings.
The Python binding changes are pointless. PyList_SetItem() takes an
integer (not an unsigned integer), so changing this doesn't fix
anything. If we really suspect we're going to run into ldb results or
messages with more than billions of elements, we should just raise an
exception if they don't fit in a Python list.

Cheers,

Jelmer

> --- Andrew Bartlett <abartlet@...> schrieb am Sa, 7.11.2009:
>
> Von: Andrew Bartlett <abartlet@...>
> Betreff: Re: LDB: Patch for fixing counter variables
> An: mdw@...
> CC: idra@..., "Samba Technical Mailinglist" <samba-technical@...>
> Datum: Samstag, 7. November 2009, 1:37
>
> On Fri, 2009-11-06 at 21:19 +0000, Matthias Dieter Wallnöfer wrote:
> > Most of our LDB objects are stored in structures containing a dynamic
> > array and a counter. The counter variable is typically an "unsigned
> > integer". Until here it's all okay. But with the major part of the
> > counter variables (i, j...) it's not: since they're "signed". Well,
> > this doesn't give problems till we don't break the ~ 2,1 * 10^9
> > barrier - but the API is fine to go beyond: with the possible result
> > of infinite loops (since we are never able to reach the maximum).
> > Now you can argue: "this does happen very rarely" - where I'm also
> > with you. But that doesn't mean that those counters aren't buggy.
> > Since the LDB is a public library for  LDAP-like directories I would
> > be very pleased to see this fixed.
> > To achieve this I prepared a patch which changes "signed" types to
> > "unsigned" ones which fix the problem. It passes "make test" and
> > doesn't seem to give problems. The only thing I would like to ask you
> > is to review it, so I could push it soon.
>
> Matthias,
>
> As you may have come to expect, my first request is that you split the
> patch up.  There are parts of this that look safe, and many parts that
> need to looked at more carefully.
>
> The changes the the Python look most dubious, as you are changing the
> type of something that is then interpreted as an integer in the python.
> (so please remove all the Python changes).
>
> You also have typo fixes and changes to use LDB_SUCCESS mixed in the
> same patch.  
>
> Once you do that, I'll take another look, and we can see if some of
> these can be easily merged.   Perhaps also produce the patches with more
> context?  It's hard to tell from the changed declaration what the effect
> would be (otherwise I'll have to do the review in gitk).
>
> Thanks,
>
> Andrew Bartlett
>
> --
> Andrew Bartlett                                http://samba.org/~abartlet/
> Authentication Developer, Samba Team           http://samba.org
> Samba Developer, Cisco Inc.
>
>
>
>
>      

--
Jelmer Vernooij <jelmer@...> - http://samba.org/~jelmer/
Jabber: jelmer@...


signature.asc (853 bytes) Download Attachment

Re: LDB: Patch for fixing counter variables

by Matthias Dieter Wallnöfer-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks Jelmer, for your clear explaination. I will drop the python part. To be honest I wasn't very about it.

Matthias

--- Jelmer Vernooij <jelmer@...> schrieb am Sa, 7.11.2009:

Von: Jelmer Vernooij <jelmer@...>
Betreff: Re: LDB: Patch for fixing counter variables
An: mdw@...
CC: "Andrew Bartlett" <abartlet@...>, "Samba Technical Mailinglist" <samba-technical@...>, idra@...
Datum: Samstag, 7. November 2009, 11:02

Hi matthias,

On Sat, 2009-11-07 at 09:24 +0000, Matthias Dieter Wallnöfer wrote:
> I don't have much time to chat now, so I submit you a new version of the patch: divided in C and python bindings.
The Python binding changes are pointless. PyList_SetItem() takes an
integer (not an unsigned integer), so changing this doesn't fix
anything. If we really suspect we're going to run into ldb results or
messages with more than billions of elements, we should just raise an
exception if they don't fit in a Python list.

Cheers,

Jelmer

> --- Andrew Bartlett <abartlet@...> schrieb am Sa, 7.11.2009:
>
> Von: Andrew Bartlett <abartlet@...>
> Betreff: Re: LDB: Patch for fixing counter variables
> An: mdw@...
> CC: idra@..., "Samba Technical Mailinglist" <samba-technical@...>
> Datum: Samstag, 7. November 2009, 1:37
>
> On Fri, 2009-11-06 at 21:19 +0000, Matthias Dieter Wallnöfer wrote:
> > Most of our LDB objects are stored in structures containing a dynamic
> > array and a counter. The counter variable is typically an "unsigned
> > integer". Until here it's all okay. But with the major part of the
> > counter variables (i, j...) it's not: since they're "signed". Well,
> > this doesn't give problems till we don't break the ~ 2,1 * 10^9
> > barrier - but the API is fine to go beyond: with the possible result
> > of infinite loops (since we are never able to reach the maximum).
> > Now you can argue: "this does happen very rarely" - where I'm also
> > with you. But that doesn't mean that those counters aren't buggy.
> > Since the LDB is a public library for  LDAP-like directories I would
> > be very pleased to see this fixed.
> > To achieve this I prepared a patch which changes "signed" types to
> > "unsigned" ones which fix the problem. It passes "make test" and
> > doesn't seem to give problems. The only thing I would like to ask you
> > is to review it, so I could push it soon.
>
> Matthias,
>
> As you may have come to expect, my first request is that you split the
> patch up.  There are parts of this that look safe, and many parts that
> need to looked at more carefully.
>
> The changes the the Python look most dubious, as you are changing the
> type of something that is then interpreted as an integer in the python.
> (so please remove all the Python changes).
>
> You also have typo fixes and changes to use LDB_SUCCESS mixed in the
> same patch. 
>
> Once you do that, I'll take another look, and we can see if some of
> these can be easily merged.   Perhaps also produce the patches with more
> context?  It's hard to tell from the changed declaration what the effect
> would be (otherwise I'll have to do the review in gitk).
>
> Thanks,
>
> Andrew Bartlett
>
> --
> Andrew Bartlett                                http://samba.org/~abartlet/
> Authentication Developer, Samba Team           http://samba.org
> Samba Developer, Cisco Inc.
>
>
>
>
>       


--
Jelmer Vernooij <jelmer@...> - http://samba.org/~jelmer/
Jabber: jelmer@...





Re: LDB: Patch for fixing counter variables

by Matthias Dieter Wallnöfer-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Volker Lendecke wrote:
> A quick side-remark for this: I've recently figured out that
> for most arrays the explicit counter is superfluous, as we
> now have talloc_array_length().
>
> Volker
>    
Ah, that would indeed be a nice method to skip those length/amount
specification variables. But for LDB that means huge API changes which
for sure aren't welcome.

Anyway, my patch considers the wrong-typed "i,j" counters which would
have to be corrected also with that approach.

Matthias

Re: LDB: Patch for fixing counter variables

by Matthias Dieter Wallnöfer-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andrew,

I dropped the wrong part of the patch though your and Jelmer's
suggestion. The other changes should fully compliy - I would like to
remember: "make test" passes as without them - so nothing (additional)
should have been broken.

I hope that you aren't that picky on those changes since then we never
get to an end. For the future I plan also to provide some patches on
other similar s4 code. Of course if you notice an obvious mistake (like
this with python) please inform me. But otherwise going through each
part I don't see very efficient.

Matthias

Andrew Bartlett wrote:

> On Fri, 2009-11-06 at 21:19 +0000, Matthias Dieter Wallnöfer wrote:
>    
>> Most of our LDB objects are stored in structures containing a dynamic
>> array and a counter. The counter variable is typically an "unsigned
>> integer". Until here it's all okay. But with the major part of the
>> counter variables (i, j...) it's not: since they're "signed". Well,
>> this doesn't give problems till we don't break the ~ 2,1 * 10^9
>> barrier - but the API is fine to go beyond: with the possible result
>> of infinite loops (since we are never able to reach the maximum).
>> Now you can argue: "this does happen very rarely" - where I'm also
>> with you. But that doesn't mean that those counters aren't buggy.
>> Since the LDB is a public library for  LDAP-like directories I would
>> be very pleased to see this fixed.
>> To achieve this I prepared a patch which changes "signed" types to
>> "unsigned" ones which fix the problem. It passes "make test" and
>> doesn't seem to give problems. The only thing I would like to ask you
>> is to review it, so I could push it soon.
>>      
> Matthias,
>
> As you may have come to expect, my first request is that you split the
> patch up.  There are parts of this that look safe, and many parts that
> need to looked at more carefully.
>
> The changes the the Python look most dubious, as you are changing the
> type of something that is then interpreted as an integer in the python.
> (so please remove all the Python changes).
>
> You also have typo fixes and changes to use LDB_SUCCESS mixed in the
> same patch.
>
> Once you do that, I'll take another look, and we can see if some of
> these can be easily merged.   Perhaps also produce the patches with more
> context?  It's hard to tell from the changed declaration what the effect
> would be (otherwise I'll have to do the review in gitk).
>
> Thanks,
>
> Andrew Bartlett
>
>    


Re: LDB: Patch for fixing counter variables

by Andrew Bartlett :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 2009-11-09 at 16:28 +0100, Matthias Dieter Wallnöfer wrote:

> Andrew,
>
> I dropped the wrong part of the patch though your and Jelmer's
> suggestion. The other changes should fully compliy - I would like to
> remember: "make test" passes as without them - so nothing (additional)
> should have been broken.
>
> I hope that you aren't that picky on those changes since then we never
> get to an end. For the future I plan also to provide some patches on
> other similar s4 code. Of course if you notice an obvious mistake (like
> this with python) please inform me. But otherwise going through each
> part I don't see very efficient.
It isn't very efficient, but just like the const changes it is easy to
slip in other changes and subtle bugs in a big 'cleanup' patch.

That's why we need to be careful doing this, and why splitting things up
helps, because it allows me or others to say '1 is ok, still thinking
about 2'.  The more you split it up, the easier it is for me.

Also, for me the whole question of 'is there any point' is still very
much open.

That is, the boundary condition needs to be *way* below 2^31 (can we
really allocate that many elements?), and raising it to 2^32 just seems
to change the rollover point.  
BTW, See: http://xkcd.com/571/

Most of the one I've looked at seem OK, but because it's so large I've
not been able to look at them all.  However some things look like they
could be improved, such as the change to ldb_msg_remove_element(), which
looks like it should be a ptrdiff_t, and check for both above and below
the range.  

And this change:
 char *ldb_binary_encode(void *mem_ctx, struct ldb_val val)
 {
- int i;
  char *ret;
- int len = val.length;
+ unsigned int i, len = val.length;
  unsigned char *buf = val.data;

Would have been better as just a search/replace - the construction 'i,
len = val.length' isn't as nice.

I'm sorry this is so tedious, and to be so picky.  Like the const
changes, it will take time to check all these (and 'make test' generally
isn't sufficient on these kind of changes).

Andrew Bartlett

--
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org
Samba Developer, Cisco Inc.



signature.asc (196 bytes) Download Attachment

Re: LDB: Patch for fixing counter variables

by tridge@samba.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Matthias,

 > That is, the boundary condition needs to be *way* below 2^31 (can we
 > really allocate that many elements?), and raising it to 2^32 just seems
 > to change the rollover point.  

Andrew is right about that, we can't ever allocate anything like that
number of elements in ldb. For a start, we limit talloc allocations to
256M, plus lots of the internal algorithms in ldb would start to fail.

There are a few things that are O(N^2) in ldb. If N approaches 2^31,
then N^2 will be 2^62. Think how long it will take to process that :-)

Changing to unsigned counters isn't wrong, but it does need to be done
very carefully, and I don't see it gaining us anything.

Matthias, would you mind if I suggested a more worthwhile project
related to ldb and 'for' loops?

A really worthwhile project would be to find the places that are
O(N^2) or larger and find a way to fix them. To do that you might like
to start by finding all for loop nesting, or loops that call functions
that loop.

To give you an example, consider the code in ltdb_modify_internal()
which checks to see that we don't have a multi-valued attribute with
the same attribute twice (look near line 691 of
lib/ldb/ldb_tdb/ldb_tdb.c). Now think about what happens when we
modify an index record, which may have thousands of @IDX entries
(eg. the objectclass=user index, or a one-level index).

Adding 10k users will call this function 10k times on that
record. After adding N users, each call of that function will cost
O(N^2), as it checks all elements being stored with all previous
elements. So that means adding 10k users will cost O(N^3). So it will
cost about 10^12 memcmp() calls (the calls are in
ldb_val_equal_exact()). Each memcmp compares on average about 10^2
bytes, so in total we memcmp() about 10^14 bytes. A modern CPU can
compare cached memory at a rate of about 4*10^9 bytes/s, so those
memcmp() calls are costing us about 25000 seconds (ie. about 6 hours).

The reason I'm suggesting this is you are concerned about large lists
of elements in ldb. If you plug N=2^31 into the above, then you'll see
that to add that number of users in the current ldb would cost about
2^95 time. The earth (and probably the universe!) would be well and
truly dead by the time we'd added that number of users.

The fix is to develop hashing, binary search and sort functions that
will allow ldb to scale to large sizes. That fix is much more
important than going from 2^31 to 2^32. In the example above, if we
used a qsort() of the elements then checked for duplicates by checking
neighboring elements then the total cost of that part of the code
would drop from 10^14 to about 10^10. So instead of taking 6 hours for
those few lines of code to run it would take about 2 seconds.

There are lots of examples like this in Samba4 (and probably in Samba3
too!), and there is an especially large concentration of them in
ldb. We rarely test for real scalability, and that means that the
developers aren't thinking in terms of big-O calculations when they
write the code.

If you can help us fix that, then maybe someday we can actually think
about approaching 2^31 elements. Then signed/unsigned for the loop
counters will matter :-)

Cheers, Tridge

PS: All numbers in the above are very rough of course - but try adding
a few thousand users with a profiler running and you'll see what I
mean

Re: LDB: Patch for fixing counter variables

by Matthias Dieter Wallnöfer-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andrew,
> It isn't very efficient, but just like the const changes it is easy to
> slip in other changes and subtle bugs in a big 'cleanup' patch.
>
> That's why we need to be careful doing this, and why splitting things up
> helps, because it allows me or others to say '1 is ok, still thinking
> about 2'.  The more you split it up, the easier it is for me.
>    
But for me more work-intensive. Isn't it possible for you to provide me
a list of functions/files which should be reviewed or corrected?
> Also, for me the whole question of 'is there any point' is still very
> much open.
>
> That is, the boundary condition needs to be *way* below 2^31 (can we
> really allocate that many elements?), and raising it to 2^32 just seems
> to change the rollover point.
> BTW, See: http://xkcd.com/571/
>    
Well, we can also let this changes. But then I would like to see also
the size specifiers ("num_elements", "num_attributes"...) as signed int
(e.g. when we would have implemented this in Java this would be the
right solution). I only dislike this kind of mixing.

> Most of the one I've looked at seem OK, but because it's so large I've
> not been able to look at them all.  However some things look like they
> could be improved, such as the change to ldb_msg_remove_element(), which
> looks like it should be a ptrdiff_t, and check for both above and below
> the range.
>
> And this change:
>   char *ldb_binary_encode(void *mem_ctx, struct ldb_val val)
>   {
> - int i;
>   char *ret;
> - int len = val.length;
> + unsigned int i, len = val.length;
>   unsigned char *buf = val.data;
>
> Would have been better as just a search/replace - the construction 'i,
> len = val.length' isn't as nice.
>    
Thanks for pointing this out! A change on this will happen from my side.
> I'm sorry this is so tedious, and to be so picky.  Like the const
> changes, it will take time to check all these (and 'make test' generally
> isn't sufficient on these kind of changes).
>    
But still I see this not completely unimportant. I handle this as a side
project like the "const"s next to my actual "main project"
("password_hash"). I try to provide a first mergeable version soon
(which does support changes from the administrator side with password
policies and for now lets the "samdb_set_password" call in the actual
form to handle also user passwords right). I only need to rethink about
the removed attribute constraints which you pointed out in another post.

Matthias