New libc malloc patch

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 - 4 | Next >

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 29, 2005, at 2:52 AM, Hiten Pandya wrote:
> I see that you have included an implementation of red-black tree CPP
> macros, but wouldn't it be better if you were to use the ones in
> <sys/tree.h> ?  I have only had a precursory look, but I would have
> thought that would be the way to go.

There's an updated patch available:

http://www.canonware.com/~jasone/jemalloc/jemalloc_20051201a.diff

This patch includes the following changes:

*) Use sys/tree.h rather than a separate red-black tree implementation.

*) Use the __isthreaded symbol to avoid locking for single-threaded  
programs, and to simplify malloc initialization.  The extra branches  
that are required to check __isthreaded should be more than offset by  
the removal of an atomic compare/swap operation.

*) Fix an obscure bug (very difficult to trigger without changing  
some compile-time constants).

Jason
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Hiten Pandya :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks Jason!

Kind Regards,

--
Hiten Pandya
hiten.pandya at gmail.com

On 01/12/05, Jason Evans <jasone@...> wrote:

> On Nov 29, 2005, at 2:52 AM, Hiten Pandya wrote:
> > I see that you have included an implementation of red-black tree CPP
> > macros, but wouldn't it be better if you were to use the ones in
> > <sys/tree.h> ?  I have only had a precursory look, but I would have
> > thought that would be the way to go.
>
> There's an updated patch available:
>
> http://www.canonware.com/~jasone/jemalloc/jemalloc_20051201a.diff
>
> This patch includes the following changes:
>
> *) Use sys/tree.h rather than a separate red-black tree implementation.
>
> *) Use the __isthreaded symbol to avoid locking for single-threaded
> programs, and to simplify malloc initialization.  The extra branches
> that are required to check __isthreaded should be more than offset by
> the removal of an atomic compare/swap operation.
>
> *) Fix an obscure bug (very difficult to trigger without changing
> some compile-time constants).
>
> Jason
>
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 29, 2005, at 12:06 PM, Jon Dama wrote:
> There exists a problem right now--localized to i386 and any other arch
> based on 32-bit pointers: address space is simply too scarce.
>
> Your decision to switch to using mmap as the exclusive source of  
> malloc
> buckets is admirable for its modernity but it simply cannot stand  
> unless
> someone steps up to change the way mmap and brk interact within the
> kernel.

There's a new version of the patch available at:

http://www.canonware.com/~jasone/jemalloc/jemalloc_20051202b.diff

This version of the patch adds the following:

* Prefer to use sbrk() rather than mmap() for the 32-bit platforms.

* Lazily create arenas, so that single-threaded applications don't  
dedicate space to arenas they never use.

* Add the '*' and '/' MALLOC_OPTIONS flags, which allow control over  
the number of arenas.

As of this patch, all of the issues that were brought to my attention  
have been addressed.  This is a good time for additional review and  
serious benchmarking.

Thanks,
Jason
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by David Xu-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jason Evans wrote:

> On Nov 29, 2005, at 12:06 PM, Jon Dama wrote:
>
>> There exists a problem right now--localized to i386 and any other arch
>> based on 32-bit pointers: address space is simply too scarce.
>>
>> Your decision to switch to using mmap as the exclusive source of  malloc
>> buckets is admirable for its modernity but it simply cannot stand  unless
>> someone steps up to change the way mmap and brk interact within the
>> kernel.
>
>
> There's a new version of the patch available at:
>
> http://www.canonware.com/~jasone/jemalloc/jemalloc_20051202b.diff
>
> This version of the patch adds the following:
>
> * Prefer to use sbrk() rather than mmap() for the 32-bit platforms.
>
> * Lazily create arenas, so that single-threaded applications don't  
> dedicate space to arenas they never use.
>
> * Add the '*' and '/' MALLOC_OPTIONS flags, which allow control over  
> the number of arenas.
>
> As of this patch, all of the issues that were brought to my attention  
> have been addressed.  This is a good time for additional review and  
> serious benchmarking.
>
> Thanks,
> Jason

I have a question about mutex used in the patch, you are using
a spin loop, isn't it suboptimal ? and a thread library like libpthread
supports static priority scheduling, this mutex does not work, it
will causes a dead lock, if a lower priority thread locked the mutex,
and preempted by a higher priority thread, and the higher priority
thread also calls malloc, it will spin there to wait lower
priority thread to complete, but that will never happen.

David Xu


_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by kometen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> There's a new version of the patch available at:
> http://www.canonware.com/~jasone/jemalloc/jemalloc_20051202b.diff

When I do a make buildworld I get:

cc -fpic -DPIC -O2 -fno-strict-aliasing -pipe -march=athlon64
-I/usr/src/lib/libc/include -I/usr/src/lib/libc/../../include
-I/usr/src/lib/libc/amd64 -D__DBINTERFACE_PRIVATE
-I/usr/src/lib/libc/../../contrib/gdtoa -DINET6
-I/usr/obj/usr/src/lib/libc -DPOSIX_MISTAKE -I/usr/src/lib/libc/locale
-DBROKEN_DES -DPORTMAP -DDES_BUILTIN -I/usr/src/lib/libc/rpc -DYP
-Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized -c
/usr/src/lib/libc/stdlib/malloc.c -o malloc.So
cc -O2 -fno-strict-aliasing -pipe -march=athlon64
-I/usr/src/lib/libc/include -I/usr/src/lib/libc/../../include
-I/usr/src/lib/libc/amd64 -D__DBINTERFACE_PRIVATE
-I/usr/src/lib/libc/../../contrib/gdtoa -DINET6
-I/usr/obj/usr/src/lib/libc -DPOSIX_MISTAKE -I/usr/src/lib/libc/locale
-DBROKEN_DES -DPORTMAP -DDES_BUILTIN -I/usr/src/lib/libc/rpc -DYP
-Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized -c
/usr/src/lib/libc/stdlib/merge.c
/usr/src/lib/libc/stdlib/malloc.c: In function `malloc_mutex_lock':
/usr/src/lib/libc/stdlib/malloc.c:846: warning: cast from pointer to
integer of different size
/usr/src/lib/libc/stdlib/malloc.c:846: warning: cast from pointer to
integer of different size
/usr/src/lib/libc/stdlib/malloc.c:853: warning: cast from pointer to
integer of different size
/usr/src/lib/libc/stdlib/malloc.c: In function `malloc_mutex_unlock':
/usr/src/lib/libc/stdlib/malloc.c:894: warning: cast from pointer to
integer of different size
*** Error code 1
/usr/src/lib/libc/stdlib/malloc.c: In function `malloc_mutex_lock':
/usr/src/lib/libc/stdlib/malloc.c:846: warning: cast from pointer to
integer of different size
/usr/src/lib/libc/stdlib/malloc.c:846: warning: cast from pointer to
integer of different size
/usr/src/lib/libc/stdlib/malloc.c:853: warning: cast from pointer to
integer of different size
/usr/src/lib/libc/stdlib/malloc.c: In function `malloc_mutex_unlock':
/usr/src/lib/libc/stdlib/malloc.c:894: warning: cast from pointer to
integer of different size
*** Error code 1
2 errors
*** Error code 2
1 error
*** Error code 2
1 error
*** Error code 2
1 error
*** Error code 2
1 error
*** Error code 2
1 error
make -j 3 buildworld  763,54s user 150,98s system 173% cpu 8:48,62 total

twin/usr/src#>uname -a
FreeBSD twin.gnome.no 7.0-CURRENT FreeBSD 7.0-CURRENT #0: Thu Dec  1
21:38:11 CET 2005     root@...:/usr/obj/usr/src/sys/TWIN
amd64

regards
Claus
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Dec 3, 2005, at 12:45 PM, Claus Guttesen wrote:

>> There's a new version of the patch available at:
>> http://www.canonware.com/~jasone/jemalloc/jemalloc_20051202b.diff
>
> When I do a make buildworld I get:
>
> cc -fpic -DPIC -O2 -fno-strict-aliasing -pipe -march=athlon64
> -I/usr/src/lib/libc/include -I/usr/src/lib/libc/../../include
> -I/usr/src/lib/libc/amd64 -D__DBINTERFACE_PRIVATE
> -I/usr/src/lib/libc/../../contrib/gdtoa -DINET6
> -I/usr/obj/usr/src/lib/libc -DPOSIX_MISTAKE -I/usr/src/lib/libc/locale
> -DBROKEN_DES -DPORTMAP -DDES_BUILTIN -I/usr/src/lib/libc/rpc -DYP
> -Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized -c
> /usr/src/lib/libc/stdlib/malloc.c -o malloc.So
> cc -O2 -fno-strict-aliasing -pipe -march=athlon64
> -I/usr/src/lib/libc/include -I/usr/src/lib/libc/../../include
> -I/usr/src/lib/libc/amd64 -D__DBINTERFACE_PRIVATE
> -I/usr/src/lib/libc/../../contrib/gdtoa -DINET6
> -I/usr/obj/usr/src/lib/libc -DPOSIX_MISTAKE -I/usr/src/lib/libc/locale
> -DBROKEN_DES -DPORTMAP -DDES_BUILTIN -I/usr/src/lib/libc/rpc -DYP
> -Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized -c
> /usr/src/lib/libc/stdlib/merge.c
> /usr/src/lib/libc/stdlib/malloc.c: In function `malloc_mutex_lock':
> /usr/src/lib/libc/stdlib/malloc.c:846: warning: cast from pointer to
> integer of different size
> /usr/src/lib/libc/stdlib/malloc.c:846: warning: cast from pointer to
> integer of different size
> /usr/src/lib/libc/stdlib/malloc.c:853: warning: cast from pointer to
> integer of different size
> /usr/src/lib/libc/stdlib/malloc.c: In function `malloc_mutex_unlock':
> /usr/src/lib/libc/stdlib/malloc.c:894: warning: cast from pointer to
> integer of different size
> *** Error code 1
> /usr/src/lib/libc/stdlib/malloc.c: In function `malloc_mutex_lock':
> /usr/src/lib/libc/stdlib/malloc.c:846: warning: cast from pointer to
> integer of different size
> /usr/src/lib/libc/stdlib/malloc.c:846: warning: cast from pointer to
> integer of different size
> /usr/src/lib/libc/stdlib/malloc.c:853: warning: cast from pointer to
> integer of different size
> /usr/src/lib/libc/stdlib/malloc.c: In function `malloc_mutex_unlock':
> /usr/src/lib/libc/stdlib/malloc.c:894: warning: cast from pointer to
> integer of different size
> *** Error code 1
> 2 errors
> *** Error code 2
> 1 error
> *** Error code 2
> 1 error
> *** Error code 2
> 1 error
> *** Error code 2
> 1 error
> *** Error code 2
> 1 error
> make -j 3 buildworld  763,54s user 150,98s system 173% cpu 8:48,62  
> total
>
> twin/usr/src#>uname -a
> FreeBSD twin.gnome.no 7.0-CURRENT FreeBSD 7.0-CURRENT #0: Thu Dec  1
> 21:38:11 CET 2005     root@...:/usr/obj/usr/src/sys/TWIN
> amd64

Did you use the 20051202b patch?  I thought I had fixed the problem,  
but I don't have an amd64 system to test on.  In any case, I'll be  
uploading up a new patch in a few minutes that removes the offending  
code entirely.

Thanks,
Jason
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Dec 3, 2005, at 12:26 AM, David Xu wrote:
> I have a question about mutex used in the patch, you are using
> a spin loop, isn't it suboptimal ? and a thread library like  
> libpthread
> supports static priority scheduling, this mutex does not work, it
> will causes a dead lock, if a lower priority thread locked the mutex,
> and preempted by a higher priority thread, and the higher priority
> thread also calls malloc, it will spin there to wait lower
> priority thread to complete, but that will never happen.

David,

You are correct that this is a problem.  Thank you for pointing it  
out.  There's a new patch that uses the spinlocks that are provided  
by the threads libraries.  Please let me know if this looks okay.

Also, this patch removes/modifies the code that was causing build  
failures on amd64, so it's worth giving another try.  Hopefully, it  
will compile now...

http://www.canonware.com/~jasone/jemalloc/jemalloc_20051203a.diff

Thanks,
Jason
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by kometen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Did you use the 20051202b patch?  I thought I had fixed the problem,
> but I don't have an amd64 system to test on.  In any case, I'll be
> uploading up a new patch in a few minutes that removes the offending
> code entirely.

Yes. I'll do the test that you want me to do :-) Thank you!

regards
Claus
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by David Xu-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jason Evans wrote:

> On Dec 3, 2005, at 12:26 AM, David Xu wrote:
>
>> I have a question about mutex used in the patch, you are using
>> a spin loop, isn't it suboptimal ? and a thread library like  libpthread
>> supports static priority scheduling, this mutex does not work, it
>> will causes a dead lock, if a lower priority thread locked the mutex,
>> and preempted by a higher priority thread, and the higher priority
>> thread also calls malloc, it will spin there to wait lower
>> priority thread to complete, but that will never happen.
>
>
> David,
>
> You are correct that this is a problem.  Thank you for pointing it  
> out.  There's a new patch that uses the spinlocks that are provided  
> by the threads libraries.  Please let me know if this looks okay.
>
> Also, this patch removes/modifies the code that was causing build  
> failures on amd64, so it's worth giving another try.  Hopefully, it  
> will compile now...
>
> http://www.canonware.com/~jasone/jemalloc/jemalloc_20051203a.diff
>
> Thanks,
> Jason
>
The libc spinlocks are deprecated, in fact, thread libraries try to keep
track
off all spinlocks in libc and reset them in child process, they will
complain
if there are too many spinlocks, this is not very correct, but would resolve
dead lock in real world applications (weird applications).
Because I see you have put _malloc_prefork() and _malloc_postfork()
hooks in thread libraries, I guess you want to manage all malloc locks, so
you might don't need to use the spinlocks,  you can implement these
locks by using umtx provided by kernel, you can use UMTX_OP_WAIT
and UMTX_OP_WAKE to implement these locks, the UMTX_OP_LOCK
and UMTX_OP_UNLOCK can also be used to implement locks, but I reserve
these two functions since I have plan to implement reliable POSIX process
shared mutex. you can find those code  in libthr to study how to use umtx.
Last, I don't know if umtx will work with libc_r, but libc_r has already
been
disconneted from world for some days, it will rot away.

Regards,
David Xu

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by David Xu-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Here is sample code to implement a mutex by using umtx syscalls:

#include <errno.h>
#include <stddef.h>
#include <sys/ucontext.h>
#include <sys/umtx.h>
#include <sys/types.h>
#include <machine/atomic.h>
#include <pthread.h>

#define LCK_UNLOCKED    0
#define LCK_LOCKED    1
#define LCK_CONTENDED    2

void
lock_mtx(struct umtx *mtx)
{
    volatile uintptr_t *m = (volatile uintptr_t *)mtx;

    for (;;) {
        /* try to lock it. */
        if (atomic_cmpset_acq_ptr(m, LCK_UNLOCKED, LCK_LOCKED))
            return;
        if (atomic_load_acq_ptr(m) == LCK_LOCKED) {
            /*
             * if it was locked by single thread, try to
             * set it to contented state.
             */
            if (!atomic_cmpset_acq_ptr(m, LCK_LOCKED, LCK_CONTENDED))
                continue;
        }
        /* if in contented state, wait it to be unlocked. */
        if (atomic_load_acq_ptr(m) == LCK_CONTENDED)
            _umtx_op((struct umtx *)m, UMTX_OP_WAIT, LCK_CONTENDED, 0,
NULL);
    }
}

void
unlock_mtx(struct umtx *mtx)
{
    volatile uintptr_t *m = (volatile uintptr_t *)mtx;

    for (;;) {
        if (atomic_load_acq_ptr(m) == LCK_UNLOCKED)
            err(1, "unlock a unlocked mutex\n");
        if (atomic_load_acq_ptr(m) == LCK_LOCKED) {
            if (atomic_cmpset_acq_ptr(m, LCK_LOCKED, LCK_UNLOCKED))
                return;
        }
        if (atomic_load_acq_ptr(m) == LCK_CONTENDED) {
            atomic_store_rel_ptr(m, LCK_UNLOCKED);
            _umtx_op((struct umtx *)m, UMTX_OP_WAKE, 1, NULL, NULL);
            break;
        }
    }
}

struct umtx m;

void *
lock_test(void *arg)
{
    int i = 0;

    for (i = 0; i < 10000; ++i) {
        lock_mtx(&m);
        pthread_yield();
        unlock_mtx(&m);
    }

    return (0);
}

int main()
{
    pthread_t td1, td2;

    pthread_create(&td1, NULL, lock_test, NULL);
    pthread_create(&td2, NULL, lock_test, NULL);

    pthread_join(td1, NULL);
    pthread_join(td2, NULL);
    return (0);
}

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by David Xu-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

David Xu wrote:

> Here is sample code to implement a mutex by using umtx syscalls:
> ...
> void
> unlock_mtx(struct umtx *mtx)
> {
>    volatile uintptr_t *m = (volatile uintptr_t *)mtx;
>
>    for (;;) {
>        if (atomic_load_acq_ptr(m) == LCK_UNLOCKED)
>            err(1, "unlock a unlocked mutex\n");
>        if (atomic_load_acq_ptr(m) == LCK_LOCKED) {
>            if (atomic_cmpset_acq_ptr(m, LCK_LOCKED, LCK_UNLOCKED))
>                return;
>        }
>        if (atomic_load_acq_ptr(m) == LCK_CONTENDED) {
>            atomic_store_rel_ptr(m, LCK_UNLOCKED);
>            _umtx_op((struct umtx *)m, UMTX_OP_WAKE, 1, NULL, NULL);
OOP, should be:
         _umtx_op((struct umtx *)m, UMTX_OP_WAKE, INT_MAX, NULL, NULL);

This line is not very optimal if there are lots of thread waiting there.
:-)

There is optimal version using transaction id:
http://www.dragonflybsd.org/cvsweb/src/lib/libthread_xu/thread/thr_umtx.c?rev=1.2&content-type=text/x-cvsweb-markup
Though, libthr in freebsd does not use these semantices, instead they
are implemented in kernel.

David Xu

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Dec 3, 2005, at 5:40 PM, David Xu wrote:

> The libc spinlocks are deprecated, in fact, thread libraries try to  
> keep track
> off all spinlocks in libc and reset them in child process, they  
> will complain
> if there are too many spinlocks, this is not very correct, but  
> would resolve
> dead lock in real world applications (weird applications).
> Because I see you have put _malloc_prefork() and _malloc_postfork()
> hooks in thread libraries, I guess you want to manage all malloc  
> locks, so
> you might don't need to use the spinlocks,  you can implement these
> locks by using umtx provided by kernel, you can use UMTX_OP_WAIT
> and UMTX_OP_WAKE to implement these locks, the UMTX_OP_LOCK
> and UMTX_OP_UNLOCK can also be used to implement locks, but I reserve
> these two functions since I have plan to implement reliable POSIX  
> process
> shared mutex. you can find those code  in libthr to study how to  
> use umtx.
> Last, I don't know if umtx will work with libc_r, but libc_r has  
> already been
> disconneted from world for some days, it will rot away.

I just need simple (low overhead) mutexes that don't cause malloc to  
be called during their initialization.  I would have used  
pthread_mutex_* directly, but cannot due to infinite recursion  
problems during initialization.

As you pointed out, it's important to get priority inheritance right  
in order to avoid priority inversion deadlock, so my hand-rolled  
spinlocks weren't adequate.  I need mutexes that are managed by the  
threads library.  The libc spinlocks appear to fit the bill perfectly  
in that capacity.  It seems to me that using umtx would actually be  
the wrong thing to do, because I'd be circumventing libpthread's  
userland scheduler, and it would be the wrong thing for libc_r, as  
you pointed out.  This approach would work for libthr, but perhaps  
nothing else?

I'd like to keep things as simple and general as possible.  Is the  
current implementation that uses libc spinlocks acceptable?

Thanks,
Jason

P.S. Why are libc spinlocks deprecated?
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by David Xu-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jason Evans wrote:
> I just need simple (low overhead) mutexes that don't cause malloc to  be
> called during their initialization.
umtx is light weight and fast and need not malloc.

>  I would have used  pthread_mutex_*
> directly, but cannot due to infinite recursion  problems during
> initialization.
>
Yes, I know current pthread_mutex implementations use malloc,
I don't think it will be changed to avoid using malloc very soon.

> As you pointed out, it's important to get priority inheritance right  in
> order to avoid priority inversion deadlock, so my hand-rolled  spinlocks
> weren't adequate.  I need mutexes that are managed by the  threads
> library.  The libc spinlocks appear to fit the bill perfectly  in that
> capacity.  It seems to me that using umtx would actually be  the wrong
> thing to do, because I'd be circumventing libpthread's  userland
> scheduler, and it would be the wrong thing for libc_r, as  you pointed
> out.  This approach would work for libthr, but perhaps  nothing else?
>
umtx will work with libpthread, I can not find any reason why using umtx
will cause deadlock, the userland scheduler can not propagate its
priority decision cross kernel, and umtx is a blockable syscall.

> I'd like to keep things as simple and general as possible.  Is the  
> current implementation that uses libc spinlocks acceptable?
>
> Thanks,
> Jason
>
> P.S. Why are libc spinlocks deprecated?
>
>
Because we want other libraries use pthread mutex, if it can not be
used widely and we have to use spinlock, it is really a bad taste.
I think only the malloc has recursive problem. I tell you the fact,
libpthread needs malloc to initialize spinlock, so you can not create
spinlock dynamically in your malloc code. only libthr does not have the
problem. libc_r also has priority inversion problem with your current
mutex code.

Regards,
David Xu

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by kometen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Did you use the 20051202b patch?  I thought I had fixed the problem,
> but I don't have an amd64 system to test on.  In any case, I'll be
> uploading up a new patch in a few minutes that removes the offending
> code entirely.

I was able to do a buildworld on current with this patch, but I had
problems getting X to run and kldxref took all my space on the
root-partition doing a installkernel. So I downgraded to 6.0 stable
and get this error:

===> libexec/atrun (all)
cc -O2 -fno-strict-aliasing -pipe -march=athlon64
-DATJOB_DIR=\"/var/at/jobs/\"  -DLFILE=\"/var/at/jobs/.lockfile\"
-DLOADAVG_MX=1.5 -DATSPOOL_DIR=\"/var/at/spool\"  -DVERSION=\"2.9\"
-DDAEMON_UID=1 -DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'
-DDEFAULT_AT_QUEUE=\'c\' -DPERM_PATH=\"/var/at/\"
-I/usr/src/libexec/atrun/../../usr.bin/at -I/usr/src/libexec/atrun  -c
/usr/src/libexec/atrun/atrun.c
cc -O2 -fno-strict-aliasing -pipe -march=athlon64
-DATJOB_DIR=\"/var/at/jobs/\"  -DLFILE=\"/var/at/jobs/.lockfile\"
-DLOADAVG_MX=1.5 -DATSPOOL_DIR=\"/var/at/spool\"  -DVERSION=\"2.9\"
-DDAEMON_UID=1 -DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'
-DDEFAULT_AT_QUEUE=\'c\' -DPERM_PATH=\"/var/at/\"
-I/usr/src/libexec/atrun/../../usr.bin/at -I/usr/src/libexec/atrun  -c
/usr/src/libexec/atrun/gloadavg.c
cc -O2 -fno-strict-aliasing -pipe -march=athlon64
-DATJOB_DIR=\"/var/at/jobs/\"  -DLFILE=\"/var/at/jobs/.lockfile\"
-DLOADAVG_MX=1.5 -DATSPOOL_DIR=\"/var/at/spool\"  -DVERSION=\"2.9\"
-DDAEMON_UID=1 -DDAEMON_GID=1  -DDEFAULT_BATCH_QUEUE=\'E\'
-DDEFAULT_AT_QUEUE=\'c\' -DPERM_PATH=\"/var/at/\"
-I/usr/src/libexec/atrun/../../usr.bin/at -I/usr/src/libexec/atrun  
-o atrun atrun.o gloadavg.o
/usr/obj/usr/src/tmp/usr/lib/libc.so: undefined reference to `calloc'
/usr/obj/usr/src/tmp/usr/lib/libc.so: undefined reference to `posix_memalign'
*** Error code 1

Stop in /usr/src/libexec/atrun.
*** Error code 1

Stop in /usr/src/libexec.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.
make buildworld  1122,93s user 217,28s system 84% cpu 26:18,72 total

twin/usr/src#>uname -a
FreeBSD twin.gnome.no 6.0-STABLE FreeBSD 6.0-STABLE #0: Sun Dec  4
01:18:58 CET 2005     root@...:/usr/obj/usr/src/sys/TWIN
amd64


regards
Claus
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Daniel Eischen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sun, 4 Dec 2005, David Xu wrote:

> Jason Evans wrote:
> > I just need simple (low overhead) mutexes that don't cause malloc to  be
> > called during their initialization.
> umtx is light weight and fast and need not malloc.
>
> >  I would have used  pthread_mutex_*
> > directly, but cannot due to infinite recursion  problems during
> > initialization.
> >
> Yes, I know current pthread_mutex implementations use malloc,
> I don't think it will be changed to avoid using malloc very soon.

It's on my list of things to do.

> > As you pointed out, it's important to get priority inheritance right  in
> > order to avoid priority inversion deadlock, so my hand-rolled  spinlocks
> > weren't adequate.  I need mutexes that are managed by the  threads
> > library.  The libc spinlocks appear to fit the bill perfectly  in that
> > capacity.  It seems to me that using umtx would actually be  the wrong
> > thing to do, because I'd be circumventing libpthread's  userland
> > scheduler, and it would be the wrong thing for libc_r, as  you pointed
> > out.  This approach would work for libthr, but perhaps  nothing else?
> >
> umtx will work with libpthread, I can not find any reason why using umtx
> will cause deadlock, the userland scheduler can not propagate its
> priority decision cross kernel, and umtx is a blockable syscall.

The problem is userland code can exit, circumvent the unlock by
exception handling, take a signal and longjmp, etc., which may
leave locks (not known by libpthread) held.  At least with
spinlocks or mutex, the thread libraries can know that the
application is in a critical region and can behave accordingly.
Libpthread will defer switching threads when they are in
critical regions (unless they are blocked).

I think that libc or other libraries that want to be thread-safe
shouldn't try to roll their own locks.  The reason to do so is
that lock overhead may be deemed too great.  If that is the
case, then we should fix the problem at its source ;-)
Of course, the other reason is that mutexes currently have to
be allocated.

--
DE

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Dec 4, 2005, at 4:51 AM, Claus Guttesen wrote:
> I was able to do a buildworld on current with this patch, but I had
> problems getting X to run and kldxref took all my space on the
> root-partition doing a installkernel.

I've fixed the offending bug in kldxref in the latest patch:

http://www.canonware.com/~jasone/jemalloc/jemalloc_20051211b.diff

I spent several hours poking at X, but was never able to determine  
why it goes into an infinite loop.  The infinite loop happens rather  
early, during the load of the libbitmap module.  My best guess is  
that it is stuck trying to acquire the Xlib lock, but cannot be sure,  
since I don't know how to get debug symbols for the loaded X module.  
In any case, malloc is nowhere in the backtrace.  I do not have the  
time to acquire the X expertise that is likely needed to track down  
this problem.  Hopefully someone else will be willing to do so.

No new problems in the malloc code have been found for some time  
now.  It has been tested on i386, sparc64, arm, and amd64.  In my  
opinion, the malloc patch is ready to be committed.  I am now working  
on the assumption that new problems are more likely application bugs  
than malloc bugs.  This seems like a good time to start sharing the  
debugging load with the community. =)

So, how about it?  Can this patch go in now?

Thanks,
Jason
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Julian Elischer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jason Evans wrote:

> On Dec 4, 2005, at 4:51 AM, Claus Guttesen wrote:
>
>> I was able to do a buildworld on current with this patch, but I had
>> problems getting X to run and kldxref took all my space on the
>> root-partition doing a installkernel.
>
>
> I've fixed the offending bug in kldxref in the latest patch:
>
> http://www.canonware.com/~jasone/jemalloc/jemalloc_20051211b.diff
>
> I spent several hours poking at X, but was never able to determine  
> why it goes into an infinite loop.  The infinite loop happens rather  
> early, during the load of the libbitmap module.  My best guess is  
> that it is stuck trying to acquire the Xlib lock, but cannot be sure,  
> since I don't know how to get debug symbols for the loaded X module.  
> In any case, malloc is nowhere in the backtrace.  I do not have the  
> time to acquire the X expertise that is likely needed to track down  
> this problem.  Hopefully someone else will be willing to do so.
>
> No new problems in the malloc code have been found for some time  
> now.  It has been tested on i386, sparc64, arm, and amd64.  In my  
> opinion, the malloc patch is ready to be committed.  I am now working  
> on the assumption that new problems are more likely application bugs  
> than malloc bugs.  This seems like a good time to start sharing the  
> debugging load with the community. =)
>
> So, how about it?  Can this patch go in now?


I may have missed it but some benchmark numbers could be good.

Is there no way to make it an option for a while?
that would get good testing AND a fallback for people.


>
> Thanks,
> Jason
> _______________________________________________
> freebsd-current@... mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to
> "freebsd-current-unsubscribe@..."

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by David Xu-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Julian Elischer wrote:

>>
>> No new problems in the malloc code have been found for some time  
>> now.  It has been tested on i386, sparc64, arm, and amd64.  In my  
>> opinion, the malloc patch is ready to be committed.  I am now
>> working  on the assumption that new problems are more likely
>> application bugs  than malloc bugs.  This seems like a good time to
>> start sharing the  debugging load with the community. =)
>>
>> So, how about it?  Can this patch go in now?
>
>
>
> I may have missed it but some benchmark numbers could be good.
>
> Is there no way to make it an option for a while?
> that would get good testing AND a fallback for people.
>
I also would like to see any benchmark number, in fact, I had plan
to import ptmalloc in the past, the malloc problem had been discussed
several times in thread@ list.
Also, it would be nice if a fallback can be provided  :-)

David Xu

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Kris Kennaway :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Dec 12, 2005 at 08:50:01AM +0800, David Xu wrote:

> Julian Elischer wrote:
>
> >>
> >>No new problems in the malloc code have been found for some time  
> >>now.  It has been tested on i386, sparc64, arm, and amd64.  In my  
> >>opinion, the malloc patch is ready to be committed.  I am now
> >>working  on the assumption that new problems are more likely
> >>application bugs  than malloc bugs.  This seems like a good time to
> >>start sharing the  debugging load with the community. =)
> >>
> >>So, how about it?  Can this patch go in now?
> >
> >
> >
> >I may have missed it but some benchmark numbers could be good.
> >
> >Is there no way to make it an option for a while?
> >that would get good testing AND a fallback for people.
> >
> I also would like to see any benchmark number, in fact, I had plan
> to import ptmalloc in the past, the malloc problem had been discussed
> several times in thread@ list.
Here is the result of a benchmark that does 1K malloc()/free() with
multiple threads on a 14-CPU sparc64 machine.  This is a poor test
because sparc64 doesn't have TLS support, which is needed for jemalloc
to perform well.  It still shows it kicking the pants off of phkmalloc
for both single-threaded and multi-threaded malloc.

phkmalloc:

# ./malloc-test 1024 1000000 1
Starting test with 1 thread...
 Thread 2114048 adjusted timing: 27.124817 seconds for 1000000 requests of 1024 bytes.

Starting test with 2 threads...
 Thread 2114560 adjusted timing: 67.535854 seconds for 1000000 requests of 1024 bytes.
 Thread 2114048 adjusted timing: 70.330298 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 3
Starting test with 3 threads...
 Thread 2114048 adjusted timing: 74.154855 seconds for 1000000 requests of 1024 bytes.
 Thread 2115072 adjusted timing: 74.356363 seconds for 1000000 requests of 1024 bytes.
 Thread 2114560 adjusted timing: 77.038550 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 4
Starting test with 4 threads...
 Thread 2115072 adjusted timing: 217.741657 seconds for 1000000 requests of 1024 bytes.
 Thread 2115584 adjusted timing: 228.434310 seconds for 1000000 requests of 1024 bytes.
 Thread 2114048 adjusted timing: 228.941544 seconds for 1000000 requests of 1024 bytes.
 Thread 2114560 adjusted timing: 229.286134 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 5
Starting test with 5 threads...
 Thread 2114048 adjusted timing: 770.255000 seconds for 1000000 requests of 1024 bytes.
 Thread 2115072 adjusted timing: 770.749431 seconds for 1000000 requests of 1024 bytes.
 Thread 2116096 adjusted timing: 771.307654 seconds for 1000000 requests of 1024 bytes.
 Thread 2114560 adjusted timing: 772.293253 seconds for 1000000 requests of 1024 bytes.
 Thread 2115584 adjusted timing: 772.550847 seconds for 1000000 requests of 1024 bytes.

jemalloc:

# ./malloc-test 1024 1000000 1
Starting test with 1 thread...
 Thread -1610612656 adjusted timing: 5.428918 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 2
Starting test with 2 threads...
 Thread -1610612656 adjusted timing: 4.840497 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612176 adjusted timing: 4.948382 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 3
Starting test with 3 threads...
 Thread -1610611696 adjusted timing: 25.065195 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612656 adjusted timing: 25.218103 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612176 adjusted timing: 25.286181 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 4
Starting test with 4 threads...
 Thread -1610612656 adjusted timing: 38.176479 seconds for 1000000 requests of 1024 bytes.
 Thread -1610611216 adjusted timing: 38.221169 seconds for 1000000 requests of 1024 bytes.
 Thread -1610611696 adjusted timing: 38.294425 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612176 adjusted timing: 38.320669 seconds for 1000000 requests of 1024 bytes.

# ./malloc-test 1024 1000000 5
Starting test with 5 threads...
 Thread -1610611216 adjusted timing: 50.376766 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612656 adjusted timing: 50.435407 seconds for 1000000 requests of 1024 bytes.
 Thread -1610611696 adjusted timing: 50.885393 seconds for 1000000 requests of 1024 bytes.
 Thread -1610610736 adjusted timing: 50.943412 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612176 adjusted timing: 50.953694 seconds for 1000000 requests of 1024 bytes.

i.e. jemalloc is a factor of 5 times faster for single-threaded
malloc, and about 15 times faster than phkmalloc for 5 threads.  You
see the effect of the missing TLS on sparc64 in the scaling
(i.e. performance should be even better with multiple threads), and
with some large performance variation with larger numbers of threads
(probably due to hash collisions):

# ./malloc-test 1024 1000000 20
Starting test with 20 threads...
 Thread -1610604016 adjusted timing: 48.297304 seconds for 1000000 requests of 1024 bytes.
 Thread -1610604496 adjusted timing: 104.249693 seconds for 1000000 requests of 1024 bytes.
 Thread -1610602496 adjusted timing: 109.578616 seconds for 1000000 requests of 1024 bytes.
 Thread -1610607856 adjusted timing: 252.337973 seconds for 1000000 requests of 1024 bytes.
 Thread -1610610736 adjusted timing: 254.338225 seconds for 1000000 requests of 1024 bytes.
 Thread -1610606896 adjusted timing: 255.015353 seconds for 1000000 requests of 1024 bytes.
 Thread -1610607376 adjusted timing: 257.463410 seconds for 1000000 requests of 1024 bytes.
 Thread -1610609776 adjusted timing: 257.848283 seconds for 1000000 requests of 1024 bytes.
 Thread -1610605936 adjusted timing: 257.955005 seconds for 1000000 requests of 1024 bytes.
 Thread -1610604976 adjusted timing: 259.303220 seconds for 1000000 requests of 1024 bytes.
 Thread -1610611216 adjusted timing: 259.610871 seconds for 1000000 requests of 1024 bytes.
 Thread -1610606416 adjusted timing: 260.622687 seconds for 1000000 requests of 1024 bytes.
 Thread -1610611696 adjusted timing: 260.857706 seconds for 1000000 requests of 1024 bytes.
 Thread -1610610256 adjusted timing: 261.056716 seconds for 1000000 requests of 1024 bytes.
 Thread -1610608816 adjusted timing: 261.764455 seconds for 1000000 requests of 1024 bytes.
 Thread -1610609296 adjusted timing: 261.800319 seconds for 1000000 requests of 1024 bytes.
 Thread -1610605456 adjusted timing: 261.748707 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612176 adjusted timing: 262.108598 seconds for 1000000 requests of 1024 bytes.
 Thread -1610608336 adjusted timing: 262.119440 seconds for 1000000 requests of 1024 bytes.
 Thread -1610612656 adjusted timing: 262.315112 seconds for 1000000 requests of 1024 bytes.

I'll try to test this on a 4 CPU amd64 machine next.

Kris


attachment0 (194 bytes) Download Attachment

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Dec 11, 2005, at 4:35 PM, Julian Elischer wrote:
> I may have missed it but some benchmark numbers could be good.

I haven't posted any benchmark numbers, but that is a reasonable  
request.  Here's a summary of what I've seen so far.

For single-threaded apps, phkmalloc and jemalloc exhibit very similar  
performance for all of the benchmarks I've run.  Neither is a clear  
winner over the other from what I've seen.

Kris Kennaway already posted some multi-threaded microbenchmark  
results.  My tests have yielded similar results to his.

It would be very informative to run benchmarks with real world  
multithreaded apps.  bind9 (built with threading support) would be a  
great candidate, but thus far I haven't gotten a chance to use the  
machines that Robert Watson uses for such tests.

> Is there no way to make it an option for a while?
> that would get good testing AND a fallback for people.

Unfortunately, there are some low level issues that make the two  
malloc implementations incompatible, and they both need access to  
libc internals in order to work correctly in a multi-threaded  
program.  The way I have been comparing the two implementations is  
via chroot installations.  It might be possible to make the two  
compatible (would require extra coding), but since both of them need  
to be part of libc, we would need a way of building separate libc  
libraries for the two mallocs.  This all seems uglier than it's worth  
to me.  Maybe there's another way...

Jason
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."
< Prev | 1 - 2 - 3 - 4 | Next >