New libc malloc patch

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 - 4 | Next >

New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

There is a patch that contains a new libc malloc implementation at:

http://www.canonware.com/~jasone/jemalloc/jemalloc_20051127a.diff

This implementation is very different from the current libc malloc.  
Probably the most important difference is that this one is designed  
with threads and SMP in mind.

The patch has been tested for stability quite a bit already, thanks  
mainly to Kris Kennaway.  However, any help with performance testing  
would be greatly appreciated.  Specifically, I'd like to know how  
well this malloc holds up to threaded workloads on SMP systems.  If  
you have an application that relies on threads, please let me know  
how performance is affected.

Naturally, if you notice horrible performance or ridiculous resident  
memory usage, that's a bad thing and I'd like to hear about it.

Thanks,
Jason

=== Important notes:

* You need to do a full buildworld/installworld in order for the  
patch to work correctly, due to various integration issues with the  
threads libraries and rtld.

* The virtual memory size of processes, as reported in the SIZE field  
by top, will appear astronomical for almost all processes (32+ MB).  
This is expected; it is merely an artifact of using large mmap()ed  
regions rather than sbrk().

* In keeping with the default option settings for CURRENT, the A and  
J flags are enabled by default.  When conducting performance tests,  
specify MALLOC_OPTIONS="aj" .

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Hiten Pandya :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jason,

I see that you have included an implementation of red-black tree CPP
macros, but wouldn't it be better if you were to use the ones in
<sys/tree.h> ?  I have only had a precursory look, but I would have
thought that would be the way to go.

Just a suggestion.

Best regards,

--
Hiten Pandya
hmp at freebsd.org

On 29/11/05, Jason Evans <jasone@...> wrote:

> There is a patch that contains a new libc malloc implementation at:
>
> http://www.canonware.com/~jasone/jemalloc/jemalloc_20051127a.diff
>
> This implementation is very different from the current libc malloc.
> Probably the most important difference is that this one is designed
> with threads and SMP in mind.
>
> The patch has been tested for stability quite a bit already, thanks
> mainly to Kris Kennaway.  However, any help with performance testing
> would be greatly appreciated.  Specifically, I'd like to know how
> well this malloc holds up to threaded workloads on SMP systems.  If
> you have an application that relies on threads, please let me know
> how performance is affected.
>
> Naturally, if you notice horrible performance or ridiculous resident
> memory usage, that's a bad thing and I'd like to hear about it.
>
> Thanks,
> Jason
>
> === Important notes:
>
> * You need to do a full buildworld/installworld in order for the
> patch to work correctly, due to various integration issues with the
> threads libraries and rtld.
>
> * The virtual memory size of processes, as reported in the SIZE field
> by top, will appear astronomical for almost all processes (32+ MB).
> This is expected; it is merely an artifact of using large mmap()ed
> regions rather than sbrk().
>
> * In keeping with the default option settings for CURRENT, the A and
> J flags are enabled by default.  When conducting performance tests,
> specify MALLOC_OPTIONS="aj" .
>
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Maxim Sobolev :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Just curious what is the grand plan for this work? I wonder if it will
make sense to have two malloc's in the system, so that user can select
one which better suits his needs.

-Maxim

Jason Evans wrote:

> There is a patch that contains a new libc malloc implementation at:
>
> http://www.canonware.com/~jasone/jemalloc/jemalloc_20051127a.diff
>
> This implementation is very different from the current libc malloc.  
> Probably the most important difference is that this one is designed with
> threads and SMP in mind.
>
> The patch has been tested for stability quite a bit already, thanks
> mainly to Kris Kennaway.  However, any help with performance testing
> would be greatly appreciated.  Specifically, I'd like to know how well
> this malloc holds up to threaded workloads on SMP systems.  If you have
> an application that relies on threads, please let me know how
> performance is affected.
>
> Naturally, if you notice horrible performance or ridiculous resident
> memory usage, that's a bad thing and I'd like to hear about it.
>
> Thanks,
> Jason
>
> === Important notes:
>
> * You need to do a full buildworld/installworld in order for the patch
> to work correctly, due to various integration issues with the threads
> libraries and rtld.
>
> * The virtual memory size of processes, as reported in the SIZE field by
> top, will appear astronomical for almost all processes (32+ MB).  This
> is expected; it is merely an artifact of using large mmap()ed regions
> rather than sbrk().
>
> * In keeping with the default option settings for CURRENT, the A and J
> flags are enabled by default.  When conducting performance tests,
> specify MALLOC_OPTIONS="aj" .
>
> _______________________________________________
> freebsd-current@... mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."
>
>
>

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 29, 2005, at 2:52 AM, Hiten Pandya wrote:
> I see that you have included an implementation of red-black tree CPP
> macros, but wouldn't it be better if you were to use the ones in
> <sys/tree.h> ?  I have only had a precursory look, but I would have
> thought that would be the way to go.

There is a feature missing from sys/tree.h that I need (rb_nsearch()  
in the patch), but you are right that it would probably be best to  
use sys/tree.h.  I am going to work on adding RB_NFIND(), and will  
then try switching to sys/tree.h.

Thanks,
Jason
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 29, 2005, at 3:37 AM, Maxim Sobolev wrote:
> Just curious what is the grand plan for this work? I wonder if it  
> will make sense to have two malloc's in the system, so that user  
> can select one which better suits his needs.

The plan for this work is to replace the current malloc, rather than  
augmenting it.  There is a long history in Unix of using shared  
library tricks to override the system malloc, and the patch does not  
change the ability to do so.  However, in my opinion, explicitly  
providing multiple implementations of malloc in the base OS misses  
the point of providing a general purpose memory allocator.  The goal  
is to have a single implementation that works well for the vast  
majority of extant programs, and to allow applications to provide  
their own implementations when the general purpose allocator fails to  
perform adequately.  phkmalloc did an excellent job in this capacity  
for quite some time, but now that we need to commonly support  
threaded programs on SMP systems, phkmalloc is being strained rather  
badly.  This isn't an indication that we need multiple malloc  
implementations in the base OS; rather it indicates that the system  
malloc implementation needs to take into account constraints that did  
not exist when phkmalloc was designed.

Jason
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Poul-Henning Kamp :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In message <1A7D4B98-9474-42B6-8A21-4C9AB8582EC1@...>, Jason Evans wr
ites:

>[...]  phkmalloc did an excellent job in this capacity  
>for quite some time, but now that we need to commonly support  
>threaded programs on SMP systems, phkmalloc is being strained rather  
>badly.  This isn't an indication that we need multiple malloc  
>implementations in the base OS; rather it indicates that the system  
>malloc implementation needs to take into account constraints that did  
>not exist when phkmalloc was designed.

The malloc phkmalloc replaced was written at some point in the
1980ies on a VAX, and more or less assumed the Vax was effectively
a single user machine and without effective paging algorithms.

Phkmalloc was written in 1994/5 where I had 4MB of RAM in my
"Gateway Handbook 486" and very strongly assumed that with the
RAM prices of the day, I could not afford an upgrade.

I gave a talk about phkmalloc at USENIX ATC 1998 in New Orleans.
One of the central points in the talk was that infrastructure code
should have regular service overhauls, to check that the assumptions
in the design is still valid.

In addition to assumptions phkmalloc makes which are no longer
relevant, there are many assumptions which should be made today
which phkmalloc is not aware of, multi-threading being but one of
them.  Cache line effects, pipeline prefetching, multi-cpu systems,
different VM system algorithms, larger address spaces etc etc etc.

Once Jason is done, I have no doubts that "jemalloc" will beat
phkalloc in all relevant benchmarking thereby neatly rendering any
discussion about having multiple mallocs in the tree pointless.

A big thank you from the author of phkmalloc to Jason for following
the service manual to the letter :-)

Poul-Henning

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@...         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Jon Dama :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have a rather strong objection to make to this proposal (read: if this
change goes in I'm going to have to go through the effort of ripping it
out locally...):

There exists a problem right now--localized to i386 and any other arch
based on 32-bit pointers: address space is simply too scarce.

Your decision to switch to using mmap as the exclusive source of malloc
buckets is admirable for its modernity but it simply cannot stand unless
someone steps up to change the way mmap and brk interact within the
kernel.

The trouble arises from the need to set MAXDSIZ and the resulting effect
it has in determining the start of the mmap region--which I might add is
the location that the shared library loader is placed.  This effectively
(and explicitly) sets the limit for how large of a contiguous region can
be allocated with brk.

What you've done by switching the system malloc to exclusively using
mmap is induced a lot of motivation on the part of the sysadmin to push
that brk/mmap boundary down.

This wouldn't be a problem except that you've effectively shot in the foot
dozens of alternative c malloc implementations, not to mention the memory
allocator routines used in obscure languages such as Modula-3 and Haskell
that rely on brk derived buckets.

This isn't playing very nicely!

I looked into the issues and limitations with phkmalloc several months ago
and concluded that simply adopting ptmalloc2 (the linux malloc) was the
better approach--notably it is willing to draw from both brk and mmap, and
it also implements per-thread arenas.

There is also cause for concern about your "cache-line" business.  Simply
on the face of it there is the problem that the scheduler does not do a
good job of pinning threads to individual CPUs.  The threads are already
bounding from cpu to cpu and thrashing (really thrashing) each CPU cache
along the way.

Second, you've forgotten that there is a layer of indirection between your
address space and the cache: the mapping of logical pages (what you can
see in userspace) to physical pages (the addresses of which actually
matter for the purposes of the cache).  I don't recall off-hand whether or
not the L1 cache on i386 is based on tags of the virtual addresses, but I
am certain that the L2 and L3 caches tag the physical addresses not the
virtual addresses.

This means that your careful address selection based on cache-lines will
only work out if it is done in the vm codepath: remember the mapping of
physical addresses to the virtual addresses that come back from mmap can
be delayed arbitrarily long into the future depending on when the program
actually goes to touch that memory.

Furthermore, the answer may vary depending on the architecture or even the
processor version.

-Jon

On Mon, 28 Nov 2005, Jason Evans wrote:

> There is a patch that contains a new libc malloc implementation at:
>
> http://www.canonware.com/~jasone/jemalloc/jemalloc_20051127a.diff
>
> This implementation is very different from the current libc malloc.
> Probably the most important difference is that this one is designed
> with threads and SMP in mind.
>
> The patch has been tested for stability quite a bit already, thanks
> mainly to Kris Kennaway.  However, any help with performance testing
> would be greatly appreciated.  Specifically, I'd like to know how
> well this malloc holds up to threaded workloads on SMP systems.  If
> you have an application that relies on threads, please let me know
> how performance is affected.
>
> Naturally, if you notice horrible performance or ridiculous resident
> memory usage, that's a bad thing and I'd like to hear about it.
>
> Thanks,
> Jason
>
> === Important notes:
>
> * You need to do a full buildworld/installworld in order for the
> patch to work correctly, due to various integration issues with the
> threads libraries and rtld.
>
> * The virtual memory size of processes, as reported in the SIZE field
> by top, will appear astronomical for almost all processes (32+ MB).
> This is expected; it is merely an artifact of using large mmap()ed
> regions rather than sbrk().
>
> * In keeping with the default option settings for CURRENT, the A and
> J flags are enabled by default.  When conducting performance tests,
> specify MALLOC_OPTIONS="aj" .
>
> _______________________________________________
> freebsd-current@... mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."
>
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jon,

Thanks for your comments.  Fortunately, I don't think things are  
quite as hopeless as you think, though you may be right that some  
adjustments are necessary.  Specific replies follow.

On Nov 29, 2005, at 12:06 PM, Jon Dama wrote:

> There exists a problem right now--localized to i386 and any other arch
> based on 32-bit pointers: address space is simply too scarce.
>
> Your decision to switch to using mmap as the exclusive source of  
> malloc
> buckets is admirable for its modernity but it simply cannot stand  
> unless
> someone steps up to change the way mmap and brk interact within the
> kernel.
>
> The trouble arises from the need to set MAXDSIZ and the resulting  
> effect
> it has in determining the start of the mmap region--which I might  
> add is
> the location that the shared library loader is placed.  This  
> effectively
> (and explicitly) sets the limit for how large of a contiguous  
> region can
> be allocated with brk.
>
> What you've done by switching the system malloc to exclusively using
> mmap is induced a lot of motivation on the part of the sysadmin to  
> push
> that brk/mmap boundary down.
>
> This wouldn't be a problem except that you've effectively shot in  
> the foot
> dozens of alternative c malloc implementations, not to mention the  
> memory
> allocator routines used in obscure languages such as Modula-3 and  
> Haskell
> that rely on brk derived buckets.
>
> This isn't playing very nicely!

Where should MAXDSIZ be?  Given scarce address space, the best we can  
hope for is setting it to the "least bad" default, as measured by  
what programs we care about do.  No matter what we do, some programs  
lose.

That said, it turns out that adding the ability to allocate via brk  
isn't hard.  The code already contains the logic to recycle address  
ranges, in order to reduce mmap system call overhead.  All of the  
locking infrastructure is also already in place.  The only necessary  
modifications are 1) explicit use of brk until all data segment space  
is consumed, 2) special case code for addresses in the brk range, so  
that madvise() is used instead of munmap(), and 3) preferential re-
use of the brk address space over mmap'ed memory.

Do you agree that there is no need for using brk on 64-bit systems?

> I looked into the issues and limitations with phkmalloc several  
> months ago
> and concluded that simply adopting ptmalloc2 (the linux malloc) was  
> the
> better approach--notably it is willing to draw from both brk and  
> mmap, and
> it also implements per-thread arenas.

ptmalloc takes a different approach to per-thread arenas that has  
been shown in multiple papers to not scale as well as the approach I  
took.  The difference isn't significant until you get to 8+ CPUs, but  
we already have systems running with enough CPUs that this is an issue.

> There is also cause for concern about your "cache-line" business.  
> Simply
> on the face of it there is the problem that the scheduler does not  
> do a
> good job of pinning threads to individual CPUs.  The threads are  
> already
> bounding from cpu to cpu and thrashing (really thrashing) each CPU  
> cache
> along the way.
>
> Second, you've forgotten that there is a layer of indirection  
> between your
> address space and the cache: the mapping of logical pages (what you  
> can
> see in userspace) to physical pages (the addresses of which actually
> matter for the purposes of the cache).  I don't recall off-hand  
> whether or
> not the L1 cache on i386 is based on tags of the virtual addresses,  
> but I
> am certain that the L2 and L3 caches tag the physical addresses not  
> the
> virtual addresses.
>
> This means that your careful address selection based on cache-lines  
> will
> only work out if it is done in the vm codepath: remember the  
> mapping of
> physical addresses to the virtual addresses that come back from  
> mmap can
> be delayed arbitrarily long into the future depending on when the  
> program
> actually goes to touch that memory.
>
> Furthermore, the answer may vary depending on the architecture or  
> even the
> processor version.

I don't think you understand the intent for the "cache-line  
business".  There is only one intention: avoid storing data  
structures in the same cache line if they are likely to be accessed  
simultaneously by multiple threads.  For example, if two independent  
structures are stored right next to each other, although they do not  
require synchronization protection from each other, the hardware will  
send a cache line invalidation message to other CPUs every time  
anything in the cache line is modified.  This means horrible cache  
performance if the data are modified often.

As a particular example, there are per-arena data structures that are  
modified quite often (arena_t).  If two arenas were right next to  
each other, then they could share a cache line, and performance would  
potentially be severly impacted.

|---arena_t---|---arena_t---|
|   |   |   |   |   |   |   |
              ^^^
              BAD!

Thanks,
Jason
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Jon Dama :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jason,

Actually I didn't mean to imply it was hopeless at all.  :-)  Obviously
there are solutions to the address space issues and agree it is possible
to change your code to improve the situation quite a bit; though I think
the best one might be to permit mmap to actually consume space below
maxdsiz either when hinted to do so or when the space above is consumed,
but before I recommend that too much, I must say that I haven't looked
into what that would entail at all.

Let me take a closer look at what you are doing with regards to
cache-lines.  You seem to be implying that you are only taking care in
regards to how you malloc within a given page?

I have a suspicion that it might just be better to dump the problem on to
the application in the sense that no malloc should ever be less
than the size of one cache line.  Perhaps this is what you are doing?

-Jon


On Tue, 29 Nov 2005, Jason Evans wrote:

> Jon,
>
> Thanks for your comments.  Fortunately, I don't think things are
> quite as hopeless as you think, though you may be right that some
> adjustments are necessary.  Specific replies follow.
>
> On Nov 29, 2005, at 12:06 PM, Jon Dama wrote:
> > There exists a problem right now--localized to i386 and any other arch
> > based on 32-bit pointers: address space is simply too scarce.
> >
> > Your decision to switch to using mmap as the exclusive source of
> > malloc
> > buckets is admirable for its modernity but it simply cannot stand
> > unless
> > someone steps up to change the way mmap and brk interact within the
> > kernel.
> >
> > The trouble arises from the need to set MAXDSIZ and the resulting
> > effect
> > it has in determining the start of the mmap region--which I might
> > add is
> > the location that the shared library loader is placed.  This
> > effectively
> > (and explicitly) sets the limit for how large of a contiguous
> > region can
> > be allocated with brk.
> >
> > What you've done by switching the system malloc to exclusively using
> > mmap is induced a lot of motivation on the part of the sysadmin to
> > push
> > that brk/mmap boundary down.
> >
> > This wouldn't be a problem except that you've effectively shot in
> > the foot
> > dozens of alternative c malloc implementations, not to mention the
> > memory
> > allocator routines used in obscure languages such as Modula-3 and
> > Haskell
> > that rely on brk derived buckets.
> >
> > This isn't playing very nicely!
>
> Where should MAXDSIZ be?  Given scarce address space, the best we can
> hope for is setting it to the "least bad" default, as measured by
> what programs we care about do.  No matter what we do, some programs
> lose.
>
> That said, it turns out that adding the ability to allocate via brk
> isn't hard.  The code already contains the logic to recycle address
> ranges, in order to reduce mmap system call overhead.  All of the
> locking infrastructure is also already in place.  The only necessary
> modifications are 1) explicit use of brk until all data segment space
> is consumed, 2) special case code for addresses in the brk range, so
> that madvise() is used instead of munmap(), and 3) preferential re-
> use of the brk address space over mmap'ed memory.
>
> Do you agree that there is no need for using brk on 64-bit systems?
>
> > I looked into the issues and limitations with phkmalloc several
> > months ago
> > and concluded that simply adopting ptmalloc2 (the linux malloc) was
> > the
> > better approach--notably it is willing to draw from both brk and
> > mmap, and
> > it also implements per-thread arenas.
>
> ptmalloc takes a different approach to per-thread arenas that has
> been shown in multiple papers to not scale as well as the approach I
> took.  The difference isn't significant until you get to 8+ CPUs, but
> we already have systems running with enough CPUs that this is an issue.
>
> > There is also cause for concern about your "cache-line" business.
> > Simply
> > on the face of it there is the problem that the scheduler does not
> > do a
> > good job of pinning threads to individual CPUs.  The threads are
> > already
> > bounding from cpu to cpu and thrashing (really thrashing) each CPU
> > cache
> > along the way.
> >
> > Second, you've forgotten that there is a layer of indirection
> > between your
> > address space and the cache: the mapping of logical pages (what you
> > can
> > see in userspace) to physical pages (the addresses of which actually
> > matter for the purposes of the cache).  I don't recall off-hand
> > whether or
> > not the L1 cache on i386 is based on tags of the virtual addresses,
> > but I
> > am certain that the L2 and L3 caches tag the physical addresses not
> > the
> > virtual addresses.
> >
> > This means that your careful address selection based on cache-lines
> > will
> > only work out if it is done in the vm codepath: remember the
> > mapping of
> > physical addresses to the virtual addresses that come back from
> > mmap can
> > be delayed arbitrarily long into the future depending on when the
> > program
> > actually goes to touch that memory.
> >
> > Furthermore, the answer may vary depending on the architecture or
> > even the
> > processor version.
>
> I don't think you understand the intent for the "cache-line
> business".  There is only one intention: avoid storing data
> structures in the same cache line if they are likely to be accessed
> simultaneously by multiple threads.  For example, if two independent
> structures are stored right next to each other, although they do not
> require synchronization protection from each other, the hardware will
> send a cache line invalidation message to other CPUs every time
> anything in the cache line is modified.  This means horrible cache
> performance if the data are modified often.
>
> As a particular example, there are per-arena data structures that are
> modified quite often (arena_t).  If two arenas were right next to
> each other, then they could share a cache line, and performance would
> potentially be severly impacted.
>
> |---arena_t---|---arena_t---|
> |   |   |   |   |   |   |   |
>               ^^^
>               BAD!
>
> Thanks,
> Jason
>
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 29, 2005, at 2:21 PM, Jon Dama wrote:
> Let me take a closer look at what you are doing with regards to
> cache-lines.  You seem to be implying that you are only taking care in
> regards to how you malloc within a given page?

You are correct that I am only taking care about allocations within a  
given page.

> I have a suspicion that it might just be better to dump the problem  
> on to
> the application in the sense that no malloc should ever be less
> than the size of one cache line.  Perhaps this is what you are doing?

I am only worrying about cache line alignment for malloc's internal  
data structures.  It's up to the application to do this for its  
allocations, if necessary (doing so for all allocations would induce  
unacceptable internal fragmentation).  This implementation provides  
posix_memalign(3), which makes it much less painful for the  
application to do so.

Jason
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by David Xu-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jon Dama wrote:

>I looked into the issues and limitations with phkmalloc several months ago
>and concluded that simply adopting ptmalloc2 (the linux malloc) was the
>better approach--notably it is willing to draw from both brk and mmap, and
>it also implements per-thread arenas.
>  
>
Hi Jon,

Is there any chance to test the jamalloc and ptmalloc2 ? I would
like to see next ten years, we will use a best performance memory
allocator. :-)

David Xu

_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by kometen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> There is a patch that contains a new libc malloc implementation at:
>
> http://www.canonware.com/~jasone/jemalloc/jemalloc_20051127a.diff
>
> This implementation is very different from the current libc malloc.
> Probably the most important difference is that this one is designed
> with threads and SMP in mind.

Do you need current for this? I patched and tried buildworld on 6.0
stable but no go.

regards
Claus
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Ulrich Spoerlein :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jason Evans wrote:
> * The virtual memory size of processes, as reported in the SIZE field by top, will appear
> astronomical for almost all processes (32+ MB).  This is expected; it is merely an artifact
> of using large mmap()ed regions rather than sbrk().

Hi,

I just read that mmap() part and have to wonder: Is it possible to
introduce something like the guard pages that OpenBSD has implemented?
I'd love to try this out and see the dozens of applications that fail
due to off-by-one bugs.

If the security features of OpenBSDs new malloc() could be implemented
as new MALLOC_OPTIONS directives, that would be fantastic!

Ulrich Spoerlein
--
 PGP Key ID: F0DB9F44 Encrypted mail welcome!
Fingerprint: F1CE D062 0CA9 ADE3 349B  2FE8 980A C6B5 F0DB 9F44
Ok, which part of "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn."
didn't you understand?


attachment0 (194 bytes) Download Attachment

Re: New libc malloc patch

by Poul-Henning Kamp :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In message <20051130111017.GA67032@...>, Ulrich Spoerlein writes:

>I just read that mmap() part and have to wonder: Is it possible to
>introduce something like the guard pages that OpenBSD has implemented?
>I'd love to try this out and see the dozens of applications that fail
>due to off-by-one bugs.

Guard-pages are very expensive and that is why I have not adopted
OpenBSD's patch.

I would advocate that people use one of the dedicated debugging malloc
implementations (ElectricFence ?) instead of putting too much overhead
into our default malloc.

For all practical purposes, the options J, A, X & Z are the most commonly
used.

--
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@...         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Daniel O'Connor-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 30 Nov 2005 21:48, Poul-Henning Kamp wrote:

> In message <20051130111017.GA67032@...>, Ulrich Spoerlein writes:
> >I just read that mmap() part and have to wonder: Is it possible to
> >introduce something like the guard pages that OpenBSD has implemented?
> >I'd love to try this out and see the dozens of applications that fail
> >due to off-by-one bugs.
>
> Guard-pages are very expensive and that is why I have not adopted
> OpenBSD's patch.
>
> I would advocate that people use one of the dedicated debugging malloc
> implementations (ElectricFence ?) instead of putting too much overhead
> into our default malloc.
Electric fence is right. Although it IS slow, an order of magnitude or more
usually. Also if you do use it you'll probably have to bump up the
vm.max_proc_mmap sysctl or it will fail to allocate memory.

Another good one is valgrind (and it detects more problems to boot :)

> For all practical purposes, the options J, A, X & Z are the most commonly
> used.

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


attachment0 (194 bytes) Download Attachment

Re: New libc malloc patch

by Ulrich Spoerlein :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Daniel O'Connor wrote:
> On Wed, 30 Nov 2005 21:48, Poul-Henning Kamp wrote:
> > In message <20051130111017.GA67032@...>, Ulrich Spoerlein writes:
> > >I just read that mmap() part and have to wonder: Is it possible to
> > >introduce something like the guard pages that OpenBSD has implemented?
> > >I'd love to try this out and see the dozens of applications that fail
> > >due to off-by-one bugs.
> >
> > Guard-pages are very expensive and that is why I have not adopted
> > OpenBSD's patch.

Yes, of course it should be disabled as default, but if it could be
implemented so you can switch at runtime or compile time (think
INVARIANTS/WITNESS) *and* there is no penalty for the disabled case,
that be nice.

> > I would advocate that people use one of the dedicated debugging malloc
> > implementations (ElectricFence ?) instead of putting too much overhead
> > into our default malloc.
>
> Electric fence is right. Although it IS slow, an order of magnitude or more
> usually. Also if you do use it you'll probably have to bump up the
> vm.max_proc_mmap sysctl or it will fail to allocate memory.
>
> Another good one is valgrind (and it detects more problems to boot :)

Yes, I usualy use dmalloc and valgrind. It's sad other developers don't
use any of these tools ...

Ulrich Spoerlein
--
 PGP Key ID: F0DB9F44 Encrypted mail welcome!
Fingerprint: F1CE D062 0CA9 ADE3 349B  2FE8 980A C6B5 F0DB 9F44
Ok, which part of "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn."
didn't you understand?


attachment0 (194 bytes) Download Attachment

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 30, 2005, at 4:30 AM, Ulrich Spoerlein wrote:

> Daniel O'Connor wrote:
>> On Wed, 30 Nov 2005 21:48, Poul-Henning Kamp wrote:
>>> In message <20051130111017.GA67032@...>, Ulrich  
>>> Spoerlein writes:
>>>> I just read that mmap() part and have to wonder: Is it possible to
>>>> introduce something like the guard pages that OpenBSD has  
>>>> implemented?
>>>> I'd love to try this out and see the dozens of applications that  
>>>> fail
>>>> due to off-by-one bugs.
>>>
>>> Guard-pages are very expensive and that is why I have not adopted
>>> OpenBSD's patch.
>
> Yes, of course it should be disabled as default, but if it could be
> implemented so you can switch at runtime or compile time (think
> INVARIANTS/WITNESS) *and* there is no penalty for the disabled case,
> that be nice.

In a previous version of the patch, I included compile-time support  
for redzones around allocations.  Kris Kennaway did a full ports tree  
build with redzones enabled, and several ports caused redzone  
corruption, but in every case it was due to writing one byte past the  
end of an allocation.  None of these were serious, since word  
alignment required that the "corrupted" byte be unused.  I suspect  
that we would catch very few serious errors, even if redzones were  
enabled by default.

Due to some unrelated performance issues, I later did a significant  
rework of the internal data structures, and decided to drop redzone  
support since the new data structures weren't as conducive to  
redzones.  Ultimately, I don't think we would have wanted to leave  
this feature enabled, even for CURRENT, because it required that all  
allocations be larger, thus bloating memory usage for all applications.

As a runtime-switchable feature, I think we still wouldn't want to  
leave it compiled in for production systems.  I spent a lot of time  
looking at valgrind (cachegrind tool) profiles, and found that even  
innocuous additional features such as the tracking of total allocated  
memory had significant negative impacts on performance.  The feature  
that I really didn't want to remove, that is also important to  
redzone support, was byte-exact tracking of allocation size.  The  
extra branches that would be required for runtime support of redzones  
probably wouldn't be worth the cost.

Jason
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by ache :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 30, 2005 at 06:32:54AM -0800, Jason Evans wrote:
> In a previous version of the patch, I included compile-time support  
> for redzones around allocations.  Kris Kennaway did a full ports tree  
> build with redzones enabled, and several ports caused redzone  
> corruption, but in every case it was due to writing one byte past the  
> end of an allocation.  None of these were serious, since word  
> alignment required that the "corrupted" byte be unused.  I suspect  
> that we would catch very few serious errors, even if redzones were  
> enabled by default.

You can make red zones word-aligned in addition to byte-aligned variant,
both as malloc options, of course.

--
http://ache.pp.ru/
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."

Re: New libc malloc patch

by Ulrich Spoerlein :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jason Evans wrote:
> [Why no redzones in jemalloc]

Thanks for the elaborate explanation. Greatly appreciated.

Ulrich Spoerlein
--
 PGP Key ID: F0DB9F44 Encrypted mail welcome!
Fingerprint: F1CE D062 0CA9 ADE3 349B  2FE8 980A C6B5 F0DB 9F44
Ok, which part of "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn."
didn't you understand?


attachment0 (194 bytes) Download Attachment

Re: New libc malloc patch

by Jason Evans :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 30, 2005, at 1:02 AM, Claus Guttesen wrote:
>> There is a patch that contains a new libc malloc implementation at:
>>
>> http://www.canonware.com/~jasone/jemalloc/jemalloc_20051127a.diff
>
> Do you need current for this? I patched and tried buildworld on 6.0
> stable but no go.

I started work on this before 6.0 branched, and am unaware of any  
changes that would impact the patch.  However, I've only used current  
for the development.

Jason
_______________________________________________
freebsd-current@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe@..."
< Prev | 1 - 2 - 3 - 4 | Next >