Whats with the 16-byte alignment in alloc.c?

View: New views
10 Messages — Rating Filter:   Alert me  

Whats with the 16-byte alignment in alloc.c?

by Mark Johnson-21 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I see this in  the qmail/netqmail CHANGES:

    19960812 portability problem: under Solaris 2.4 and possibly other
                   systems, the linker does not give generic alignment
to an array
                   of 4096 chars. tnx JP. impact: some subset of the programs
                   would (reliably) die with a bus error; in the Solaris case,
                   maildir2mbox. fix: redefine space in alloc.c to be aligned.

also this:

    19960829 change: used double union in alloc.c. tnx ME.

I don't see anything in djb's portability notes:

    http://cr.yp.to/unix.html

Are these just 13 year old portability hacks?  Or is there some deep
magic here?  If the former, any idea if they're still needed?  If the
latter, can anybody offer any insight?

Also, there is a (documented) overflow:

    ...
    n = ALIGNMENT + n - (n & (ALIGNMENT - 1)); /* XXX: could overflow */
    ...

See:

    http://seclists.org/fulldisclosure/2005/May/0115.html
    http://seclists.org/fulldisclosure/2005/May/0139.html

for more info.  Any reason not to apply the suggested fix?

Re: Whats with the 16-byte alignment in alloc.c?

by Kyle Wheeler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Tuesday, September  1 at 05:32 PM, quoth Mark Johnson:
>I see this in  the qmail/netqmail CHANGES:
>
>    19960812 portability problem: under Solaris 2.4 and possibly other
>                   systems, the linker does not give generic alignment
>to an array
>           of 4096 chars. tnx JP. impact: some subset of the programs
>           would (reliably) die with a bus error; in the Solaris case,
>           maildir2mbox. fix: redefine space in alloc.c to be aligned.

On SPARC systems (even modern ones) unaligned memory access causes a
bus error. It's no problem if you have a 4096 byte array that you're
treating as characters (i.e. byte by byte), but if you cast it to be
an array of ints or something, then you could easily be in trouble.
The linker says "ah, that's an array of characters, no alignment
necessary", but if you're casting the array to behave like an array of
ints or doubles, it'll crash. So here's what'd happen: he'd define
realspace to be a character array. The compiler would say "character
arrays don't require alignment, so... this will start at wherever's
convenient (probably someplace unaligned)". Then the code would ask
for something like alloc(sizeof(int)) and then try and use the
returned memory as a container for an int. Because the returned data
wasn't aligned, such operations would cause bus errors. It's possible
SPARC compilers have gotten somewhat smarter, but I wouldn't count on
it.

>also this:
>
>    19960829 change: used double union in alloc.c. tnx ME.

Interesting! That's a cute trick to make the linker allocate aligned
data.

>I don't see anything in djb's portability notes:
>
>    http://cr.yp.to/unix.html
>
>Are these just 13 year old portability hacks?  Or is there some deep
>magic here?

Portability. And yeah, I'd leave them in.

Now, I don't know why alloc.c is in there to begin with. I'm guessing
it's a way to avoid slow malloc implementations or something like
that, but I'm not sure. But for what it does, those details need to
stay there.

>Also, there is a (documented) overflow:
>
>    ...
>    n = ALIGNMENT + n - (n & (ALIGNMENT - 1)); /* XXX: could overflow */
>    ...
>
>See:
>
>    http://seclists.org/fulldisclosure/2005/May/0115.html
>    http://seclists.org/fulldisclosure/2005/May/0139.html
>
>for more info.  Any reason not to apply the suggested fix?

Mmmm... because it's inelegant. Offhand, I'd prefer a fix that looked
more like:

     if (n > avail) { errno = error_nomem; return NULL; }

I don't see any reason to use 0xfffffff0 or any other constant. On the
other hand, it's late, and I'm probably not reading that correctly.

~Kyle
- --
If an elderly respected expert in a given field tells you that
something can be done he is almost certainly right. If an elderly
respected expert in a given field tells you that something is
impossible, he is almost certainly wrong.
                                                  -- Robert A. Heinlein
-----BEGIN PGP SIGNATURE-----
Comment: Thank you for using encryption!

iQIcBAEBCAAGBQJKngfGAAoJECuveozR/AWescoP/RKTVY9kIpby0wsuHpdalFvC
OrAfHZwK+k2LfW06qTPykhqJDUaNk+vUlbbB9uvFAvpZiGolku3eUZnz/haHrKlX
pP1ZtJibgF1cv/x98nMsxArBxADS/DHcMaPfKKnGswYUKiWu8snW5WEw8qO85GS5
FLW/6ePKhNoc3hurn8aYJboHui2h5m+8eEYJq4IP65fOcWiTSGMnPFKG2SOSgGcs
LTjXkEnZ4zMsiDH7Ru1XOw6yqB9Y9Wss5+SOVXHtluTT7lxxGWgM6dRyRb49d157
Pi6jD2aPlBEcHMNkW8KP+cPSVDLUeyPOCuHMLwSxZgzNYojRxajuquZzpuQLQnU4
PIw/E4lSJqzbOCdVrEVohlOVmDB2QAY5owVp9g0pTqANKCPAYoS3qyId11i246Cv
rpSW4cTj5Zg98z4m68sOreHOcwCmUx1WAa7m8o4/xyN9ouJDxo0qmRyuK9aYHB1X
NM5ONQsSTZgpcHPcRD5Io6365j0rOF8Q1YGRl77ZNqEG9QTBxGSMcuf+D9DV/MZv
/2CXQiBjz1iRdKQp19IhppcLUCq3c1lJVDO2Q2K82R7bM+9biXwjuhI5fFiWyyjc
fpW9pew1nYG1UFEnkYK/AopFZDJreuuCV99twtl1MZ42cVJd4k05vmIcRx8ou/+E
Tb80iXwmZLVp4a+1lbxJ
=5jQe
-----END PGP SIGNATURE-----

Re: Whats with the 16-byte alignment in alloc.c?

by Matthew Dempsky-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm not exactly sure what the question is, but alloc() returns a
pointer to memory aligned along a 16-byte boundary so as to prevent
unaligned memory access.

The portability issue was that a char array only guarantees 1-byte
alignment (though many linkers align large arrays anyway).  By
changing the allocation to a union with a double, it guarantees at
least 8-byte alignment.

I suggest keeping the union with the double.

As for the overflow, I wouldn't test against 0xfffffff0: that post
assumes that stralloc_readyplus(&sa,1) only increases the allocation
by 1 byte, whereas it actually increases it by 12.5% + 30 more bytes.

Returning error_nomem when n >= 0xe0000000 seems like a safe boundary.

Re: Whats with the 16-byte alignment in alloc.c?

by Mark Johnson-21 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Sep 2, 2009 at 2:31 AM, Matthew Dempsky<matthew@...> wrote:
> I'm not exactly sure what the question is, but alloc() returns a
> pointer to memory aligned along a 16-byte boundary so as to prevent
> unaligned memory access.

It was more of a WTF?!? than a specific question.  I've spent too many
years working exclusively in high-level languages...

Why 16 bytes, though?  To support 128-bit data types?

> The portability issue was that a char array only guarantees 1-byte
> alignment (though many linkers align large arrays anyway).  By
> changing the allocation to a union with a double, it guarantees at
> least 8-byte alignment.
>
> I suggest keeping the union with the double.

I suppose there isn't any other way to guarantee the alignment of
realspace.  Even if an alignment fault isn't fatal, performance is
going to suck.

> As for the overflow, I wouldn't test against 0xfffffff0: that post
> assumes that stralloc_readyplus(&sa,1) only increases the allocation
> by 1 byte, whereas it actually increases it by 12.5% + 30 more bytes.
>
> Returning error_nomem when n >= 0xe0000000 seems like a safe boundary.

Actually, I'd really prefer something that doesn't limit the amount of
memory that can be allocated to an arbitrary value.  What about 16
bytes less than UINT_MAX?

Re: Whats with the 16-byte alignment in alloc.c?

by Mark Johnson-21 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Sep 2, 2009 at 12:51 AM, Kyle Wheeler<kyle-qmail@...> wrote:

> On SPARC systems (even modern ones) unaligned memory access causes a
> bus error. It's no problem if you have a 4096 byte array that you're
> treating as characters (i.e. byte by byte), but if you cast it to be
> an array of ints or something, then you could easily be in trouble.
> The linker says "ah, that's an array of characters, no alignment
> necessary", but if you're casting the array to behave like an array of
> ints or doubles, it'll crash. So here's what'd happen: he'd define
> realspace to be a character array. The compiler would say "character
> arrays don't require alignment, so... this will start at wherever's
> convenient (probably someplace unaligned)". Then the code would ask
> for something like alloc(sizeof(int)) and then try and use the
> returned memory as a container for an int. Because the returned data
> wasn't aligned, such operations would cause bus errors. It's possible
> SPARC compilers have gotten somewhat smarter, but I wouldn't count on
> it.

Now that I better understand what he's up to, I don't know that I'd
count on *any* compiler / linker to get that right.

>>Are these just 13 year old portability hacks?  Or is there some deep
>>magic here?
>
> Portability. And yeah, I'd leave them in.

Portability hacks to make a performance hack portable.

> Now, I don't know why alloc.c is in there to begin with. I'm guessing
> it's a way to avoid slow malloc implementations or something like
> that, but I'm not sure. But for what it does, those details need to
> stay there.

I really wish he'd left some design notes on this somewhere, but if he
did, I haven't found them.  If I had to guess, I'd say he was trying
to avoid heap allocation of trivial amounts of memory.  Because heap
allocation is always slower than stack allocation?  Because lots of
small requests tend to result in fragmentation?

>>for more info.  Any reason not to apply the suggested fix?
>
> Mmmm... because it's inelegant. Offhand, I'd prefer a fix that looked
> more like:
>
>     if (n > avail) { errno = error_nomem; return NULL; }
>
> I don't see any reason to use 0xfffffff0 or any other constant. On the
> other hand, it's late, and I'm probably not reading that correctly.

I don't think you are reading that correctly...but then again, it's
late, and I'm probably not, either.

Re: Whats with the 16-byte alignment in alloc.c?

by Matthew Dempsky-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Sep 4, 2009 at 9:55 PM, Mark Johnson<johnsonm@...> wrote:
> Actually, I'd really prefer something that doesn't limit the amount of
> memory that can be allocated to an arbitrary value.  What about 16
> bytes less than UINT_MAX?

If you feel compelled to use a symbolic expression, then use UINT_MAX / 8 * 7.

There's no reason to test against UINT_MAX - 16.  You can easily
detect an actual overflow condition using something like:

    unsigned int m = n;
    n = ALIGNMENT + n - (n & (ALIGNMENT - 1));
    if (n < m) { errno = error_nomem; return 0; }

However, if you're going to do that, then you should similarly change
gen_allocdefs.h and anywhere else that dynamically allocates memory to
make sure they don't overflow a uint either.

Re: Whats with the 16-byte alignment in alloc.c?

by Mark Johnson-21 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Sep 5, 2009 at 12:42 AM, Mark Johnson<johnsonm@...> wrote:
> I really wish he'd left some design notes on this somewhere, but if he
> did, I haven't found them.  If I had to guess, I'd say he was trying
> to avoid heap allocation of trivial amounts of memory.  Because heap
> allocation is always slower than stack allocation?  Because lots of
> small requests tend to result in fragmentation?

Because malloc overallocates?  He does *not* seem to like wasting pages:

http://marc.info/?l=freebsd-performance&m=105622549608769&w=2

Re: Whats with the 16-byte alignment in alloc.c?

by Mark Johnson-21 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Sep 5, 2009 at 3:13 AM, Matthew Dempsky<matthew@...> wrote:

> If you feel compelled to use a symbolic expression, then use UINT_MAX / 8 * 7.
>
> There's no reason to test against UINT_MAX - 16.  You can easily
> detect an actual overflow condition using something like:
>
>    unsigned int m = n;
>    n = ALIGNMENT + n - (n & (ALIGNMENT - 1));
>    if (n < m) { errno = error_nomem; return 0; }
>
> However, if you're going to do that, then you should similarly change
> gen_allocdefs.h and anywhere else that dynamically allocates memory to
> make sure they don't overflow a uint either.

Sounds like a plan.  Theoretically this is all moot with resource
limits in place, but I don't think that's an excuse to be lazy.

Re: Whats with the 16-byte alignment in alloc.c?

by Andy Bradford-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thus said Mark Johnson on Sat, 05 Sep 2009 17:19:59 CDT:

> Because malloc overallocates?  He does *not* seem to like wasting pages:
>
> http://marc.info/?lÿeebsd-performance&m5622549608769&wÿ

Interesting thread.  I wonder what  program he had  in mind when  he was
suggesting  spawning  10 thousand  processes.  Maybe  that was  just  an
extreme case to prove a point.

Andy
--
[-----------[system uptime]--------------------------------------------]
  9:15pm  up 17 min,  1 user,  load average: 1.05, 1.19, 0.87



Re: Whats with the 16-byte alignment in alloc.c?

by Mark Johnson-21 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Sep 5, 2009 at 5:19 PM, Mark Johnson <johnsonm@...> wrote:

> On Sat, Sep 5, 2009 at 12:42 AM, Mark Johnson<johnsonm@...> wrote:
>> I really wish he'd left some design notes on this somewhere, but if he
>> did, I haven't found them.  If I had to guess, I'd say he was trying
>> to avoid heap allocation of trivial amounts of memory.  Because heap
>> allocation is always slower than stack allocation?  Because lots of
>> small requests tend to result in fragmentation?
>
> Because malloc overallocates?  He does *not* seem to like wasting pages:
>
> http://marc.info/?l=freebsd-performance&m=105622549608769&w=2

Actually, realspace ends up in the ELF .bss section (with the other
static variables).

I actually found a reference to this trick of his later in that thread:

http://marc.info/?l=freebsd-performance&m=105652116929654&w=2

"Quite a few of my programs simulate this effect by checking for space
in a bss array, typically 2K"