ATLAS on win32, pthreads, SSE and stack alignment

View: New views
2 Messages — Rating Filter:   Alert me  

ATLAS on win32, pthreads, SSE and stack alignment

by Sébastien Kunz-Jacques-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I tried  to compile atlas 3.9.4 under Win32 with MinGW and pthreads.
While single threaded builds passed make check and make full_test just
fine, I encountered crashes in make ptcheck. After having investigated
the problem, I came to the conclusion that these crashes occur because
gcc always maintains a 16-byte aligned stack and pthreads-win32, in
accordance win the Win32 ABI, only guarantees a 4-byte aligned stack.

Since there is a function of the pthreads-win32 library that calls the
start function of a thread created with pthread_create, it is sufficient
to align the stack upon entry in that function to solve this
thread-related issues. It turns out there is a specific gcc attribute
for that,
__attribute__((force_align_arg_pointer))
make ptcheck then succeeds. However, there is no real reason why pthread
would align the stack itself. More importantly, this does not
necessarily solve the case where an ATLAS function compiled in a dll is
called from a compiler that does not align the stack as gcc does. So my
questions are as follows:

(a) Should ATLAS align itself the stack in any function called from
pthread_create? The alternative would be to make it in the pthread-win32
lib; I started a thread (no pun intended) on this topic at
http://sourceware.org/ml/pthreads-win32/2008/msg00053.html.

(b) does ATLAS already handle this stack alignment issue for interface
functions? I made a MSVC project that calls a ATLAS (actually LAPACK) in
a dll and it seems to work fine ; however this proves basically nothing
as most of the tests done by make ptcheck work ok despite the possible
alignment issues.

(c) if not, should it be fixed to do so ? There are two ways to do tha
(with a gcc 4.2 or more recent):
- give to each interface fonction the
__attribute__((force_align_arg_pointer)) ;
- compile each file containing interface functions with the option
-mstack-realign. This possibly realign stack for functions that do not
need it, but avoids modifying source files when they are not in the
ATLAS scope (e.g. LAPACK).

It seems to me the optimal solution would be to have pthread-win32
handle the alignment for internal ATLAS functions, and to add if
required alignment directives for interface functions when compiling
ATLAS with shared libraries on win32. What do you think ?

(d) If it is not easy to make the changes, would it be possible to add
an entry in the ATLAS FAQ/errata list to point to the issues described
here ?


BtW, I am willing to help in maintaining the on-line doc for ATLAS on
win32 for the topics that I understand (mostly configure/compile issues).

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: ATLAS on win32, pthreads, SSE and stack alignment

by Clint Whaley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sabastien,

Thanks for the detailed reports!  I'll answer what I can below, but
can you start submitting this kind of thing as support requests rather
than e-mails?  I have a tough time keeping track of what is going on
via e-mail for such complicated questions.  On the tracker, I can see
a history of everything, and be reminded of what I have done, and still
need to do . . .  I only get to work on this stuff sporadically, so that
having the tracker info is a big help . . .

So, perhaps for any outstanding issues after this e-mail, you can open
up seperate support tracker items for each?

>* problems carriage returns in compiler version strings
>  for some versions of GCC, there is a trailing carriage return in the
>version string returned by gcc -v (this is the case for the current
>MinGW candidate). In that case, that trailing CR gets added to the macro
>definitions of file $blddir\include\atlas_buildinfo.h, and this causes
>an error later in the build, which forces to remove these CR and to
>restart the build.

I think I have a fix for this in 3.9.5: I essentially got rid of all
control codes in the string . . .

>* problems with the compiler selected for the wall timer
>  when the wall timer is used, the corresponding source files are
>compiled. The make definitions for these files (ATL_walltime.o,
>ATL_cputime.o and time.o) are in makes/Make.sysinfo; they are compiled
>with the "interface" compiler (ICC). Unfortunately they use some gcc
>idioms so when the interface compiler is cl or icl, that timer cannot be
>used. However, using cl as the interface compiler works well when ICC
>and ICCFLAGS are replaced, for the files mentioned, by, say, XCC and
>XCCFLAGS (if XCC is gcc). Why not use XCC or GOODGCC by default here?

Are you throwing the -DPentiumCPS flag to configure or something?  That
should be the only way ATL_walltime has gcc-specific idioms in it
(inline assembly).  You can't use -DPentiumCPS with MSVC++: you
can get the same functionality for free, as ATLAS will use the
MS-specific QueryPerformanceFrequency when compiled under windows . . .

>Btw, the instructions found on
>http://math-atlas.sourceforge.net/errata.html#WinPthreads are mostly
>outdated (reference to $(BC) correspond to an old version of atlas I guess)

This whole entry was in error.  Somehow, a whole lot of out-of-date links
were in the file.  I got rid of them . . .
 
>* problems with shared libs generation
> "make shared" does not work with MinGW, because libc which is
>referenced in lib/Makefile (-lc link option) does not exist for that
>target. In fact, it is better to let gcc itself figure out the options
>to give to the linker. It is also convenient to generate .def files and
>import libraries. For me, the lines below work with a 4.x mingw (these
>are for self-contained dlls)

Post this problem to the tracker.  This is way to complex for me to
understand in one sitting.
 
>I tried  to compile atlas 3.9.4 under Win32 with MinGW and pthreads.
>While single threaded builds passed make check and make full_test just
>fine, I encountered crashes in make ptcheck. After having investigated
>the problem, I came to the conclusion that these crashes occur because
>gcc always maintains a 16-byte aligned stack and pthreads-win32, in
>accordance win the Win32 ABI, only guarantees a 4-byte aligned stack.

First off, I'm not suprised there are problems here, in the sense that I
support ATLAS for cygwin only on Windows.  I've had requests for
MinGW support, but I've never succeeded in getting things to work there
myself (I think the last two times I tried, I couldn't even figure out
how to install and use MinGW).

That being said, you appear to have uncovered an astounding decision on
gcc's part: they have decided to break the ABI on *all* systems because,
I guess, manually aligning >4 byte data was too hard for them.  I just
did a man on gcc looking for stack, and I find this jewel:
>       -mpreferred-stack-boundary=num
>           Attempt to keep the stack boundary aligned to a 2 raised to num
>           byte boundary.  If -mpreferred-stack-boundary is not specified, the
>           default is 4 (16 bytes or 128 bits).

Which means they are not just breaking the Windows ABI, but the ABI under
linux as well, since the x86-32 ABI says that only 4-byte alignment can
be assumed:
   http://math-atlas.sourceforge.net/devel/assembly/abi386-4.pdf

Breaking the ABI in order to avoid a few bit twiddles in the prologue
seems incredibly shortsighted to me, but they appear to have done it.
ATLAS does *not* assume a 16-byte aligned stack on 32-bit OSes
(the x86-64 ABI does specify 16-byte alignment, so my assembly assumes
it there).  Therefore, this problem should only be coming from gcc, not ATLAS.

Have you tried adding the flag:      -mpreferred-stack-boundary=2
To all your gcc compiler flags to override gcc's wonderful redefinition of
the x86-32 ABI?

Cheers,
Clint

**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel