3.9.80

View: New views
3 Messages — Rating Filter:   Alert me  

3.9.80

by whaley-8 :: Rate this Message:

| View Threaded | Show Only this Message

Guys,

I have released 3.9.80, which is primarily a bug-fix release to fix:
   https://sourceforge.net/tracker/index.php?func=detail&aid=3537219&group_id=23725&atid=379482

This bug affects any machine using AVX.

I also switched all of ATLAS's internal gzip usage to bzip2.

Cheers,
Clint

ATLAS 3.9.80 released 06/23/12, changes from 3.9.79:
   * Fixed it so ATL_MinMMAlign is 32 when AVX is used
   * Got rid of HAMMER64SSE2 & HAMMER32SSE3 archdefs; they were for older
     gcc, and my machine died, so I cannot maintain them
   * Fixed xmergvecs so MFLOP_max is max, rather than min
   * Disabled much-abused -Si cputhrchk
   * Replaced all use of gzip/gunzip with bzip2/bunzip2

**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: 3.9.80

by Volker Braun :: Rate this Message:

| View Threaded | Show Only this Message

Can we skip the throttling check If the user specifies the
architecture? If I want to build a generic binary then I'm not doing
this on the machine where the output will be running, so whats the
point of checking if its throttled?

Incidentally, I'd like to get some feedback on "generic" choices for a
library that is supposed to run on a wide variety of machines (i.e.
Sage binary builds). We currently use configure -A # -V # with numbers
computed from the following values:

64-bit Intel:
        arch = 'x86SSE2'
        isa_ext = ('SSE2', 'SSE1')

32-bit Intel:
        arch = 'x86x87'
        isa_ext = ('3DNow',)

SPARC:
        arch = 'USIII'
        isa_ext = ()

PPC:
        arch = 'POWER4'
        isa_ext = ()

Itanium:
        arch = 'IA64Itan'
        isa_ext = ()

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel

Re: 3.9.80

by whaley-8 :: Rate this Message:

| View Threaded | Show Only this Message

>Can we skip the throttling check If the user specifies the
>architecture? If I want to build a generic binary then I'm not doing
>this on the machine where the output will be running, so whats the
>point of checking if its throttled?

Yes, if you aren't building for the compiling machine anyway, then you
can probably get away with turning this off.  See the atlas_install extract
below for some CacheEdge advice.

I have purposely left most of the code in, so that people who insist
can turn the check off easily.  I removed it because essentially every
"atlas auto-build" script I got was throwing it.  Looked like to me people
just said "hey this makes it install regardless, so I'll just always throw it".

The problem is that ATLAS gets its performance by timings, and when CPU
throttling is on, the OS's throttling has a much larger affect on performance
than almost any optimization that ATLAS applies, which means you get a
library with random transformations applied, rather than an optimized lib.
Since I couldn't get people to stop throwing the flag, I disabled it.

If you use archdefs, then you can still get certain things bad that aren't
specified by archdefs (eg., CacheEdge, for many archs), but at least the
entire library isn't randomized.  The generic archdefs tend to be much
more fully specified, since I know that the timings won't hold true for
all machines.

>Incidentally, I'd like to get some feedback on "generic" choices for a
>library that is supposed to run on a wide variety of machines (i.e.
>Sage binary builds). We currently use configure -A # -V # with numbers
>computed from the following values:
>
>64-bit Intel:
>        arch = 'x86SSE2'
>        isa_ext = ('SSE2', 'SSE1')
>
>32-bit Intel:
>        arch = 'x86x87'
>        isa_ext = ('3DNow',)
>
>SPARC:
>        arch = 'USIII'
>        isa_ext = ()

These are up-to-date, but I don't have access to a parallel sparc that
has a modern gcc on it, and so they don't specify any parallel archdefs,
which means your installs will take a long time and do a lot of empirical
tuning.  You can make your own archdefs on a parallel machine if you want
to avoid this.

>
>PPC:
>        arch = 'POWER4'
>        isa_ext = ()

Does that work for things like G4/G5?  There are ISA differences between
PowerPC and POWER archs, as well as architectural diffs . . .
I think the POWER4 archdefs are completely out-of-date; I have access only
to G4/G5, and the G4 just died, so the only thing I can maintain now is G5.

>Itanium:
>        arch = 'IA64Itan'
>        isa_ext = ()

I presently have no access to Itaniums, so am no longer able to update these
archdefs, which are still for gcc 3, just as for POWER4.

So, I'm guessing Itanium & POWER4 are not fully specified, and are also
completely out-of-date :(

Cheers,
Clint

******************************************************************************
\subsubsection{Selecting a good generic CacheEdge}
ATLAS uses the CacheEdge macro set in
\verb+BLDdir/include/atlas_cacheedge.h+ and \verb+atlas_tcacheedge.h+ to
control the L2-cache blocking  for the serial and threaded libraries,
respectively.  You'll want to be sure this value is either set to
the minimum of the L2SIZE of any target architecture, or ridiculously large,
so that no effective L2 blocking is done.  So, if you are using non-celeron
x86, it almost always safe to set this value (in both files) to 256K
(262144), since almost all archs have at least this much cache.  If you
know your target machines have more cache than this, then increase this
number appropriately.  If you may have celerons or other archs with
crippled last-level caches, then I recommend you set CacheEdge to
\verb+4194304+ (4MB).  At this level, CacheEdge doesn't effectively
block for caches, but it will tend to keep your workspace requirements
down.


**************************************************************************
** R. Clint Whaley, PhD ** Assist Prof, UTSA ** www.cs.utsa.edu/~whaley **
**************************************************************************

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Math-atlas-devel mailing list
Math-atlas-devel@...
https://lists.sourceforge.net/lists/listinfo/math-atlas-devel