Crashes with 64-bit native code generator on Windows

View: New views
11 Messages — Rating Filter:   Alert me  

Crashes with 64-bit native code generator on Windows

by David Hansel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

We are still trying to figure out why our code crashes (brings up a
Windows error message box saying that the application was terminated) when
compiled with the native 64-bit codegen on Windows.  We were able to break
down the code a bit but unfortunately not enough to produce a small enough
example that could be shared here.  Removing more (mostly unrelated) code
makes the crash go away.  In our testing we have confirmed the following:

- The crashes NEVER occur before the first FFI call.  Removing all FFI
  calls makes the application work without crashes (as well as possible
  without the functionality that would be provided by the FFI calls).
- Sometimes the crashes occur on the first FFI call, sometimes some
  time after the call (within ML code),  sometimes on the second or third
  FFI call.  This changes randomly depending on which code we include.
- The crashes are not caused by our user code in the FFI functions.  We
  have removed all code from the bodies of those functions,  leaving only
  a simple return statement.
- Our FFI DLLs do not have entry functions that would be called
  when the DLL is loaded.
- The crashes do not occur if MLton's C code generator is used

We were able to create an example that only uses a single FFI call
and crashes on the first call to that function.  I have consolidated
the (partially ml-nlffigen generated) code and listed it below.  Please
let me know if you find any problem in the code.  Please don't mind the
useless conversions in the "cp" function within "foo".  In the real
code they are partially within some compatibility wrapper code and
removing them completely makes the crash go away.  I can not see
why these conversions should cause a crash.
The code below does causes the crash when called from within our (large)
code.  It does not produce a crash when called within a small example.
As mentioned,  this is the only FFI call that is actually called by
the code.  We do have to include another function making an FFI call
in order to make the crash happen.  However,  that call is never executed
before the crash.  It could be executed some time later.  If that second
FFI code is not present,  the crash does not happen.

My question basically is this:  do you have any suggestions on how to
debug this any further?  Any MLton command-line options for debugging?
Are there any optimization passes that we should try to disable?
Do you know of any caveats that we might have missed when creating our
DLLs?

Any suggestions are welcome.

Best regards,

David


C code:
-------

__declspec(dllexport) int __stdcall foo(const char *p, int f, const void *buf, int sz)
{
  return 1;
}


ML code:
--------

structure F_foo =
struct
   val lib = DynLinkage.open_lib {name = "bar.dll", lazy=true, global=false}
   val h   = DynLinkage.lib_symbol (lib, "foo")

   val callop =
       _import * :
       CMemory.addr ->
       CMemory.cc_addr * CMemory.cc_sint * CMemory.cc_addr * CMemory.cc_sint -> CMemory.cc_sint;

   fun mkcall a (x1, x2, x3, x4) =
       C_Int.Cvt.c_sint
           (CMemory.unwrap_sint
                (callop
                     a
                     (CMemory.wrap_addr (C_Int.reveal (C_Int.Ptr.inject' x1)),
                      CMemory.wrap_sint (C_Int.Cvt.ml_sint x2),
                      CMemory.wrap_addr (C_Int.reveal x3),
                      CMemory.wrap_sint (C_Int.Cvt.ml_sint x4))))

   fun f' (x1 : C_Int.ro C_Int.uchar_obj C_Int.ptr',
           x2 : MLRep.Int.Signed.int,
           x3 : C_Int.voidptr,
           x4 : MLRep.Int.Signed.int) : MLRep.Int.Signed.int =
       C_Int.Cvt.ml_sint
           (C_Int.call (C_Int.mk_fptr (mkcall, DynLinkage.addr h),
                        (x1, C_Int.Cvt.c_sint x2, x3, C_Int.Cvt.c_sint x4)))
end


fun foo (pp : string, f : bool, c : Word8.word vector) : bool =
    let val _   = print "foo-start\n"
        val sz  = Vector.length c
        val buf = C.alloc' C.S.uchar (Word.fromInt sz)

    fun cp (i, p) =
        if   i >= sz
        then ()
        else (C.Set.uchar' (C.Ptr.|*! p, Word8.fromLargeInt (Word32.toLargeInt (Word32.fromLargeInt (Word8.toLargeInt (Vector.sub (c, i))))));
              cp (i+1, C.Ptr.|+! C.S.uchar (p, 1)))
    in
        cp (0, buf);
        F_foo.f' (C.Ptr.null', if f then 1 else 0, C.Ptr.inject' buf, Int32.fromInt sz);
        C.free' buf;
        print "foo-end\n";
        true
    end

--
  ----------------------------------------------------------
  David Hansel
  http://www.reactive-systems.com/
  OpenPGP (GnuPG) public key file:
  http://www.reactive-systems.com/~hansel/pgp_public_key.txt
  ----------------------------------------------------------

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Crashes with 64-bit native code generator on Windows

by Wesley W. Terpstra :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sorry for the slow reply.

On Wed, Nov 11, 2009 at 5:10 AM, David Hansel
<hansel@...> wrote:
> The code below does causes the crash when called from within our (large)
> code.  It does not produce a crash when called within a small example.
> As mentioned,  this is the only FFI call that is actually called by
> the code.  We do have to include another function making an FFI call
> in order to make the crash happen.  However,  that call is never executed
> before the crash.  It could be executed some time later.  If that second
> FFI code is not present,  the crash does not happen.

I went ahead and tried to build it. I compiled the C (bar.c) program using:
x86_64-w64-mingw32-gcc -Wall -O2 -o bar.dll -shared
-Wl,--out-implib,bar.a -Wl,--output-def,bar.def bar.c
I wrote the following baz.mlb file:
$(SML_LIB)/basis/basis.mlb
$(SML_LIB)/mlnlffi-lib/mlnlffi-lib.mlb
$(SML_LIB)/mlnlffi-lib/memory/memory.mlb
$(SML_LIB)/mlnlffi-lib/internals/c-int.mlb
ann
   "allowFFI true"
in
   baz.sml
end

Then I compiled with:
mlton -target x86_64-w64-mingw32 -link-opt -ldl -verbose 1 baz.mlb

The resulting program worked. Are you using similar compile options?
In the time since your last post have you perhaps found a more
complete crash example?

> My question basically is this:  do you have any suggestions on how to
> debug this any further?  Any MLton command-line options for debugging?

Well, there's -debug true, but gdb under 64-bit windows is so flakey I
wouldn't bother trying that. In fact, the MLton.msi doesn't include
the debug version of the runtime (it is over 200MB due to the windows
debugging format), so you would need to build MLton from source to get
the debug library. I doubt it would help you, though.

> Are there any optimization passes that we should try to disable?

I doubt this is an optimization problem.

> Do you know of any caveats that we might have missed when creating our
> DLLs?

Ok, here are the things I can think of from the top of my head:
0) You're loading a 32-bit dll instead of a 64-bit one. Double check.
1) Windows might require a stack alignment that doesn't match the
amd64 FFI codegen. Your program happens to end up with bad alignment,
and my programs have just never been unlucky. You could declare a
volatile local 64-bit variable and printf it's address in the C code.
See if the offset of this variable fails to be 64-bit aligned (only)
in the failing programs.
2) The __stdcall is confusing gcc. There is only one calling
convention under win64. Try specifying nothing.

However, I am guessing blind! Without a way to reproduce this I can't
really help. I've used the FFI quite heavily under win64 in one of our
recent projects without problems, so FFI definitely works most of the
time. It's possible you've found a corner case, which can often be an
alignment problem.

Is the program really too secret to release the buggy part of its
source code? MLton is free. ;)

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Crashes with 64-bit native code generator on Windows

by David Hansel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Wesley,

Wesley W. Terpstra wrote:

> Sorry for the slow reply.
>
> On Wed, Nov 11, 2009 at 5:10 AM, David Hansel
> <hansel@...> wrote:
>> The code below does causes the crash when called from within our (large)
>> code.  It does not produce a crash when called within a small example.
>> As mentioned,  this is the only FFI call that is actually called by
>> the code.  We do have to include another function making an FFI call
>> in order to make the crash happen.  However,  that call is never executed
>> before the crash.  It could be executed some time later.  If that second
>> FFI code is not present,  the crash does not happen.
>
> I went ahead and tried to build it.
> [...]
> The resulting program worked. Are you using similar compile options?

We are using Microsoft Visual C++ to create our DLL's so (just in case there
was some obscure compiler setting that we were missing) I gave it a try
and compiled our DLL with gcc -- which didn't change anything.  Our MLton
command line is:

mlton @MLton gc-summary hash-cons 1.0 --
  -target x86_64-w64-mingw32
  -codegen native -profile no -profile-stack false -const 'Exn.keepHistory false'
  -drop-pass deepFlatten -link-opt -ldl -output foo.exe -verbose 2 foo.mlb

I do not know exactly what the '-drop-pass deepFlatten' does but it was put
in by Stephen Weeks back in 2006 when he assisted us in making our code
compile with MLton.  If I remember correctly there was a compiler performace
issue.  However,  as you said before, optimization settings are probably not
the problem here.

> In the time since your last post have you perhaps found a more
> complete crash example?

Unfortunately no.  I have been trying but the crash goes away anytime I cut
down the code some more to produce a smaller example.

>> My question basically is this:  do you have any suggestions on how to
>> debug this any further?  Any MLton command-line options for debugging?
>
> Well, there's -debug true, but gdb under 64-bit windows is so flakey I
> wouldn't bother trying that. In fact, the MLton.msi doesn't include
> the debug version of the runtime (it is over 200MB due to the windows
> debugging format), so you would need to build MLton from source to get
> the debug library. I doubt it would help you, though.

That's unfortunate.

>> Are there any optimization passes that we should try to disable?
>
> I doubt this is an optimization problem.
>
>> Do you know of any caveats that we might have missed when creating our
>> DLLs?
>
> Ok, here are the things I can think of from the top of my head:
> 0) You're loading a 32-bit dll instead of a 64-bit one. Double check.

Double- and triple-checked that.

> 1) Windows might require a stack alignment that doesn't match the
> amd64 FFI codegen. Your program happens to end up with bad alignment,
> and my programs have just never been unlucky. You could declare a
> volatile local 64-bit variable and printf it's address in the C code.
> See if the offset of this variable fails to be 64-bit aligned (only)
> in the failing programs.

An alignment problem or something similar is what I suspect,  too.
Creating a local variable won't help because the process dies even
before the first time it enters the code in the DLL,  so any printf
in there will not happen before the crash.

> 2) The __stdcall is confusing gcc. There is only one calling
> convention under win64. Try specifying nothing.

I've tried with and without.  No difference.

> However, I am guessing blind! Without a way to reproduce this I can't
> really help. I've used the FFI quite heavily under win64 in one of our
> recent projects without problems, so FFI definitely works most of the
> time. It's possible you've found a corner case, which can often be an
> alignment problem.
> Is the program really too secret to release the buggy part of its
> source code? MLton is free. ;)

It's good to hear that the FFI has been tested in win64.  I completely
understand about the guessing,  we do have the same problems with our
customer (our product not working with their code,  can't send the code).
Unfortunately since this is a commercial application and we do have to
include a large part of our code to make the crash happen I can definitely
not post the code to the list.  If we can't figure this out otherwise we
might be able to set up an NDA with you so we could send the code to
you in private.

One thing I can make available is the executable that actually experiences
the crash as well as the MLton-produced assembly code.  I don't know what
kind of debugging tools you have available and whether that would be
any help.  Please let me know.

There are two observations that I have made since my last post.  They
may or may not be related to the actual problem but I thought I'd
mention them anyways:

I was looking into what could be causing the problem and came across
file MLton/lib/mlton/sml/mlnlffi-lib/memory/linkage-libdl.sml which
is of course used by the FFI.  I wasn't completely sure what the "era"
deal in that code is,  so I changed the body of function "get" to just
"f()",  resolving the FFI function's address before every call.  After
that change,  all crashes were gone.  Furthermore,  changing the body
of "get" to just "a" does NOT fix the crashes.  That looked good
so I added some "print" statements in "get" to see whether there is
a problem with the address not being resolved properly.  Unfortunately,
just adding the "print" statements also made the crashes go away. In
fact,  just adding 'print "";' at the beginning of "get" eliminates
the crashes.  Interestingly,  this eliminates the crashes completely.
With other changes in our code I was able to eliminate some instances
of the crashes but new ones would pop up at other places.  I suspect
that the proximity of this code to the actual FFI calls might play
a role in that.

I gave the "Debugging Tools for Windows" debugger a try and loaded
the crashing executable there.  With that,  I was able to track the
crash in our simplest example to the following assembly code:

00000000`0054b2c8 4c897df8        mov     qword ptr [rbp-8],r15
00000000`0054b2cc 48892d3d408000  mov     qword ptr [rsim4c_mlton!MLton_main+0x402901 (00000000`0080403d)],rbp
00000000`0054b2d3 4c892526408000  mov     qword ptr [rsim4c_mlton!MLton_main+0x4028ea (00000000`00804026)],r12
00000000`0054b2da ff15683d8000    call    qword ptr [rsim4c_mlton!MLton_main+0x40262c (00000000`00803d68)] ds:00000000`00d4f048=0000000000000000

Note the "=0" address at the end.  The crash happens because the result
address of the indirect call is 0,  which could be some hint but I don't
know how to look into this any further.  Do you have a suggestion how to
track this back to MLton's assembly output or even to the original ML code?

Best regards,

David

--
  ----------------------------------------------------------
  David Hansel
  http://www.reactive-systems.com/
  OpenPGP (GnuPG) public key file:
  http://www.reactive-systems.com/~hansel/pgp_public_key.txt
  ----------------------------------------------------------


_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Crashes with 64-bit native code generator on Windows

by David Hansel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi again,

I have a few more observations regarding the crashes that might ring a
bell for some of the MLton developers involved in the native code-generators
and/or FFI interface:

1) In my previous post I included a disassembly of the location where the
crash happens.  With some creative grep-ing I was able to find the location
of that code within the assembly code that MLton produces for our program:

_L_176132:
        movq (_c_stackP+0x0)(%rip),%rsp
        movq 0x40(%rbp),%r14
        movq 0x0(%r14),%r13
        movq 0x0(%r13),%r11
        movl %r15d,%r9d
        xorq %r8,%r8
        movl $0x0,%r15d
        movl %r15d,%edx
        xorq %rcx,%rcx
        subq $0x20,%rsp
        addq $0x40,%rbp
        leaq (_L_176133+0x0)(%rip),%r15
        movq %r15,0xFFFFFFFFFFFFFFF8(%rbp)
        movq %rbp,(_gcState+0x10)(%rip)
        movq %r12,(_gcState+0x0)(%rip)
        call *(_applyFFTempFun+0x0)(%rip)   <<------- CRASH
        addq $0x20,%rsp
        movq (_gcState+0x0)(%rip),%r12
        movq (_gcState+0x10)(%rip),%rbp
        jmp _L_176133

I was able to reproduce this with several examples for which the crash
occurs (all of which unfortunately include a large part of our code so
I can not make them available here).  The crash always occurs in the
"*applyFFTempFun" call and always because applyFFTempFun is NULL.

I am reasonably sure that this actually is the location of the crash and
not just some similar looking code because if I comment out the "call"
statement in the assembly code and then compile the executable from the
assembly code, the crash goes away (the target of the FFI call in question
is an empty function that does not do anything so it is not surprising
that skipping the call does not produce other problems).  Also, if I
introduce an infinite loop right before the call,  the compiled program
hangs right before where I would usually observe the crash.

2) As I mentioned before,  if I compile the program from the SML
code and just insert a "print" statement in function "get" within
MLton/lib/mlton/sml/mlnlffi-lib/memory/linkage-libdl.sml,  the crash
also does not occur.  Interestingly,  the MLton-produced assembly code
for that version (only change is the "print" statement) does not contain
ANY calls to "applyFFTempFun".

3) Looking at the MLton source code (amd64-generate-transfers.fun),  I can see
that calls to "applyFFTempFun" seem to be inserted for "Indirect" FFI calls.
I do not know enough about the code generator or the FFI interface to make
much sense out of this.
However,  I can see that the MLTon-produced code with the crash only contains
a call to "applyFFTempFun" (which I assume is created in line 1566 of file
amd64-generate-transfers.fun) but never any code that would set the value
of "applyFFTempFun" (which I assume should be created in line 1183 of file
amd64-generate-transfers.fun).

Given these observations,  does anyone have any suggestions about MLton
debugging options or other ways to shed more light on what might be going
wrong here?


Thanks,

David


_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Crashes with 64-bit native code generator on Windows

by Matthew Fluet-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 23, 2009 at 8:39 PM, David Hansel <hansel@...> wrote:
I was looking into what could be causing the problem and came across
file MLton/lib/mlton/sml/mlnlffi-lib/memory/linkage-libdl.sml which
is of course used by the FFI.  I wasn't completely sure what the "era"
deal in that code is,  so I changed the body of function "get" to just
"f()",  resolving the FFI function's address before every call.

The "era" is supposed to invalidated dynamically loaded library addresses if the executable is started up again after saving the world (MLton.World.save).  Because it is a new executable invocation, the dynamically linked library needs to be reloaded and it might end up at a different address.  This isn't actually done in the linkage-libdl.sml code; see the commented out "Cleaner.addNew" application.  I don't recall why it is disabled.  In any case, unless you are saving and loading worlds, it shouldn't affect your code.
 
After
that change,  all crashes were gone.  Furthermore,  changing the body
of "get" to just "a" does NOT fix the crashes.  That looked good
so I added some "print" statements in "get" to see whether there is
a problem with the address not being resolved properly.  Unfortunately,
just adding the "print" statements also made the crashes go away. In
fact,  just adding 'print "";' at the beginning of "get" eliminates
the crashes.  Interestingly,  this eliminates the crashes completely.
With other changes in our code I was able to eliminate some instances
of the crashes but new ones would pop up at other places.  I suspect
that the proximity of this code to the actual FFI calls might play
a role in that.

This, and your next email, suggest that it is a bug with the native codegen.  The probable role that the proximity of the "print" call plays is that there will be an C function call invoked by the "print", which "resets" the register allocator.  Without the "print" call, there is a wider scope over which the register allocator is able to work, and, apparently, is mistakenly dropping a def.

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Crashes with 64-bit native code generator on Windows

by Matthew Fluet-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 30, 2009 at 11:00 AM, David Hansel <hansel@...> wrote:
1) In my previous post I included a disassembly of the location where the
crash happens.  With some creative grep-ing I was able to find the location
of that code within the assembly code that MLton produces for our program:

I was able to reproduce this with several examples for which the crash
occurs (all of which unfortunately include a large part of our code so
I can not make them available here).  The crash always occurs in the
"*applyFFTempFun" call and always because applyFFTempFun is NULL.

I agree that that seems to pinpoint the source of the crash.
 
2) As I mentioned before,  if I compile the program from the SML
code and just insert a "print" statement in function "get" within
MLton/lib/mlton/sml/mlnlffi-lib/memory/linkage-libdl.sml,  the crash
also does not occur.  Interestingly,  the MLton-produced assembly code
for that version (only change is the "print" statement) does not contain
ANY calls to "applyFFTempFun".

More evidence.  Although, in this case, I suspect that there is still an indirect call in the assembly code.  It simply doesn't go through the temporary variable --- gets allocated and stays in a register.

3) Looking at the MLton source code (amd64-generate-transfers.fun),  I can see
that calls to "applyFFTempFun" seem to be inserted for "Indirect" FFI calls.
I do not know enough about the code generator or the FFI interface to make
much sense out of this.
However,  I can see that the MLTon-produced code with the crash only contains
a call to "applyFFTempFun" (which I assume is created in line 1566 of file
amd64-generate-transfers.fun) but never any code that would set the value
of "applyFFTempFun" (which I assume should be created in line 1183 of file
amd64-generate-transfers.fun).

Given these observations,  does anyone have any suggestions about MLton
debugging options or other ways to shed more light on what might be going
wrong here?

Sounds like a bug in the amd64 codegen simplifier and/or register allocator.  It seems that somewhere along the line, the definition of the applyFFTempFun variable is being dropped, but the use in the indirect call is being retained.  When the register allocator comes along, when it doesn't locally find the def point of applyFFTempFun, it has to fetch the value from the (uninitialized) variable.

Could you compile with "-native-commented 3 -native-split 0 -keep g" and post the basic block that has the call through applyFFTempFun?  It will be pretty noisy, but should shed some light on what the native codegen is doing (wrong).


_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Crashes with 64-bit native code generator on Windows

by David Hansel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Matthew,

Matthew Fluet wrote:

> [...]
> Sounds like a bug in the amd64 codegen simplifier and/or register
> allocator.  It seems that somewhere along the line, the definition of
> the applyFFTempFun variable is being dropped, but the use in the
> indirect call is being retained.  When the register allocator comes
> along, when it doesn't locally find the def point of applyFFTempFun, it
> has to fetch the value from the (uninitialized) variable.
>
> Could you compile with "-native-commented 3 -native-split 0 -keep g" and
> post the basic block that has the call through applyFFTempFun?  It will
> be pretty noisy, but should shed some light on what the native codegen
> is doing (wrong).

See the code below.  It should match up with the code I posted before.
>From what I can tell it does look like MLton puts the target address for
applyFFTempFun into a register but then later does the indirect call via
the memory location.

Please let me know if you need any more context or other debugging
information.  It does seem like you are on the right track.

Thanks!

David


/* Live: (SW64(24): ExnStack, SW32(40): Word32, SP(64): Objptr (opt_1516), SP(48): Objptr (opt_36)) */
/* begin: RP(0): Objptr (opt_22)  = OP (SP(64): Objptr (opt_1516), 0): Objptr (opt_22) */
/* end: RP(0): Objptr (opt_22)  = OP (SP(64): Objptr (opt_1516), 0): Objptr (opt_22) */
/* begin: RQ(0): CPointer  = OQ (RP(0): Objptr (opt_22), 0): CPointer */
/* end: RQ(0): CPointer  = OQ (RP(0): Objptr (opt_22), 0): CPointer */
/* CCall {args = (RQ(0): CPointer, NULL, 0x0, NULL, SW32(40): Word32), frameInfo = Some {frameLayoutsIndex = 1072}, func = {args = (CPointer, CPointer, Word32, CPointer, Word32), bytesNeeded = None, convention = cdecl, ensuresBytesFree = false, mayGC = true, maySwitchThreads = false, modifiesFrontier = true, prototype = {args = (CPointer, Int32, CPointer, Int32), res = Some Int32}, readsStackTop = true, return = Word32, symbolScope = external, target = <*>, writesStackTop = true}, return = Some L_176133} */
/* begin ccall: cdecl <*> */
/* CCALL cdecl <*>(MEM<q>{Heap}[(MEM<q>{Heap}[(MEM<q>{Stack}[(MEM<q>{GCStateHold}[((_gcState+0x10))+(0x0)])+(0x40)])+(0x0)])+(0x0)], $0x0, $0x0, $0x0, MEM<l>{Stack}[(MEM<q>{GCStateHold}[((_gcState+0x10))+(0x0)])+(0x28)]) <Some _L_176133> */
/* ************************************************************ */
/* Cache: caches: MEM<q>{StaticNonTemp}[(_c_stackP)+(0x0)] -> %rsp (reserved)  */
        movq (_c_stackP+0x0)(%rip),%rsp
/* ************************************************************ */
/* movq MEM<q>{Heap}[(MEM<q>{Heap}[(MEM<q>{Stack}[(MEM<q>{GCStateHold}[((_gcState+0x10))+(0x0)])+(0x40)])+(0x0)])+(0x0)],MEM<q>{CArg}[(_applyFFTempFun)+(0x0)] */
        movq 0x40(%rbp),%r14
        movq 0x0(%r14),%r13
        movq 0x0(%r13),%r11
/* ************************************************************ */
/* movzlq MEM<l>{Stack}[(MEM<q>{GCStateHold}[((_gcState+0x10))+(0x0)])+(0x28)],MEM<q>{CArg}[(_applyFFTempRegArg)+(0x0)] */
        movl %r15d,%r9d
/* ************************************************************ */
/* Cache: caches: MEM<q>{CArg}[(_applyFFTempRegArg)+(0x0)] -> %r9 (reserved)  */
/* ************************************************************ */
/* movq $0x0,MEM<q>{CArg}[(_applyFFTempRegArg)+(0x8)] */
        xorq %r8,%r8
/* ************************************************************ */
/* Cache: caches: MEM<q>{CArg}[(_applyFFTempRegArg)+(0x8)] -> %r8 (reserved)  */
/* ************************************************************ */
/* movzlq $0x0,MEM<q>{CArg}[(_applyFFTempRegArg)+(0x10)] */
        movl $0x0,%r15d
        movl %r15d,%edx
/* ************************************************************ */
/* Cache: caches: MEM<q>{CArg}[(_applyFFTempRegArg)+(0x10)] -> %rdx (reserved)  */
/* ************************************************************ */
/* movq $0x0,MEM<q>{CArg}[(_applyFFTempRegArg)+(0x18)] */
        xorq %rcx,%rcx
/* ************************************************************ */
/* Cache: caches: MEM<q>{CArg}[(_applyFFTempRegArg)+(0x18)] -> %rcx (reserved)  */
/* ************************************************************ */
/* subq $0x20,MEM<q>{StaticNonTemp}[(_c_stackP)+(0x0)] */
        subq $0x20,%rsp
/* ************************************************************ */
/* Force: commit_memlocs: commit_classes: remove_memlocs: remove_classes: dead_memlocs: dead_classes:  */
/* ************************************************************ */
/* addq $0x40,MEM<q>{GCStateHold}[((_gcState+0x10))+(0x0)] */
        addq $0x40,%rbp
/* ************************************************************ */
/* leaq MEM<q>{Code}[(_L_176133)+(0x0)],MEM<q>{Stack}[(MEM<q>{GCStateHold}[((_gcState+0x10))+(0x0)])+(0xFFFFFFFFFFFFFFF8)] */
        leaq (_L_176133+0x0)(%rip),%r15
        movq %r15,0xFFFFFFFFFFFFFFF8(%rbp)
        movq %rbp,(_gcState+0x10)(%rip)
/* ************************************************************ */
/* Force: commit_memlocs: MEM<q>{Stack}[(MEM<q>{GCStateHold}[((_gcState+0x10))+(0x0)])+(0xFFFFFFFFFFFFFFF8)] commit_classes: remove_memlocs: remove_classes: dead_memlocs: dead_classes:  */
/* ************************************************************ */
/* Force: commit_memlocs: commit_classes: GCStateVolatile GCState CStatic Globals Stack Heap Code CStack remove_memlocs: remove_classes: dead_memlocs: dead_classes:  */
/* ************************************************************ */
/* Force: commit_memlocs: commit_classes: GCStateVolatile GCStateHold GCState Globals Stack Heap remove_memlocs: remove_classes: dead_memlocs: dead_classes:  */
        movq %r12,(_gcState+0x0)(%rip)
/* ************************************************************ */
/* CCall */
/* ************************************************************ */
/* call *MEM<q>{CArg}[(_applyFFTempFun)+(0x0)] */
        call *(_applyFFTempFun+0x0)(%rip)
/* ************************************************************ */
/* XmmUnreserve: registers:  */
/* ************************************************************ */
/* Unreserve: registers: %rcx %rdx %r8 %r9  */
/* ************************************************************ */
/* Force: commit_memlocs: commit_classes: remove_memlocs: remove_classes: dead_memlocs: dead_classes: GCStateVolatile GCStateHold GCState Globals Stack Heap  */
/* ************************************************************ */
/* Return: [(%eax,MEM<l>{StaticTemp}[(_cReturnTemp)+(0x0)])] */
/* ************************************************************ */
/* addq $0x20,MEM<q>{StaticNonTemp}[(_c_stackP)+(0x0)] */
        addq $0x20,%rsp
/* ************************************************************ */
/* Unreserve: registers: %rsp  */
/* ************************************************************ */
/* Cache: caches: MEM<q>{GCStateHold}[((_gcState+0x0))+(0x0)] -> %r12 (reserved) MEM<q>{GCStateHold}[((_gcState+0x10))+(0x0)] -> %rbp (reserved)  */
        movq (_gcState+0x0)(%rip),%r12
        movq (_gcState+0x10)(%rip),%rbp
/* ************************************************************ */
/* XmmCache: caches:  */
/* ************************************************************ */
/* Cache: caches: MEM<l>{StaticTemp}[(_cReturnTemp)+(0x0)] -> %eax (reserved)  */
/* ************************************************************ */
/* Force: commit_memlocs: commit_classes: GCStateVolatile GCState CStatic Globals Stack Heap Code CStack remove_memlocs: remove_classes: dead_memlocs: dead_classes:  */
/* ************************************************************ */
/* jmp _L_176133 */
        jmp _L_176133


_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Crashes with 64-bit native code generator on Windows

by Matthew Fluet-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 30, 2009 at 12:45 PM, David Hansel <hansel@...> wrote:
See the code below.  It should match up with the code I posted before.
>From what I can tell it does look like MLton puts the target address for
applyFFTempFun into a register but then later does the indirect call via
the memory location.

Yes, it looks like it gets dropped into %r11, but not used from that location.
 
Please let me know if you need any more context or other debugging
information.  It does seem like you are on the right track.

We need to find out when the codegen loses track of the fact that %r11 has applyFFTempFun.  Could you compile with "-native-commented 6"?  That's the most debugging information that we can get from a precompiled MLton binary.  It produces a *lot* of debugging information (in the form of comments in the assembly).  Rather than posting to the mailing list, I suggest posting the basic block to http://mlton.org/TemporaryUpload.

-Matthew

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Crashes with 64-bit native code generator on Windows

by David Hansel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Matthew,

I tried using "-native-commented 6" but (due to the size of the code involved)
compilation (in the "outputAssembly" stage) seems to take a VERY long time.
I also tried "-native-commented 5" with the same result.  A setting of "4"
worked much faster and I have uploaded a file hansel-20091130-1.s containing
the basic block.

I will try the others again but the "4" setting produced about 1200 .s output
files and with the "6" setting MLton produced the first (.0.s) output file
and then I stopped it after about an hour of not producing any more output.
I will try again but unless the process gets much faster after the second file
I don't think we'll get to output file #913 in any reasonable amount of time.

Are there ways to restrict the additional output to specific parts of the code?
Unfortunately,  I can't cut down the code much without the problem going away.

David


Matthew Fluet wrote:

> On Mon, Nov 30, 2009 at 12:45 PM, David Hansel
> <hansel@... <mailto:hansel@...>> wrote:
>
>     See the code below.  It should match up with the code I posted before.
>     From what I can tell it does look like MLton puts the target address for
>     applyFFTempFun into a register but then later does the indirect call via
>     the memory location.
>
>
> Yes, it looks like it gets dropped into %r11, but not used from that
> location.
>  
>
>     Please let me know if you need any more context or other debugging
>     information.  It does seem like you are on the right track.
>
>
> We need to find out when the codegen loses track of the fact that %r11
> has applyFFTempFun.  Could you compile with "-native-commented 6"?
> That's the most debugging information that we can get from a precompiled
> MLton binary.  It produces a *lot* of debugging information (in the form
> of comments in the assembly).  Rather than posting to the mailing list,
> I suggest posting the basic block to http://mlton.org/TemporaryUpload.
>
> -Matthew

--
  ----------------------------------------------------------
  David Hansel
  Chief Technology Officer -- Reactive Systems, Inc.
  http://www.reactive-systems.com/
  (919) 324-3507 ext. 102 -- hansel@...
  OpenPGP (GnuPG) public key file:
  http://www.reactive-systems.com/~hansel/pgp_public_key.txt
  ----------------------------------------------------------

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Crashes with 64-bit native code generator on Windows

by Matthew Fluet-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 30, 2009 at 4:19 PM, David Hansel <hansel@...> wrote:
I tried using "-native-commented 6" but (due to the size of the code involved)
compilation (in the "outputAssembly" stage) seems to take a VERY long time.
I also tried "-native-commented 5" with the same result.  A setting of "4"
worked much faster and I have uploaded a file hansel-20091130-1.s containing
the basic block.

That seems to be enough to provide a hint.  I think that the issue is that the function address got placed in %r11, which is a caller save register.  The contents of caller save registers are pushed to memory immediately before the call instruction, for any register whose content is live after the call and purged from the register allocation.  Of course, the function address is still live *at* the call instruction, although it is not live after the call instruction.  Small examples seem to favor %r15 as the register into which the function address is placed, which is not caller save, and so not susceptible to this issue.  It also fits with small changes near the indirect function call eliminating the segfault; such changes alter the liveness and used registers and presumably the function address get stored in a non-caller save register.  If this is indeed the source of the issue, then it is simply a native amd64 codegen bug (and, possibly, a latent x86 codegen bug as well) and is independent of the target OS; that is, it is not mingw specific.


_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton

Re: Crashes with 64-bit native code generator on Windows

by David Hansel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Just in case someone comes across this thread in the future:
Matthew Fluet fixed the problem that was causing these crashes
in r7368.

Many thanks,

David


Matthew Fluet wrote:

> On Mon, Nov 30, 2009 at 4:19 PM, David Hansel
> <hansel@... <mailto:hansel@...>> wrote:
>
>     I tried using "-native-commented 6" but (due to the size of the code
>     involved)
>     compilation (in the "outputAssembly" stage) seems to take a VERY
>     long time.
>     I also tried "-native-commented 5" with the same result.  A setting
>     of "4"
>     worked much faster and I have uploaded a file hansel-20091130-1.s
>     containing
>     the basic block.
>
>
> That seems to be enough to provide a hint.  I think that the issue is
> that the function address got placed in %r11, which is a caller save
> register.  The contents of caller save registers are pushed to memory
> immediately before the call instruction, for any register whose content
> is live after the call and purged from the register allocation.  Of
> course, the function address is still live *at* the call instruction,
> although it is not live after the call instruction.  Small examples seem
> to favor %r15 as the register into which the function address is placed,
> which is not caller save, and so not susceptible to this issue.  It also
> fits with small changes near the indirect function call eliminating the
> segfault; such changes alter the liveness and used registers and
> presumably the function address get stored in a non-caller save
> register.  If this is indeed the source of the issue, then it is simply
> a native amd64 codegen bug (and, possibly, a latent x86 codegen bug as
> well) and is independent of the target OS; that is, it is not mingw
> specific.
>

--
  ----------------------------------------------------------
  David Hansel
  http://www.reactive-systems.com/
  OpenPGP (GnuPG) public key file:
  http://www.reactive-systems.com/~hansel/pgp_public_key.txt
  ----------------------------------------------------------

_______________________________________________
MLton mailing list
MLton@...
http://mlton.org/mailman/listinfo/mlton