I need your help :-(

View: New views
3 Messages — Rating Filter:   Alert me  

I need your help :-(

by Laurent Michel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I found another problem.

With the calling convention when using 4 args or more. 

Let me show you:

I JIT one of my instruction as follows:

0   jit->prepare(4);
1   jit->movi_i(CJIT_R0,(_nbf+1) * sizeof(ColSlotI));  
2   jit->movi_i(CJIT_R1,_nbl*sizeof(ColSlotI));           
3   jit->movi_p(CJIT_R2,_codeDesc);                        
4   jit->pusharg_p(CJIT_V0);
5   jit->pusharg_p(CJIT_R0);
6   jit->pusharg_p(CJIT_R1);
7   jit->pusharg_p(CJIT_R2);
8   jit->finish(table->getPushFrame());
9   jit->retval(CJIT_V0);

So... It is a call to  a function (generated by lightning a well). 

It takes 4 arguments RO/R1 are integers while V0 and R2 are pointers. 

Here is the assembly it generates:

(gdb)  x/20i $rip
0x100030c359: mov    $0x20,%eax
0x100030c35e: xor    %ecx,%ecx
0x100030c360: mov    $0x10002c3aa8,%rdx
0x100030c36a: mov    %rbx,%r8
0x100030c36d: mov    %rax,%r9
0x100030c370: mov    %rcx,%r10
0x100030c373: mov    %rdx,%r11
0x100030c376: mov    $0x10002ffff8,%r11
0x100030c380: mov    %r11,%rdi
0x100030c383: mov    %r10,%rsi
0x100030c386: mov    %r8,%rdx
0x100030c389: rex.WB callq  *%r11
0x100030c38c: mov    %eax,%ebx

it initalizes eax and ecx (instructions 1/2) then loads rdx  (R2, instruction 3). Then we get the 4 push args that supposedly load the input registers with the desired value (let's ignore the missing 32-bit to 64-bit conversion for now, I know how to fix that). So lightning first moves the registers into "temporary" ones (starting with R8. So 4 pushargs and we use R8,R9,R10,R11).

Then, the CALL instruction of lightning always emit the (fixed address) into a register with a mov (register R11) which of course destroys what I had in R11 (a.k.a.,rdx). Now the registers are "shifted" back from the temp locations to the input registers. and R11 is written to RDI, R10 to RSI and R8 to rdx. Now we don't move all the registers either (only 3 mov when there are 4 arguments) so something may be off here too. 

The most striking issue though is that the use of R11 as a dedicated register to load the address of the callee interferes with functions that have 4 input arguments. There is no provision for that now. It seems that the best lightning can do (safely) is 3 arguments.  I looked up the intel documentation and this bit:


indicates that CALL r/m64 should be possible (calling  with an immediate that is 64bit wide).

I'm still too fresh on the low-level instruction encoding to fix that though. (Getting rid of the use of R11 as a temp for the callee address). 

I figured that we ought to add a macro similar to:
#define CALLQsr(R) (_REXQrr(0, R), _O_Mrm (0xff ,_b11,_b010,_r8(R) ))


with a _REXQrm (rather than rr) but I have no idea how to twiddle the bits of the instruction to state the the operand is a 64-bit wide immediate. 

Any help with this greatly appreciated as always!

(I have some patches I'll pass along as well --for small things -- as soon as I'm done with this).

PS/ Getting to 4 args would be enough for me, but lightning has a more flexible API and x86_64 allows an arbitrary number of args and spilling on the stack. 

--
  Laurent







_______________________________________________
Lightning mailing list
Lightning@...
http://lists.gnu.org/mailman/listinfo/lightning

smime.p7s (5K) Download Attachment

Re: I need your help :-(

by Paolo Bonzini-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


> I'm still too fresh on the low-level instruction encoding to fix that
> though. (Getting rid of the use of R11 as a temp for the callee address).

This is a way to go, however what about the attached patch instead, or
something similar.

Since now we use R12 and R13 for V1 and V2, we can use RSI and RDI for
the arguments directly.  Only the third and fourth argument have to be
saved in R8D/R9D until the call (and shifted to RDX/RCX just before).
The patch supports 6 arguments only, but it wouldn't be hard to spill to
the stack early.

> I figured that we ought to add a macro similar to:
> #define CALLQsr(R) (_REXQrr(0, R), _O_Mrm (0xff ,_b11,_b010,_r8(R) ))

This would be "call *(imm)", i.e. an indirect call.  I'll check what
binutils produce so that we can replicate that in lightning.

Paolo


diff --git a/lightning/i386/core-64.h b/lightning/i386/core-64.h
index e19789b..9b036cd 100644
--- a/lightning/i386/core-64.h
+++ b/lightning/i386/core-64.h
@@ -36,8 +36,7 @@
 
 /* Used to implement ldc, stc, ... */
 #define JIT_CAN_16 0
-#define JIT_CALLTMPSTART 0x48
-#define JIT_REXTMP       0x4B
+#define JIT_REXTMP _R11D
 
 #define JIT_V_NUM               3
 #define JIT_V(i)                ((i) == 0 ? _EBX : _R11D + (i))
@@ -127,25 +126,20 @@ struct jit_local_state {
 /* Stack isn't used for arguments: */
 #define jit_prepare_i(ni) (_jitl.argssize = 0)
 
-#define jit_pusharg_i(rs) (_jitl.argssize++, MOVQrr(rs, JIT_CALLTMPSTART + _jitl.argssize - 1))
-#define jit_finish(sub)         (MOVQir((long) (sub), JIT_REXTMP), \
- jit_shift_args(), \
+#define jit_pusharg_i(rs) (_jitl.argssize++, MOVQrr(rs, jit_arg_reg_temp[_jitl.argssize - 1]))
+#define jit_finish(sub)         (jit_shift_args(), \
+ MOVQir((long) (sub), JIT_REXTMP), \
  CALLsr(JIT_REXTMP))
-#define jit_reg_is_arg(reg)     ((reg == _EDI) || (reg ==_ESI) || (reg == _EDX))
+#define jit_reg_is_arg(reg)     ((reg) == _ECX || (reg) == _EDX)
 #define jit_finishr(reg) ((jit_reg_is_arg((reg)) ? MOVQrr(reg, JIT_REXTMP) : (void)0), \
                                  jit_shift_args(), \
                                  CALLsr(jit_reg_is_arg((reg)) ? JIT_REXTMP : (reg)))
 
 #define jit_shift_args() \
-   (_jitl.argssize--  \
-    ? ((void)MOVQrr(JIT_CALLTMPSTART + _jitl.argssize, jit_arg_reg_order[0]), \
-       (_jitl.argssize--  \
-        ? ((void)MOVQrr(JIT_CALLTMPSTART + _jitl.argssize, jit_arg_reg_order[1]), \
-           (_jitl.argssize--  \
-            ? (void)MOVQrr(JIT_CALLTMPSTART, jit_arg_reg_order[2])      \
-            : (void)0)) \
-        : (void)0)) \
-    : (void)0)
+   ((_jitl.argssize >= 2 \ ? (void) (MOVQrr(_R8D, _RDX)) : (void) 0), \
+    (_jitl.argssize >= 3 \ ? (void) (MOVQrr(_R9D, _RCX)) : (void) 0), \
+    (_jitl.argssize >= 4 \ ? (void) (PUSHQr(_R10D)) : (void) 0), \
+    (_jitl.argssize >= 5 \ ? (void) (PUSHQr(_R11D)) : (void) 0))
 
 #define jit_retval_l(rd) ((void)jit_movr_l ((rd), _EAX))
 #define jit_arg_c()        (jit_arg_reg_order[_jitl.nextarg_geti++])
@@ -158,6 +152,8 @@ struct jit_local_state {
 #define jit_arg_ul()        (jit_arg_reg_order[_jitl.nextarg_geti++])
 #define jit_arg_p()        (jit_arg_reg_order[_jitl.nextarg_geti++])
 #define jit_arg_up()        (jit_arg_reg_order[_jitl.nextarg_geti++])
+
+static int jit_arg_reg_temp[] = { _EDI, _ESI, _R8D, _R9D, _R10D, _R11D };
 static int jit_arg_reg_order[] = { _EDI, _ESI, _EDX, _ECX };
 
 #define jit_negr_l(d, rs) jit_opi_((d), (rs), NEGQr(d), (XORQrr((d), (d)), SUBQrr((rs), (d))) )

_______________________________________________
Lightning mailing list
Lightning@...
http://lists.gnu.org/mailman/listinfo/lightning

Re: I need your help :-(

by Paolo Bonzini-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



I committed something similar to the patch I had posted before (except
that this has actually been tested, and works).  It supports up to 6
arguments, all passed in registers.

Paolo



_______________________________________________
Lightning mailing list
Lightning@...
http://lists.gnu.org/mailman/listinfo/lightning