|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
32+64 bit multiplication with MPY32 on MSP430F54xxIn the last days I did some experiments with the MSP430F5438 and its 32bit hardware multiplier. The compiler (non-X-build from 12/2008) generates proper inline code for 16x16bit multiplications. But for 32x32 and 64x64 bit it refers to two external functions which are built into mspgcclib and implicitely linked from there. In these functions, the address of the hardware multiplier registers is hardcoded to the wrong base address and only the 16x16 bit registers are used. In case of 32x32bit the result is useless. Also, the 64x64 bit multiplication uses software only and doe snot use the (16bit) hardware multiplier at all. Luckily you can overload these functions with your own versions. The source code follow below. Just put it somewhere in your c files. One unfortunate thing is that the C language definitions state that the result of a multiplication is of the size of it biggest multiplikator. So a 32x32 bit multiplication will have a 32 bit result, even if the hardwar emultiplier woudl be able to instantly deliver the full 64 bit result. Same for 16x16 bit multiplications and even 8x8 bit. If you want a result that is bigger than your operands, you'll have to typecast one of the operands to this bigger size too. As result of this language rules, the compiler will use the 'bigger' multiplication function too even if not necessary at all, leading into completely unnecessary pushing-around of freshly empty registers etc. You can't get the full power out of the MPY32 module in C because of this. So if you need to multiply two 16 bit values and need a 32 bit result (or 32x32 with 64 bit result), you should consider writing you own multiply function and call it as a function. It will be much faster and smaller than using the '*' operator with typecasting of one of the operands. In case of a 32 bit multiplication, the code is basically the same as the __umulsi3hw function below, only that RES2 and RES3 are added to the return value. Also, the compiler will not disable interrupt when calling the 64x64 multiplication function as this function originally does not use the multiplier hardware. It had to be added into the function itself. for 32x32 bit, the compiler disables interrupts inline and it is not necessary to do so inside the function. The 32x32 bit multiplication takes 26 bytes and executes in 22 cycles, the 64x64 bit multiplication is 96 bytes and executes in 87 cycles. The compiler-generated inline code for callign the 64x64 bit multiplication from two volatile 64bit variables into one volatile 64bit v“variable adds another 62bytes/54 cycles, but this varies depending on type and location of source and destination and optimisation level. Have fun, JMGross void __attribute__ ((naked)) __umulsi3hw (void) { __asm__ __volatile__ ( "mov r10, %0 ":"=m" (MPY32L): ); __asm__ __volatile__ ( "mov r11, %0 ":"=m" (MPY32H): ); __asm__ __volatile__ ( "mov r12, %0 ":"=m" (OP2L): ); __asm__ __volatile__ ( "mov r13, %0 ":"=m" (OP2H): ); __asm__ __volatile__ ( "mov %0, r14 "::"m" (RES0) ); __asm__ __volatile__ ( "mov %0, r15 "::"m" (RES1) ); // >8 cycles delay __asm__ __volatile__ ( "ret ":: ); } void __attribute__ ((naked)) __muldi3 (void) { // ARG1*ARG2 = (ARG1L*ARG2L)+(ARG1L*ARG2H)<<32+(ARG1H*ARG2L)<<32 __asm__ __volatile__ ( "push r2 ":: ); // save interrupt state __asm__ __volatile__ ( "dint ":: ); // set interrupt (the compiler did not itself) __asm__ __volatile__ ( "mov 4+0(r1), %0 ":"=m" (MPY32L): ); // 4 is the stack offset for ARG2 LSB (R8 and retadr) __asm__ __volatile__ ( "mov 4+2(r1), %0 ":"=m" (MPY32H): ); __asm__ __volatile__ ( "mov r12, %0 ":"=m" (OP2L): ); // ARG2L * ARG1L __asm__ __volatile__ ( "mov r13, %0 ":"=m" (OP2H): ); __asm__ __volatile__ ( "push %0 "::"m" (RES0) ); // LSB result to stack __asm__ __volatile__ ( "push %0 "::"m" (RES1) ); __asm__ __volatile__ ( "mov %1, %0 ":"=m" (RES0):"m" (RES2)); // bit 32..63 'overflow' into bit 0..31 for nextoperations __asm__ __volatile__ ( "mov %1, %0 ":"=m" (RES1):"m" (RES3)); __asm__ __volatile__ ( "mov 8+0(r1), %0 ":"=m" (MAC32L): ); // + ARG2L * ARG1H operand 2 has moved by 4 bytes (the 2 push above) __asm__ __volatile__ ( "mov 8+2(r1), %0 ":"=m" (MAC32H): ); __asm__ __volatile__ ( "mov r14, %0 ":"=m" (OP2L): ); __asm__ __volatile__ ( "mov r15, %0 ":"=m" (OP2H): ); __asm__ __volatile__ ( "mov 8+4(r1), %0 ":"=m" (MAC32L): ); // + ARG2H * ARG1L __asm__ __volatile__ ( "mov 8+6(r1), %0 ":"=m" (MAC32H): ); __asm__ __volatile__ ( "mov r12, %0 ":"=m" (OP2L): ); __asm__ __volatile__ ( "mov r13, %0 ":"=m" (OP2H): ); // ARG1H*ARG2H is >64 bit, so skip this step __asm__ __volatile__ ( "mov %0, r14 "::"m" (RES0) ); // LSBs as MSB result to result registers __asm__ __volatile__ ( "mov %0, r15 "::"m" (RES1) ); __asm__ __volatile__ ( "pop r13 ":: ); // LSB result form stack __asm__ __volatile__ ( "pop r12 ":: ); __asm__ __volatile__ ( "pop r2 ":: ); // restore interrupt state __asm__ __volatile__ ( "ret ":: ); } ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ Mspgcc-users mailing list Mspgcc-users@... https://lists.sourceforge.net/lists/listinfo/mspgcc-users |
|
|
Re: 32+64 bit multiplication with MPY32 on MSP430F54xxhi jmgross,
after spending a couple of days tryhing to untangle the ti documentation on the new bsl format for this mcu (f5438) -- ti is amazingly vague on this basic subject -- i thought that since you are already using it that i might ask you what tool you're using to flash it. could you please provide some guidance? i'm afraid that i'm addicted to the wonderfully-simple bsl.py. thanks in advance, steve ayer JMGross wrote: > In the last days I did some experiments with the MSP430F5438 and its 32bit hardware multiplier. > The compiler (non-X-build from 12/2008) generates proper inline code for 16x16bit multiplications. But for 32x32 and 64x64 bit it refers to two external functions which are built into mspgcclib and implicitely linked from > there. In these functions, the address of the hardware multiplier registers is hardcoded to the wrong base address and only the 16x16 bit registers are used. > In case of 32x32bit the result is useless. > Also, the 64x64 bit multiplication uses software only and doe snot use the (16bit) hardware multiplier at all. > > Luckily you can overload these functions with your own versions. The source code follow below. Just put it somewhere in your c files. > > One unfortunate thing is that the C language definitions state that the result of a multiplication is of the size of it biggest multiplikator. So a 32x32 bit multiplication will have a 32 bit result, even if the hardwar emultiplier > woudl be able to instantly deliver the full 64 bit result. Same for 16x16 bit multiplications and even 8x8 bit. > If you want a result that is bigger than your operands, you'll have to typecast one of the operands to this bigger size too. > As result of this language rules, the compiler will use the 'bigger' multiplication function too even if not necessary at all, leading into completely unnecessary pushing-around of freshly empty registers etc. > You can't get the full power out of the MPY32 module in C because of this. > > So if you need to multiply two 16 bit values and need a 32 bit result (or 32x32 with 64 bit result), you should consider writing you own multiply function and call it as a function. It will be much faster and smaller than > using the '*' operator with typecasting of one of the operands. In case of a 32 bit multiplication, the code is basically the same as the __umulsi3hw function below, only that RES2 and RES3 are added to the return > value. > > Also, the compiler will not disable interrupt when calling the 64x64 multiplication function as this function originally does not use the multiplier hardware. It had to be added into the function itself. for 32x32 bit, the > compiler disables interrupts inline and it is not necessary to do so inside the function. > > The 32x32 bit multiplication takes 26 bytes and executes in 22 cycles, the 64x64 bit multiplication is 96 bytes and executes in 87 cycles. > The compiler-generated inline code for callign the 64x64 bit multiplication from two volatile 64bit variables into one volatile 64bit v“variable adds another 62bytes/54 cycles, but this varies depending on type and location > of source and destination and optimisation level. > > > Have fun, > JMGross > > void __attribute__ ((naked)) __umulsi3hw (void) { > __asm__ __volatile__ ( "mov r10, %0 ":"=m" (MPY32L): ); > __asm__ __volatile__ ( "mov r11, %0 ":"=m" (MPY32H): ); > __asm__ __volatile__ ( "mov r12, %0 ":"=m" (OP2L): ); > __asm__ __volatile__ ( "mov r13, %0 ":"=m" (OP2H): ); > __asm__ __volatile__ ( "mov %0, r14 "::"m" (RES0) ); > __asm__ __volatile__ ( "mov %0, r15 "::"m" (RES1) ); // >8 cycles delay > __asm__ __volatile__ ( "ret ":: ); > } > > void __attribute__ ((naked)) __muldi3 (void) { // ARG1*ARG2 = (ARG1L*ARG2L)+(ARG1L*ARG2H)<<32+(ARG1H*ARG2L)<<32 > __asm__ __volatile__ ( "push r2 ":: ); // save interrupt state > __asm__ __volatile__ ( "dint ":: ); // set interrupt (the compiler did not itself) > > __asm__ __volatile__ ( "mov 4+0(r1), %0 ":"=m" (MPY32L): ); // 4 is the stack offset for ARG2 LSB (R8 and retadr) > __asm__ __volatile__ ( "mov 4+2(r1), %0 ":"=m" (MPY32H): ); > __asm__ __volatile__ ( "mov r12, %0 ":"=m" (OP2L): ); // ARG2L * ARG1L > __asm__ __volatile__ ( "mov r13, %0 ":"=m" (OP2H): ); > > __asm__ __volatile__ ( "push %0 "::"m" (RES0) ); // LSB result to stack > __asm__ __volatile__ ( "push %0 "::"m" (RES1) ); > > __asm__ __volatile__ ( "mov %1, %0 ":"=m" (RES0):"m" (RES2)); // bit 32..63 'overflow' into bit 0..31 for nextoperations > __asm__ __volatile__ ( "mov %1, %0 ":"=m" (RES1):"m" (RES3)); > > __asm__ __volatile__ ( "mov 8+0(r1), %0 ":"=m" (MAC32L): ); // + ARG2L * ARG1H operand 2 has moved by 4 bytes (the 2 push above) > __asm__ __volatile__ ( "mov 8+2(r1), %0 ":"=m" (MAC32H): ); > __asm__ __volatile__ ( "mov r14, %0 ":"=m" (OP2L): ); > __asm__ __volatile__ ( "mov r15, %0 ":"=m" (OP2H): ); > > __asm__ __volatile__ ( "mov 8+4(r1), %0 ":"=m" (MAC32L): ); // + ARG2H * ARG1L > __asm__ __volatile__ ( "mov 8+6(r1), %0 ":"=m" (MAC32H): ); > __asm__ __volatile__ ( "mov r12, %0 ":"=m" (OP2L): ); > __asm__ __volatile__ ( "mov r13, %0 ":"=m" (OP2H): ); // ARG1H*ARG2H is >64 bit, so skip this step > > __asm__ __volatile__ ( "mov %0, r14 "::"m" (RES0) ); // LSBs as MSB result to result registers > __asm__ __volatile__ ( "mov %0, r15 "::"m" (RES1) ); > > __asm__ __volatile__ ( "pop r13 ":: ); // LSB result form stack > __asm__ __volatile__ ( "pop r12 ":: ); > __asm__ __volatile__ ( "pop r2 ":: ); // restore interrupt state > __asm__ __volatile__ ( "ret ":: ); > } > > > > > ------------------------------------------------------------------------------ > Come build with us! The BlackBerry(R) Developer Conference in SF, CA > is the only developer event you need to attend this year. Jumpstart your > developing skills, take BlackBerry mobile applications to market and stay > ahead of the curve. Join us from November 9 - 12, 2009. Register now! > http://p.sf.net/sfu/devconference > _______________________________________________ > Mspgcc-users mailing list > Mspgcc-users@... > https://lists.sourceforge.net/lists/listinfo/mspgcc-users ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Mspgcc-users mailing list Mspgcc-users@... https://lists.sourceforge.net/lists/listinfo/mspgcc-users |
| Free embeddable forum powered by Nabble | Forum Help |