Function Address fixup missing?

View: New views
7 Messages — Rating Filter:   Alert me  

Function Address fixup missing?

by Mau Liste :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Sirs,
I have a problem that, after much thinking with the disassembly and
maps, I cannot explain other than a compiler or linker bug.

This problem appears on a big program.
I tried to make a smaller version exposing the problem, without success,
in the sense that any smaller version seems to work.
So I am sorry that I cannot place here the full program, but I will try
to give all needed information, hoping for some help.
If needed I can send other information or all the code if necessary.
Thank you in advance for any help or workaround.

I have a program for the AtMega2561 that is more that 128K and that make
use of function pointers. The compiler is WinAvr as of March 13, 2009.

As I understood when a function body is in the upper 128K, and the
address of this function is taken, the compiler (or linker) generates a
small stub in the "trampoline" area in lower memory that contains a jump
to the function body.
The "address of" operator then returns the address of this stub, rather
that the address of the function itself.
This way an indirect jump through the pointer (only 16 bits) ends up,
being the EIND register always zero, to the stub, which in turn makes a
full jump to the function.

If this is correct (and I have verified that this is the case with
smaller programs), the I don't understand the following results:

I have a function in upper 128K:
int ButtamiViaSubito(void) {
  return 3;
}

In the main (in lower memory) i take the address and then I call the
function:

extern int ButtamiViaSubito(void);
typedef int (*PuntaAButtami)(void);
PuntaAButtami puntatore;

int main(void)
{
  ...
  puntatore = ButtamiViaSubito;    // (1)
  ...
  int butta = puntatore();         // (2)
  ...
}

In the disassembly of the full program (out of elf file) i find:

In the trampoline area:
00003e50 <__trampolines_start>:
    3e50: 0d 94 45 07 jmp 0x20e8a ; 0x20e8a <test+0x9e>
which looks correct: the function is at 20e8a.

instruction (1) is:
puntatore = ButtamiViaSubito;
    3efc: 80 e0       ldi r24, 0x00 ; 0
    3efe: 90 e0       ldi r25, 0x00 ; 0
    3f00: 90 93 82 0b sts 0x0B82, r25
    3f04: 80 93 81 0b sts 0x0B81, r24
which looks wrong. The first 2 ldi should load 3F28 which is the word
address of the trampoline. It looks that this fixup is not filled by the
linker leaving the value at zero.
The pointer is in RAM at B81.

The instruction (2) is also correct:
    int butta = puntatore();
    3ffe: e0 91 81 0b lds r30, 0x0B81
    4002: f0 91 82 0b lds r31, 0x0B82
    4006: 19 95       eicall


The same seems to happen also within the library around the fputc
function, which I assume is using function pointers.
Thanks again for any help.

Regards.
Mau.



_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@...
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

RE: Function Address fixup missing?

by Stu Bell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I have a program for the AtMega2561 that is more that 128K
> and that make use of function pointers. The compiler is
> WinAvr as of March 13, 2009.
>
> As I understood when a function body is in the upper 128K,
> and the address of this function is taken, the compiler (or
> linker) generates a small stub in the "trampoline" area in
> lower memory that contains a jump to the function body.
> The "address of" operator then returns the address of this
> stub, rather that the address of the function itself.
> This way an indirect jump through the pointer (only 16 bits)
> ends up, being the EIND register always zero, to the stub,
> which in turn makes a full jump to the function.

Trampolines work only for statically linked functions, not function
pointers.

This is a known bug and cannot easily be fixed.

The root of the problem is that GCC's architecture does not lend itself
to 24-bit entities.  As you noted, function pointers are 16 bits, which
limits the range to 128K of flash (since all AVR instructions are 2
bytes wide, the program counter addresses 2 byte words).

GCC could be modified to run with 32-bit (4 byte) pointers, but the
ATTiny folks would yell like mashed cats if that were done universally.
I suppose that a switch could be placed telling GCC whether to use 16
bit or 32 bit pointers, but then the avr-libc library would need to be
compiled both ways leading to problems of making sure that the correct
library is linked with the correct compiled source.

All of the above could be done.  However, the GCC team are notoriously
low on volunteers.  Interested in taking this project on?  I thought
not.

For the moment, the only solution is to place the target of all function
pointers in the bottom part of the ATMega2560/1's flash.  If you check
my post on FreeRTOS for the ATmega2560/1 on AVR Freaks
(http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=70387
) you will find that I describe how to place function pointer targets in
low flash.

I should point out that there *is* another fix for the problem:  I
understand the IAR and CodeVision have compilers capable of handling the
larger ATMega series.  Oh wait, those cost *money*.  Well, Cheap Fast
Good - choose two.

Best regards,

Stu Bell
DataPlay (DPHI, Inc.)


_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@...
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: Function Address fixup missing?

by Maurizio Ferraris Studio :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Stu Bell wrote:
> Trampolines work only for statically linked functions, not function
> pointers.
Sorry, but I don't understand this.
I believe that there is no problem with statically linked functions.
GCC actually generates a call with the full address, and no 16 bit
pointers are involved, as in the following disassembly:

    butta = ButtamiViaSubito();
    4014: 0f 94 f5 06 call 0x20dea ; 0x20dea

Here I am calling the same function but directly, and GCC generates the
correct code for the AtMega2561.

Actually, as I told in my previous mail, I tried a small program
where it seems to work.

Here is the main:
    typedef void (*PPro)(void);
    PPro ppro1;

    extern void pro1(void) __attribute__ ((section ("spro1")));

    int main(void)
    {
      ppro1 = pro1;

      pro1();

      ppro1();
      while(1)
        ;
      return(0);
    }

Here is the function:
    __attribute__((section ("spro1"))) void pro1(void)
    {
      return;
    }

In the linker flags I added:
    -section-start=spro1=0x21000
to place the routine in the upper 128K.

and the relevant disassembly is:
...
000000cc <__trampolines_start>:
  cc: 0d 94 00 08 jmp 0x21000 ; 0x21000 <pro1>
...
  ppro1 = pro1;
 116: 86 e6       ldi r24, 0x66 ; 102
 118: 90 e0       ldi r25, 0x00 ; 0
 11a: 90 93 01 02 sts 0x0201, r25
 11e: 80 93 00 02 sts 0x0200, r24
...
  ppro1();
 126: e0 91 00 02 lds r30, 0x0200
 12a: f0 91 01 02 lds r31, 0x0201
 12e: 19 95       eicall




_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@...
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: Function Address fixup missing?

by Mau Liste :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Stu Bell wrote:
> Trampolines work only for statically linked functions, not function
> pointers.
Sorry, but I don't understand this.
I believe that there is no problem with statically linked functions.
GCC actually generates a call with the full address, and no 16 bit
pointers are involved, as in the following disassembly:

    butta = ButtamiViaSubito();
    4014: 0f 94 f5 06 call 0x20dea ; 0x20dea

Here I am calling the same function but directly, and GCC generates the
correct code for the AtMega2561.

Actually, as I told in my previous mail, I tried a small program
where it seems to work.

Here is the main:
    typedef void (*PPro)(void);
    PPro ppro1;

    extern void pro1(void) __attribute__ ((section ("spro1")));

    int main(void)
    {
      ppro1 = pro1;

      pro1();

      ppro1();
      while(1)
        ;
      return(0);
    }

Here is the function:
    __attribute__((section ("spro1"))) void pro1(void)
    {
      return;
    }

In the linker flags I added:
    -section-start=spro1=0x21000
to place the routine in the upper 128K.

and the relevant disassembly is:
...
000000cc <__trampolines_start>:
  cc: 0d 94 00 08 jmp 0x21000 ; 0x21000 <pro1>
...
  ppro1 = pro1;
 116: 86 e6       ldi r24, 0x66 ; 102
 118: 90 e0       ldi r25, 0x00 ; 0
 11a: 90 93 01 02 sts 0x0201, r25
 11e: 80 93 00 02 sts 0x0200, r24
...
  ppro1();
 126: e0 91 00 02 lds r30, 0x0200
 12a: f0 91 01 02 lds r31, 0x0201
 12e: 19 95       eicall





_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@...
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

RE: Function Address fixup missing?

by Stu Bell :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Stu Bell wrote:
> > Trampolines work only for statically linked functions, not function
> > pointers.
> Sorry, but I don't understand this.
> I believe that there is no problem with statically linked functions.

And that's what I said. There is not a problem with statically linked
functions.

In your first email, you wrote:

> I have a function in upper 128K:
> int ButtamiViaSubito(void) {
>   return 3;
> }
>
> In the main (in lower memory) i take the address and then I call the
> function:
>
> extern int ButtamiViaSubito(void);
> typedef int (*PuntaAButtami)(void);
> PuntaAButtami puntatore;
>
> int main(void)
> {
>   ...
>   puntatore = ButtamiViaSubito;    // (1)
>   ...
>   int butta = puntatore();         // (2)
>   ...
> }

At this point, "puntatore" is a function pointer.  Unless the GCC gods
disagree with me (in which case I have been doing a lot of work for
nothing for the last 2 years), GCC only understands puntatore as a
16-bit entity.  Further, it will *not* use the trampoline for this call.

What sayeth thee, oh GCC gods? :-)

In fact, given that the trampoline is not used for static calls to upper
flash, I am also confused as to why it is not used for function
pointers.  I suspect the problem is that if a function pointer is used
in a routine in upper flash, the 16-bit call would go to the wrong
place.  So, an EICALL must be generated to go to the trampoline, and
EIND must be set correctly for the call to work.  On the other hand, if
the compiler is generating a call to the trampoline which it *knows* is
in lower flash, EIND can always be forced to 0 before the call.  But
then the code needs to be smart enough to reset EIND to what it was
before the function pointer call.  That means that *every* function
pointer call would need to generate instructions to save EIND before the
call and more to restore it's state after the call.  This would need to
be done because the compiler (which generates the instructions) has no
idea where the eventual location of the code will be, so it must plan
for the worst.

Generation of instructions would need to be an architecture-dependent,
since the owners of ATTinys would be pissed if the compiler added
istructions for a different architecture that are completely worthless
to them.

Sounds like this is a job for a volunteer.  Would you like the job, Mau?

Again, as I said in my first post, you *must* place *all* targets of
function pointers (in this case, the function ButtamiViaSubito()) in
lower flash.  There is currently no other solution in GCC.  Trust me,
I've looked for one.

Again, as I said in my first post, if you look in
http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=70387
you will find that I describe exactly how to place all of these function
pointer targets in lower flash.  It isn't hard and if you choose you can
steal (err, leverage, yeah, that's it, leverage!) my work directly.

I will add one more gotcha here -- I've noticed that ISRs also work
"better" when in lower flash.  Theoretically this is not needed, but I
suspect that the compiler has some assumptions about upper versus lower
flash register states (specifically, EIND) that do not hold when an ISR
is in upper flash.

Sorry about the long reply, but I've spent time fighting this issue and
the results are, well, complicated.

Best regards,

Stu Bell
DataPlay (DPHI, Inc.)



_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@...
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: Function Address fixup missing?

by Mau Liste :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Stu Bell wrote:
>> Stu Bell wrote:
>>> Trampolines work only for statically linked functions, not function
>>> pointers.
>> Sorry, but I don't understand this.
>> I believe that there is no problem with statically linked functions.
>
> And that's what I said. There is not a problem with statically linked
> functions.

Sorry, I meant that the trampolines have nothing to do with direct calls.
As I understood the AtMega core have assembly instructions for direct
call and jump with full 24 bit address within the instruction code. So
no trampoline is needed and neither extension registers like EIND. I
also believe that GCC is able to generate such correct code for direct
calls and jumps anywhere.


> At this point, "puntatore" is a function pointer.  Unless the GCC gods
> disagree with me (in which case I have been doing a lot of work for
> nothing for the last 2 years), GCC only understands puntatore as a
> 16-bit entity.  Further, it will *not* use the trampoline for this call.
I don't agree with this, but maybe some GCC guru can clarify further.
I'll try to explain now, as I showed you in the code in my last mail.

I believe that GCC assumes two things:
- Register EIND is always zero, and nobody ever touches it
- Trampolines are always in lower flash (<128K)

With these assumptions it is possible to use 16bit pointers to functions
(the default pointers in AVR-GCC) to reach functions anywhere.

I'll try to explain what I have understood, and please tell me if and
where I am wrong.

When in the code there is a request for the address of a function which
is over 128K, this is what I believe happens:

1) a trampoline il lower memory is generated. This trampoline contains a
single jump to the full 24 bit address of function.
2) The address of the trampoline is returned instead. This is important
so I repeat: "The address of the trampoline is returned instead". This
address is still a full 24 bit address, but since the trampoline is in
lower memory, the upper 8 bits are zero. So this pointer is "safely"
stored into a 16Bit variable, without loosing information.

When, in the code, there is a call to the function through its pointer
(stored into a 16bit variable), this is what I think happens:

1) The GCC generates the load of the pointer contents, which is the low
part of the trampoline address, and generates also a eicall instruction.
2) When this code executes, during the eicall processing, the EIND
register is used to extend the 16bit address, but EIND is zero, and the
24bit address resulting from this concatenation operation results in the
24bit address of the trampoline.
3) The eicall places a 24bit return address into the stack, and the PC
is loaded with the address of the trampoline so the execution continue there
4) The trampoline contains the full address of function, so the PC is
again loaded with the correct function address (and the EIND register is
not involved anymore).
5) When the function will return, it will find the full return address
on the stack so it will return to the point after the eicall.


This magic works, in my understanding, without any penalty for code
below 128K, and with only a small penalty in time and space, but only
for functions accessed through a pointer.
And all this using only 16bit pointers!
Or maybe I am completely wrong, but to my support there is the small
code that I sent to the list in my mail, that works exactly as I described.

My original question was related to a possible implementation bug, while
you were trying to explain to me that the whole trampoline stuff is not
working for a design problem (16 bit pointers and so ...).

Can you comment on this?

As a last comment, I can say that I can place all my functions where a
pointer is needed in lower memory, and indeed this was my first attempt.
Unfortunately the library code is always placed last, so over 128K, and
functions like fputc do use function pointers (I assume to manage open
file descriptors), and so I need that the trampoline stuff works, or
some other workaround.

Thanks all.
Mau.


_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@...
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: Function Address fixup missing?

by Georg-Johann Lay-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Stu Bell schrieb:

> The root of the problem is that GCC's architecture does not lend itself
> to 24-bit entities.  As you noted, function pointers are 16 bits, which
> limits the range to 128K of flash (since all AVR instructions are 2
> bytes wide, the program counter addresses 2 byte words).
>
> GCC could be modified to run with 32-bit (4 byte) pointers, but the
> ATTiny folks would yell like mashed cats if that were done universally.
> I suppose that a switch could be placed telling GCC whether to use 16
> bit or 32 bit pointers, but then the avr-libc library would need to be
> compiled both ways leading to problems of making sure that the correct
> library is linked with the correct compiled source.

The multilib would be no problem, same for introducing a new compiler
option. The hard part would be to rework the backend. At the moment,
Pmode is HImode. You would have to set Pmode to SImode, but note that
gcc knows just /one/ pointer mode. So both data and code pointer stuff
would be in 32-bit arithmetics, load and stores.

Believe me, this would certainly trigger a flood of "optimization
regression" bugs from the same folks that requested such a feature...
The E-registers are sometimes chenged by the hardware, and sometimes
not. The E-regs are no GPRs, yet they would need handling as if they
were R32. Any pointer arithmetic would involve clobber registers to hold
and modify the E-part of the address.

You really want that?

Georg-Johann


_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@...
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list