Adding members to struct cpu_functions

View: New views
15 Messages — Rating Filter:   Alert me  

Adding members to struct cpu_functions

by Guillaume Ballet :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello list,

I am continuing my effort to port FreeBSD to the BeagleBoard. I
reached the point where the system prompts for the root filesystem. I
am therefore cleaning up my code and will then post it to this list
for comments. I still have a few hacky fixups to remove before it
becomes readable :)

I know that Mark Tinguely, whose help has been precious in this
endeavor, has some patches ready for ARMv6 cache management, so I did
not focus on this.

At the moment, I am using backward-compatibility for the TLB format. I
want to start using the ARMv6 TLB format. My current problem is that
most of the arch-dependent code uses macros that are defined to match
the pre-ARMv6 TLB format. There are several ways of fixing this,
including defining these macros depending on some symbol such as
_ARM_ARCH_* or CPU_ARM*. I am however no friend of heavy preprocessor
flagging. What if instead, cpu_functions was extended to include
fields like the prototype for TLB entries of each size? For example,
take this patch to the following excerpt from pmap_map_chunk in
sys/arm/arm/pmap.c:

  /* See if we can use a L2 large page mapping. */
  if (L2_L_MAPPABLE_P(va, pa, resid)) {
  #ifdef VERBOSE_INIT_ARM
          printf("L");
  #endif
          for (i = 0; i < 16; i++) {
                  pte[l2pte_index(va) + i] =
-                     L2_L_PROTO | pa |
+                    cpufuncs.cf_l2_l_proto | pa |
-                     L2_L_PROT(PTE_KERNEL, prot) | f2l;
+                    cpufuncs.l2_l_prot(PTE_KERNEL, prot) | f2l;
                      PTE_SYNC(&pte[l2pte_index(va) + i]);
          }
          va += L2_L_SIZE;
          pa += L2_L_SIZE;
          resid -= L2_L_SIZE;
          continue;
  }

Would that be acceptable?

Now, assuming people agree with this change, that would only be a
first step because all values for cpufuncs are defined in the same
file (cpufunc.c), which is guarded with as many CPU_ARMx defines as
there are cpu flavors. Is there a specific reason for all these
structures to be defined in a same file, instead of defining it in a
platform- or cpu-specific file and using the files.* to select the
appropriate cpufunc flavor in the build system?

Guillaume
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by Mark Tinguely :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


>  I am continuing my effort to port FreeBSD to the BeagleBoard. I
>  reached the point where the system prompts for the root filesystem. I
>  am therefore cleaning up my code and will then post it to this list
>  for comments. I still have a few hacky fixups to remove before it
>  becomes readable :)

Congratulations.

>  I know that Mark Tinguely, whose help has been precious in this
>  endeavor, has some patches ready for ARMv6 cache management, so I did
>  not focus on this.

It is all untested. I am wondering to myself with small number of TBL
on the OMAP, should we use the new ASID process identifier and not flush
the TBLs or just flush them on context switch.

>  At the moment, I am using backward-compatibility for the TLB format. I
>  want to start using the ARMv6 TLB format. My current problem is that
>  most of the arch-dependent code uses macros that are defined to match
>  the pre-ARMv6 TLB format. There are several ways of fixing this,
>  including defining these macros depending on some symbol such as
>  _ARM_ARCH_* or CPU_ARM*. I am however no friend of heavy preprocessor
>  flagging. What if instead, cpu_functions was extended to include
>  fields like the prototype for TLB entries of each size? For example,
>  take this patch to the following excerpt from pmap_map_chunk in
>  sys/arm/arm/pmap.c:
 [deleted example]
>  Now, assuming people agree with this change, that would only be a
>  first step because all values for cpufuncs are defined in the same
>  file (cpufunc.c), which is guarded with as many CPU_ARMx defines as
>  there are cpu flavors. Is there a specific reason for all these
>  structures to be defined in a same file, instead of defining it in a
>  platform- or cpu-specific file and using the files.* to select the
>  appropriate cpufunc flavor in the build system?

I have been pondering the current L1/L2 (in this context we mean Page
Directory entries and Page Table entries, not cache levels) values for
cache, protection, the C/B/TEX (and now the global, secure, and no execute)
and masks for a while. Some of these values are variables and some are
defines. It can be confusing to track down a value for a ARCH/board.

ARM kernels are very specific for processor and board. IMO, we should be
moving some of these settings into a processor [and board] directories/files
and make them more consistant. I am not saying anything bad about NetBSD
that originate these files nor the additions that have been made since,
what I am saying is it would be nice to have a big reorganization;
that takes time and therefore money.

                        -- on a tangent about the future --
Since the ARMv7 is coming to FreeBSD, there are other ARMv4/5 vrs ARMv6/7
questions, the most important is should we break the new ARM chips with
their physical tagged caches to another subbranch or define it into the
existing code? One example of the existing pmap code that does not mesh
well with ARMv6/7 is the exisiting flush of the level 2 cache because the
old archs have VIVT level 2 caches). ARMv6/7 level 2 caches are PIPT,
and would not be flushed until DMA time. A simple solution would be if
an arch needs to flush the level 2 cache when it flushes the level 1
cache, then it should do so in the level 1 cache flushing routine.

--Mark Tinguely.
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by Rafal Jaworowski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 2009-10-08, at 18:13, Mark Tinguely wrote:

> -- on a tangent about the future --
> Since the ARMv7 is coming to FreeBSD, there are other ARMv4/5 vrs  
> ARMv6/7
> questions, the most important is should we break the new ARM chips  
> with
> their physical tagged caches to another subbranch or define it into  
> the
> existing code? One example of the existing pmap code that does not  
> mesh
> well with ARMv6/7 is the exisiting flush of the level 2 cache  
> because the
> old archs have VIVT level 2 caches). ARMv6/7 level 2 caches are PIPT,
> and would not be flushed until DMA time. A simple solution would be if
> an arch needs to flush the level 2 cache when it flushes the level 1
> cache, then it should do so in the level 1 cache flushing routine.

I was wondering whether a separate pmap module for ARMv6-7 would not  
be the best approach. After all v6-7 should be considered an entirely  
new architecture variation, and we would avoid the very likely #ifdefs  
hell in case of a single pmap.c.

Rafal

_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by Stanislav Sedov-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, 12 Oct 2009 13:15:41 +0200
Rafal Jaworowski <raj@...> mentioned:

>
> On 2009-10-08, at 18:13, Mark Tinguely wrote:
>
> > -- on a tangent about the future --
> > Since the ARMv7 is coming to FreeBSD, there are other ARMv4/5 vrs  
> > ARMv6/7
> > questions, the most important is should we break the new ARM chips  
> > with
> > their physical tagged caches to another subbranch or define it into  
> > the
> > existing code? One example of the existing pmap code that does not  
> > mesh
> > well with ARMv6/7 is the exisiting flush of the level 2 cache  
> > because the
> > old archs have VIVT level 2 caches). ARMv6/7 level 2 caches are PIPT,
> > and would not be flushed until DMA time. A simple solution would be if
> > an arch needs to flush the level 2 cache when it flushes the level 1
> > cache, then it should do so in the level 1 cache flushing routine.
>
> I was wondering whether a separate pmap module for ARMv6-7 would not  
> be the best approach. After all v6-7 should be considered an entirely  
> new architecture variation, and we would avoid the very likely #ifdefs  
> hell in case of a single pmap.c.
>
Yeah, I think that would be the best solution.  We could conditionally
select the right pmap.c file based on the target CPU selected (just
like we do for board variations for at91/marvell).

--
Stanislav Sedov
ST4096-RIPE


attachment0 (817 bytes) Download Attachment

Re: Adding members to struct cpu_functions

by Guillaume Ballet :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Oct 12, 2009 at 1:36 PM, Stanislav Sedov <stas@...> wrote:

> On Mon, 12 Oct 2009 13:15:41 +0200
> Rafal Jaworowski <raj@...> mentioned:
>
>>
>> On 2009-10-08, at 18:13, Mark Tinguely wrote:
>>
>> >                     -- on a tangent about the future --
>> > Since the ARMv7 is coming to FreeBSD, there are other ARMv4/5 vrs
>> > ARMv6/7
>> > questions, the most important is should we break the new ARM chips
>> > with
>> > their physical tagged caches to another subbranch or define it into
>> > the
>> > existing code? One example of the existing pmap code that does not
>> > mesh
>> > well with ARMv6/7 is the exisiting flush of the level 2 cache
>> > because the
>> > old archs have VIVT level 2 caches). ARMv6/7 level 2 caches are PIPT,
>> > and would not be flushed until DMA time. A simple solution would be if
>> > an arch needs to flush the level 2 cache when it flushes the level 1
>> > cache, then it should do so in the level 1 cache flushing routine.
>>
>> I was wondering whether a separate pmap module for ARMv6-7 would not
>> be the best approach. After all v6-7 should be considered an entirely
>> new architecture variation, and we would avoid the very likely #ifdefs
>> hell in case of a single pmap.c.
>>
>
> Yeah, I think that would be the best solution.  We could conditionally
> select the right pmap.c file based on the target CPU selected (just
> like we do for board variations for at91/marvell).
>

pmap.c is a very large file that seems to change very often. I fear
having several versions is going to be difficult to maintain. Granted,
I haven't read the whole file line after line. Yet it seems to me its
content can be abstracted to rely on arch-specific functions that
would be found in cpufuncs instead of hardcoded macros. Is there
something fundamentally wrong with enhancing struct cpufunc in order
to let the portmeisters decide what the MMU and caching bits should
look like? This is a blocking issue for me, since it looks like the
omap has some problem with backward compatibility mode. Without fixing
up the TLBs in my initarm function, it doesn't work.

Speaking of #ifdef hell, why not breaking cpufuncs.c into several
cpufuncs_<myarch>.c? That would be a good way to start that
reorganization Mark has been talking about in his email.

Guillaume
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by Nathan Whitehorn-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Guillaume Ballet wrote:

> On Mon, Oct 12, 2009 at 1:36 PM, Stanislav Sedov <stas@...> wrote:
>  
>> On Mon, 12 Oct 2009 13:15:41 +0200
>> Rafal Jaworowski <raj@...> mentioned:
>>
>>    
>>> On 2009-10-08, at 18:13, Mark Tinguely wrote:
>>>
>>>      
>>>>                     -- on a tangent about the future --
>>>> Since the ARMv7 is coming to FreeBSD, there are other ARMv4/5 vrs
>>>> ARMv6/7
>>>> questions, the most important is should we break the new ARM chips
>>>> with
>>>> their physical tagged caches to another subbranch or define it into
>>>> the
>>>> existing code? One example of the existing pmap code that does not
>>>> mesh
>>>> well with ARMv6/7 is the exisiting flush of the level 2 cache
>>>> because the
>>>> old archs have VIVT level 2 caches). ARMv6/7 level 2 caches are PIPT,
>>>> and would not be flushed until DMA time. A simple solution would be if
>>>> an arch needs to flush the level 2 cache when it flushes the level 1
>>>> cache, then it should do so in the level 1 cache flushing routine.
>>>>        
>>> I was wondering whether a separate pmap module for ARMv6-7 would not
>>> be the best approach. After all v6-7 should be considered an entirely
>>> new architecture variation, and we would avoid the very likely #ifdefs
>>> hell in case of a single pmap.c.
>>>
>>>      
>> Yeah, I think that would be the best solution.  We could conditionally
>> select the right pmap.c file based on the target CPU selected (just
>> like we do for board variations for at91/marvell).
>>
>>    
>
> pmap.c is a very large file that seems to change very often. I fear
> having several versions is going to be difficult to maintain. Granted,
> I haven't read the whole file line after line. Yet it seems to me its
> content can be abstracted to rely on arch-specific functions that
> would be found in cpufuncs instead of hardcoded macros. Is there
> something fundamentally wrong with enhancing struct cpufunc in order
> to let the portmeisters decide what the MMU and caching bits should
> look like? This is a blocking issue for me, since it looks like the
> omap has some problem with backward compatibility mode. Without fixing
> up the TLBs in my initarm function, it doesn't work.
>
> Speaking of #ifdef hell, why not breaking cpufuncs.c into several
> cpufuncs_<myarch>.c? That would be a good way to start that
> reorganization Mark has been talking about in his email.
>  
One thing that might be worth looking at while thinking about this is
how this is done on PowerPC. We have run-time selectable PMAP modules
using KOBJ to handle CPUs with different MMU designs, as well as a
platform module scheme, again using KOBJ, to pick the appropriate PMAP
for the board as well as determine the physical memory layout and such
things. One of the nice things about the approach is that it is easy to
subclass if you have a new, marginally different, design, and it avoids
#ifdef hell as well as letting you build a GENERIC kernel with support
for multiple MMU designs and board types (the last less of a concern on
ARM, though).
-Nathan
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by Rafal Jaworowski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 2009-10-12, at 15:21, Nathan Whitehorn wrote:

>>>> I was wondering whether a separate pmap module for ARMv6-7 would  
>>>> not
>>>> be the best approach. After all v6-7 should be considered an  
>>>> entirely
>>>> new architecture variation, and we would avoid the very likely  
>>>> #ifdefs
>>>> hell in case of a single pmap.c.
>>>>
>>>>
>>> Yeah, I think that would be the best solution.  We could  
>>> conditionally
>>> select the right pmap.c file based on the target CPU selected (just
>>> like we do for board variations for at91/marvell).
>>>
>>>
>>
>> pmap.c is a very large file that seems to change very often. I fear
>> having several versions is going to be difficult to maintain.  
>> Granted,
>> I haven't read the whole file line after line. Yet it seems to me its
>> content can be abstracted to rely on arch-specific functions that
>> would be found in cpufuncs instead of hardcoded macros. Is there
>> something fundamentally wrong with enhancing struct cpufunc in order
>> to let the portmeisters decide what the MMU and caching bits should
>> look like? This is a blocking issue for me, since it looks like the
>> omap has some problem with backward compatibility mode. Without  
>> fixing
>> up the TLBs in my initarm function, it doesn't work.
>>
>> Speaking of #ifdef hell, why not breaking cpufuncs.c into several
>> cpufuncs_<myarch>.c? That would be a good way to start that
>> reorganization Mark has been talking about in his email.
>>
> One thing that might be worth looking at while thinking about this  
> is how this is done on PowerPC. We have run-time selectable PMAP  
> modules using KOBJ to handle CPUs with different MMU designs, as  
> well as a platform module scheme, again using KOBJ, to pick the  
> appropriate PMAP for the board as well as determine the physical  
> memory layout and such things. One of the nice things about the  
> approach is that it is easy to subclass if you have a new,  
> marginally different, design, and it avoids #ifdef hell as well as  
> letting you build a GENERIC kernel with support for multiple MMU  
> designs and board types (the last less of a concern on ARM, though).

What always concerned me was the performance cost this imposes, and it  
would be a really useful exercise to measure what is the actual impact  
of KOBJ-tized pmap we have in PowerPC; with an often-called interface  
like pmap it might occur the penalty is not that little..

Rafal

_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by Guillaume Ballet :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Oct 12, 2009 at 5:07 PM, Rafal Jaworowski <raj@...> wrote:

>
> On 2009-10-12, at 15:21, Nathan Whitehorn wrote:
>
>>>>> I was wondering whether a separate pmap module for ARMv6-7 would not
>>>>> be the best approach. After all v6-7 should be considered an entirely
>>>>> new architecture variation, and we would avoid the very likely #ifdefs
>>>>> hell in case of a single pmap.c.
>>>>>
>>>>>
>>>> Yeah, I think that would be the best solution.  We could conditionally
>>>> select the right pmap.c file based on the target CPU selected (just
>>>> like we do for board variations for at91/marvell).
>>>>
>>>>
>>>
>>> pmap.c is a very large file that seems to change very often. I fear
>>> having several versions is going to be difficult to maintain. Granted,
>>> I haven't read the whole file line after line. Yet it seems to me its
>>> content can be abstracted to rely on arch-specific functions that
>>> would be found in cpufuncs instead of hardcoded macros. Is there
>>> something fundamentally wrong with enhancing struct cpufunc in order
>>> to let the portmeisters decide what the MMU and caching bits should
>>> look like? This is a blocking issue for me, since it looks like the
>>> omap has some problem with backward compatibility mode. Without fixing
>>> up the TLBs in my initarm function, it doesn't work.
>>>
>>> Speaking of #ifdef hell, why not breaking cpufuncs.c into several
>>> cpufuncs_<myarch>.c? That would be a good way to start that
>>> reorganization Mark has been talking about in his email.
>>>
>> One thing that might be worth looking at while thinking about this is how
>> this is done on PowerPC. We have run-time selectable PMAP modules using KOBJ
>> to handle CPUs with different MMU designs, as well as a platform module
>> scheme, again using KOBJ, to pick the appropriate PMAP for the board as well
>> as determine the physical memory layout and such things. One of the nice
>> things about the approach is that it is easy to subclass if you have a new,
>> marginally different, design, and it avoids #ifdef hell as well as letting
>> you build a GENERIC kernel with support for multiple MMU designs and board
>> types (the last less of a concern on ARM, though).
>
> What always concerned me was the performance cost this imposes, and it would
> be a really useful exercise to measure what is the actual impact of
> KOBJ-tized pmap we have in PowerPC; with an often-called interface like pmap
> it might occur the penalty is not that little..
>
> Rafal
>
>

Good point. Using KOBJs this way is really cool, but the overhead is
going to be a concern if it is used by an application that allocates
memory very often. This is not the case of most embedded appliances I
worked with, still one should not assume anything about the userland
at kernel level.
As a result, extending the struct cpu_functions is not a good thing
either, for the same reason. The compiler can not inline a call
through a function pointer.

In which case, why not create a bunch of headers files with the
pattern cpufunc_myarch.h, in which all functions would be declared
inline? Something like:

static inline l2_l_entry(vm_addr_t pa, int prot, int cache);
static inline l2_s_entry(vm_addr_t pa, int prot, int cache);
...

which would then be included by pmap.c and friends.
One problem is that such a change affects all platforms at the same
time, and therefore requires all portmeisters to implement the
functions that are needed. That should not be too difficult, though,
because so far it was the same macros that were used by all platforms.
Another problem is that it requires some build script magic to make
sure the correct header is included depending on the arch. I wonder if
this is easy?

Guillaume
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by Nathan Whitehorn-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Rafal Jaworowski wrote:

>
> On 2009-10-12, at 15:21, Nathan Whitehorn wrote:
>
>>>>> I was wondering whether a separate pmap module for ARMv6-7 would not
>>>>> be the best approach. After all v6-7 should be considered an entirely
>>>>> new architecture variation, and we would avoid the very likely
>>>>> #ifdefs
>>>>> hell in case of a single pmap.c.
>>>>>
>>>>>
>>>> Yeah, I think that would be the best solution.  We could conditionally
>>>> select the right pmap.c file based on the target CPU selected (just
>>>> like we do for board variations for at91/marvell).
>>>>
>>>>
>>>
>>> pmap.c is a very large file that seems to change very often. I fear
>>> having several versions is going to be difficult to maintain. Granted,
>>> I haven't read the whole file line after line. Yet it seems to me its
>>> content can be abstracted to rely on arch-specific functions that
>>> would be found in cpufuncs instead of hardcoded macros. Is there
>>> something fundamentally wrong with enhancing struct cpufunc in order
>>> to let the portmeisters decide what the MMU and caching bits should
>>> look like? This is a blocking issue for me, since it looks like the
>>> omap has some problem with backward compatibility mode. Without fixing
>>> up the TLBs in my initarm function, it doesn't work.
>>>
>>> Speaking of #ifdef hell, why not breaking cpufuncs.c into several
>>> cpufuncs_<myarch>.c? That would be a good way to start that
>>> reorganization Mark has been talking about in his email.
>>>
>> One thing that might be worth looking at while thinking about this is
>> how this is done on PowerPC. We have run-time selectable PMAP modules
>> using KOBJ to handle CPUs with different MMU designs, as well as a
>> platform module scheme, again using KOBJ, to pick the appropriate
>> PMAP for the board as well as determine the physical memory layout
>> and such things. One of the nice things about the approach is that it
>> is easy to subclass if you have a new, marginally different, design,
>> and it avoids #ifdef hell as well as letting you build a GENERIC
>> kernel with support for multiple MMU designs and board types (the
>> last less of a concern on ARM, though).
>
> What always concerned me was the performance cost this imposes, and it
> would be a really useful exercise to measure what is the actual impact
> of KOBJ-tized pmap we have in PowerPC; with an often-called interface
> like pmap it might occur the penalty is not that little..
Using the KOBJ cache means that it is only marginally more expensive
than a standard function pointer call. There's a 9-year-old note in the
commit log for sys/sys/kobj.h that it takes about 30% longer to call a
function that does nothing via KOBJ versus a direct call on a 300 MHz P2
(a 10 ns time difference). Given that and that pmap methods do, in fact,
do things besides get called and immediately return, I suspect non-KOBJ
related execution time will dwarf any time loss from the indirection.
I'll try to repeat the measurement in the next few days, however, since
this is important to know.
-Nathan
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by Mark Tinguely :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


>  As a result, extending the struct cpu_functions is not a good thing
>  either, for the same reason. The compiler can not inline a call
>  through a function pointer.
>
>  In which case, why not create a bunch of headers files with the
>  pattern cpufunc_myarch.h, in which all functions would be declared
>  inline? Something like:
>
>  static inline l2_l_entry(vm_addr_t pa, int prot, int cache);
>  static inline l2_s_entry(vm_addr_t pa, int prot, int cache);
>  ...
>  which would then be included by pmap.c and friends.

I think they need to be regular function calls because assembly routines
call the per-cpu functions. A few simple macros would save the branch to NOP
functions.

>  One problem is that such a change affects all platforms at the same
>  time, and therefore requires all portmeisters to implement the
>  functions that are needed. That should not be too difficult, though,
>  because so far it was the same macros that were used by all platforms.
>  Another problem is that it requires some build script magic to make
>  sure the correct header is included depending on the arch. I wonder if
>  this is easy?


--Mark Tinguely
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by Guillaume Ballet :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Oct 12, 2009 at 11:29 PM, Mark Tinguely <tinguely@...> wrote:

>
>>  As a result, extending the struct cpu_functions is not a good thing
>>  either, for the same reason. The compiler can not inline a call
>>  through a function pointer.
>>
>>  In which case, why not create a bunch of headers files with the
>>  pattern cpufunc_myarch.h, in which all functions would be declared
>>  inline? Something like:
>>
>>  static inline l2_l_entry(vm_addr_t pa, int prot, int cache);
>>  static inline l2_s_entry(vm_addr_t pa, int prot, int cache);
>>  ...
>>  which would then be included by pmap.c and friends.
>
> I think they need to be regular function calls because assembly routines
> call the per-cpu functions. A few simple macros would save the branch to NOP
> functions.
>

I'm not sure what you mean by that: would macros be ok, in your
opinion? I am a bit puzzled because I see a contradiction with the
previous sentence that requires the functions to be callable from the
assembly code. Obviously I am misinterpreting, so would you mind
clarifying, please?

I think it is important to notice that even though cache management
relies a lot on assembly function, I haven't found any page table
management done in assembly past locore.S. I think using macros for
page table management functions can be done. For cache management,
however, I agree that having different pmap.c files is probably the
way to go. In both cases, I am still curious to see what Nathan will
come up with.

I took a more thorough look at pmap, and there is indeed lots of
machine-specific code, especially at the beginning. And when it comes
to cpufunc, it's all about #ifdefs. Since I'm still working on the
cleanup for the beagleboard, I will declare cpufuncs in an
armv6-specific file. Let's call it cpufunc_armv6.c. I am struggling
with another MMU problem at the moment, but I'll try to come up asap
with a patch for pmap.c. It will replace hardcoded values with
machine-defined macros, for reference.

Guillaume
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by Mark Tinguely :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>  >> =A0either, for the same reason. The compiler can not inline a call
>  >> =A0through a function pointer.
>  >>
>  >> =A0In which case, why not create a bunch of headers files with the
>  >> =A0pattern cpufunc_myarch.h, in which all functions would be declared
>  >> =A0inline? Something like:
>  >>
>  >> =A0static inline l2_l_entry(vm_addr_t pa, int prot, int cache);
>  >> =A0static inline l2_s_entry(vm_addr_t pa, int prot, int cache);
>  >> =A0...
>  >> =A0which would then be included by pmap.c and friends.
>  >
>  > I think they need to be regular function calls because assembly routines
>  > call the per-cpu functions. A few simple macros would save the branch to =
>  NOP
>  > functions.
>  >
>
>  I'm not sure what you mean by that: would macros be ok, in your
>  opinion? I am a bit puzzled because I see a contradiction with the
>  previous sentence that requires the functions to be callable from the
>  assembly code. Obviously I am misinterpreting, so would you mind
>  clarifying, please?
>
>  I think it is important to notice that even though cache management
>  relies a lot on assembly function, I haven't found any page table
>  management done in assembly past locore.S. I think using macros for
>  page table management functions can be done. For cache management,
>  however, I agree that having different pmap.c files is probably the
>  way to go. In both cases, I am still curious to see what Nathan will
>  come up with.

You are correct, the page tables routines are pmap.c oriented.
I extended clean up thought to all the cpu specific functions.
There are cpu specific functions that are NOPs that we branch to
and back again. I was just throwing out a global re-organization
thought.

>  I took a more thorough look at pmap, and there is indeed lots of
>  machine-specific code, especially at the beginning. And when it comes
>  to cpufunc, it's all about #ifdefs. Since I'm still working on the
>  cleanup for the beagleboard, I will declare cpufuncs in an
>  armv6-specific file. Let's call it cpufunc_armv6.c. I am struggling
>  with another MMU problem at the moment, but I'll try to come up asap
>  with a patch for pmap.c. It will replace hardcoded values with
>  machine-defined macros, for reference.

I think you are running that processor in v5 mode. There is still some
individuals looking at a cache problem with recent code. I still believe,
we need to add the PVF_REF flag when adding the new unmanaged (PVF_UNMAN)
pv_entry, so pmap_fix_cache() will clean write back the cache and remove
the tlb. That and the changes to remove dangling allocations.

--Mark.
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by Nathan Whitehorn-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nathan Whitehorn wrote:

> Rafal Jaworowski wrote:
>>
>> On 2009-10-12, at 15:21, Nathan Whitehorn wrote:
>>
>>>>>> I was wondering whether a separate pmap module for ARMv6-7 would not
>>>>>> be the best approach. After all v6-7 should be considered an
>>>>>> entirely
>>>>>> new architecture variation, and we would avoid the very likely
>>>>>> #ifdefs
>>>>>> hell in case of a single pmap.c.
>>>>>>
>>>>>>
>>>>> Yeah, I think that would be the best solution.  We could
>>>>> conditionally
>>>>> select the right pmap.c file based on the target CPU selected (just
>>>>> like we do for board variations for at91/marvell).
>>>>>
>>>>>
>>>>
>>>> pmap.c is a very large file that seems to change very often. I fear
>>>> having several versions is going to be difficult to maintain. Granted,
>>>> I haven't read the whole file line after line. Yet it seems to me its
>>>> content can be abstracted to rely on arch-specific functions that
>>>> would be found in cpufuncs instead of hardcoded macros. Is there
>>>> something fundamentally wrong with enhancing struct cpufunc in order
>>>> to let the portmeisters decide what the MMU and caching bits should
>>>> look like? This is a blocking issue for me, since it looks like the
>>>> omap has some problem with backward compatibility mode. Without fixing
>>>> up the TLBs in my initarm function, it doesn't work.
>>>>
>>>> Speaking of #ifdef hell, why not breaking cpufuncs.c into several
>>>> cpufuncs_<myarch>.c? That would be a good way to start that
>>>> reorganization Mark has been talking about in his email.
>>>>
>>> One thing that might be worth looking at while thinking about this
>>> is how this is done on PowerPC. We have run-time selectable PMAP
>>> modules using KOBJ to handle CPUs with different MMU designs, as
>>> well as a platform module scheme, again using KOBJ, to pick the
>>> appropriate PMAP for the board as well as determine the physical
>>> memory layout and such things. One of the nice things about the
>>> approach is that it is easy to subclass if you have a new,
>>> marginally different, design, and it avoids #ifdef hell as well as
>>> letting you build a GENERIC kernel with support for multiple MMU
>>> designs and board types (the last less of a concern on ARM, though).
>>
>> What always concerned me was the performance cost this imposes, and
>> it would be a really useful exercise to measure what is the actual
>> impact of KOBJ-tized pmap we have in PowerPC; with an often-called
>> interface like pmap it might occur the penalty is not that little..
> Using the KOBJ cache means that it is only marginally more expensive
> than a standard function pointer call. There's a 9-year-old note in
> the commit log for sys/sys/kobj.h that it takes about 30% longer to
> call a function that does nothing via KOBJ versus a direct call on a
> 300 MHz P2 (a 10 ns time difference). Given that and that pmap methods
> do, in fact, do things besides get called and immediately return, I
> suspect non-KOBJ related execution time will dwarf any time loss from
> the indirection. I'll try to repeat the measurement in the next few
> days, however, since this is important to know.
> -Nathan
I just did the measurements on a 1.8 GHz PowerPC G5. There were four
tests, each repeated 1 million times. "Load and store" involves
incrementing a volatile int from 0 to 1e6 inline. "Direct calls"
involves a branch to a function that returns 0 and does nothing else.
"Function ptr" calls the same function via a pointer stored in a
register, and "KOBJ calls" calls it via KOBJ. Here are the results
(errors are +/- 0.5 ns for the function call measurements due to
compiler optimization jitter, and 0 for load and store, since that takes
a deterministic number of clock cycles):

32-bit kernel:
Load and store:  26.1 ns
Direct calls:   7.2 ns
Function ptr:   8.4 ns
KOBJ calls:     17.8 ns

64-bit kernel:
Load and store:  9.2 ns
Direct calls:   6.1 ns
Function ptr:   8.3 ns
KOBJ calls:     40.5 ns

ABI changes make a large difference, as you can see. The cost of calling
via KOBJ is non-negligible, but small, especially compared to the cost
of doing anything involving memory. I don't know how this changes with
ARM calling conventions.
-Nathan
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by Rafal Jaworowski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 2009-10-18, at 17:49, Nathan Whitehorn wrote:

>>>> One thing that might be worth looking at while thinking about  
>>>> this is how this is done on PowerPC. We have run-time selectable  
>>>> PMAP modules using KOBJ to handle CPUs with different MMU  
>>>> designs, as well as a platform module scheme, again using KOBJ,  
>>>> to pick the appropriate PMAP for the board as well as determine  
>>>> the physical memory layout and such things. One of the nice  
>>>> things about the approach is that it is easy to subclass if you  
>>>> have a new, marginally different, design, and it avoids #ifdef  
>>>> hell as well as letting you build a GENERIC kernel with support  
>>>> for multiple MMU designs and board types (the last less of a  
>>>> concern on ARM, though).
>>>
>>> What always concerned me was the performance cost this imposes,  
>>> and it would be a really useful exercise to measure what is the  
>>> actual impact of KOBJ-tized pmap we have in PowerPC; with an often-
>>> called interface like pmap it might occur the penalty is not that  
>>> little..
>> Using the KOBJ cache means that it is only marginally more  
>> expensive than a standard function pointer call. There's a 9-year-
>> old note in the commit log for sys/sys/kobj.h that it takes about  
>> 30% longer to call a function that does nothing via KOBJ versus a  
>> direct call on a 300 MHz P2 (a 10 ns time difference). Given that  
>> and that pmap methods do, in fact, do things besides get called and  
>> immediately return, I suspect non-KOBJ related execution time will  
>> dwarf any time loss from the indirection. I'll try to repeat the  
>> measurement in the next few days, however, since this is important  
>> to know.
>> -Nathan
> I just did the measurements on a 1.8 GHz PowerPC G5. There were four  
> tests, each repeated 1 million times. "Load and store" involves  
> incrementing a volatile int from 0 to 1e6 inline. "Direct calls"  
> involves a branch to a function that returns 0 and does nothing  
> else. "Function ptr" calls the same function via a pointer stored in  
> a register, and "KOBJ calls" calls it via KOBJ. Here are the results  
> (errors are +/- 0.5 ns for the function call measurements due to  
> compiler optimization jitter, and 0 for load and store, since that  
> takes a deterministic number of clock cycles):
>
> 32-bit kernel:
> Load and store:  26.1 ns
> Direct calls:   7.2 ns
> Function ptr:   8.4 ns
> KOBJ calls:     17.8 ns
>
> 64-bit kernel:
> Load and store:  9.2 ns
> Direct calls:   6.1 ns
> Function ptr:   8.3 ns
> KOBJ calls:     40.5 ns
>
> ABI changes make a large difference, as you can see. The cost of  
> calling via KOBJ is non-negligible, but small, especially compared  
> to the cost of doing anything involving memory. I don't know how  
> this changes with ARM calling conventions.

Very interesting, thanks! Could you elaborate on the testing details  
and share the diagnostic code so we could repeat this with other CPU  
variations like Book-E PowerPC, or ARM?

Rafal

_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."

Re: Adding members to struct cpu_functions

by M. Warner Losh :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In message: <05B19969-B238-4E3A-8326-624067F0362B@...>
            Rafal Jaworowski <raj@...> writes:
: On 2009-10-18, at 17:49, Nathan Whitehorn wrote:
[[ trimmed ]]
: > I just did the measurements on a 1.8 GHz PowerPC G5. There were four  
: > tests, each repeated 1 million times. "Load and store" involves  
: > incrementing a volatile int from 0 to 1e6 inline. "Direct calls"  
: > involves a branch to a function that returns 0 and does nothing  
: > else. "Function ptr" calls the same function via a pointer stored in  
: > a register, and "KOBJ calls" calls it via KOBJ. Here are the results  
: > (errors are +/- 0.5 ns for the function call measurements due to  
: > compiler optimization jitter, and 0 for load and store, since that  
: > takes a deterministic number of clock cycles):
: >
: > 32-bit kernel:
: > Load and store:  26.1 ns
: > Direct calls:   7.2 ns
: > Function ptr:   8.4 ns
: > KOBJ calls:     17.8 ns
: >
: > 64-bit kernel:
: > Load and store:  9.2 ns
: > Direct calls:   6.1 ns
: > Function ptr:   8.3 ns
: > KOBJ calls:     40.5 ns
: >
: > ABI changes make a large difference, as you can see. The cost of  
: > calling via KOBJ is non-negligible, but small, especially compared  
: > to the cost of doing anything involving memory. I don't know how  
: > this changes with ARM calling conventions.
:
: Very interesting, thanks! Could you elaborate on the testing details  
: and share the diagnostic code so we could repeat this with other CPU  
: variations like Book-E PowerPC, or ARM?

I'd love to see this on MIPS too...

KOBJ is a big win for device configuration, where one memory I/O can
take 60 times these call numbers...

Warner
_______________________________________________
freebsd-arm@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..."