|
View:
New views
15 Messages
—
Rating Filter:
Alert me
|
|
|
Adding members to struct cpu_functionsHello list,
I am continuing my effort to port FreeBSD to the BeagleBoard. I reached the point where the system prompts for the root filesystem. I am therefore cleaning up my code and will then post it to this list for comments. I still have a few hacky fixups to remove before it becomes readable :) I know that Mark Tinguely, whose help has been precious in this endeavor, has some patches ready for ARMv6 cache management, so I did not focus on this. At the moment, I am using backward-compatibility for the TLB format. I want to start using the ARMv6 TLB format. My current problem is that most of the arch-dependent code uses macros that are defined to match the pre-ARMv6 TLB format. There are several ways of fixing this, including defining these macros depending on some symbol such as _ARM_ARCH_* or CPU_ARM*. I am however no friend of heavy preprocessor flagging. What if instead, cpu_functions was extended to include fields like the prototype for TLB entries of each size? For example, take this patch to the following excerpt from pmap_map_chunk in sys/arm/arm/pmap.c: /* See if we can use a L2 large page mapping. */ if (L2_L_MAPPABLE_P(va, pa, resid)) { #ifdef VERBOSE_INIT_ARM printf("L"); #endif for (i = 0; i < 16; i++) { pte[l2pte_index(va) + i] = - L2_L_PROTO | pa | + cpufuncs.cf_l2_l_proto | pa | - L2_L_PROT(PTE_KERNEL, prot) | f2l; + cpufuncs.l2_l_prot(PTE_KERNEL, prot) | f2l; PTE_SYNC(&pte[l2pte_index(va) + i]); } va += L2_L_SIZE; pa += L2_L_SIZE; resid -= L2_L_SIZE; continue; } Would that be acceptable? Now, assuming people agree with this change, that would only be a first step because all values for cpufuncs are defined in the same file (cpufunc.c), which is guarded with as many CPU_ARMx defines as there are cpu flavors. Is there a specific reason for all these structures to be defined in a same file, instead of defining it in a platform- or cpu-specific file and using the files.* to select the appropriate cpufunc flavor in the build system? Guillaume _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functions> I am continuing my effort to port FreeBSD to the BeagleBoard. I > reached the point where the system prompts for the root filesystem. I > am therefore cleaning up my code and will then post it to this list > for comments. I still have a few hacky fixups to remove before it > becomes readable :) Congratulations. > I know that Mark Tinguely, whose help has been precious in this > endeavor, has some patches ready for ARMv6 cache management, so I did > not focus on this. It is all untested. I am wondering to myself with small number of TBL on the OMAP, should we use the new ASID process identifier and not flush the TBLs or just flush them on context switch. > At the moment, I am using backward-compatibility for the TLB format. I > want to start using the ARMv6 TLB format. My current problem is that > most of the arch-dependent code uses macros that are defined to match > the pre-ARMv6 TLB format. There are several ways of fixing this, > including defining these macros depending on some symbol such as > _ARM_ARCH_* or CPU_ARM*. I am however no friend of heavy preprocessor > flagging. What if instead, cpu_functions was extended to include > fields like the prototype for TLB entries of each size? For example, > take this patch to the following excerpt from pmap_map_chunk in > sys/arm/arm/pmap.c: > Now, assuming people agree with this change, that would only be a > first step because all values for cpufuncs are defined in the same > file (cpufunc.c), which is guarded with as many CPU_ARMx defines as > there are cpu flavors. Is there a specific reason for all these > structures to be defined in a same file, instead of defining it in a > platform- or cpu-specific file and using the files.* to select the > appropriate cpufunc flavor in the build system? I have been pondering the current L1/L2 (in this context we mean Page Directory entries and Page Table entries, not cache levels) values for cache, protection, the C/B/TEX (and now the global, secure, and no execute) and masks for a while. Some of these values are variables and some are defines. It can be confusing to track down a value for a ARCH/board. ARM kernels are very specific for processor and board. IMO, we should be moving some of these settings into a processor [and board] directories/files and make them more consistant. I am not saying anything bad about NetBSD that originate these files nor the additions that have been made since, what I am saying is it would be nice to have a big reorganization; that takes time and therefore money. -- on a tangent about the future -- Since the ARMv7 is coming to FreeBSD, there are other ARMv4/5 vrs ARMv6/7 questions, the most important is should we break the new ARM chips with their physical tagged caches to another subbranch or define it into the existing code? One example of the existing pmap code that does not mesh well with ARMv6/7 is the exisiting flush of the level 2 cache because the old archs have VIVT level 2 caches). ARMv6/7 level 2 caches are PIPT, and would not be flushed until DMA time. A simple solution would be if an arch needs to flush the level 2 cache when it flushes the level 1 cache, then it should do so in the level 1 cache flushing routine. --Mark Tinguely. _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functionsOn 2009-10-08, at 18:13, Mark Tinguely wrote: > -- on a tangent about the future -- > Since the ARMv7 is coming to FreeBSD, there are other ARMv4/5 vrs > ARMv6/7 > questions, the most important is should we break the new ARM chips > with > their physical tagged caches to another subbranch or define it into > the > existing code? One example of the existing pmap code that does not > mesh > well with ARMv6/7 is the exisiting flush of the level 2 cache > because the > old archs have VIVT level 2 caches). ARMv6/7 level 2 caches are PIPT, > and would not be flushed until DMA time. A simple solution would be if > an arch needs to flush the level 2 cache when it flushes the level 1 > cache, then it should do so in the level 1 cache flushing routine. I was wondering whether a separate pmap module for ARMv6-7 would not be the best approach. After all v6-7 should be considered an entirely new architecture variation, and we would avoid the very likely #ifdefs hell in case of a single pmap.c. Rafal _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functionsOn Mon, 12 Oct 2009 13:15:41 +0200
Rafal Jaworowski <raj@...> mentioned: > > On 2009-10-08, at 18:13, Mark Tinguely wrote: > > > -- on a tangent about the future -- > > Since the ARMv7 is coming to FreeBSD, there are other ARMv4/5 vrs > > ARMv6/7 > > questions, the most important is should we break the new ARM chips > > with > > their physical tagged caches to another subbranch or define it into > > the > > existing code? One example of the existing pmap code that does not > > mesh > > well with ARMv6/7 is the exisiting flush of the level 2 cache > > because the > > old archs have VIVT level 2 caches). ARMv6/7 level 2 caches are PIPT, > > and would not be flushed until DMA time. A simple solution would be if > > an arch needs to flush the level 2 cache when it flushes the level 1 > > cache, then it should do so in the level 1 cache flushing routine. > > I was wondering whether a separate pmap module for ARMv6-7 would not > be the best approach. After all v6-7 should be considered an entirely > new architecture variation, and we would avoid the very likely #ifdefs > hell in case of a single pmap.c. > select the right pmap.c file based on the target CPU selected (just like we do for board variations for at91/marvell). -- Stanislav Sedov ST4096-RIPE |
|
|
Re: Adding members to struct cpu_functionsOn Mon, Oct 12, 2009 at 1:36 PM, Stanislav Sedov <stas@...> wrote:
> On Mon, 12 Oct 2009 13:15:41 +0200 > Rafal Jaworowski <raj@...> mentioned: > >> >> On 2009-10-08, at 18:13, Mark Tinguely wrote: >> >> > -- on a tangent about the future -- >> > Since the ARMv7 is coming to FreeBSD, there are other ARMv4/5 vrs >> > ARMv6/7 >> > questions, the most important is should we break the new ARM chips >> > with >> > their physical tagged caches to another subbranch or define it into >> > the >> > existing code? One example of the existing pmap code that does not >> > mesh >> > well with ARMv6/7 is the exisiting flush of the level 2 cache >> > because the >> > old archs have VIVT level 2 caches). ARMv6/7 level 2 caches are PIPT, >> > and would not be flushed until DMA time. A simple solution would be if >> > an arch needs to flush the level 2 cache when it flushes the level 1 >> > cache, then it should do so in the level 1 cache flushing routine. >> >> I was wondering whether a separate pmap module for ARMv6-7 would not >> be the best approach. After all v6-7 should be considered an entirely >> new architecture variation, and we would avoid the very likely #ifdefs >> hell in case of a single pmap.c. >> > > Yeah, I think that would be the best solution. We could conditionally > select the right pmap.c file based on the target CPU selected (just > like we do for board variations for at91/marvell). > pmap.c is a very large file that seems to change very often. I fear having several versions is going to be difficult to maintain. Granted, I haven't read the whole file line after line. Yet it seems to me its content can be abstracted to rely on arch-specific functions that would be found in cpufuncs instead of hardcoded macros. Is there something fundamentally wrong with enhancing struct cpufunc in order to let the portmeisters decide what the MMU and caching bits should look like? This is a blocking issue for me, since it looks like the omap has some problem with backward compatibility mode. Without fixing up the TLBs in my initarm function, it doesn't work. Speaking of #ifdef hell, why not breaking cpufuncs.c into several cpufuncs_<myarch>.c? That would be a good way to start that reorganization Mark has been talking about in his email. Guillaume _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functionsGuillaume Ballet wrote:
> On Mon, Oct 12, 2009 at 1:36 PM, Stanislav Sedov <stas@...> wrote: > >> On Mon, 12 Oct 2009 13:15:41 +0200 >> Rafal Jaworowski <raj@...> mentioned: >> >> >>> On 2009-10-08, at 18:13, Mark Tinguely wrote: >>> >>> >>>> -- on a tangent about the future -- >>>> Since the ARMv7 is coming to FreeBSD, there are other ARMv4/5 vrs >>>> ARMv6/7 >>>> questions, the most important is should we break the new ARM chips >>>> with >>>> their physical tagged caches to another subbranch or define it into >>>> the >>>> existing code? One example of the existing pmap code that does not >>>> mesh >>>> well with ARMv6/7 is the exisiting flush of the level 2 cache >>>> because the >>>> old archs have VIVT level 2 caches). ARMv6/7 level 2 caches are PIPT, >>>> and would not be flushed until DMA time. A simple solution would be if >>>> an arch needs to flush the level 2 cache when it flushes the level 1 >>>> cache, then it should do so in the level 1 cache flushing routine. >>>> >>> I was wondering whether a separate pmap module for ARMv6-7 would not >>> be the best approach. After all v6-7 should be considered an entirely >>> new architecture variation, and we would avoid the very likely #ifdefs >>> hell in case of a single pmap.c. >>> >>> >> Yeah, I think that would be the best solution. We could conditionally >> select the right pmap.c file based on the target CPU selected (just >> like we do for board variations for at91/marvell). >> >> > > pmap.c is a very large file that seems to change very often. I fear > having several versions is going to be difficult to maintain. Granted, > I haven't read the whole file line after line. Yet it seems to me its > content can be abstracted to rely on arch-specific functions that > would be found in cpufuncs instead of hardcoded macros. Is there > something fundamentally wrong with enhancing struct cpufunc in order > to let the portmeisters decide what the MMU and caching bits should > look like? This is a blocking issue for me, since it looks like the > omap has some problem with backward compatibility mode. Without fixing > up the TLBs in my initarm function, it doesn't work. > > Speaking of #ifdef hell, why not breaking cpufuncs.c into several > cpufuncs_<myarch>.c? That would be a good way to start that > reorganization Mark has been talking about in his email. > how this is done on PowerPC. We have run-time selectable PMAP modules using KOBJ to handle CPUs with different MMU designs, as well as a platform module scheme, again using KOBJ, to pick the appropriate PMAP for the board as well as determine the physical memory layout and such things. One of the nice things about the approach is that it is easy to subclass if you have a new, marginally different, design, and it avoids #ifdef hell as well as letting you build a GENERIC kernel with support for multiple MMU designs and board types (the last less of a concern on ARM, though). -Nathan _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functionsOn 2009-10-12, at 15:21, Nathan Whitehorn wrote: >>>> I was wondering whether a separate pmap module for ARMv6-7 would >>>> not >>>> be the best approach. After all v6-7 should be considered an >>>> entirely >>>> new architecture variation, and we would avoid the very likely >>>> #ifdefs >>>> hell in case of a single pmap.c. >>>> >>>> >>> Yeah, I think that would be the best solution. We could >>> conditionally >>> select the right pmap.c file based on the target CPU selected (just >>> like we do for board variations for at91/marvell). >>> >>> >> >> pmap.c is a very large file that seems to change very often. I fear >> having several versions is going to be difficult to maintain. >> Granted, >> I haven't read the whole file line after line. Yet it seems to me its >> content can be abstracted to rely on arch-specific functions that >> would be found in cpufuncs instead of hardcoded macros. Is there >> something fundamentally wrong with enhancing struct cpufunc in order >> to let the portmeisters decide what the MMU and caching bits should >> look like? This is a blocking issue for me, since it looks like the >> omap has some problem with backward compatibility mode. Without >> fixing >> up the TLBs in my initarm function, it doesn't work. >> >> Speaking of #ifdef hell, why not breaking cpufuncs.c into several >> cpufuncs_<myarch>.c? That would be a good way to start that >> reorganization Mark has been talking about in his email. >> > One thing that might be worth looking at while thinking about this > is how this is done on PowerPC. We have run-time selectable PMAP > modules using KOBJ to handle CPUs with different MMU designs, as > well as a platform module scheme, again using KOBJ, to pick the > appropriate PMAP for the board as well as determine the physical > memory layout and such things. One of the nice things about the > approach is that it is easy to subclass if you have a new, > marginally different, design, and it avoids #ifdef hell as well as > letting you build a GENERIC kernel with support for multiple MMU > designs and board types (the last less of a concern on ARM, though). What always concerned me was the performance cost this imposes, and it would be a really useful exercise to measure what is the actual impact of KOBJ-tized pmap we have in PowerPC; with an often-called interface like pmap it might occur the penalty is not that little.. Rafal _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functionsOn Mon, Oct 12, 2009 at 5:07 PM, Rafal Jaworowski <raj@...> wrote:
> > On 2009-10-12, at 15:21, Nathan Whitehorn wrote: > >>>>> I was wondering whether a separate pmap module for ARMv6-7 would not >>>>> be the best approach. After all v6-7 should be considered an entirely >>>>> new architecture variation, and we would avoid the very likely #ifdefs >>>>> hell in case of a single pmap.c. >>>>> >>>>> >>>> Yeah, I think that would be the best solution. We could conditionally >>>> select the right pmap.c file based on the target CPU selected (just >>>> like we do for board variations for at91/marvell). >>>> >>>> >>> >>> pmap.c is a very large file that seems to change very often. I fear >>> having several versions is going to be difficult to maintain. Granted, >>> I haven't read the whole file line after line. Yet it seems to me its >>> content can be abstracted to rely on arch-specific functions that >>> would be found in cpufuncs instead of hardcoded macros. Is there >>> something fundamentally wrong with enhancing struct cpufunc in order >>> to let the portmeisters decide what the MMU and caching bits should >>> look like? This is a blocking issue for me, since it looks like the >>> omap has some problem with backward compatibility mode. Without fixing >>> up the TLBs in my initarm function, it doesn't work. >>> >>> Speaking of #ifdef hell, why not breaking cpufuncs.c into several >>> cpufuncs_<myarch>.c? That would be a good way to start that >>> reorganization Mark has been talking about in his email. >>> >> One thing that might be worth looking at while thinking about this is how >> this is done on PowerPC. We have run-time selectable PMAP modules using KOBJ >> to handle CPUs with different MMU designs, as well as a platform module >> scheme, again using KOBJ, to pick the appropriate PMAP for the board as well >> as determine the physical memory layout and such things. One of the nice >> things about the approach is that it is easy to subclass if you have a new, >> marginally different, design, and it avoids #ifdef hell as well as letting >> you build a GENERIC kernel with support for multiple MMU designs and board >> types (the last less of a concern on ARM, though). > > What always concerned me was the performance cost this imposes, and it would > be a really useful exercise to measure what is the actual impact of > KOBJ-tized pmap we have in PowerPC; with an often-called interface like pmap > it might occur the penalty is not that little.. > > Rafal > > Good point. Using KOBJs this way is really cool, but the overhead is going to be a concern if it is used by an application that allocates memory very often. This is not the case of most embedded appliances I worked with, still one should not assume anything about the userland at kernel level. As a result, extending the struct cpu_functions is not a good thing either, for the same reason. The compiler can not inline a call through a function pointer. In which case, why not create a bunch of headers files with the pattern cpufunc_myarch.h, in which all functions would be declared inline? Something like: static inline l2_l_entry(vm_addr_t pa, int prot, int cache); static inline l2_s_entry(vm_addr_t pa, int prot, int cache); ... which would then be included by pmap.c and friends. One problem is that such a change affects all platforms at the same time, and therefore requires all portmeisters to implement the functions that are needed. That should not be too difficult, though, because so far it was the same macros that were used by all platforms. Another problem is that it requires some build script magic to make sure the correct header is included depending on the arch. I wonder if this is easy? Guillaume _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functionsRafal Jaworowski wrote:
> > On 2009-10-12, at 15:21, Nathan Whitehorn wrote: > >>>>> I was wondering whether a separate pmap module for ARMv6-7 would not >>>>> be the best approach. After all v6-7 should be considered an entirely >>>>> new architecture variation, and we would avoid the very likely >>>>> #ifdefs >>>>> hell in case of a single pmap.c. >>>>> >>>>> >>>> Yeah, I think that would be the best solution. We could conditionally >>>> select the right pmap.c file based on the target CPU selected (just >>>> like we do for board variations for at91/marvell). >>>> >>>> >>> >>> pmap.c is a very large file that seems to change very often. I fear >>> having several versions is going to be difficult to maintain. Granted, >>> I haven't read the whole file line after line. Yet it seems to me its >>> content can be abstracted to rely on arch-specific functions that >>> would be found in cpufuncs instead of hardcoded macros. Is there >>> something fundamentally wrong with enhancing struct cpufunc in order >>> to let the portmeisters decide what the MMU and caching bits should >>> look like? This is a blocking issue for me, since it looks like the >>> omap has some problem with backward compatibility mode. Without fixing >>> up the TLBs in my initarm function, it doesn't work. >>> >>> Speaking of #ifdef hell, why not breaking cpufuncs.c into several >>> cpufuncs_<myarch>.c? That would be a good way to start that >>> reorganization Mark has been talking about in his email. >>> >> One thing that might be worth looking at while thinking about this is >> how this is done on PowerPC. We have run-time selectable PMAP modules >> using KOBJ to handle CPUs with different MMU designs, as well as a >> platform module scheme, again using KOBJ, to pick the appropriate >> PMAP for the board as well as determine the physical memory layout >> and such things. One of the nice things about the approach is that it >> is easy to subclass if you have a new, marginally different, design, >> and it avoids #ifdef hell as well as letting you build a GENERIC >> kernel with support for multiple MMU designs and board types (the >> last less of a concern on ARM, though). > > What always concerned me was the performance cost this imposes, and it > would be a really useful exercise to measure what is the actual impact > of KOBJ-tized pmap we have in PowerPC; with an often-called interface > like pmap it might occur the penalty is not that little.. than a standard function pointer call. There's a 9-year-old note in the commit log for sys/sys/kobj.h that it takes about 30% longer to call a function that does nothing via KOBJ versus a direct call on a 300 MHz P2 (a 10 ns time difference). Given that and that pmap methods do, in fact, do things besides get called and immediately return, I suspect non-KOBJ related execution time will dwarf any time loss from the indirection. I'll try to repeat the measurement in the next few days, however, since this is important to know. -Nathan _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functions> As a result, extending the struct cpu_functions is not a good thing > either, for the same reason. The compiler can not inline a call > through a function pointer. > > In which case, why not create a bunch of headers files with the > pattern cpufunc_myarch.h, in which all functions would be declared > inline? Something like: > > static inline l2_l_entry(vm_addr_t pa, int prot, int cache); > static inline l2_s_entry(vm_addr_t pa, int prot, int cache); > ... > which would then be included by pmap.c and friends. I think they need to be regular function calls because assembly routines call the per-cpu functions. A few simple macros would save the branch to NOP functions. > One problem is that such a change affects all platforms at the same > time, and therefore requires all portmeisters to implement the > functions that are needed. That should not be too difficult, though, > because so far it was the same macros that were used by all platforms. > Another problem is that it requires some build script magic to make > sure the correct header is included depending on the arch. I wonder if > this is easy? --Mark Tinguely _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functionsOn Mon, Oct 12, 2009 at 11:29 PM, Mark Tinguely <tinguely@...> wrote:
> >> As a result, extending the struct cpu_functions is not a good thing >> either, for the same reason. The compiler can not inline a call >> through a function pointer. >> >> In which case, why not create a bunch of headers files with the >> pattern cpufunc_myarch.h, in which all functions would be declared >> inline? Something like: >> >> static inline l2_l_entry(vm_addr_t pa, int prot, int cache); >> static inline l2_s_entry(vm_addr_t pa, int prot, int cache); >> ... >> which would then be included by pmap.c and friends. > > I think they need to be regular function calls because assembly routines > call the per-cpu functions. A few simple macros would save the branch to NOP > functions. > I'm not sure what you mean by that: would macros be ok, in your opinion? I am a bit puzzled because I see a contradiction with the previous sentence that requires the functions to be callable from the assembly code. Obviously I am misinterpreting, so would you mind clarifying, please? I think it is important to notice that even though cache management relies a lot on assembly function, I haven't found any page table management done in assembly past locore.S. I think using macros for page table management functions can be done. For cache management, however, I agree that having different pmap.c files is probably the way to go. In both cases, I am still curious to see what Nathan will come up with. I took a more thorough look at pmap, and there is indeed lots of machine-specific code, especially at the beginning. And when it comes to cpufunc, it's all about #ifdefs. Since I'm still working on the cleanup for the beagleboard, I will declare cpufuncs in an armv6-specific file. Let's call it cpufunc_armv6.c. I am struggling with another MMU problem at the moment, but I'll try to come up asap with a patch for pmap.c. It will replace hardcoded values with machine-defined macros, for reference. Guillaume _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functions> >> =A0either, for the same reason. The compiler can not inline a call
> >> =A0through a function pointer. > >> > >> =A0In which case, why not create a bunch of headers files with the > >> =A0pattern cpufunc_myarch.h, in which all functions would be declared > >> =A0inline? Something like: > >> > >> =A0static inline l2_l_entry(vm_addr_t pa, int prot, int cache); > >> =A0static inline l2_s_entry(vm_addr_t pa, int prot, int cache); > >> =A0... > >> =A0which would then be included by pmap.c and friends. > > > > I think they need to be regular function calls because assembly routines > > call the per-cpu functions. A few simple macros would save the branch to = > NOP > > functions. > > > > I'm not sure what you mean by that: would macros be ok, in your > opinion? I am a bit puzzled because I see a contradiction with the > previous sentence that requires the functions to be callable from the > assembly code. Obviously I am misinterpreting, so would you mind > clarifying, please? > > I think it is important to notice that even though cache management > relies a lot on assembly function, I haven't found any page table > management done in assembly past locore.S. I think using macros for > page table management functions can be done. For cache management, > however, I agree that having different pmap.c files is probably the > way to go. In both cases, I am still curious to see what Nathan will > come up with. You are correct, the page tables routines are pmap.c oriented. I extended clean up thought to all the cpu specific functions. There are cpu specific functions that are NOPs that we branch to and back again. I was just throwing out a global re-organization thought. > I took a more thorough look at pmap, and there is indeed lots of > machine-specific code, especially at the beginning. And when it comes > to cpufunc, it's all about #ifdefs. Since I'm still working on the > cleanup for the beagleboard, I will declare cpufuncs in an > armv6-specific file. Let's call it cpufunc_armv6.c. I am struggling > with another MMU problem at the moment, but I'll try to come up asap > with a patch for pmap.c. It will replace hardcoded values with > machine-defined macros, for reference. I think you are running that processor in v5 mode. There is still some individuals looking at a cache problem with recent code. I still believe, we need to add the PVF_REF flag when adding the new unmanaged (PVF_UNMAN) pv_entry, so pmap_fix_cache() will clean write back the cache and remove the tlb. That and the changes to remove dangling allocations. --Mark. _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functionsNathan Whitehorn wrote:
> Rafal Jaworowski wrote: >> >> On 2009-10-12, at 15:21, Nathan Whitehorn wrote: >> >>>>>> I was wondering whether a separate pmap module for ARMv6-7 would not >>>>>> be the best approach. After all v6-7 should be considered an >>>>>> entirely >>>>>> new architecture variation, and we would avoid the very likely >>>>>> #ifdefs >>>>>> hell in case of a single pmap.c. >>>>>> >>>>>> >>>>> Yeah, I think that would be the best solution. We could >>>>> conditionally >>>>> select the right pmap.c file based on the target CPU selected (just >>>>> like we do for board variations for at91/marvell). >>>>> >>>>> >>>> >>>> pmap.c is a very large file that seems to change very often. I fear >>>> having several versions is going to be difficult to maintain. Granted, >>>> I haven't read the whole file line after line. Yet it seems to me its >>>> content can be abstracted to rely on arch-specific functions that >>>> would be found in cpufuncs instead of hardcoded macros. Is there >>>> something fundamentally wrong with enhancing struct cpufunc in order >>>> to let the portmeisters decide what the MMU and caching bits should >>>> look like? This is a blocking issue for me, since it looks like the >>>> omap has some problem with backward compatibility mode. Without fixing >>>> up the TLBs in my initarm function, it doesn't work. >>>> >>>> Speaking of #ifdef hell, why not breaking cpufuncs.c into several >>>> cpufuncs_<myarch>.c? That would be a good way to start that >>>> reorganization Mark has been talking about in his email. >>>> >>> One thing that might be worth looking at while thinking about this >>> is how this is done on PowerPC. We have run-time selectable PMAP >>> modules using KOBJ to handle CPUs with different MMU designs, as >>> well as a platform module scheme, again using KOBJ, to pick the >>> appropriate PMAP for the board as well as determine the physical >>> memory layout and such things. One of the nice things about the >>> approach is that it is easy to subclass if you have a new, >>> marginally different, design, and it avoids #ifdef hell as well as >>> letting you build a GENERIC kernel with support for multiple MMU >>> designs and board types (the last less of a concern on ARM, though). >> >> What always concerned me was the performance cost this imposes, and >> it would be a really useful exercise to measure what is the actual >> impact of KOBJ-tized pmap we have in PowerPC; with an often-called >> interface like pmap it might occur the penalty is not that little.. > Using the KOBJ cache means that it is only marginally more expensive > than a standard function pointer call. There's a 9-year-old note in > the commit log for sys/sys/kobj.h that it takes about 30% longer to > call a function that does nothing via KOBJ versus a direct call on a > 300 MHz P2 (a 10 ns time difference). Given that and that pmap methods > do, in fact, do things besides get called and immediately return, I > suspect non-KOBJ related execution time will dwarf any time loss from > the indirection. I'll try to repeat the measurement in the next few > days, however, since this is important to know. > -Nathan tests, each repeated 1 million times. "Load and store" involves incrementing a volatile int from 0 to 1e6 inline. "Direct calls" involves a branch to a function that returns 0 and does nothing else. "Function ptr" calls the same function via a pointer stored in a register, and "KOBJ calls" calls it via KOBJ. Here are the results (errors are +/- 0.5 ns for the function call measurements due to compiler optimization jitter, and 0 for load and store, since that takes a deterministic number of clock cycles): 32-bit kernel: Load and store: 26.1 ns Direct calls: 7.2 ns Function ptr: 8.4 ns KOBJ calls: 17.8 ns 64-bit kernel: Load and store: 9.2 ns Direct calls: 6.1 ns Function ptr: 8.3 ns KOBJ calls: 40.5 ns ABI changes make a large difference, as you can see. The cost of calling via KOBJ is non-negligible, but small, especially compared to the cost of doing anything involving memory. I don't know how this changes with ARM calling conventions. -Nathan _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functionsOn 2009-10-18, at 17:49, Nathan Whitehorn wrote: >>>> One thing that might be worth looking at while thinking about >>>> this is how this is done on PowerPC. We have run-time selectable >>>> PMAP modules using KOBJ to handle CPUs with different MMU >>>> designs, as well as a platform module scheme, again using KOBJ, >>>> to pick the appropriate PMAP for the board as well as determine >>>> the physical memory layout and such things. One of the nice >>>> things about the approach is that it is easy to subclass if you >>>> have a new, marginally different, design, and it avoids #ifdef >>>> hell as well as letting you build a GENERIC kernel with support >>>> for multiple MMU designs and board types (the last less of a >>>> concern on ARM, though). >>> >>> What always concerned me was the performance cost this imposes, >>> and it would be a really useful exercise to measure what is the >>> actual impact of KOBJ-tized pmap we have in PowerPC; with an often- >>> called interface like pmap it might occur the penalty is not that >>> little.. >> Using the KOBJ cache means that it is only marginally more >> expensive than a standard function pointer call. There's a 9-year- >> old note in the commit log for sys/sys/kobj.h that it takes about >> 30% longer to call a function that does nothing via KOBJ versus a >> direct call on a 300 MHz P2 (a 10 ns time difference). Given that >> and that pmap methods do, in fact, do things besides get called and >> immediately return, I suspect non-KOBJ related execution time will >> dwarf any time loss from the indirection. I'll try to repeat the >> measurement in the next few days, however, since this is important >> to know. >> -Nathan > I just did the measurements on a 1.8 GHz PowerPC G5. There were four > tests, each repeated 1 million times. "Load and store" involves > incrementing a volatile int from 0 to 1e6 inline. "Direct calls" > involves a branch to a function that returns 0 and does nothing > else. "Function ptr" calls the same function via a pointer stored in > a register, and "KOBJ calls" calls it via KOBJ. Here are the results > (errors are +/- 0.5 ns for the function call measurements due to > compiler optimization jitter, and 0 for load and store, since that > takes a deterministic number of clock cycles): > > 32-bit kernel: > Load and store: 26.1 ns > Direct calls: 7.2 ns > Function ptr: 8.4 ns > KOBJ calls: 17.8 ns > > 64-bit kernel: > Load and store: 9.2 ns > Direct calls: 6.1 ns > Function ptr: 8.3 ns > KOBJ calls: 40.5 ns > > ABI changes make a large difference, as you can see. The cost of > calling via KOBJ is non-negligible, but small, especially compared > to the cost of doing anything involving memory. I don't know how > this changes with ARM calling conventions. Very interesting, thanks! Could you elaborate on the testing details and share the diagnostic code so we could repeat this with other CPU variations like Book-E PowerPC, or ARM? Rafal _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
|
|
Re: Adding members to struct cpu_functionsIn message: <05B19969-B238-4E3A-8326-624067F0362B@...>
Rafal Jaworowski <raj@...> writes: : On 2009-10-18, at 17:49, Nathan Whitehorn wrote: [[ trimmed ]] : > I just did the measurements on a 1.8 GHz PowerPC G5. There were four : > tests, each repeated 1 million times. "Load and store" involves : > incrementing a volatile int from 0 to 1e6 inline. "Direct calls" : > involves a branch to a function that returns 0 and does nothing : > else. "Function ptr" calls the same function via a pointer stored in : > a register, and "KOBJ calls" calls it via KOBJ. Here are the results : > (errors are +/- 0.5 ns for the function call measurements due to : > compiler optimization jitter, and 0 for load and store, since that : > takes a deterministic number of clock cycles): : > : > 32-bit kernel: : > Load and store: 26.1 ns : > Direct calls: 7.2 ns : > Function ptr: 8.4 ns : > KOBJ calls: 17.8 ns : > : > 64-bit kernel: : > Load and store: 9.2 ns : > Direct calls: 6.1 ns : > Function ptr: 8.3 ns : > KOBJ calls: 40.5 ns : > : > ABI changes make a large difference, as you can see. The cost of : > calling via KOBJ is non-negligible, but small, especially compared : > to the cost of doing anything involving memory. I don't know how : > this changes with ARM calling conventions. : : Very interesting, thanks! Could you elaborate on the testing details : and share the diagnostic code so we could repeat this with other CPU : variations like Book-E PowerPC, or ARM? I'd love to see this on MIPS too... KOBJ is a big win for device configuration, where one memory I/O can take 60 times these call numbers... Warner _______________________________________________ freebsd-arm@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-arm To unsubscribe, send any mail to "freebsd-arm-unsubscribe@..." |
| Free embeddable forum powered by Nabble | Forum Help |