|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
NUMA?Hi,
As even Intel's new CPUs have integrated memory controllers and thus become NUMA, I'm interested in what is, in theory (I'm not proposing to do it, I'm just curious), necessary to change in an OS to support NUMA. My guess is: 1) node topology detection - something similar to what ULE does but also recording which memory ranges are "close" to which CPU and the "distance" between nodes/CPUs 2) on new image load (exec), pick a node for it, among "least used" nodes and record the choice per-proc; on fork, keep the new process on the same node 3) schedule threads on a CPU from the proc's node if at all possible (e.g, when a 6-core CPU is still 1 node), then on a "near" node from a list of distances sorted in order of cost 4) allocate new pages for a proc from its node's memory range(s) if at all possible. Is this all? On the other hand, did someone do a study of performance increase for todays "consumer" NUMA systems (e.g. 2-4 sockets/nodes x86/x64 systems) - is it worth it? _______________________________________________ freebsd-smp@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-smp To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..." |
|
|
Re: NUMA?Ivan Voras wrote:
> Hi, I did the AMD course a few weeks ago so I'm also very interested in this.. > > As even Intel's new CPUs have integrated memory controllers and thus > become NUMA, I'm interested in what is, in theory (I'm not proposing to > do it, I'm just curious), necessary to change in an OS to support NUMA. > My guess is: > > 1) node topology detection - something similar to what ULE does but also > recording which memory ranges are "close" to which CPU and the > "distance" between nodes/CPUs at a minimum, this is needed before anything else can really work. > 2) on new image load (exec), pick a node for it, among "least used" > nodes and record the choice per-proc; on fork, keep the new process on > the same node In some cases it may be worth having multiple copies of teh read-only text segments. For example, it may eventually be worth having a /bin/sh text segment in each CPU's memory space. > 3) schedule threads on a CPU from the proc's node if at all possible > (e.g, when a 6-core CPU is still 1 node), then on a "near" node from a > list of distances sorted in order of cost this is where it really starts getting hairy.. when do you migrate a process? and what if there are as many threads runnable as processors? > 4) allocate new pages for a proc from its node's memory range(s) if at > all possible. > > Is this all? There are other interesting effects too.. assigning network interrupts to processors that have good access to the hardware AND the destination if you can.. > > On the other hand, did someone do a study of performance increase for > todays "consumer" NUMA systems (e.g. 2-4 sockets/nodes x86/x64 systems) > - is it worth it? caches hide a multitude of sins.. > > _______________________________________________ > freebsd-smp@... mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-smp > To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..." _______________________________________________ freebsd-smp@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-smp To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..." |
|
|
Re: NUMA?On Thu, Nov 13, 2008 at 01:35:28AM +0100, Ivan Voras wrote:
> Hi, > > As even Intel's new CPUs have integrated memory controllers and thus > become NUMA, I'm interested in what is, in theory (I'm not proposing to > do it, I'm just curious), necessary to change in an OS to support NUMA. > My guess is: > > 1) node topology detection - something similar to what ULE does but also > recording which memory ranges are "close" to which CPU and the > "distance" between nodes/CPUs > 2) on new image load (exec), pick a node for it, among "least used" > nodes and record the choice per-proc; on fork, keep the new process on > the same node > 3) schedule threads on a CPU from the proc's node if at all possible > (e.g, when a 6-core CPU is still 1 node), then on a "near" node from a > list of distances sorted in order of cost > 4) allocate new pages for a proc from its node's memory range(s) if at > all possible. One good source of information on this topic is IBM's AIX on the Power 4 - 6 processors. There is the concept of distant vs. close memory and processors as well as what is referred to as memory affinity. Marc -- Marc Wiz marc@... Yes, that really is my last name. _______________________________________________ freebsd-smp@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-smp To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..." |
|
|
Re: NUMA?Julian Elischer wrote:
> There are other interesting effects too.. > > assigning network interrupts to processors that have good access to the > hardware AND the destination if you can.. UMA also seems to be sensitive to topology. While at that, how do you (if at all) deal with kernel memory allocations with respect to topology? Things that have their own thread or process is easy but AFAIK there is a lot of "thread-agnostic" code? |
| Free embeddable forum powered by Nabble | Forum Help |