Xen 3.4.1 NUMA support

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Xen 3.4.1 NUMA support

by Papagiannis Anastasios :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

does the last version of Xen(3.4.1) support NUMA machines? Is there a .pdf
or a link that can give me some more details about that? I work on a
project for xen performace in numa machines. And in xen 3.3.0 this
performance isn't good. Have something changed in last version?

Thanks in advance,
Papagiannis Anastasios


_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by Keir Fraser-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Add Xen boot parameter 'numa=on' to enable NUMA detection. Then it's up to
you to, for example, pin domains to specific nodes, using the 'cpus=...'
option in the domain config file. See /etc/xen/xmexample1 for an example of
its usage.

 -- Keir

On 04/11/2009 12:02, "Papagiannis Anastasios" <apapag@...> wrote:

> Hello,
>
> does the last version of Xen(3.4.1) support NUMA machines? Is there a .pdf
> or a link that can give me some more details about that? I work on a
> project for xen performace in numa machines. And in xen 3.3.0 this
> performance isn't good. Have something changed in last version?
>
> Thanks in advance,
> Papagiannis Anastasios
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@...
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

RE: Xen 3.4.1 NUMA support

by Dan Magenheimer-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

VMware has the notion of a "cell" where VMs can be
scheduled only within a cell, not across cells.
Cell boundaries are determined by VMware by
default, though certains settings can override them.

An interesting project might be to implement
"numa=cell" for Xen.... or maybe something similar
is already in George Dunlap's scheduler plans?

> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@...]
> Sent: Wednesday, November 04, 2009 5:33 AM
> To: Papagiannis Anastasios; xen-devel@...
> Subject: Re: [Xen-devel] Xen 3.4.1 NUMA support
>
>
> Add Xen boot parameter 'numa=on' to enable NUMA detection.
> Then it's up to
> you to, for example, pin domains to specific nodes, using the
> 'cpus=...'
> option in the domain config file. See /etc/xen/xmexample1 for
> an example of
> its usage.
>
>  -- Keir
>
> On 04/11/2009 12:02, "Papagiannis Anastasios"
> <apapag@...> wrote:
>
> > Hello,
> >
> > does the last version of Xen(3.4.1) support NUMA machines?
> Is there a .pdf
> > or a link that can give me some more details about that? I work on a
> > project for xen performace in numa machines. And in xen 3.3.0 this
> > performance isn't good. Have something changed in last version?
> >
> > Thanks in advance,
> > Papagiannis Anastasios
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@...
> > http://lists.xensource.com/xen-devel
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@...
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by George Dunlap-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I haven't had time to look at NUMA stuff at all.  I probably will look
at it eventually, if no one else does, but I'd be happy if someone else
could pursue it.

 -George

Dan Magenheimer wrote:

> VMware has the notion of a "cell" where VMs can be
> scheduled only within a cell, not across cells.
> Cell boundaries are determined by VMware by
> default, though certains settings can override them.
>
> An interesting project might be to implement
> "numa=cell" for Xen.... or maybe something similar
> is already in George Dunlap's scheduler plans?
>
>  
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.fraser@...]
>> Sent: Wednesday, November 04, 2009 5:33 AM
>> To: Papagiannis Anastasios; xen-devel@...
>> Subject: Re: [Xen-devel] Xen 3.4.1 NUMA support
>>
>>
>> Add Xen boot parameter 'numa=on' to enable NUMA detection.
>> Then it's up to
>> you to, for example, pin domains to specific nodes, using the
>> 'cpus=...'
>> option in the domain config file. See /etc/xen/xmexample1 for
>> an example of
>> its usage.
>>
>>  -- Keir
>>
>> On 04/11/2009 12:02, "Papagiannis Anastasios"
>> <apapag@...> wrote:
>>
>>    
>>> Hello,
>>>
>>> does the last version of Xen(3.4.1) support NUMA machines?
>>>      
>> Is there a .pdf
>>    
>>> or a link that can give me some more details about that? I work on a
>>> project for xen performace in numa machines. And in xen 3.3.0 this
>>> performance isn't good. Have something changed in last version?
>>>
>>> Thanks in advance,
>>> Papagiannis Anastasios
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@...
>>> http://lists.xensource.com/xen-devel
>>>      
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@...
>> http://lists.xensource.com/xen-devel
>>
>>    


_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by dulloor :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

George,

What's the current scope and status of your scheduler work ? Is it
going to look similar to the Linux scheduler (with scheduling domains,
et al). In that case, topology is already accounted for, to a large
extent. It would be good to know so that I can work on something that
doesn't overlap.

-dulloor

On Mon, Nov 9, 2009 at 6:33 AM, George Dunlap
<george.dunlap@...> wrote:

> I haven't had time to look at NUMA stuff at all.  I probably will look at it
> eventually, if no one else does, but I'd be happy if someone else could
> pursue it.
>
> -George
>
> Dan Magenheimer wrote:
>>
>> VMware has the notion of a "cell" where VMs can be
>> scheduled only within a cell, not across cells.
>> Cell boundaries are determined by VMware by
>> default, though certains settings can override them.
>>
>> An interesting project might be to implement
>> "numa=cell" for Xen.... or maybe something similar
>> is already in George Dunlap's scheduler plans?
>>
>>
>>>
>>> -----Original Message-----
>>> From: Keir Fraser [mailto:keir.fraser@...]
>>> Sent: Wednesday, November 04, 2009 5:33 AM
>>> To: Papagiannis Anastasios; xen-devel@...
>>> Subject: Re: [Xen-devel] Xen 3.4.1 NUMA support
>>>
>>>
>>> Add Xen boot parameter 'numa=on' to enable NUMA detection. Then it's up
>>> to
>>> you to, for example, pin domains to specific nodes, using the 'cpus=...'
>>> option in the domain config file. See /etc/xen/xmexample1 for an example
>>> of
>>> its usage.
>>>
>>>  -- Keir
>>>
>>> On 04/11/2009 12:02, "Papagiannis Anastasios" <apapag@...>
>>> wrote:
>>>
>>>
>>>>
>>>> Hello,
>>>>
>>>> does the last version of Xen(3.4.1) support NUMA machines?
>>>
>>> Is there a .pdf
>>>
>>>>
>>>> or a link that can give me some more details about that? I work on a
>>>> project for xen performace in numa machines. And in xen 3.3.0 this
>>>> performance isn't good. Have something changed in last version?
>>>>
>>>> Thanks in advance,
>>>> Papagiannis Anastasios
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@...
>>>> http://lists.xensource.com/xen-devel
>>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@...
>>> http://lists.xensource.com/xen-devel
>>>
>>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@...
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by Juergen Gross-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Cpupools? :-)

NUMA was a topic I wanted to look at as soon as cpupools are officially
accepted. Keir wanted to propose a way to get rid of the function
continue_hypercall_on_cpu() which was causing most of the stuff leading
to the objection of cpupools.
I guess Keir had some higher priority jobs. :-)
So I will try a new patch for cpupools without continue_hypercall_on_cpu()
and perhaps with NUMA support.
George, would this be okay for you? I think your scheduler still will have
problems with domain weights as long as domains are restricted to some
processors, right?

Juergen

George Dunlap wrote:

> I haven't had time to look at NUMA stuff at all.  I probably will look
> at it eventually, if no one else does, but I'd be happy if someone else
> could pursue it.
>
> -George
>
> Dan Magenheimer wrote:
>> VMware has the notion of a "cell" where VMs can be
>> scheduled only within a cell, not across cells.
>> Cell boundaries are determined by VMware by
>> default, though certains settings can override them.
>>
>> An interesting project might be to implement
>> "numa=cell" for Xen.... or maybe something similar
>> is already in George Dunlap's scheduler plans?
>>
>>  
>>> -----Original Message-----
>>> From: Keir Fraser [mailto:keir.fraser@...]
>>> Sent: Wednesday, November 04, 2009 5:33 AM
>>> To: Papagiannis Anastasios; xen-devel@...
>>> Subject: Re: [Xen-devel] Xen 3.4.1 NUMA support
>>>
>>>
>>> Add Xen boot parameter 'numa=on' to enable NUMA detection. Then it's
>>> up to
>>> you to, for example, pin domains to specific nodes, using the 'cpus=...'
>>> option in the domain config file. See /etc/xen/xmexample1 for an
>>> example of
>>> its usage.
>>>
>>>  -- Keir
>>>
>>> On 04/11/2009 12:02, "Papagiannis Anastasios" <apapag@...>
>>> wrote:
>>>
>>>    
>>>> Hello,
>>>>
>>>> does the last version of Xen(3.4.1) support NUMA machines?      
>>> Is there a .pdf
>>>    
>>>> or a link that can give me some more details about that? I work on a
>>>> project for xen performace in numa machines. And in xen 3.3.0 this
>>>> performance isn't good. Have something changed in last version?
>>>>
>>>> Thanks in advance,
>>>> Papagiannis Anastasios
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@...
>>>> http://lists.xensource.com/xen-devel
>>>>      
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@...
>>> http://lists.xensource.com/xen-devel
>>>
>>>    
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@...
> http://lists.xensource.com/xen-devel
>
>


--
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@...
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by George Dunlap-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 9, 2009 at 11:44 AM, Juergen Gross
<juergen.gross@...> wrote:
> George, would this be okay for you? I think your scheduler still will have
> problems with domain weights as long as domains are restricted to some
> processors, right?

Hmm, this may be a point of discussion at some point.

My plan was actually to have one runqueue per L2 processor cache.
Thus as many as 4 cores (and possibly 8 hyperthreads) would be sharing
the same runqueue; doing CPU pinning within the same runqueue would be
problematic.

I was planning on having credits work mainly within one runqueue, and
then do load balancing between runqueues.  In that case pinning to a
specific runqueue shouldn't cause a problem, because credits of one
runqueue wouldn't affect credtis of another one.

However, I haven't implemented or tested this idea yet; it's possible
that having credits kept distinct and doing load balancing between
runqueues will cause unacceptable levels of unfairness.  I expect it
to be fine (esp since Linux's scheduler does this kind of load
balancing, but doesn't share runqueues between logical processors),
but without implementation and testing I can't say for sure.

Thoughts are welcome at this point, but it will probably be better to
have a real discussion once I've posted some patches.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by George Dunlap-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 9, 2009 at 11:39 AM, Dulloor <dulloor@...> wrote:
> What's the current scope and status of your scheduler work ? Is it
> going to look similar to the Linux scheduler (with scheduling domains,
> et al). In that case, topology is already accounted for, to a large
> extent. It would be good to know so that I can work on something that
> doesn't overlap.

My plan was to do something similar to Linux, but with this
difference: Instead of having one runqueue per logical processor (as
both Xen and Linux currently do), and having "domains" all the way up
(as Linux currently does), I had planned on having one runqueue per L2
processor cache.  The main reason to avoid migration is to preserve a
warm cache; but since L1's are replaced so quickly, there should be
little impact to a VM migrating between different threads and cores
which share the same L2.

Above the L2s I was planning on having an idea similar to the Linux
"domains" (although obviously it would need a different name to avoid
confusion), and doing explicit load-balancing between them.  But as I
have not had a chance to test this kind of load balancing yet, the
plan may change somewhate before then.

Problems to solve wrt NUMA, as I understand it, are to balance the
performance cost of sharing a busy local CPU, vs the performance cost
of non-local memory accesses.  This would involve adding the NUMA
logic to the load balancing algorithm.  Which I guess would depend in
part on having a load balancing algorithm to begin with. :-)

Once I have the basic credit patches in working order, would you be
interested in working on the load-balancing between runqueues?  I can
then work on further testing of the credit algorithm.  My ultimate
goal would be to have a basic regression test that people could use to
measure how their changes to the scheduler affect a wide variety of
workloads.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by Keir Fraser-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 09/11/2009 11:44, "Juergen Gross" <juergen.gross@...> wrote:

> NUMA was a topic I wanted to look at as soon as cpupools are officially
> accepted. Keir wanted to propose a way to get rid of the function
> continue_hypercall_on_cpu() which was causing most of the stuff leading
> to the objection of cpupools.
> I guess Keir had some higher priority jobs. :-)

Well, I forgot about it. I think the plan was to perhaps keep something like
continue_hypercall_on_cpu(), but not need to actually run the vcpu itself
'over there' but instead schedule a tasklet or somesuch, and sleep on its
completion. That would get rid of the skanky affinity hacks you had to do to
support continue_hypercall_on_cpu(). I'll have a look back at what we
discussed.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by dulloor :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sure ! Let know when you have the patches ready. Also, that might be a
good time to see if runq-per-l2 works better.

-dulloor

On Mon, Nov 9, 2009 at 7:29 AM, George Dunlap
<George.Dunlap@...> wrote:

> On Mon, Nov 9, 2009 at 11:39 AM, Dulloor <dulloor@...> wrote:
>> What's the current scope and status of your scheduler work ? Is it
>> going to look similar to the Linux scheduler (with scheduling domains,
>> et al). In that case, topology is already accounted for, to a large
>> extent. It would be good to know so that I can work on something that
>> doesn't overlap.
>
> My plan was to do something similar to Linux, but with this
> difference: Instead of having one runqueue per logical processor (as
> both Xen and Linux currently do), and having "domains" all the way up
> (as Linux currently does), I had planned on having one runqueue per L2
> processor cache.  The main reason to avoid migration is to preserve a
> warm cache; but since L1's are replaced so quickly, there should be
> little impact to a VM migrating between different threads and cores
> which share the same L2.
>
> Above the L2s I was planning on having an idea similar to the Linux
> "domains" (although obviously it would need a different name to avoid
> confusion), and doing explicit load-balancing between them.  But as I
> have not had a chance to test this kind of load balancing yet, the
> plan may change somewhate before then.
>
> Problems to solve wrt NUMA, as I understand it, are to balance the
> performance cost of sharing a busy local CPU, vs the performance cost
> of non-local memory accesses.  This would involve adding the NUMA
> logic to the load balancing algorithm.  Which I guess would depend in
> part on having a load balancing algorithm to begin with. :-)
>
> Once I have the basic credit patches in working order, would you be
> interested in working on the load-balancing between runqueues?  I can
> then work on further testing of the credit algorithm.  My ultimate
> goal would be to have a basic regression test that people could use to
> measure how their changes to the scheduler affect a wide variety of
> workloads.
>
>  -George
>

_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by Andre Przywara :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dan Magenheimer wrote:
>> Add Xen boot parameter 'numa=on' to enable NUMA detection.
>> Then it's up to you to, for example, pin domains to specific nodes,
>> using the 'cpus=...' option in the domain config file. See
>> /etc/xen/xmexample1 for an example of its usage.
> VMware has the notion of a "cell" where VMs can be
> scheduled only within a cell, not across cells.
> Cell boundaries are determined by VMware by
> default, though certains settings can override them.
Well, If I got this right, then you are describing the current behaviour
of Xen. It has a similar feature for some time now (since 3.3, I guess).
When you launch a domain on a numa=on machine, it will pick the least
busiest node (which can hold the requested memory) and restrict the
domain to that node (by only allowing CPUs of that node).
This is in XendDomainInfo.py (c/s 17131, 17247, 17709)
Looks like this one:
(kernel xen.gz numa=on dom0_mem=6144M dom0_max_vcpus=6 dom0_vcpus_pin)
# xm create opensuse.hvm
# xm create opensuse2.hvm
# xm vcpu-list
Name                                ID  VCPU   CPU State   Time(s) CPU
Affinity
001-LTP                              1     0     6   -b-      17.8 6-11
001-LTP                              1     1     7   -b-       6.3 6-11
002-LTP                              2     0    12   -b-      19.0 12-17
002-LTP                              2     1    16   -b-       1.6 12-17
002-LTP                              2     2    17   -b-       1.7 12-17
002-LTP                              2     3    14   -b-       1.6 12-17
002-LTP                              2     4    16   -b-       1.6 12-17
002-LTP                              2     5    15   -b-       1.5 12-17
002-LTP                              2     6    12   -b-       1.3 12-17
002-LTP                              2     7    13   -b-       1.8 12-17
Domain-0                             0     0     0   -b-      12.6 0
Domain-0                             0     1     1   -b-       7.6 1
Domain-0                             0     2     2   -b-       8.0 2
Domain-0                             0     3     3   -b-      14.6 3
Domain-0                             0     4     4   r--       1.4 4
Domain-0                             0     5     5   -b-       0.9 5
# xm debug-keys U
(XEN) Domain 0 (total: 2097152):
(XEN)     Node 0: 2097152
(XEN)     Node 1: 0
(XEN)     Node 2: 0
(XEN)     Node 3: 0
(XEN)     Node 4: 0
(XEN)     Node 5: 0
(XEN)     Node 6: 0
(XEN)     Node 7: 0
(XEN) Domain 1 (total: 394219):
(XEN)     Node 0: 0
(XEN)     Node 1: 394219
(XEN)     Node 2: 0
(XEN)     Node 3: 0
(XEN)     Node 4: 0
(XEN)     Node 5: 0
(XEN)     Node 6: 0
(XEN)     Node 7: 0
(XEN) Domain 2 (total: 394219):
(XEN)     Node 0: 0
(XEN)     Node 1: 0
(XEN)     Node 2: 394219
(XEN)     Node 3: 0
(XEN)     Node 4: 0
(XEN)     Node 5: 0
(XEN)     Node 6: 0
(XEN)     Node 7: 0

Note that there were no cpus= lines in the config files, Xen did that
automatically.

Domains can be localhost-migrated to another node:
# xm migrate --node=4 1 localhost
The only issue is with domains larger than a node.
If someone has a useful use-case, I can start rebasing my old patches
for NUMA aware HVM domains to Xen unstable.

Regards,
Andre.

BTW: Shouldn't we set finally numa=on as the default value?

--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712


_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by George Dunlap-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Andre Przywara wrote:
> BTW: Shouldn't we set finally numa=on as the default value?
>  
Is there any data to support the idea that this helps significantly on
common systems?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by Jan Beulich :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>> Andre Przywara <andre.przywara@...> 09.11.09 16:02 >>>
>BTW: Shouldn't we set finally numa=on as the default value?

I'd say no, at least until the default confinement of a guest to a single
node gets fixed to properly deal with guests having more vCPU-s than
a node's worth of pCPU-s (i.e. I take it for granted that the benefits of
not overcommitting CPUs outweigh the drawbacks of cross-node memory
accesses at the very least for CPU-bound workloads).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by Andre Przywara :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

George Dunlap wrote:
> Andre Przywara wrote:
>> BTW: Shouldn't we set finally numa=on as the default value?
>>  
> Is there any data to support the idea that this helps significantly on
> common systems?
I don't have any numbers handy, but I will try if I can generate some.

Looking from a high level perspective it is a shame that it's not the
default: With numa=off the Xen domain loader will allocate physical
memory from some node (maybe even from several nodes) and will schedule
the guest on some other (even rapidly changing) nodes. According to
Murphy's law you will end up with _all_ the memory access of a guest to
be remote. But in fact a NUMA architecture is really beneficial for
virtualization: As there are close to zero cross domain memory accesses
(except for Dom0), each node is more or less self contained and each
guest can use the node's memory controller almost exclusively.
But this is all spoiled as most people don't know about Xen's NUMA
capabilities and don't set numa=on. Using this as a default would solve
this.

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 488-3567-12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

RE: Xen 3.4.1 NUMA support

by Ian Pratt-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> >>> Andre Przywara <andre.przywara@...> 09.11.09 16:02 >>>
> >BTW: Shouldn't we set finally numa=on as the default value?
>
> I'd say no, at least until the default confinement of a guest to a single
> node gets fixed to properly deal with guests having more vCPU-s than
> a node's worth of pCPU-s (i.e. I take it for granted that the benefits of
> not overcommitting CPUs outweigh the drawbacks of cross-node memory
> accesses at the very least for CPU-bound workloads).

What default confinement? I thought guests had an all-pCPUs affinity mask be default?

I suspect we will get benefits enabling NUMA even if all the guests have all-pCPUs affinity masks: all guests will have memory stripped across all nodes, which is likely better than allocating from one node and then the other. Obviously assigning VMs to node(s) and allocating memory accordingly is the best plan.

Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by dulloor :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I am not finding this. Can you please point to the code ?

numa=on/off is only for setting up numa in xen (similar to the linux
knob, but turned off by default). The allocation of memory from a
single node (that you observe) could be because of the way
alloc_heap_pages is implemented (trying to allocate from all the heaps
from a node, before trying the next one) - try looking at dump_numa
output. And, affinities are not set anywhere based on the node from
which allocation happens.

-dulloor

On Mon, Nov 9, 2009 at 5:51 PM, Andre Przywara <andre.przywara@...> wrote:

> George Dunlap wrote:
>>
>> Andre Przywara wrote:
>>>
>>> BTW: Shouldn't we set finally numa=on as the default value?
>>>
>>
>> Is there any data to support the idea that this helps significantly on
>> common systems?
>
> I don't have any numbers handy, but I will try if I can generate some.
>
> Looking from a high level perspective it is a shame that it's not the
> default: With numa=off the Xen domain loader will allocate physical memory
> from some node (maybe even from several nodes) and will schedule the guest
> on some other (even rapidly changing) nodes. According to Murphy's law you
> will end up with _all_ the memory access of a guest to be remote. But in
> fact a NUMA architecture is really beneficial for virtualization: As there
> are close to zero cross domain memory accesses (except for Dom0), each node
> is more or less self contained and each guest can use the node's memory
> controller almost exclusively.
> But this is all spoiled as most people don't know about Xen's NUMA
> capabilities and don't set numa=on. Using this as a default would solve
> this.
>
> Regards,
> Andre.
>
> --
> Andre Przywara
> AMD-Operating System Research Center (OSRC), Dresden, Germany
> Tel: +49 351 488-3567-12
> ----to satisfy European Law for business letters:
> Advanced Micro Devices GmbH
> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> Geschaeftsfuehrer: Jochen Polster; Thomas M. McCoy; Giuliano Meroni
> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> Registergericht Muenchen, HRB Nr. 43632
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@...
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by Andre Przywara :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dulloor wrote:
> I am not finding this. Can you please point to the code ?
tools/python/xen/xend/XendDomainInfo.py (around line 2600)
with the core code being:
-------------
       index = nodeload.index( min(nodeload) )
       cpumask = info['node_to_cpu'][index]
   for v in range(0, self.info['VCPUs_max']):
       xc.vcpu_setaffinity(self.domid, v, cpumask)
--------------
The code got introduced with c/s 17131 and later got refined with c/s
17247 and c/s 17709.
>
> numa=on/off is only for setting up numa in xen (similar to the linux
> knob, but turned off by default). The allocation of memory from a
> single node (that you observe) could be because of the way
> alloc_heap_pages is implemented (trying to allocate from all the heaps
> from a node, before trying the next one)
Yes, but if the domain is pinned before it allocated it's memory, then
the natural behavior of Xen is to take memory from this local node.

> - try looking at dump_numa
> output. And, affinities are not set anywhere based on the node from
> which allocation happens.
It is the other way round, first the domain is pinned, later the memory
is allocated (based on the node to which the currently scheduled CPU is
belonging to).

Regards,
Andre.

>
> -dulloor
>
> On Mon, Nov 9, 2009 at 5:51 PM, Andre Przywara <andre.przywara@...> wrote:
>> George Dunlap wrote:
>>> Andre Przywara wrote:
>>>> BTW: Shouldn't we set finally numa=on as the default value?
>>>>
>>> Is there any data to support the idea that this helps significantly on
>>> common systems?
>> I don't have any numbers handy, but I will try if I can generate some.
>>
>> Looking from a high level perspective it is a shame that it's not the
>> default: With numa=off the Xen domain loader will allocate physical memory
>> from some node (maybe even from several nodes) and will schedule the guest
>> on some other (even rapidly changing) nodes. According to Murphy's law you
>> will end up with _all_ the memory access of a guest to be remote. But in
>> fact a NUMA architecture is really beneficial for virtualization: As there
>> are close to zero cross domain memory accesses (except for Dom0), each node
>> is more or less self contained and each guest can use the node's memory
>> controller almost exclusively.
>> But this is all spoiled as most people don't know about Xen's NUMA
>> capabilities and don't set numa=on. Using this as a default would solve
>> this.
>>
>> Regards,
>> Andre.
>>
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448 3567 12
----to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd; Thomas M. McCoy; Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

RE: Xen 3.4.1 NUMA support

by Jan Beulich :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>> Ian Pratt <Ian.Pratt@...> 10.11.09 02:46 >>>
>> >>> Andre Przywara <andre.przywara@...> 09.11.09 16:02 >>>
>> >BTW: Shouldn't we set finally numa=on as the default value?
>>
>> I'd say no, at least until the default confinement of a guest to a single
>> node gets fixed to properly deal with guests having more vCPU-s than
>> a node's worth of pCPU-s (i.e. I take it for granted that the benefits of
>> not overcommitting CPUs outweigh the drawbacks of cross-node memory
>> accesses at the very least for CPU-bound workloads).
>
>What default confinement? I thought guests had an all-pCPUs affinity mask be default?

Not with numa=on (see also Andre's post to this effect): The guest will
get assigned to a node, and its affinity set to that node's CPUs.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by Keir Fraser-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 10/11/2009 08:51, "Jan Beulich" <JBeulich@...> wrote:

>> What default confinement? I thought guests had an all-pCPUs affinity mask be
>> default?
>
> Not with numa=on (see also Andre's post to this effect): The guest will
> get assigned to a node, and its affinity set to that node's CPUs.

...And if it didn't, striping would not happen. In fact iirc the default
NUMA allocation policy for an all-pcpus domain is in some respects pessimal:
vcpu0's initial node gets drained of memory first. I.e., you get *less*
'striping' than you could with numa=off where you might at least get lucky.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel

Re: Xen 3.4.1 NUMA support

by Keir Fraser-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 09/11/2009 15:19, "Jan Beulich" <JBeulich@...> wrote:

>>>> Andre Przywara <andre.przywara@...> 09.11.09 16:02 >>>
>> BTW: Shouldn't we set finally numa=on as the default value?
>
> I'd say no, at least until the default confinement of a guest to a single
> node gets fixed to properly deal with guests having more vCPU-s than
> a node's worth of pCPU-s (i.e. I take it for granted that the benefits of
> not overcommitting CPUs outweigh the drawbacks of cross-node memory
> accesses at the very least for CPU-bound workloads).

If this would be fixed (e.g., turn off node locality entirely by default for
domains which will not fit into a single node) then I think we could
consider numa=on by default.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@...
http://lists.xensource.com/xen-devel
< Prev | 1 - 2 | Next >