CPU affinity with ULE scheduler

View: New views
16 Messages — Rating Filter:   Alert me  

CPU affinity with ULE scheduler

by Archimedes Gaviola-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

To Whom It May Concerned:

Can someone explain or share about ULE scheduler (latest version 2 if
I'm not mistaken) dealing with CPU affinity? Is there any existing
benchmarks on this with FreeBSD? Because I am currently using 4BSD
scheduler and as what I have observed especially on processing high
network load traffic on multiple CPU cores, only one CPU were being
stressed with network interrupt while the rests are mostly in idle
state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
network interface cards (bce0 and bce1). Below is the snapshot of the
case.

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
   17 root        1 171   52     0K    16K RUN    0  96:04 97.71% idle: cpu0
   15 root        1 171   52     0K    16K RUN    2  98:41 97.07% idle: cpu2
   14 root        1 171   52     0K    16K RUN    3 103:56 95.90% idle: cpu3
   13 root        1 171   52     0K    16K RUN    4 104:17 88.23% idle: cpu4
   12 root        1 171   52     0K    16K RUN    5  97:59 86.57% idle: cpu5
   10 root        1 171   52     0K    16K RUN    7  81:51 82.08% idle: cpu7
   11 root        1 171   52     0K    16K RUN    6  95:28 81.35% idle: cpu6
   16 root        1 171   52     0K    16K RUN    1 102:15 77.78% idle: cpu1
   36 root        1 -68 -187     0K    16K WAIT   7  19:37  4.59%
irq23: bce0 bce1
   18 root        1 -32 -151     0K    16K CPU0   0   2:13  0.00%
swi4: clock sio
 4488 root        1  96    0 30728K  4292K select 3   1:51  0.00% sshd
   43 root        1 171   52     0K    16K pgzero 3   1:08  0.00% pagezero
  218 root        1  96    0  3852K  1380K select 3   0:38  0.00% syslogd
   20 root        1 -44 -163     0K    16K WAIT   7   0:32  0.00% swi1: net


Thanks,
Archimedes
_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by Ivan Voras-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Archimedes Gaviola wrote:
> To Whom It May Concerned:
>
> Can someone explain or share about ULE scheduler (latest version 2 if
> I'm not mistaken) dealing with CPU affinity? Is there any existing
> benchmarks on this with FreeBSD? Because I am currently using 4BSD

Yes but not for network loads. See for example benchmarks in
http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf

> scheduler and as what I have observed especially on processing high
> network load traffic on multiple CPU cores, only one CPU were being
> stressed with network interrupt while the rests are mostly in idle
> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
> network interface cards (bce0 and bce1). Below is the snapshot of the
> case.

This is unfortunately so and cannot be changed for now - you are not the
first with this particular performance problem. BUT, looking at the data
in the snapshot you gave, it's not clear that there is a performance
problem in your case - bce is not nearly taking as much CPU time to be
bottlenecking. What exactly do you think is wrong in your case?

>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>    17 root        1 171   52     0K    16K RUN    0  96:04 97.71% idle: cpu0
>    15 root        1 171   52     0K    16K RUN    2  98:41 97.07% idle: cpu2
>    14 root        1 171   52     0K    16K RUN    3 103:56 95.90% idle: cpu3
>    13 root        1 171   52     0K    16K RUN    4 104:17 88.23% idle: cpu4
>    12 root        1 171   52     0K    16K RUN    5  97:59 86.57% idle: cpu5
>    10 root        1 171   52     0K    16K RUN    7  81:51 82.08% idle: cpu7
>    11 root        1 171   52     0K    16K RUN    6  95:28 81.35% idle: cpu6
>    16 root        1 171   52     0K    16K RUN    1 102:15 77.78% idle: cpu1
>    36 root        1 -68 -187     0K    16K WAIT   7  19:37  4.59%
> irq23: bce0 bce1
>    18 root        1 -32 -151     0K    16K CPU0   0   2:13  0.00%
> swi4: clock sio
>  4488 root        1  96    0 30728K  4292K select 3   1:51  0.00% sshd
>    43 root        1 171   52     0K    16K pgzero 3   1:08  0.00% pagezero
>   218 root        1  96    0  3852K  1380K select 3   0:38  0.00% syslogd
>    20 root        1 -44 -163     0K    16K WAIT   7   0:32  0.00% swi1: net



signature.asc (260 bytes) Download Attachment

Re: CPU affinity with ULE scheduler

by John Baldwin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:

> To Whom It May Concerned:
>
> Can someone explain or share about ULE scheduler (latest version 2 if
> I'm not mistaken) dealing with CPU affinity? Is there any existing
> benchmarks on this with FreeBSD? Because I am currently using 4BSD
> scheduler and as what I have observed especially on processing high
> network load traffic on multiple CPU cores, only one CPU were being
> stressed with network interrupt while the rests are mostly in idle
> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
> network interface cards (bce0 and bce1). Below is the snapshot of the
> case.

Interrupts are routed to a single CPU.  Since bce0 and bce1 are both on the
same interrupt (irq 23), the CPU that interrupt is routed to is going to end
up handling all the interrupts for bce0 and bce1.  This not something ULE or
4BSD have any control over.

--
John Baldwin
_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by Archimedes Gaviola-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@...> wrote:

> On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
>> To Whom It May Concerned:
>>
>> Can someone explain or share about ULE scheduler (latest version 2 if
>> I'm not mistaken) dealing with CPU affinity? Is there any existing
>> benchmarks on this with FreeBSD? Because I am currently using 4BSD
>> scheduler and as what I have observed especially on processing high
>> network load traffic on multiple CPU cores, only one CPU were being
>> stressed with network interrupt while the rests are mostly in idle
>> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
>> network interface cards (bce0 and bce1). Below is the snapshot of the
>> case.
>
> Interrupts are routed to a single CPU.  Since bce0 and bce1 are both on the
> same interrupt (irq 23), the CPU that interrupt is routed to is going to end
> up handling all the interrupts for bce0 and bce1.  This not something ULE or
> 4BSD have any control over.
>
> --
> John Baldwin
>

Hi John,

I'm sorry for the wrong snapshot. Here's the right one with my concern.

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
   17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% idle: cpu0
   15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% idle: cpu2
   14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% idle: cpu3
   13 root        1 171   52     0K    16K RUN    4  59:14 82.47% idle: cpu4
   12 root        1 171   52     0K    16K RUN    5  55:42 82.23% idle: cpu5
   16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% idle: cpu1
   11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% idle: cpu6
   36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
irq23: bce0 bce1
   10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% idle: cpu7
   43 root        1 171   52     0K    16K pgzero 2   0:35  1.51% pagezero
 1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
 4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
   18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% swi4: clock s
   20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% swi1: net
  218 root        1  96    0  3852K  1376K select 0   0:23  0.00% syslogd
 2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd

Actually I was doing a network performance testing on this system with
FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a
tool to generate big amount of traffic around 600Mbps-700Mbps
traversing the FreeBSD system in bi-direction, meaning both network
interfaces are receiving traffic. What happened was, the CPU (cpu7)
that handles the (irq 23) on both interfaces consumed big amount of
CPU utilization around 65.53% in which it affects other running
applications and services like sshd and httpd. It's no longer
accessible when traffic is bombarded. With the current situation of my
FreeBSD system with only one CPU being stressed, I was thinking of
moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought
my concern has something to do with the distributions of load on
multiple CPU cores handled by the scheduler especially at the network
level, processing network load. So, if it is more of interrupt
handling and not on the scheduler, is there a way we can optimize it?
Because if it still routed only to one CPU then for me it's still
inefficient. Who handles interrupt scheduling for bounding CPU in
order to prevent shared IRQ? Is there any improvements with
FreeBSD-7.0 with regards to interrupt handling?

Thanks,
Archimedes
_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by Archimedes Gaviola-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Nov 11, 2008 at 12:32 PM, Archimedes Gaviola
<archimedes.gaviola@...> wrote:

> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@...> wrote:
>> On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
>>> To Whom It May Concerned:
>>>
>>> Can someone explain or share about ULE scheduler (latest version 2 if
>>> I'm not mistaken) dealing with CPU affinity? Is there any existing
>>> benchmarks on this with FreeBSD? Because I am currently using 4BSD
>>> scheduler and as what I have observed especially on processing high
>>> network load traffic on multiple CPU cores, only one CPU were being
>>> stressed with network interrupt while the rests are mostly in idle
>>> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
>>> network interface cards (bce0 and bce1). Below is the snapshot of the
>>> case.
>>
>> Interrupts are routed to a single CPU.  Since bce0 and bce1 are both on the
>> same interrupt (irq 23), the CPU that interrupt is routed to is going to end
>> up handling all the interrupts for bce0 and bce1.  This not something ULE or
>> 4BSD have any control over.
>>
>> --
>> John Baldwin
>>
>
> Hi John,
>
> I'm sorry for the wrong snapshot. Here's the right one with my concern.
>
>  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>   17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% idle: cpu0
>   15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% idle: cpu2
>   14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% idle: cpu3
>   13 root        1 171   52     0K    16K RUN    4  59:14 82.47% idle: cpu4
>   12 root        1 171   52     0K    16K RUN    5  55:42 82.23% idle: cpu5
>   16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% idle: cpu1
>   11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% idle: cpu6
>   36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
> irq23: bce0 bce1
>   10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% idle: cpu7
>   43 root        1 171   52     0K    16K pgzero 2   0:35  1.51% pagezero
>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
>   18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% swi4: clock s
>   20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% swi1: net
>  218 root        1  96    0  3852K  1376K select 0   0:23  0.00% syslogd
>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd
>
> Actually I was doing a network performance testing on this system with
> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a
> tool to generate big amount of traffic around 600Mbps-700Mbps
> traversing the FreeBSD system in bi-direction, meaning both network
> interfaces are receiving traffic. What happened was, the CPU (cpu7)
> that handles the (irq 23) on both interfaces consumed big amount of
> CPU utilization around 65.53% in which it affects other running
> applications and services like sshd and httpd. It's no longer
> accessible when traffic is bombarded. With the current situation of my
> FreeBSD system with only one CPU being stressed, I was thinking of
> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought
> my concern has something to do with the distributions of load on
> multiple CPU cores handled by the scheduler especially at the network
> level, processing network load. So, if it is more of interrupt
> handling and not on the scheduler, is there a way we can optimize it?
> Because if it still routed only to one CPU then for me it's still
> inefficient. Who handles interrupt scheduling for bounding CPU in
> order to prevent shared IRQ? Is there any improvements with
> FreeBSD-7.0 with regards to interrupt handling?
>
> Thanks,
> Archimedes
>

Hi Ivan,

Archimedes Gaviola wrote:
> To Whom It May Concerned:
>=20
> Can someone explain or share about ULE scheduler (latest version 2 if
> I'm not mistaken) dealing with CPU affinity? Is there any existing
> benchmarks on this with FreeBSD? Because I am currently using 4BSD

Yes but not for network loads. See for example benchmarks in
http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf

[Archimedes] Ah okay, so based on my understanding with ULE scheduler
in FreeBSD-7.0, it only scale well with userland applications
scheduling such as database and DNS?

> scheduler and as what I have observed especially on processing high
> network load traffic on multiple CPU cores, only one CPU were being
> stressed with network interrupt while the rests are mostly in idle
> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
> network interface cards (bce0 and bce1). Below is the snapshot of the
> case.

This is unfortunately so and cannot be changed for now - you are not the
first with this particular performance problem.

[Archimedes] Meaning, you still have to improve the ULE scheduler
processing network load? I have read some papers and articles that
FreeBSD is implementing parallelized network stack, what is the status
of this development? Is processing high network load can address this?

BUT, looking at the data
in the snapshot you gave, it's not clear that there is a performance
problem in your case - bce is not nearly taking as much CPU time to be
bottlenecking. What exactly do you think is wrong in your case?

[Archimedes] Oh I'm sorry this is not the right one. Here below,

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
   17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% idle: cpu0
   15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% idle: cpu2
   14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% idle: cpu3
   13 root        1 171   52     0K    16K RUN    4  59:14 82.47% idle: cpu4
   12 root        1 171   52     0K    16K RUN    5  55:42 82.23% idle: cpu5
   16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% idle: cpu1
   11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% idle: cpu6
   36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
irq23: bce0 bce1
   10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% idle: cpu7
   43 root        1 171   52     0K    16K pgzero 2   0:35  1.51% pagezero
 1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
 4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
   18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% swi4: clock s
   20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% swi1: net
  218 root        1  96    0  3852K  1376K select 0   0:23  0.00% syslogd
 2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd

I was doing network performance testing with a traffic generator tool
bombarding 600Mbps-700Mbps traversing my FreeBSD system in both
directions. As you can see cpu7 is bounded to irq23 shared on both
network interfaces bce0 and bce1. cpu7 takes up 65.53% CPU utilization
which affects some of the applications running on the system like sshd
and httpd. These services are no longer accessible when bombarding
that amount of traffic. Since there are still more idled CPUs, I'm
concern about CPU load distribution so that not only one CPU will be
stressed.

Thanks,
Archimedes
_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by Ivan Voras-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Archimedes Gaviola wrote:

> Hi Ivan,
>
> Archimedes Gaviola wrote:
>> To Whom It May Concerned:
>> =20
>> Can someone explain or share about ULE scheduler (latest version 2 if
>> I'm not mistaken) dealing with CPU affinity? Is there any existing
>> benchmarks on this with FreeBSD? Because I am currently using 4BSD
>
> Yes but not for network loads. See for example benchmarks in
> http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf
>
> [Archimedes] Ah okay, so based on my understanding with ULE scheduler
> in FreeBSD-7.0, it only scale well with userland applications
> scheduling such as database and DNS?

The problem you are seeing is probably not solvable by a better
scheduler. There are other parts of the system that cause performance
bottlenecks. I'd recommend you try 7-STABLE, it might help you, but it
probably won't.


_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by John Baldwin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote:

> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@...> wrote:
> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
> >> To Whom It May Concerned:
> >>
> >> Can someone explain or share about ULE scheduler (latest version 2 if
> >> I'm not mistaken) dealing with CPU affinity? Is there any existing
> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD
> >> scheduler and as what I have observed especially on processing high
> >> network load traffic on multiple CPU cores, only one CPU were being
> >> stressed with network interrupt while the rests are mostly in idle
> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
> >> network interface cards (bce0 and bce1). Below is the snapshot of the
> >> case.
> >
> > Interrupts are routed to a single CPU.  Since bce0 and bce1 are both on
the
> > same interrupt (irq 23), the CPU that interrupt is routed to is going to
end
> > up handling all the interrupts for bce0 and bce1.  This not something ULE
or

> > 4BSD have any control over.
> >
> > --
> > John Baldwin
> >
>
> Hi John,
>
> I'm sorry for the wrong snapshot. Here's the right one with my concern.
>
>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>    17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% idle: cpu0
>    15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% idle: cpu2
>    14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% idle: cpu3
>    13 root        1 171   52     0K    16K RUN    4  59:14 82.47% idle: cpu4
>    12 root        1 171   52     0K    16K RUN    5  55:42 82.23% idle: cpu5
>    16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% idle: cpu1
>    11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% idle: cpu6
>    36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
> irq23: bce0 bce1
>    10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% idle: cpu7
>    43 root        1 171   52     0K    16K pgzero 2   0:35  1.51% pagezero
>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
>    18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% swi4:
clock s

>    20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% swi1: net
>   218 root        1  96    0  3852K  1376K select 0   0:23  0.00% syslogd
>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd
>
> Actually I was doing a network performance testing on this system with
> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a
> tool to generate big amount of traffic around 600Mbps-700Mbps
> traversing the FreeBSD system in bi-direction, meaning both network
> interfaces are receiving traffic. What happened was, the CPU (cpu7)
> that handles the (irq 23) on both interfaces consumed big amount of
> CPU utilization around 65.53% in which it affects other running
> applications and services like sshd and httpd. It's no longer
> accessible when traffic is bombarded. With the current situation of my
> FreeBSD system with only one CPU being stressed, I was thinking of
> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought
> my concern has something to do with the distributions of load on
> multiple CPU cores handled by the scheduler especially at the network
> level, processing network load. So, if it is more of interrupt
> handling and not on the scheduler, is there a way we can optimize it?
> Because if it still routed only to one CPU then for me it's still
> inefficient. Who handles interrupt scheduling for bounding CPU in
> order to prevent shared IRQ? Is there any improvements with
> FreeBSD-7.0 with regards to interrupt handling?

It depends.  In all likelihood, the interrupts from bce0 and bce1 are both
hardwired to the same interrupt pin and so they will always share the same
ithread when using the legacy INTx interrupts.  However, bce(4) parts do
support MSI, and if you try a newer OS snap (6.3 or later) these devices
should use MSI in which case each NIC would be assigned to a separate CPU.  I
would suggest trying 7.0 or a 7.1 release candidate and see if it does
better.

--
John Baldwin
_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by Archimedes Gaviola-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin <jhb@...> wrote:

> On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote:
>> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@...> wrote:
>> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
>> >> To Whom It May Concerned:
>> >>
>> >> Can someone explain or share about ULE scheduler (latest version 2 if
>> >> I'm not mistaken) dealing with CPU affinity? Is there any existing
>> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD
>> >> scheduler and as what I have observed especially on processing high
>> >> network load traffic on multiple CPU cores, only one CPU were being
>> >> stressed with network interrupt while the rests are mostly in idle
>> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
>> >> network interface cards (bce0 and bce1). Below is the snapshot of the
>> >> case.
>> >
>> > Interrupts are routed to a single CPU.  Since bce0 and bce1 are both on
> the
>> > same interrupt (irq 23), the CPU that interrupt is routed to is going to
> end
>> > up handling all the interrupts for bce0 and bce1.  This not something ULE
> or
>> > 4BSD have any control over.
>> >
>> > --
>> > John Baldwin
>> >
>>
>> Hi John,
>>
>> I'm sorry for the wrong snapshot. Here's the right one with my concern.
>>
>>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>>    17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% idle: cpu0
>>    15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% idle: cpu2
>>    14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% idle: cpu3
>>    13 root        1 171   52     0K    16K RUN    4  59:14 82.47% idle: cpu4
>>    12 root        1 171   52     0K    16K RUN    5  55:42 82.23% idle: cpu5
>>    16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% idle: cpu1
>>    11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% idle: cpu6
>>    36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
>> irq23: bce0 bce1
>>    10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% idle: cpu7
>>    43 root        1 171   52     0K    16K pgzero 2   0:35  1.51% pagezero
>>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
>>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
>>    18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% swi4:
> clock s
>>    20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% swi1: net
>>   218 root        1  96    0  3852K  1376K select 0   0:23  0.00% syslogd
>>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd
>>
>> Actually I was doing a network performance testing on this system with
>> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a
>> tool to generate big amount of traffic around 600Mbps-700Mbps
>> traversing the FreeBSD system in bi-direction, meaning both network
>> interfaces are receiving traffic. What happened was, the CPU (cpu7)
>> that handles the (irq 23) on both interfaces consumed big amount of
>> CPU utilization around 65.53% in which it affects other running
>> applications and services like sshd and httpd. It's no longer
>> accessible when traffic is bombarded. With the current situation of my
>> FreeBSD system with only one CPU being stressed, I was thinking of
>> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought
>> my concern has something to do with the distributions of load on
>> multiple CPU cores handled by the scheduler especially at the network
>> level, processing network load. So, if it is more of interrupt
>> handling and not on the scheduler, is there a way we can optimize it?
>> Because if it still routed only to one CPU then for me it's still
>> inefficient. Who handles interrupt scheduling for bounding CPU in
>> order to prevent shared IRQ? Is there any improvements with
>> FreeBSD-7.0 with regards to interrupt handling?
>
> It depends.  In all likelihood, the interrupts from bce0 and bce1 are both
> hardwired to the same interrupt pin and so they will always share the same
> ithread when using the legacy INTx interrupts.  However, bce(4) parts do
> support MSI, and if you try a newer OS snap (6.3 or later) these devices
> should use MSI in which case each NIC would be assigned to a separate CPU.  I
> would suggest trying 7.0 or a 7.1 release candidate and see if it does
> better.
>
> --
> John Baldwin
>

Hi John,

I try 7.0 release and each network interface were already allocated
separately on different CPU. Here, MSI is already working.

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
   12 root        1 171 ki31     0K    16K CPU6   6 123:55 100.00% idle: cpu6
   15 root        1 171 ki31     0K    16K CPU3   3 123:54 100.00% idle: cpu3
   14 root        1 171 ki31     0K    16K CPU4   4 123:26 100.00% idle: cpu4
   16 root        1 171 ki31     0K    16K CPU2   2 123:15 100.00% idle: cpu2
   17 root        1 171 ki31     0K    16K CPU1   1 123:15 100.00% idle: cpu1
   37 root        1 -68    -     0K    16K CPU7   7   9:09 100.00% irq256: bce0
   13 root        1 171 ki31     0K    16K CPU5   5 123:49 99.07% idle: cpu5
   40 root        1 -68    -     0K    16K WAIT   0   4:40 51.17% irq257: bce1
   18 root        1 171 ki31     0K    16K RUN    0 117:48 49.37% idle: cpu0
   11 root        1 171 ki31     0K    16K RUN    7 115:25  0.00% idle: cpu7
   19 root        1 -32    -     0K    16K WAIT   0   0:39  0.00% swi4: clock s
14367 root        1  44    0  5176K  3104K select 2   0:01  0.00% dhcpd
   22 root        1 -16    -     0K    16K -      3   0:01  0.00% yarrow
   25 root        1 -24    -     0K    16K WAIT   0   0:00  0.00% swi6: Giant t
11658 root        1  44    0 32936K  4540K select 1   0:00  0.00% sshd
14224 root        1  44    0 32936K  4540K select 5   0:00  0.00% sshd
   41 root        1 -60    -     0K    16K WAIT   0   0:00  0.00% irq1: atkbd0
    4 root        1  -8    -     0K    16K -      2   0:00  0.00% g_down

The bce0 interface interrupt (irq256) gets stressed out which already
have 100% of CPU7 while CPU0 is around 51.17%. Any more
recommendations? Is there anything we can do about optimization with
MSI?

Thanks,
Archimedes
_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by John Baldwin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote:

> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin <jhb@...> wrote:
> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote:
> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@...> wrote:
> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
> >> >> To Whom It May Concerned:
> >> >>
> >> >> Can someone explain or share about ULE scheduler (latest version 2 if
> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing
> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD
> >> >> scheduler and as what I have observed especially on processing high
> >> >> network load traffic on multiple CPU cores, only one CPU were being
> >> >> stressed with network interrupt while the rests are mostly in idle
> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the
> >> >> case.
> >> >
> >> > Interrupts are routed to a single CPU.  Since bce0 and bce1 are both on
> > the
> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going
to
> > end
> >> > up handling all the interrupts for bce0 and bce1.  This not something
ULE

> > or
> >> > 4BSD have any control over.
> >> >
> >> > --
> >> > John Baldwin
> >> >
> >>
> >> Hi John,
> >>
> >> I'm sorry for the wrong snapshot. Here's the right one with my concern.
> >>
> >>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> >>    17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% idle:
cpu0
> >>    15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% idle:
cpu2
> >>    14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% idle:
cpu3
> >>    13 root        1 171   52     0K    16K RUN    4  59:14 82.47% idle:
cpu4
> >>    12 root        1 171   52     0K    16K RUN    5  55:42 82.23% idle:
cpu5
> >>    16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% idle:
cpu1
> >>    11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% idle:
cpu6
> >>    36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
> >> irq23: bce0 bce1
> >>    10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% idle:
cpu7
> >>    43 root        1 171   52     0K    16K pgzero 2   0:35  1.51%
pagezero
> >>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
> >>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
> >>    18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% swi4:
> > clock s
> >>    20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% swi1:
net

> >>   218 root        1  96    0  3852K  1376K select 0   0:23  0.00% syslogd
> >>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd
> >>
> >> Actually I was doing a network performance testing on this system with
> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a
> >> tool to generate big amount of traffic around 600Mbps-700Mbps
> >> traversing the FreeBSD system in bi-direction, meaning both network
> >> interfaces are receiving traffic. What happened was, the CPU (cpu7)
> >> that handles the (irq 23) on both interfaces consumed big amount of
> >> CPU utilization around 65.53% in which it affects other running
> >> applications and services like sshd and httpd. It's no longer
> >> accessible when traffic is bombarded. With the current situation of my
> >> FreeBSD system with only one CPU being stressed, I was thinking of
> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought
> >> my concern has something to do with the distributions of load on
> >> multiple CPU cores handled by the scheduler especially at the network
> >> level, processing network load. So, if it is more of interrupt
> >> handling and not on the scheduler, is there a way we can optimize it?
> >> Because if it still routed only to one CPU then for me it's still
> >> inefficient. Who handles interrupt scheduling for bounding CPU in
> >> order to prevent shared IRQ? Is there any improvements with
> >> FreeBSD-7.0 with regards to interrupt handling?
> >
> > It depends.  In all likelihood, the interrupts from bce0 and bce1 are both
> > hardwired to the same interrupt pin and so they will always share the same
> > ithread when using the legacy INTx interrupts.  However, bce(4) parts do
> > support MSI, and if you try a newer OS snap (6.3 or later) these devices
> > should use MSI in which case each NIC would be assigned to a separate CPU.  
I

> > would suggest trying 7.0 or a 7.1 release candidate and see if it does
> > better.
> >
> > --
> > John Baldwin
> >
>
> Hi John,
>
> I try 7.0 release and each network interface were already allocated
> separately on different CPU. Here, MSI is already working.
>
>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>    12 root        1 171 ki31     0K    16K CPU6   6 123:55 100.00% idle:
cpu6
>    15 root        1 171 ki31     0K    16K CPU3   3 123:54 100.00% idle:
cpu3
>    14 root        1 171 ki31     0K    16K CPU4   4 123:26 100.00% idle:
cpu4
>    16 root        1 171 ki31     0K    16K CPU2   2 123:15 100.00% idle:
cpu2
>    17 root        1 171 ki31     0K    16K CPU1   1 123:15 100.00% idle:
cpu1
>    37 root        1 -68    -     0K    16K CPU7   7   9:09 100.00% irq256:
bce0
>    13 root        1 171 ki31     0K    16K CPU5   5 123:49 99.07% idle: cpu5
>    40 root        1 -68    -     0K    16K WAIT   0   4:40 51.17% irq257:
bce1
>    18 root        1 171 ki31     0K    16K RUN    0 117:48 49.37% idle: cpu0
>    11 root        1 171 ki31     0K    16K RUN    7 115:25  0.00% idle: cpu7
>    19 root        1 -32    -     0K    16K WAIT   0   0:39  0.00% swi4:
clock s
> 14367 root        1  44    0  5176K  3104K select 2   0:01  0.00% dhcpd
>    22 root        1 -16    -     0K    16K -      3   0:01  0.00% yarrow
>    25 root        1 -24    -     0K    16K WAIT   0   0:00  0.00% swi6:
Giant t
> 11658 root        1  44    0 32936K  4540K select 1   0:00  0.00% sshd
> 14224 root        1  44    0 32936K  4540K select 5   0:00  0.00% sshd
>    41 root        1 -60    -     0K    16K WAIT   0   0:00  0.00% irq1:
atkbd0
>     4 root        1  -8    -     0K    16K -      2   0:00  0.00% g_down
>
> The bce0 interface interrupt (irq256) gets stressed out which already
> have 100% of CPU7 while CPU0 is around 51.17%. Any more
> recommendations? Is there anything we can do about optimization with
> MSI?

Well, on 7.x you can try turning net.isr.direct off (sysctl).  However, it
seems you are hammering your bce0 interface.  You might want to try using
polling on bce0 and seeing if it keeps up with the traffic better.

--
John Baldwin
_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by Archimedes Gaviola-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin <jhb@...> wrote:

> On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote:
>> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin <jhb@...> wrote:
>> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote:
>> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@...> wrote:
>> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
>> >> >> To Whom It May Concerned:
>> >> >>
>> >> >> Can someone explain or share about ULE scheduler (latest version 2 if
>> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing
>> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD
>> >> >> scheduler and as what I have observed especially on processing high
>> >> >> network load traffic on multiple CPU cores, only one CPU were being
>> >> >> stressed with network interrupt while the rests are mostly in idle
>> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
>> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the
>> >> >> case.
>> >> >
>> >> > Interrupts are routed to a single CPU.  Since bce0 and bce1 are both on
>> > the
>> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going
> to
>> > end
>> >> > up handling all the interrupts for bce0 and bce1.  This not something
> ULE
>> > or
>> >> > 4BSD have any control over.
>> >> >
>> >> > --
>> >> > John Baldwin
>> >> >
>> >>
>> >> Hi John,
>> >>
>> >> I'm sorry for the wrong snapshot. Here's the right one with my concern.
>> >>
>> >>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>> >>    17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% idle:
> cpu0
>> >>    15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% idle:
> cpu2
>> >>    14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% idle:
> cpu3
>> >>    13 root        1 171   52     0K    16K RUN    4  59:14 82.47% idle:
> cpu4
>> >>    12 root        1 171   52     0K    16K RUN    5  55:42 82.23% idle:
> cpu5
>> >>    16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% idle:
> cpu1
>> >>    11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% idle:
> cpu6
>> >>    36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
>> >> irq23: bce0 bce1
>> >>    10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% idle:
> cpu7
>> >>    43 root        1 171   52     0K    16K pgzero 2   0:35  1.51%
> pagezero
>> >>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
>> >>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
>> >>    18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% swi4:
>> > clock s
>> >>    20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% swi1:
> net
>> >>   218 root        1  96    0  3852K  1376K select 0   0:23  0.00% syslogd
>> >>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd
>> >>
>> >> Actually I was doing a network performance testing on this system with
>> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a
>> >> tool to generate big amount of traffic around 600Mbps-700Mbps
>> >> traversing the FreeBSD system in bi-direction, meaning both network
>> >> interfaces are receiving traffic. What happened was, the CPU (cpu7)
>> >> that handles the (irq 23) on both interfaces consumed big amount of
>> >> CPU utilization around 65.53% in which it affects other running
>> >> applications and services like sshd and httpd. It's no longer
>> >> accessible when traffic is bombarded. With the current situation of my
>> >> FreeBSD system with only one CPU being stressed, I was thinking of
>> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought
>> >> my concern has something to do with the distributions of load on
>> >> multiple CPU cores handled by the scheduler especially at the network
>> >> level, processing network load. So, if it is more of interrupt
>> >> handling and not on the scheduler, is there a way we can optimize it?
>> >> Because if it still routed only to one CPU then for me it's still
>> >> inefficient. Who handles interrupt scheduling for bounding CPU in
>> >> order to prevent shared IRQ? Is there any improvements with
>> >> FreeBSD-7.0 with regards to interrupt handling?
>> >
>> > It depends.  In all likelihood, the interrupts from bce0 and bce1 are both
>> > hardwired to the same interrupt pin and so they will always share the same
>> > ithread when using the legacy INTx interrupts.  However, bce(4) parts do
>> > support MSI, and if you try a newer OS snap (6.3 or later) these devices
>> > should use MSI in which case each NIC would be assigned to a separate CPU.
> I
>> > would suggest trying 7.0 or a 7.1 release candidate and see if it does
>> > better.
>> >
>> > --
>> > John Baldwin
>> >
>>
>> Hi John,
>>
>> I try 7.0 release and each network interface were already allocated
>> separately on different CPU. Here, MSI is already working.
>>
>>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>>    12 root        1 171 ki31     0K    16K CPU6   6 123:55 100.00% idle:
> cpu6
>>    15 root        1 171 ki31     0K    16K CPU3   3 123:54 100.00% idle:
> cpu3
>>    14 root        1 171 ki31     0K    16K CPU4   4 123:26 100.00% idle:
> cpu4
>>    16 root        1 171 ki31     0K    16K CPU2   2 123:15 100.00% idle:
> cpu2
>>    17 root        1 171 ki31     0K    16K CPU1   1 123:15 100.00% idle:
> cpu1
>>    37 root        1 -68    -     0K    16K CPU7   7   9:09 100.00% irq256:
> bce0
>>    13 root        1 171 ki31     0K    16K CPU5   5 123:49 99.07% idle: cpu5
>>    40 root        1 -68    -     0K    16K WAIT   0   4:40 51.17% irq257:
> bce1
>>    18 root        1 171 ki31     0K    16K RUN    0 117:48 49.37% idle: cpu0
>>    11 root        1 171 ki31     0K    16K RUN    7 115:25  0.00% idle: cpu7
>>    19 root        1 -32    -     0K    16K WAIT   0   0:39  0.00% swi4:
> clock s
>> 14367 root        1  44    0  5176K  3104K select 2   0:01  0.00% dhcpd
>>    22 root        1 -16    -     0K    16K -      3   0:01  0.00% yarrow
>>    25 root        1 -24    -     0K    16K WAIT   0   0:00  0.00% swi6:
> Giant t
>> 11658 root        1  44    0 32936K  4540K select 1   0:00  0.00% sshd
>> 14224 root        1  44    0 32936K  4540K select 5   0:00  0.00% sshd
>>    41 root        1 -60    -     0K    16K WAIT   0   0:00  0.00% irq1:
> atkbd0
>>     4 root        1  -8    -     0K    16K -      2   0:00  0.00% g_down
>>
>> The bce0 interface interrupt (irq256) gets stressed out which already
>> have 100% of CPU7 while CPU0 is around 51.17%. Any more
>> recommendations? Is there anything we can do about optimization with
>> MSI?
>
> Well, on 7.x you can try turning net.isr.direct off (sysctl).  However, it
> seems you are hammering your bce0 interface.  You might want to try using
> polling on bce0 and seeing if it keeps up with the traffic better.
>
> --
> John Baldwin
>

With net.isr.direct=0, my IBM system lessens CPU utilization per
interface (bce0 and bce1) but swi1:net increase its utilization.
Can you explained what's happening here? What does net.isr.direct do
with the decrease of CPU utilization on its interface? I really wanted
to know what happened internally during the packets being processed
and received by the interfaces then to the device interrupt up to the
software interrupt level because I am confused when enabling/disabling
net.isr.direct in sysctl. Is there a tool that can we used to trace
this process just to be able to know which part of the kernel internal
is doing the bottleneck especially when net.isr.direct=1? By the way
with device polling enabled, the system experienced packet errors and
the interface throughput is worst, so I avoid using it though.

   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND

   16 root        1 171 ki31     0K    16K CPU10  a  86:06 89.06% idle: cpu10
   27 root        1 -44    -     0K    16K CPU1   1  34:37 82.67% swi1: net
   52 root        1 -68    -     0K    16K WAIT   b  51:59 59.77% irq32: bce1
   15 root        1 171 ki31     0K    16K RUN    b  69:28 43.16% idle: cpu11
   25 root        1 171 ki31     0K    16K RUN    1 115:35 24.27% idle: cpu1
   51 root        1 -68    -     0K    16K CPU10  a  35:21 13.48% irq31: bce0


Regards,
Archimedes
_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by Archimedes Gaviola-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 17, 2008 at 7:11 PM, Archimedes Gaviola
<archimedes.gaviola@...> wrote:

> On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin <jhb@...> wrote:
>> On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote:
>>> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin <jhb@...> wrote:
>>> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote:
>>> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@...> wrote:
>>> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
>>> >> >> To Whom It May Concerned:
>>> >> >>
>>> >> >> Can someone explain or share about ULE scheduler (latest version 2 if
>>> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing
>>> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD
>>> >> >> scheduler and as what I have observed especially on processing high
>>> >> >> network load traffic on multiple CPU cores, only one CPU were being
>>> >> >> stressed with network interrupt while the rests are mostly in idle
>>> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE Broadcom
>>> >> >> network interface cards (bce0 and bce1). Below is the snapshot of the
>>> >> >> case.
>>> >> >
>>> >> > Interrupts are routed to a single CPU.  Since bce0 and bce1 are both on
>>> > the
>>> >> > same interrupt (irq 23), the CPU that interrupt is routed to is going
>> to
>>> > end
>>> >> > up handling all the interrupts for bce0 and bce1.  This not something
>> ULE
>>> > or
>>> >> > 4BSD have any control over.
>>> >> >
>>> >> > --
>>> >> > John Baldwin
>>> >> >
>>> >>
>>> >> Hi John,
>>> >>
>>> >> I'm sorry for the wrong snapshot. Here's the right one with my concern.
>>> >>
>>> >>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>>> >>    17 root        1 171   52     0K    16K CPU0   0  54:28 95.17% idle:
>> cpu0
>>> >>    15 root        1 171   52     0K    16K CPU2   2  55:55 93.65% idle:
>> cpu2
>>> >>    14 root        1 171   52     0K    16K CPU3   3  58:53 93.55% idle:
>> cpu3
>>> >>    13 root        1 171   52     0K    16K RUN    4  59:14 82.47% idle:
>> cpu4
>>> >>    12 root        1 171   52     0K    16K RUN    5  55:42 82.23% idle:
>> cpu5
>>> >>    16 root        1 171   52     0K    16K CPU1   1  58:13 77.78% idle:
>> cpu1
>>> >>    11 root        1 171   52     0K    16K CPU6   6  54:08 76.17% idle:
>> cpu6
>>> >>    36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
>>> >> irq23: bce0 bce1
>>> >>    10 root        1 171   52     0K    16K CPU7   7  48:19 29.79% idle:
>> cpu7
>>> >>    43 root        1 171   52     0K    16K pgzero 2   0:35  1.51%
>> pagezero
>>> >>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
>>> >>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
>>> >>    18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00% swi4:
>>> > clock s
>>> >>    20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00% swi1:
>> net
>>> >>   218 root        1  96    0  3852K  1376K select 0   0:23  0.00% syslogd
>>> >>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd
>>> >>
>>> >> Actually I was doing a network performance testing on this system with
>>> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a
>>> >> tool to generate big amount of traffic around 600Mbps-700Mbps
>>> >> traversing the FreeBSD system in bi-direction, meaning both network
>>> >> interfaces are receiving traffic. What happened was, the CPU (cpu7)
>>> >> that handles the (irq 23) on both interfaces consumed big amount of
>>> >> CPU utilization around 65.53% in which it affects other running
>>> >> applications and services like sshd and httpd. It's no longer
>>> >> accessible when traffic is bombarded. With the current situation of my
>>> >> FreeBSD system with only one CPU being stressed, I was thinking of
>>> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought
>>> >> my concern has something to do with the distributions of load on
>>> >> multiple CPU cores handled by the scheduler especially at the network
>>> >> level, processing network load. So, if it is more of interrupt
>>> >> handling and not on the scheduler, is there a way we can optimize it?
>>> >> Because if it still routed only to one CPU then for me it's still
>>> >> inefficient. Who handles interrupt scheduling for bounding CPU in
>>> >> order to prevent shared IRQ? Is there any improvements with
>>> >> FreeBSD-7.0 with regards to interrupt handling?
>>> >
>>> > It depends.  In all likelihood, the interrupts from bce0 and bce1 are both
>>> > hardwired to the same interrupt pin and so they will always share the same
>>> > ithread when using the legacy INTx interrupts.  However, bce(4) parts do
>>> > support MSI, and if you try a newer OS snap (6.3 or later) these devices
>>> > should use MSI in which case each NIC would be assigned to a separate CPU.
>> I
>>> > would suggest trying 7.0 or a 7.1 release candidate and see if it does
>>> > better.
>>> >
>>> > --
>>> > John Baldwin
>>> >
>>>
>>> Hi John,
>>>
>>> I try 7.0 release and each network interface were already allocated
>>> separately on different CPU. Here, MSI is already working.
>>>
>>>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>>>    12 root        1 171 ki31     0K    16K CPU6   6 123:55 100.00% idle:
>> cpu6
>>>    15 root        1 171 ki31     0K    16K CPU3   3 123:54 100.00% idle:
>> cpu3
>>>    14 root        1 171 ki31     0K    16K CPU4   4 123:26 100.00% idle:
>> cpu4
>>>    16 root        1 171 ki31     0K    16K CPU2   2 123:15 100.00% idle:
>> cpu2
>>>    17 root        1 171 ki31     0K    16K CPU1   1 123:15 100.00% idle:
>> cpu1
>>>    37 root        1 -68    -     0K    16K CPU7   7   9:09 100.00% irq256:
>> bce0
>>>    13 root        1 171 ki31     0K    16K CPU5   5 123:49 99.07% idle: cpu5
>>>    40 root        1 -68    -     0K    16K WAIT   0   4:40 51.17% irq257:
>> bce1
>>>    18 root        1 171 ki31     0K    16K RUN    0 117:48 49.37% idle: cpu0
>>>    11 root        1 171 ki31     0K    16K RUN    7 115:25  0.00% idle: cpu7
>>>    19 root        1 -32    -     0K    16K WAIT   0   0:39  0.00% swi4:
>> clock s
>>> 14367 root        1  44    0  5176K  3104K select 2   0:01  0.00% dhcpd
>>>    22 root        1 -16    -     0K    16K -      3   0:01  0.00% yarrow
>>>    25 root        1 -24    -     0K    16K WAIT   0   0:00  0.00% swi6:
>> Giant t
>>> 11658 root        1  44    0 32936K  4540K select 1   0:00  0.00% sshd
>>> 14224 root        1  44    0 32936K  4540K select 5   0:00  0.00% sshd
>>>    41 root        1 -60    -     0K    16K WAIT   0   0:00  0.00% irq1:
>> atkbd0
>>>     4 root        1  -8    -     0K    16K -      2   0:00  0.00% g_down
>>>
>>> The bce0 interface interrupt (irq256) gets stressed out which already
>>> have 100% of CPU7 while CPU0 is around 51.17%. Any more
>>> recommendations? Is there anything we can do about optimization with
>>> MSI?
>>
>> Well, on 7.x you can try turning net.isr.direct off (sysctl).  However, it
>> seems you are hammering your bce0 interface.  You might want to try using
>> polling on bce0 and seeing if it keeps up with the traffic better.
>>
>> --
>> John Baldwin
>>
>
> With net.isr.direct=0, my IBM system lessens CPU utilization per
> interface (bce0 and bce1) but swi1:net increase its utilization.
> Can you explained what's happening here? What does net.isr.direct do
> with the decrease of CPU utilization on its interface? I really wanted
> to know what happened internally during the packets being processed
> and received by the interfaces then to the device interrupt up to the
> software interrupt level because I am confused when enabling/disabling
> net.isr.direct in sysctl. Is there a tool that can we used to trace
> this process just to be able to know which part of the kernel internal
> is doing the bottleneck especially when net.isr.direct=1? By the way
> with device polling enabled, the system experienced packet errors and
> the interface throughput is worst, so I avoid using it though.
>
>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>
>   16 root        1 171 ki31     0K    16K CPU10  a  86:06 89.06% idle: cpu10
>   27 root        1 -44    -     0K    16K CPU1   1  34:37 82.67% swi1: net
>   52 root        1 -68    -     0K    16K WAIT   b  51:59 59.77% irq32: bce1
>   15 root        1 171 ki31     0K    16K RUN    b  69:28 43.16% idle: cpu11
>   25 root        1 171 ki31     0K    16K RUN    1 115:35 24.27% idle: cpu1
>   51 root        1 -68    -     0K    16K CPU10  a  35:21 13.48% irq31: bce0
>
>
> Regards,
> Archimedes
>

One more thing, I observed that when net.isr.direct=1, bce0 is using
irq256 and bce1 is using irq257 while net.isr.direct=0, bce0 is now
using irq31 and bce1 is using irq32. What makes it different?
_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by Ivan Voras-7 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Archimedes Gaviola wrote:

> With net.isr.direct=0, my IBM system lessens CPU utilization per
> interface (bce0 and bce1) but swi1:net increase its utilization.
> Can you explained what's happening here? What does net.isr.direct do
> with the decrease of CPU utilization on its interface?

The system has a choice between processing the packets in the interrupt
handler (the "irq:bce" process) or in a dedicated network process (the
"swi:net" process). This is about protocol handling not simply receiving
packets. With net.isr.direct you're toggling between those two options.
If "direct" is 1, the packets are processed in the interrupt handler; if
it's 0, the processing is delegated to swi. It's set to 1 by default
because this setting should yield best latency.

In both cases the code path a packet must go through is very similar: it
has to be received, then processed through firewalls and network stack
code, then delivered to application(s), so it's a serial process. There
are things that could be better parallelized in the stack and people are
working on them, but they will not be finished any time soon.




signature.asc (260 bytes) Download Attachment

Re: CPU affinity with ULE scheduler

by John Baldwin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Monday 17 November 2008 06:11:00 am Archimedes Gaviola wrote:
> On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin <jhb@...> wrote:
> > On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote:
> >> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin <jhb@...> wrote:
> >> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote:
> >> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@...> wrote:
> >> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
> >> >> >> To Whom It May Concerned:
> >> >> >>
> >> >> >> Can someone explain or share about ULE scheduler (latest version 2
if
> >> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing
> >> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD
> >> >> >> scheduler and as what I have observed especially on processing high
> >> >> >> network load traffic on multiple CPU cores, only one CPU were being
> >> >> >> stressed with network interrupt while the rests are mostly in idle
> >> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE
Broadcom
> >> >> >> network interface cards (bce0 and bce1). Below is the snapshot of
the
> >> >> >> case.
> >> >> >
> >> >> > Interrupts are routed to a single CPU.  Since bce0 and bce1 are both
on
> >> > the
> >> >> > same interrupt (irq 23), the CPU that interrupt is routed to is
going
> > to
> >> > end
> >> >> > up handling all the interrupts for bce0 and bce1.  This not
something

> > ULE
> >> > or
> >> >> > 4BSD have any control over.
> >> >> >
> >> >> > --
> >> >> > John Baldwin
> >> >> >
> >> >>
> >> >> Hi John,
> >> >>
> >> >> I'm sorry for the wrong snapshot. Here's the right one with my
concern.
> >> >>
> >> >>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
COMMAND
> >> >>    17 root        1 171   52     0K    16K CPU0   0  54:28 95.17%
idle:
> > cpu0
> >> >>    15 root        1 171   52     0K    16K CPU2   2  55:55 93.65%
idle:
> > cpu2
> >> >>    14 root        1 171   52     0K    16K CPU3   3  58:53 93.55%
idle:
> > cpu3
> >> >>    13 root        1 171   52     0K    16K RUN    4  59:14 82.47%
idle:
> > cpu4
> >> >>    12 root        1 171   52     0K    16K RUN    5  55:42 82.23%
idle:
> > cpu5
> >> >>    16 root        1 171   52     0K    16K CPU1   1  58:13 77.78%
idle:
> > cpu1
> >> >>    11 root        1 171   52     0K    16K CPU6   6  54:08 76.17%
idle:
> > cpu6
> >> >>    36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
> >> >> irq23: bce0 bce1
> >> >>    10 root        1 171   52     0K    16K CPU7   7  48:19 29.79%
idle:
> > cpu7
> >> >>    43 root        1 171   52     0K    16K pgzero 2   0:35  1.51%
> > pagezero
> >> >>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
> >> >>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00% sshd
> >> >>    18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00%
swi4:
> >> > clock s
> >> >>    20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00%
swi1:
> > net
> >> >>   218 root        1  96    0  3852K  1376K select 0   0:23  0.00%
syslogd

> >> >>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00% sshd
> >> >>
> >> >> Actually I was doing a network performance testing on this system with
> >> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used a
> >> >> tool to generate big amount of traffic around 600Mbps-700Mbps
> >> >> traversing the FreeBSD system in bi-direction, meaning both network
> >> >> interfaces are receiving traffic. What happened was, the CPU (cpu7)
> >> >> that handles the (irq 23) on both interfaces consumed big amount of
> >> >> CPU utilization around 65.53% in which it affects other running
> >> >> applications and services like sshd and httpd. It's no longer
> >> >> accessible when traffic is bombarded. With the current situation of my
> >> >> FreeBSD system with only one CPU being stressed, I was thinking of
> >> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I thought
> >> >> my concern has something to do with the distributions of load on
> >> >> multiple CPU cores handled by the scheduler especially at the network
> >> >> level, processing network load. So, if it is more of interrupt
> >> >> handling and not on the scheduler, is there a way we can optimize it?
> >> >> Because if it still routed only to one CPU then for me it's still
> >> >> inefficient. Who handles interrupt scheduling for bounding CPU in
> >> >> order to prevent shared IRQ? Is there any improvements with
> >> >> FreeBSD-7.0 with regards to interrupt handling?
> >> >
> >> > It depends.  In all likelihood, the interrupts from bce0 and bce1 are
both
> >> > hardwired to the same interrupt pin and so they will always share the
same
> >> > ithread when using the legacy INTx interrupts.  However, bce(4) parts
do
> >> > support MSI, and if you try a newer OS snap (6.3 or later) these
devices
> >> > should use MSI in which case each NIC would be assigned to a separate
CPU.

> > I
> >> > would suggest trying 7.0 or a 7.1 release candidate and see if it does
> >> > better.
> >> >
> >> > --
> >> > John Baldwin
> >> >
> >>
> >> Hi John,
> >>
> >> I try 7.0 release and each network interface were already allocated
> >> separately on different CPU. Here, MSI is already working.
> >>
> >>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> >>    12 root        1 171 ki31     0K    16K CPU6   6 123:55 100.00% idle:
> > cpu6
> >>    15 root        1 171 ki31     0K    16K CPU3   3 123:54 100.00% idle:
> > cpu3
> >>    14 root        1 171 ki31     0K    16K CPU4   4 123:26 100.00% idle:
> > cpu4
> >>    16 root        1 171 ki31     0K    16K CPU2   2 123:15 100.00% idle:
> > cpu2
> >>    17 root        1 171 ki31     0K    16K CPU1   1 123:15 100.00% idle:
> > cpu1
> >>    37 root        1 -68    -     0K    16K CPU7   7   9:09 100.00%
irq256:
> > bce0
> >>    13 root        1 171 ki31     0K    16K CPU5   5 123:49 99.07% idle:
cpu5
> >>    40 root        1 -68    -     0K    16K WAIT   0   4:40 51.17% irq257:
> > bce1
> >>    18 root        1 171 ki31     0K    16K RUN    0 117:48 49.37% idle:
cpu0
> >>    11 root        1 171 ki31     0K    16K RUN    7 115:25  0.00% idle:
cpu7

> >>    19 root        1 -32    -     0K    16K WAIT   0   0:39  0.00% swi4:
> > clock s
> >> 14367 root        1  44    0  5176K  3104K select 2   0:01  0.00% dhcpd
> >>    22 root        1 -16    -     0K    16K -      3   0:01  0.00% yarrow
> >>    25 root        1 -24    -     0K    16K WAIT   0   0:00  0.00% swi6:
> > Giant t
> >> 11658 root        1  44    0 32936K  4540K select 1   0:00  0.00% sshd
> >> 14224 root        1  44    0 32936K  4540K select 5   0:00  0.00% sshd
> >>    41 root        1 -60    -     0K    16K WAIT   0   0:00  0.00% irq1:
> > atkbd0
> >>     4 root        1  -8    -     0K    16K -      2   0:00  0.00% g_down
> >>
> >> The bce0 interface interrupt (irq256) gets stressed out which already
> >> have 100% of CPU7 while CPU0 is around 51.17%. Any more
> >> recommendations? Is there anything we can do about optimization with
> >> MSI?
> >
> > Well, on 7.x you can try turning net.isr.direct off (sysctl).  However, it
> > seems you are hammering your bce0 interface.  You might want to try using
> > polling on bce0 and seeing if it keeps up with the traffic better.
> >
> > --
> > John Baldwin
> >
>
> With net.isr.direct=0, my IBM system lessens CPU utilization per
> interface (bce0 and bce1) but swi1:net increase its utilization.
> Can you explained what's happening here? What does net.isr.direct do
> with the decrease of CPU utilization on its interface? I really wanted
> to know what happened internally during the packets being processed
> and received by the interfaces then to the device interrupt up to the
> software interrupt level because I am confused when enabling/disabling
> net.isr.direct in sysctl. Is there a tool that can we used to trace
> this process just to be able to know which part of the kernel internal
> is doing the bottleneck especially when net.isr.direct=1? By the way
> with device polling enabled, the system experienced packet errors and
> the interface throughput is worst, so I avoid using it though.
>
>    PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>
>    16 root        1 171 ki31     0K    16K CPU10  a  86:06 89.06% idle:
cpu10
>    27 root        1 -44    -     0K    16K CPU1   1  34:37 82.67% swi1: net
>    52 root        1 -68    -     0K    16K WAIT   b  51:59 59.77% irq32:
bce1
>    15 root        1 171 ki31     0K    16K RUN    b  69:28 43.16% idle:
cpu11
>    25 root        1 171 ki31     0K    16K RUN    1 115:35 24.27% idle: cpu1
>    51 root        1 -68    -     0K    16K CPU10  a  35:21 13.48% irq31:
bce0

With net.isr.direct=1, the ithread tries to pass the received packets up to
IP/UDP/TCP/socket directly.  With net.isr.direct=0, the ithread places
received packets on a queue and sends a signal to 'sw1: net'.  The swi thread
wakes up, pulls the packets off of the queue and sends them to
IP/UDP/TCP/socket.

--
John Baldwin
_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by John Baldwin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Monday 17 November 2008 06:36:40 am Archimedes Gaviola wrote:
> On Mon, Nov 17, 2008 at 7:11 PM, Archimedes Gaviola
> <archimedes.gaviola@...> wrote:
> > On Fri, Nov 14, 2008 at 12:28 AM, John Baldwin <jhb@...> wrote:
> >> On Thursday 13 November 2008 06:55:01 am Archimedes Gaviola wrote:
> >>> On Wed, Nov 12, 2008 at 1:16 AM, John Baldwin <jhb@...> wrote:
> >>> > On Monday 10 November 2008 11:32:55 pm Archimedes Gaviola wrote:
> >>> >> On Tue, Nov 11, 2008 at 6:33 AM, John Baldwin <jhb@...>
wrote:
> >>> >> > On Monday 10 November 2008 03:33:23 am Archimedes Gaviola wrote:
> >>> >> >> To Whom It May Concerned:
> >>> >> >>
> >>> >> >> Can someone explain or share about ULE scheduler (latest version 2
if
> >>> >> >> I'm not mistaken) dealing with CPU affinity? Is there any existing
> >>> >> >> benchmarks on this with FreeBSD? Because I am currently using 4BSD
> >>> >> >> scheduler and as what I have observed especially on processing
high
> >>> >> >> network load traffic on multiple CPU cores, only one CPU were
being
> >>> >> >> stressed with network interrupt while the rests are mostly in idle
> >>> >> >> state. This is an AMD-64 (4x) dual-core IBM system with GigE
Broadcom
> >>> >> >> network interface cards (bce0 and bce1). Below is the snapshot of
the
> >>> >> >> case.
> >>> >> >
> >>> >> > Interrupts are routed to a single CPU.  Since bce0 and bce1 are
both on
> >>> > the
> >>> >> > same interrupt (irq 23), the CPU that interrupt is routed to is
going
> >> to
> >>> > end
> >>> >> > up handling all the interrupts for bce0 and bce1.  This not
something

> >> ULE
> >>> > or
> >>> >> > 4BSD have any control over.
> >>> >> >
> >>> >> > --
> >>> >> > John Baldwin
> >>> >> >
> >>> >>
> >>> >> Hi John,
> >>> >>
> >>> >> I'm sorry for the wrong snapshot. Here's the right one with my
concern.
> >>> >>
> >>> >>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
COMMAND
> >>> >>    17 root        1 171   52     0K    16K CPU0   0  54:28 95.17%
idle:
> >> cpu0
> >>> >>    15 root        1 171   52     0K    16K CPU2   2  55:55 93.65%
idle:
> >> cpu2
> >>> >>    14 root        1 171   52     0K    16K CPU3   3  58:53 93.55%
idle:
> >> cpu3
> >>> >>    13 root        1 171   52     0K    16K RUN    4  59:14 82.47%
idle:
> >> cpu4
> >>> >>    12 root        1 171   52     0K    16K RUN    5  55:42 82.23%
idle:
> >> cpu5
> >>> >>    16 root        1 171   52     0K    16K CPU1   1  58:13 77.78%
idle:
> >> cpu1
> >>> >>    11 root        1 171   52     0K    16K CPU6   6  54:08 76.17%
idle:
> >> cpu6
> >>> >>    36 root        1 -68 -187     0K    16K WAIT   7   8:50 65.53%
> >>> >> irq23: bce0 bce1
> >>> >>    10 root        1 171   52     0K    16K CPU7   7  48:19 29.79%
idle:
> >> cpu7
> >>> >>    43 root        1 171   52     0K    16K pgzero 2   0:35  1.51%
> >> pagezero
> >>> >>  1372 root       10  20    0 16716K  5764K kserel 6  58:42  0.00% kmd
> >>> >>  4488 root        1  96    0 30676K  4236K select 2   1:51  0.00%
sshd
> >>> >>    18 root        1 -32 -151     0K    16K WAIT   0   1:14  0.00%
swi4:
> >>> > clock s
> >>> >>    20 root        1 -44 -163     0K    16K WAIT   0   0:30  0.00%
swi1:
> >> net
> >>> >>   218 root        1  96    0  3852K  1376K select 0   0:23  0.00%
syslogd
> >>> >>  2171 root        1  96    0 30676K  4224K select 6   0:19  0.00%
sshd
> >>> >>
> >>> >> Actually I was doing a network performance testing on this system
with
> >>> >> FreeBSD-6.2 RELEASE using its default scheduler 4BSD and then I used
a
> >>> >> tool to generate big amount of traffic around 600Mbps-700Mbps
> >>> >> traversing the FreeBSD system in bi-direction, meaning both network
> >>> >> interfaces are receiving traffic. What happened was, the CPU (cpu7)
> >>> >> that handles the (irq 23) on both interfaces consumed big amount of
> >>> >> CPU utilization around 65.53% in which it affects other running
> >>> >> applications and services like sshd and httpd. It's no longer
> >>> >> accessible when traffic is bombarded. With the current situation of
my
> >>> >> FreeBSD system with only one CPU being stressed, I was thinking of
> >>> >> moving to FreeBSD-7.0 RELEASE with the ULE scheduler because I
thought

> >>> >> my concern has something to do with the distributions of load on
> >>> >> multiple CPU cores handled by the scheduler especially at the network
> >>> >> level, processing network load. So, if it is more of interrupt
> >>> >> handling and not on the scheduler, is there a way we can optimize it?
> >>> >> Because if it still routed only to one CPU then for me it's still
> >>> >> inefficient. Who handles interrupt scheduling for bounding CPU in
> >>> >> order to prevent shared IRQ? Is there any improvements with
> >>> >> FreeBSD-7.0 with regards to interrupt handling?
> >>> >
> >>> > It depends.  In all likelihood, the interrupts from bce0 and bce1 are
both
> >>> > hardwired to the same interrupt pin and so they will always share the
same
> >>> > ithread when using the legacy INTx interrupts.  However, bce(4) parts
do
> >>> > support MSI, and if you try a newer OS snap (6.3 or later) these
devices
> >>> > should use MSI in which case each NIC would be assigned to a separate
CPU.

> >> I
> >>> > would suggest trying 7.0 or a 7.1 release candidate and see if it does
> >>> > better.
> >>> >
> >>> > --
> >>> > John Baldwin
> >>> >
> >>>
> >>> Hi John,
> >>>
> >>> I try 7.0 release and each network interface were already allocated
> >>> separately on different CPU. Here, MSI is already working.
> >>>
> >>>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
COMMAND

> >>>    12 root        1 171 ki31     0K    16K CPU6   6 123:55 100.00% idle:
> >> cpu6
> >>>    15 root        1 171 ki31     0K    16K CPU3   3 123:54 100.00% idle:
> >> cpu3
> >>>    14 root        1 171 ki31     0K    16K CPU4   4 123:26 100.00% idle:
> >> cpu4
> >>>    16 root        1 171 ki31     0K    16K CPU2   2 123:15 100.00% idle:
> >> cpu2
> >>>    17 root        1 171 ki31     0K    16K CPU1   1 123:15 100.00% idle:
> >> cpu1
> >>>    37 root        1 -68    -     0K    16K CPU7   7   9:09 100.00%
irq256:
> >> bce0
> >>>    13 root        1 171 ki31     0K    16K CPU5   5 123:49 99.07% idle:
cpu5
> >>>    40 root        1 -68    -     0K    16K WAIT   0   4:40 51.17%
irq257:
> >> bce1
> >>>    18 root        1 171 ki31     0K    16K RUN    0 117:48 49.37% idle:
cpu0
> >>>    11 root        1 171 ki31     0K    16K RUN    7 115:25  0.00% idle:
cpu7

> >>>    19 root        1 -32    -     0K    16K WAIT   0   0:39  0.00% swi4:
> >> clock s
> >>> 14367 root        1  44    0  5176K  3104K select 2   0:01  0.00% dhcpd
> >>>    22 root        1 -16    -     0K    16K -      3   0:01  0.00% yarrow
> >>>    25 root        1 -24    -     0K    16K WAIT   0   0:00  0.00% swi6:
> >> Giant t
> >>> 11658 root        1  44    0 32936K  4540K select 1   0:00  0.00% sshd
> >>> 14224 root        1  44    0 32936K  4540K select 5   0:00  0.00% sshd
> >>>    41 root        1 -60    -     0K    16K WAIT   0   0:00  0.00% irq1:
> >> atkbd0
> >>>     4 root        1  -8    -     0K    16K -      2   0:00  0.00% g_down
> >>>
> >>> The bce0 interface interrupt (irq256) gets stressed out which already
> >>> have 100% of CPU7 while CPU0 is around 51.17%. Any more
> >>> recommendations? Is there anything we can do about optimization with
> >>> MSI?
> >>
> >> Well, on 7.x you can try turning net.isr.direct off (sysctl).  However,
it

> >> seems you are hammering your bce0 interface.  You might want to try using
> >> polling on bce0 and seeing if it keeps up with the traffic better.
> >>
> >> --
> >> John Baldwin
> >>
> >
> > With net.isr.direct=0, my IBM system lessens CPU utilization per
> > interface (bce0 and bce1) but swi1:net increase its utilization.
> > Can you explained what's happening here? What does net.isr.direct do
> > with the decrease of CPU utilization on its interface? I really wanted
> > to know what happened internally during the packets being processed
> > and received by the interfaces then to the device interrupt up to the
> > software interrupt level because I am confused when enabling/disabling
> > net.isr.direct in sysctl. Is there a tool that can we used to trace
> > this process just to be able to know which part of the kernel internal
> > is doing the bottleneck especially when net.isr.direct=1? By the way
> > with device polling enabled, the system experienced packet errors and
> > the interface throughput is worst, so I avoid using it though.
> >
> >   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
> >
> >   16 root        1 171 ki31     0K    16K CPU10  a  86:06 89.06% idle:
cpu10
> >   27 root        1 -44    -     0K    16K CPU1   1  34:37 82.67% swi1: net
> >   52 root        1 -68    -     0K    16K WAIT   b  51:59 59.77% irq32:
bce1
> >   15 root        1 171 ki31     0K    16K RUN    b  69:28 43.16% idle:
cpu11
> >   25 root        1 171 ki31     0K    16K RUN    1 115:35 24.27% idle:
cpu1
> >   51 root        1 -68    -     0K    16K CPU10  a  35:21 13.48% irq31:
bce0
> >
> >
> > Regards,
> > Archimedes
> >
>
> One more thing, I observed that when net.isr.direct=1, bce0 is using
> irq256 and bce1 is using irq257 while net.isr.direct=0, bce0 is now
> using irq31 and bce1 is using irq32. What makes it different?

That is not from net.isr.direcct.  irq256/257 is when the bce devices are
using MSI.  irq31/32 is when the bce devices are using INTx.

--
John Baldwin
_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by Archimedes Gaviola-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> In both cases the code path a packet must go through is very similar: it
> has to be received, then processed through firewalls and network stack
> code, then delivered to application(s), so it's a serial process. There
> are things that could be better parallelized in the stack and people are
> working on them, but they will not be finished any time soon.


Ah okay so the project is moving towards network stack parallelism.
What is the benefit of parallelized network stack in comparison to the
current serialized network stack? Is there any known issues with
serialized network stack dealing with multiple CPUs? If it has, in
what aspect, components or subsystem of the operating system? With
network stack parallelism, what are the necessary changes of the
operating system? How should be the network processing be optimized
with parallelized network stack? I have gone through a technical paper
in the Internet about evaluation on network stack parallelism
strategies for modern operating system
http://www.cs.rice.edu/CS/Architecture/docs/willmann-usenix06.pdf
which described about approaches in implementing parallelized network
stack in which also described FreeBSD were used as the prototype of
the different approaches, from here I want to know what approach does
FreeBSD is implementing, is it message-based parallelism or
connection-based parallelism?

Thanks,
Archimedes
_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."

Re: CPU affinity with ULE scheduler

by Archimedes Gaviola-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Is there a tool that can we used to trace
> this process just to be able to know which part of the kernel internal
> is doing the bottleneck especially when net.isr.direct=1? By the way
> with device polling enabled, the system experienced packet errors and
> the interface throughput is worst, so I avoid using it though.
>

Since I was really looking for a tool to be able to know how packets
are being processed from the interface and up to the network stack and
applications, but I haven't found any tool for my concern. What I have
found is the LOCK_PROFILING tool. Although I'm sure that this really
not answer my concern but I just tried because I need to know
something about locks which FreeBSD is using with. Some people
consider that there's a lot of factors and variables with regards to
network performance in FreeBSD, so I got a try on this tool. I also
get valuable info from this link
http://markmail.org/message/3uqxi4pipvvoy6jx#query:lock%20profiling%20freebsd+page:1+mid:ymqgrxqf4min54zd+state:results.
Instead of the IBM machine with Broadcom NICs, I use another machine
with 4 x Quad-Core AMD64 with still Broadcom NICs on FreeBSD-7.1
BETA2. I took data results with traffic and without traffic. With
traffic, I use both TCP and UDP protocols in bombarding traffic. UDP
for upload and TCP for download in a back-to-back setup.

What I have found is that there's a high wait_total on some of the
following when there's traffic:

max      total              wait_total   count         avg  wait_avg
cnt_hold      cnt_lock     name

517       24761291      6165864     4460995     5     1
552124      1558183 net/route.c:293 (sleep mutex:radix node head)
277       1427082       140797       354220       4     0
14476        20674 amd64/amd64/io_apic.c:212 (spin mutex:icu)
33         25275           20744        5401          4      3
    0               5400 amd64/amd64/mp_machdep.c:974 (spin
mutex:sched lock 4)
17283   3346679       104214       107262       31     0
4545         4072 kern/kern_sysctl.c:1334 (sleep mutex:Giant)
257       28599           386           1302           21     0
   35             30 vm/vm_fault.c:667 (sleep mutex:vm object)
282       2821743        2673         977635       2     0
926           552 net/if_ethersubr.c:405 (sleep mutex:bce1)
22        743637          157239      256274       2     0
5304         48357 dev/random/randomdev_soft.c:308 (spin mutex:entropy
harvest mutex)
301      16301894       881827     1255534      12     0
241491       45973 dev/bce/if_bce.c:5016 (sleep mutex:bce0)
273      1228787         55458       103863       11     0
3733          4736 kern/subr_sleepqueue.c:232 (spin mutex:sleepq
chain)
624      4682305         1339783    1251253     3     1
32664        254211 dev/bce/if_bce.c:4320 (sleep mutex:bce1)

With lock profiling, how do we know that a certain kernel structure or
function is causing a contention? I only have little knowledge about
mutex, can someone elaborate on these especially sleep and spin mutex?

Unfortunately due to the log result is too big for the mailing list
then I only attached the complete log in compressed format.

Thanks,
Archimedes

_______________________________________________
freebsd-smp@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-smp
To unsubscribe, send any mail to "freebsd-smp-unsubscribe@..."