Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly caused by netem)

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Joao Correia :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I am now running 2.6.31-rc2 for a couple of hours, no freeze.

Let me know what/if i can help with tracking down the original source
of the problem.

Thank you very much for your time,
Joao Correia

On Tue, Jul 7, 2009 at 7:50 AM, Jarek Poplawski<jarkao2@...> wrote:

> On Mon, Jul 06, 2009 at 07:26:43PM +0200, Andres Freund wrote:
>> On Monday 06 July 2009 19:23:18 Joao Correia wrote:
>> > Hello
>> >
>> > Since i already had the kernel compiled and ready to boot when i read
>> > this, i gave it a go anyway :-).
>> >
>> > I can reproduce the freeze with those 4 patches applied, so i can
>> > confirm that its, at least, related to, or exposed by, those patches.
>> > There must be something else too, or its just too much fuzziness, but
>> > the freeze takes a bit more time (approximately five minutes, give or
>> > take) compared to the instant freeze before, but its there with the
>> > patches, and without them, no freeze.
>> >
>> > I assume there isnt a "safe" way to get them out of current .31-rc's,
>> > right?
>> `echo 0 > /proc/sys/kernel/timer_migration` should mitigate the problem.
>
> I guess it should fix it entirely. Btw., here is a patch disabling the
> timers' part, so to make it hrtimers only. Could you try?
>
> Thanks,
> Jarek P.
> ---
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 0b36b9e..011429c 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -634,7 +634,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
>
>        cpu = smp_processor_id();
>
> -#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
> +#if 0
>        if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
>                int preferred_cpu = get_nohz_load_balancer();
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Andres Freund :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 07 July 2009 12:40:16 Joao Correia wrote:
> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>
> Let me know what/if i can help with tracking down the original source
> of the problem.
You dont see the problem anymore with the `echo 0 >
/proc/sys/kernel/timer_migration`  change (or equivalently with the patch from
Jarek) or has the problem vanished completely?

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Parent Message unknown Fwd: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Joao Correia :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

CC'ing the list and other listeners, that were left out in the previous email.

Joao Correia


---------- Forwarded message ----------
From: Joao Correia <joaomiguelcorreia@...>
Date: Tue, Jul 7, 2009 at 12:03 PM
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 (
possibly?caused by netem)
To: Andres Freund <andres@...>


I dont see the problem with the patch from Jarek. (sorry for not being
clear about this).

With vanilla 31-rc2 it is still there.

Joao Correia

On Tue, Jul 7, 2009 at 11:47 AM, Andres Freund<andres@...> wrote:

> On Tuesday 07 July 2009 12:40:16 Joao Correia wrote:
>> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>>
>> Let me know what/if i can help with tracking down the original source
>> of the problem.
> You dont see the problem anymore with the `echo 0 >
> /proc/sys/kernel/timer_migration`  change (or equivalently with the patch from
> Jarek) or has the problem vanished completely?
>
> Andres
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Jarek Poplawski-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>
> Let me know what/if i can help with tracking down the original source
> of the problem.

OK, so we know it's only about timers. Here is another tiny patch
(the previous one should be removed), which could tell (with oops) if
there's something while migrating. Anyway, the bug should be back :-(

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Jarek Poplawski-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sorry, here is this tiny patch!

On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>
> Let me know what/if i can help with tracking down the original source
> of the problem.

OK, so we know it's only about timers. Here is another tiny patch
(the previous one should be removed), which could tell (with oops) if
there's something while migrating. Anyway, the bug should be back :-(

Thanks,
Jarek P.
---

 kernel/timer.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 0b36b9e..61ba855 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -658,6 +658,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
  spin_unlock(&base->lock);
  base = new_base;
  spin_lock(&base->lock);
+ BUG_ON(tbase_get_base(timer->base));
  timer_set_base(timer, base);
  }
  }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Andres Freund :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 07 July 2009 15:18:03 Jarek Poplawski wrote:
> On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> > I am now running 2.6.31-rc2 for a couple of hours, no freeze.
> > Let me know what/if i can help with tracking down the original source
> > of the problem.
> OK, so we know it's only about timers. Here is another tiny patch
> (the previous one should be removed), which could tell (with oops) if
> there's something while migrating. Anyway, the bug should be back :-(
How do we know this? It still could be a race uncovered by timer migration,
right?

Andres

PS: You forgot the patch ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Jarek Poplawski-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 07, 2009 at 03:22:06PM +0200, Andres Freund wrote:

> On Tuesday 07 July 2009 15:18:03 Jarek Poplawski wrote:
> > On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> > > I am now running 2.6.31-rc2 for a couple of hours, no freeze.
> > > Let me know what/if i can help with tracking down the original source
> > > of the problem.
> > OK, so we know it's only about timers. Here is another tiny patch
> > (the previous one should be removed), which could tell (with oops) if
> > there's something while migrating. Anyway, the bug should be back :-(
> How do we know this? It still could be a race uncovered by timer migration,
> right?

Right. But (rather) not by or in hrtimers.

> PS: You forgot the patch ;-)

Yes, I hope you got it already ;-)

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Andres Freund :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 07 July 2009 15:29:37 Jarek Poplawski wrote:

> On Tue, Jul 07, 2009 at 03:22:06PM +0200, Andres Freund wrote:
> > On Tuesday 07 July 2009 15:18:03 Jarek Poplawski wrote:
> > > On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> > > > I am now running 2.6.31-rc2 for a couple of hours, no freeze.
> > > > Let me know what/if i can help with tracking down the original source
> > > > of the problem.
> > >
> > > OK, so we know it's only about timers. Here is another tiny patch
> > > (the previous one should be removed), which could tell (with oops) if
> > > there's something while migrating. Anyway, the bug should be back :-(
> > PS: You forgot the patch ;-)
> Yes, I hope you got it already ;-)
Yes. Can't reboot that machine right now, will test later.

Testing wether its triggerable inside a vm might be interesting...

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Jarek Poplawski-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 07, 2009 at 03:34:07PM +0200, Andres Freund wrote:
...
> Testing wether its triggerable inside a vm might be interesting...

Probably similarly to testing without this patch or even less. Maybe
I should've warned you but this type of bugs in -rc with possible
memory or stack overwrites might be fatal for your data (at least).

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Andres Freund :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tuesday 07 July 2009 15:57:42 Jarek Poplawski wrote:
> On Tue, Jul 07, 2009 at 03:34:07PM +0200, Andres Freund wrote:
> ...
> > Testing wether its triggerable inside a vm might be interesting...
> Probably similarly to testing without this patch or even less. Maybe
> I should've warned you but this type of bugs in -rc with possible
> memory or stack overwrites might be fatal for your data (at least).
Fortunately all the data on that machine should either be replaceable or
regularly backuped.

Will test later today if that patch bugs.

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Jarek Poplawski-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Jul 07, 2009 at 06:11:27PM +0200, Andres Freund wrote:

> On Tuesday 07 July 2009 15:57:42 Jarek Poplawski wrote:
> > On Tue, Jul 07, 2009 at 03:34:07PM +0200, Andres Freund wrote:
> > ...
> > > Testing wether its triggerable inside a vm might be interesting...
> > Probably similarly to testing without this patch or even less. Maybe
> > I should've warned you but this type of bugs in -rc with possible
> > memory or stack overwrites might be fatal for your data (at least).
> Fortunately all the data on that machine should either be replaceable or
> regularly backuped.
>
> Will test later today if that patch bugs.

If you didn't start yet, it would be nice to use this, btw:

CONFIG_HOTPLUG_CPU = N
CONFIG_DEBUG_OBJECTS = Y
CONFIG_DEBUG_OBJECTS_TIMERS = Y

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Andres Freund :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wednesday 08 July 2009 10:08:52 Jarek Poplawski wrote:

> On Tue, Jul 07, 2009 at 06:11:27PM +0200, Andres Freund wrote:
> > On Tuesday 07 July 2009 15:57:42 Jarek Poplawski wrote:
> > > On Tue, Jul 07, 2009 at 03:34:07PM +0200, Andres Freund wrote:
> > > ...
> > >
> > > > Testing wether its triggerable inside a vm might be interesting...
> > >
> > > Probably similarly to testing without this patch or even less. Maybe
> > > I should've warned you but this type of bugs in -rc with possible
> > > memory or stack overwrites might be fatal for your data (at least).
> >
> > Fortunately all the data on that machine should either be replaceable or
> > regularly backuped.
> >
> > Will test later today if that patch bugs.
>
> If you didn't start yet, it would be nice to use this, btw:
>
> CONFIG_HOTPLUG_CPU = N
> CONFIG_DEBUG_OBJECTS = Y
> CONFIG_DEBUG_OBJECTS_TIMERS = Y
So I should test with a single cpu? Or is there a config where HOTPLUG_CPU does
not imply !SMP?

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Jarek Poplawski-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 08, 2009 at 10:29:34AM +0200, Andres Freund wrote:

> On Wednesday 08 July 2009 10:08:52 Jarek Poplawski wrote:
> > On Tue, Jul 07, 2009 at 06:11:27PM +0200, Andres Freund wrote:
> > > On Tuesday 07 July 2009 15:57:42 Jarek Poplawski wrote:
> > > > On Tue, Jul 07, 2009 at 03:34:07PM +0200, Andres Freund wrote:
> > > > ...
> > > >
> > > > > Testing wether its triggerable inside a vm might be interesting...
> > > >
> > > > Probably similarly to testing without this patch or even less. Maybe
> > > > I should've warned you but this type of bugs in -rc with possible
> > > > memory or stack overwrites might be fatal for your data (at least).
> > >
> > > Fortunately all the data on that machine should either be replaceable or
> > > regularly backuped.
> > >
> > > Will test later today if that patch bugs.
> >
> > If you didn't start yet, it would be nice to use this, btw:
> >
> > CONFIG_HOTPLUG_CPU = N
> > CONFIG_DEBUG_OBJECTS = Y
> > CONFIG_DEBUG_OBJECTS_TIMERS = Y
> So I should test with a single cpu? Or is there a config where HOTPLUG_CPU does
> not imply !SMP?

No, my single cpu should be enough ;-) There is something wrong I guess.
I can see in my menuconfig:

SMP [=y]
...
HOTPLUG [=n]
...
HOTPUG_CPU [=y]
...
Depends on SMP && HOTPLUG

So, let it be HOTPLUG_CPU = Y for now...

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Joao Correia :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello again

On Tue, Jul 7, 2009 at 11:47 AM, Andres Freund<andres@...> wrote:

> On Tuesday 07 July 2009 12:40:16 Joao Correia wrote:
>> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>>
>> Let me know what/if i can help with tracking down the original source
>> of the problem.
> You dont see the problem anymore with the `echo 0 >
> /proc/sys/kernel/timer_migration`  change (or equivalently with the patch from
> Jarek) or has the problem vanished completely?
>
> Andres
>
> On Tuesday 07 July 2009 13:03:50 Joao Correia wrote:
>> I dont see the problem with the patch from Jarek


I have to correct this information.
I had inserted  `echo 0 >> /proc/sys/kernel/timer_migration` into
rc.local, and i left it there when i applied your first patch.

Im talking about this patch:

diff --git a/kernel/timer.c b/kernel/timer.c
index 0b36b9e..011429c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -634,7 +634,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,

       cpu = smp_processor_id();

-#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+#if 0

After removing the line from rc.local, and leaving only the patch, the
freeze still happens. The patch -does not- prevent the freeze. It was
my mistake saying it does, i totally forgot i had added that line to
rc.local.

So again, the only thing that stops that freeze is  `echo 0 >>
/proc/sys/kernel/timer_migration`. Apologies for pointing you in the
wrong direction.

I also tried the other patch provided:

 kernel/timer.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 0b36b9e..61ba855 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -658,6 +658,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
                       spin_unlock(&base->lock);
                       base = new_base;
                       spin_lock(&base->lock);
+                       BUG_ON(tbase_get_base(timer->base));
                       timer_set_base(timer, base);
               }
       }

but the OPS never triggers, either with your first patch or with the
echo 0 > proc[...]

I was under the impression that disabling the entry in /proc or
applying the first patch would provide the same result, but alas, it
does not.

Joao Correia

[PS Im providing the patches in this email to contextualize this so
that people dont get lost]
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Jarek Poplawski-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 08, 2009 at 10:44:47PM +0100, Joao Correia wrote:
> Hello again
Hello!

...
> So again, the only thing that stops that freeze is  `echo 0 >>
> /proc/sys/kernel/timer_migration`. Apologies for pointing you in the
> wrong direction.

No problem: the direction is almost right, we only need one U-turn ;-)
In case you're not bored or too bored, one little patch to check the
other side (after reverting the previous patch).

Thanks,
Jarek P.
---

 kernel/hrtimer.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 9002958..23387e4 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -203,7 +203,7 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
  int cpu, preferred_cpu = -1;
 
  cpu = smp_processor_id();
-#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+#if 0
  if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
  preferred_cpu = get_nohz_load_balancer();
  if (preferred_cpu >= 0)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Andres Freund :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wednesday 08 July 2009 10:08:52 Jarek Poplawski wrote:

> On Tue, Jul 07, 2009 at 06:11:27PM +0200, Andres Freund wrote:
> > On Tuesday 07 July 2009 15:57:42 Jarek Poplawski wrote:
> > > On Tue, Jul 07, 2009 at 03:34:07PM +0200, Andres Freund wrote:
> > > ...
> > >
> > > > Testing wether its triggerable inside a vm might be interesting...
> > >
> > > Probably similarly to testing without this patch or even less. Maybe
> > > I should've warned you but this type of bugs in -rc with possible
> > > memory or stack overwrites might be fatal for your data (at least).
> >
> > Fortunately all the data on that machine should either be replaceable or
> > regularly backuped.
> >
> > Will test later today if that patch bugs.
>
> If you didn't start yet, it would be nice to use this, btw:
> CONFIG_HOTPLUG_CPU = N
> CONFIG_DEBUG_OBJECTS = Y
> CONFIG_DEBUG_OBJECTS_TIMERS = Y
Unfortunately this just yields the same backtraces during softlockup and not
earlier.
I did not test without lockdep yet, but that should not have stopped the BUG
from appearing, right?


Andres

[  207.233011] BUG: soft lockup - CPU#0 stuck for 61s! [openvpn:4232]
[  207.233011] Modules linked in: sch_netem sbs sbshc snd_hda_codec_conexant pcmcia snd_hda_intel snd_hda_codec iwlagn thinkpad_acpi yenta_socket rsrc_nonstatic pcmcia_core btusb snd_hwdep ehci_hcd uhci_hcd
[  207.233011] irq event stamp: 158057
[  207.233011] hardirqs last  enabled at (158056): [<ffffffff81036a10>] restore_args+0x0/0x30
[  207.233011] hardirqs last disabled at (158057): [<ffffffff81035d3a>] save_args+0x6a/0x70
[  207.233011] softirqs last  enabled at (27750): [<ffffffff8155837d>] lock_sock_nested+0x8d/0x130
[  207.233011] softirqs last disabled at (27756): [<ffffffff81568278>] dev_queue_xmit+0x58/0x4b0
[  207.233011] CPU 0:
[  207.233011] Modules linked in: sch_netem sbs sbshc snd_hda_codec_conexant pcmcia snd_hda_intel snd_hda_codec iwlagn thinkpad_acpi yenta_socket rsrc_nonstatic pcmcia_core btusb snd_hwdep ehci_hcd uhci_hcd
[  207.233011] Pid: 4232, comm: openvpn Not tainted 2.6.31-rc2-andres-00151-gf3060b0-dirty #83 208252G
[  207.233011] RIP: 0010:[<ffffffff812a9eb1>]  [<ffffffff812a9eb1>] delay_tsc+0x51/0x80
[  207.233011] RSP: 0018:ffff88012984f938  EFLAGS: 00000202
[  207.233011] RAX: 000000007086c4e9 RBX: ffff88012984f958 RCX: 000000007086c4e9
[  207.233011] RDX: 000000007086c4e9 RSI: 0000000000006238 RDI: 0000000000000001
[  207.233011] RBP: ffffffff81036b6e R08: ffffffff82189460 R09: 0000000000000002
[  207.233011] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000003fda
[  207.233011] R13: ffff88002ee00000 R14: ffff88012984e000 R15: 0000000000000000
[  207.233011] FS:  00007f518d51a6f0(0000) GS:ffff88002ee00000(0000) knlGS:0000000000000000
[  207.233011] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  207.233011] CR2: 00007f46fb78600c CR3: 000000012bc8f000 CR4: 00000000000026f0
[  207.233011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  207.233011] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  207.233011] Call Trace:
[  207.233011]  [<ffffffff812a9eaa>] ? delay_tsc+0x4a/0x80
[  207.233011]  [<ffffffff812a9d9a>] ? __delay+0xa/0x10
[  207.233011]  [<ffffffff812ae578>] ? _raw_spin_lock+0xd8/0x150
[  207.233011]  [<ffffffff816f0431>] ? _spin_lock+0x51/0x70
[  207.233011]  [<ffffffff81568306>] ? dev_queue_xmit+0xe6/0x4b0
[  207.233011]  [<ffffffff81568306>] ? dev_queue_xmit+0xe6/0x4b0
[  207.233011]  [<ffffffff81568273>] ? dev_queue_xmit+0x53/0x4b0
[  207.233011]  [<ffffffff8159a67c>] ? ip_finish_output+0x13c/0x320
[  207.233011]  [<ffffffff8159a8db>] ? ip_output+0x7b/0xd0
[  207.233011]  [<ffffffff81598b98>] ? ip_generic_getfrag+0x88/0xa0
[  207.233011]  [<ffffffff815996c0>] ? ip_local_out+0x20/0x30
[  207.233011]  [<ffffffff81599957>] ? ip_push_pending_frames+0x287/0x410
[  207.233011]  [<ffffffff815bae18>] ? udp_push_pending_frames+0x168/0x3d0
[  207.233011]  [<ffffffff815bcd07>] ? udp_sendmsg+0x457/0x760
[  207.233011]  [<ffffffff815c4144>] ? inet_sendmsg+0x24/0x60
[  207.233011]  [<ffffffff81555556>] ? sock_sendmsg+0x126/0x140
[  207.233011]  [<ffffffff81097f60>] ? autoremove_wake_function+0x0/0x40
[  207.233011]  [<ffffffff810ab6e7>] ? mark_held_locks+0x67/0x90
[  207.233011]  [<ffffffff816f01fb>] ? _spin_unlock_irqrestore+0x3b/0x70
[  207.233011]  [<ffffffff810ab9fd>] ? trace_hardirqs_on_caller+0x14d/0x190
[  207.233011]  [<ffffffff81556490>] ? sys_sendto+0xf0/0x130
[  207.233011]  [<ffffffff810aba4d>] ? trace_hardirqs_on+0xd/0x10
[  207.233011]  [<ffffffff810a21f7>] ? getnstimeofday+0x57/0xe0
[  207.233011]  [<ffffffff8109c1f1>] ? ktime_get_ts+0x51/0x70
[  207.233011]  [<ffffffff81035ec2>] ? system_call_fastpath+0x16/0x1bx

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Joao Correia :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 8, 2009 at 11:07 PM, Jarek Poplawski<jarkao2@...> wrote:

> On Wed, Jul 08, 2009 at 10:44:47PM +0100, Joao Correia wrote:
>> Hello again
> Hello!
>
> ...
>> So again, the only thing that stops that freeze is  `echo 0 >>
>> /proc/sys/kernel/timer_migration`. Apologies for pointing you in the
>> wrong direction.
>
> No problem: the direction is almost right, we only need one U-turn ;-)
> In case you're not bored or too bored, one little patch to check the
> other side (after reverting the previous patch).
>
> Thanks,
> Jarek P.
> ---
>
>  kernel/hrtimer.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 9002958..23387e4 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -203,7 +203,7 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
>        int cpu, preferred_cpu = -1;
>
>        cpu = smp_processor_id();
> -#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
> +#if 0
>        if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
>                preferred_cpu = get_nohz_load_balancer();
>                if (preferred_cpu >= 0)
>

(this time i triple-checked :-) )

So, with only this last patch applied, no freeze. No need to disable
anything through /proc.

Where should i put the BUG_ON?

Joao Correia
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Jarek Poplawski-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Jul 08, 2009 at 11:27:30PM +0100, Joao Correia wrote:

> On Wed, Jul 8, 2009 at 11:07 PM, Jarek Poplawski<jarkao2@...> wrote:
> > On Wed, Jul 08, 2009 at 10:44:47PM +0100, Joao Correia wrote:
> >> Hello again
> > Hello!
> >
> > ...
> >> So again, the only thing that stops that freeze is  `echo 0 >>
> >> /proc/sys/kernel/timer_migration`. Apologies for pointing you in the
> >> wrong direction.
> >
> > No problem: the direction is almost right, we only need one U-turn ;-)
> > In case you're not bored or too bored, one little patch to check the
> > other side (after reverting the previous patch).
> >
> > Thanks,
> > Jarek P.
> > ---
> >
> >  kernel/hrtimer.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> > index 9002958..23387e4 100644
> > --- a/kernel/hrtimer.c
> > +++ b/kernel/hrtimer.c
> > @@ -203,7 +203,7 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
> >        int cpu, preferred_cpu = -1;
> >
> >        cpu = smp_processor_id();
> > -#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
> > +#if 0
> >        if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
> >                preferred_cpu = get_nohz_load_balancer();
> >                if (preferred_cpu >= 0)
> >
>
> (this time i triple-checked :-) )
>
> So, with only this last patch applied, no freeze. No need to disable
> anything through /proc.
>
> Where should i put the BUG_ON?

Hmm... Not so fast! I've looked in timers till now; "tomorrow" I'll
"change resolution". ;-)

Thanks again,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Joao Correia :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>> (this time i triple-checked :-) )
>>
>> So, with only this last patch applied, no freeze. No need to disable
>> anything through /proc.
>>
>> Where should i put the BUG_ON?
>
> Hmm... Not so fast! I've looked in timers till now; "tomorrow" I'll
> "change resolution". ;-)
>
> Thanks again,
> Jarek P.
>

Of course :-)

Thanks for looking into this.
Joao Correia
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

by Jarek Poplawski-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Jul 09, 2009 at 12:23:17AM +0200, Andres Freund wrote:
...
> Unfortunately this just yields the same backtraces during softlockup and not
> earlier.
> I did not test without lockdep yet, but that should not have stopped the BUG
> from appearing, right?

Since it looks like hrtimers now, these changes in timers shouldn't
matter. Let's wait for new ideas.

Thanks for testing anyway,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
< Prev | 1 - 2 - 3 | Next >