|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
Kernel Panic occuring when drbd is up & (re)syncingHello,
here we have a two nodes setup that are running CentOS 5.4, Xen 3.0 (CentOS RPMs) and DRBD 8.3.2 (again CentOS RPM). Both servers are Dell PowerEdge 1950 servers with two Quad-Core Xeon processors and 32GB of memory. The network card used by DRBD is an Intel 82571EB Gigabit Ethernet card (e1000 driver). Both are connected directly with a crossover cable. DRBD is configured so that I have one resource (drbd0) on which I have configured a LVM VolumeGroup which is then sliced in two LVs. Both LVs are mapped to my Xen VM (PV) as sda and sdb disks. Recently, we've had issues where the node that is in Primary state and hence running the VM locks up and throws a kernel panic. The situation seems to indicate that this might be a problem related to DRBD and/or the network stack because if we disconnect the DRBD resource, this problem will not occur. Even worse, the problem occur very quickly after we connect the DRBD resource, either during resynchronization after being out-of-sync for a while or during normal syncing operations. No errors show up on the network interface (ifconfig, ethtool) One thing to note is that the kernel panic seems to complain about checksum functions so that might be related (see below) Here are the relevant informations # rpm -qa | grep -e xen -e drbd drbd83-8.3.2-6.el5_3 kmod-drbd83-xen-8.3.2-6.el5_3 xen-3.0.3-94.el5 kernel-xen-2.6.18-164.el5 xen-libs-3.0.3-94.el5 # cat /etc/drbd.conf global { usage-count no; } common { protocol C; syncer { rate 33M; verify-alg crc32c; al-extents 1801; } net { cram-hmac-alg sha1; max-epoch-size 8192; max-buffers 8192; } disk { on-io-error detach; no-disk-flushes; no-disk-barrier; no-md-flushes; } } resource drbd0 { device /dev/drbd0; disk /dev/sda6; flexible-meta-disk internal; on node1 { address 10.11.1.1:7788; } on node2 { address 10.11.1.2:7788; } } ### Kernel Panic ### Unable to handle kernel paging request at ffff880011e3cc64 RIP: [<ffffffff80212bad>] csum_partial+0x56/0x4bc PGD ed8067 PUD ed9067 PMD f69067 PTE 0 Oops: 0000 [1] SMP last sysfs file: /class/scsi_host/host0/proc_name CPU 0 Modules linked in: xt_physdev netconsole drbd(U) netloop netbk blktap blkbk ipt_MASQUERADE iptable_nat ip_nat bridge ipv6 xfrm_nalgo crypto_api xt_tcpudp xt_state ip_conntrack_irc xt_conntrack ip_conntrack_ftp xt_mac xt_length xt_limit xt_multiport ipt_ULOG ipt_TCPMSS ipt_TOS ipt_ttl ipt_owner ipt_REJECT ipt_ecn ipt_LOG ipt_recent ip_conntrack iptable_mangle iptable_filter ip_tables nfnetlink x_tables autofs4 dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport joydev ide_cd e1000e cdrom serial_core i5000_edac edac_mc bnx2 serio_raw pcspkr sg dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache ata_piix libata shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 12887, comm: drbd0_receiver Tainted: G 2.6.18-128.1.16.el5xen #1 RIP: e030:[<ffffffff80212bad>] [<ffffffff80212bad>] csum_partial+0x56/0x4bc RSP: e02b:ffff88000c347718 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff880010ced500 RDX: 00000000000000e7 RSI: 000000000000039c RDI: ffff880011e3cc64 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000025b85e7c R11: 0000000000000002 R12: 0000000000000028 R13: 0000000000000028 R14: ffff88001c56f7b0 R15: 0000000025b85e7c FS: 00002b391e123f60(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process drbd0_receiver (pid: 12887, threadinfo ffff88000c346000, task ffff88001c207820) Stack: 000000000000039c 00000000000005b4 ffffffff8023d496 ffff88001e7e48d8 0000001400000000 ffff8800000003c4 ffff88001c56f7b0 ffff88001e7e48d8 ffff88001e7e48ec ffff88000c3478e8 Call Trace: [<ffffffff8023d496>] skb_checksum+0x11b/0x260 [<ffffffff80411472>] skb_checksum_help+0x71/0xd0 [<ffffffff8853f33e>] :iptable_nat:ip_nat_fn+0x56/0x1c3 [<ffffffff8853f6cf>] :iptable_nat:ip_nat_local_fn+0x32/0xb7 [<ffffffff8023550c>] nf_iterate+0x41/0x7d [<ffffffff8042f004>] dst_output+0x0/0xe [<ffffffff80258b28>] nf_hook_slow+0x58/0xbc [<ffffffff8042f004>] dst_output+0x0/0xe [<ffffffff802359ab>] ip_queue_xmit+0x41c/0x48c [<ffffffff8022c1cb>] local_bh_enable+0x9/0xa5 [<ffffffff8020b6b7>] kmem_cache_alloc+0x62/0x6d [<ffffffff8023668d>] alloc_skb_from_cache+0x74/0x13c [<ffffffff80222a0b>] tcp_transmit_skb+0x62f/0x667 [<ffffffff8043903a>] tcp_retransmit_skb+0x53d/0x638 [<ffffffff80439353>] tcp_xmit_retransmit_queue+0x21e/0x2bb [<ffffffff80225cff>] tcp_ack+0x1705/0x1879 [<ffffffff8021c6b1>] tcp_rcv_established+0x804/0x925 [<ffffffff80263710>] schedule_timeout+0x1e/0xad [<ffffffff8023cef3>] tcp_v4_do_rcv+0x2a/0x2fa [<ffffffff8040bbfe>] sk_wait_data+0xac/0xbf [<ffffffff8029b018>] autoremove_wake_function+0x0/0x2e [<ffffffff80434f71>] tcp_prequeue_process+0x65/0x78 [<ffffffff8021dd39>] tcp_recvmsg+0x492/0xb1f [<ffffffff80233102>] sock_common_recvmsg+0x2d/0x43 [<ffffffff80233102>] sock_common_recvmsg+0x2d/0x43 [<ffffffff80231c18>] sock_recvmsg+0x101/0x120 [<ffffffff80231c18>] sock_recvmsg+0x101/0x120 [<ffffffff8029b018>] autoremove_wake_function+0x0/0x2e [<ffffffff80343366>] swiotlb_map_sg+0xf7/0x205 [<ffffffff880b563c>] :megaraid_sas:megasas_make_sgl64+0x78/0xa9 [<ffffffff880b61bc>] :megaraid_sas:megasas_queue_command+0x343/0x3ed [<ffffffff884e119f>] :drbd:drbd_recv+0x7b/0x109 [<ffffffff884e53b2>] :drbd:receive_DataRequest+0x3b/0x655 [<ffffffff884e1c4b>] :drbd:drbdd+0x77/0x152 [<ffffffff884e4870>] :drbd:drbdd_init+0xea/0x1dc [<ffffffff884f432a>] :drbd:drbd_thread_setup+0xa2/0x18b [<ffffffff80260b2c>] child_rip+0xa/0x12 [<ffffffff884f4288>] :drbd:drbd_thread_setup+0x0/0x18b [<ffffffff80260b22>] child_rip+0x0/0x12 Code: 44 8b 0f ff ca 83 ee 04 48 83 c7 04 4d 01 c8 41 89 d2 41 89 RIP [<ffffffff80212bad>] csum_partial+0x56/0x4bc RSP <ffff88000c347718> CR2: ffff880011e3cc64 Kernel panic - not syncing: Fatal exception ####### Any ideas on how to diagnose this properly and eventually find the culprit? Regards, -- Jean-François Chevrette [iWeb] _______________________________________________ drbd-user mailing list drbd-user@... http://lists.linbit.com/mailman/listinfo/drbd-user |
|
|
Re: Kernel Panic occuring when drbd is up & (re)syncingIt appears that there is currently a problem with the latest
CentOS/Redhat kernel. We have noticed the same problem when using LVM snapshots and a backup technology called R1Soft CDP. Some related info: http://bugs.centos.org/view.php?id=3869 forum.r1soft.com/showthread.php?t=1158 No sign of a bug at bugzilla.redhat.com For now we have reverted to kernel-2.6.18-128.7.1 on which we did not have any issues for the past 4 hours. Previously, a few seconds after starting a 'drbdadm verify' the kernel panic would occur. DRBD devs might want to check it out. Regards, -- Jean-François Chevrette [iWeb] On 09-11-09 10:20 AM, Jean-Francois Chevrette wrote: > Hello, > > here we have a two nodes setup that are running CentOS 5.4, Xen 3.0 > (CentOS RPMs) and DRBD 8.3.2 (again CentOS RPM). Both servers are Dell > PowerEdge 1950 servers with two Quad-Core Xeon processors and 32GB of > memory. The network card used by DRBD is an Intel 82571EB Gigabit > Ethernet card (e1000 driver). Both are connected directly with a > crossover cable. > > DRBD is configured so that I have one resource (drbd0) on which I have > configured a LVM VolumeGroup which is then sliced in two LVs. Both LVs > are mapped to my Xen VM (PV) as sda and sdb disks. > > Recently, we've had issues where the node that is in Primary state and > hence running the VM locks up and throws a kernel panic. The situation > seems to indicate that this might be a problem related to DRBD and/or > the network stack because if we disconnect the DRBD resource, this > problem will not occur. > > Even worse, the problem occur very quickly after we connect the DRBD > resource, either during resynchronization after being out-of-sync for a > while or during normal syncing operations. No errors show up on the > network interface (ifconfig, ethtool) > > One thing to note is that the kernel panic seems to complain about > checksum functions so that might be related (see below) > > Here are the relevant informations > > # rpm -qa | grep -e xen -e drbd > drbd83-8.3.2-6.el5_3 > kmod-drbd83-xen-8.3.2-6.el5_3 > xen-3.0.3-94.el5 > kernel-xen-2.6.18-164.el5 > xen-libs-3.0.3-94.el5 > > # cat /etc/drbd.conf > global { > usage-count no; > } > > common { > protocol C; > > syncer { > rate 33M; > verify-alg crc32c; > al-extents 1801; > } > net { > cram-hmac-alg sha1; > max-epoch-size 8192; > max-buffers 8192; > } > > disk { > on-io-error detach; > no-disk-flushes; > no-disk-barrier; > no-md-flushes; > } > } > > resource drbd0 { > device /dev/drbd0; > disk /dev/sda6; > flexible-meta-disk internal; > on node1 { > address 10.11.1.1:7788; > } > on node2 { > address 10.11.1.2:7788; > } > } > > ### Kernel Panic ### > Unable to handle kernel paging request > at ffff880011e3cc64 RIP: > [<ffffffff80212bad>] csum_partial+0x56/0x4bc > PGD ed8067 > PUD ed9067 > PMD f69067 > PTE 0 > > Oops: 0000 [1] > SMP > > last sysfs file: /class/scsi_host/host0/proc_name > CPU 0 > > Modules linked in: > xt_physdev > netconsole > drbd(U) > netloop > netbk > blktap > blkbk > ipt_MASQUERADE > iptable_nat > ip_nat > bridge > ipv6 > xfrm_nalgo > crypto_api > xt_tcpudp > xt_state > ip_conntrack_irc > xt_conntrack > ip_conntrack_ftp > xt_mac > xt_length > xt_limit > xt_multiport > ipt_ULOG > ipt_TCPMSS > ipt_TOS > ipt_ttl > ipt_owner > ipt_REJECT > ipt_ecn > ipt_LOG > ipt_recent > ip_conntrack > iptable_mangle > iptable_filter > ip_tables > nfnetlink > x_tables > autofs4 > dm_mirror > dm_multipath > scsi_dh > video > hwmon > backlight > sbs > i2c_ec > i2c_core > button > battery > asus_acpi > ac > parport_pc > lp > parport > joydev > ide_cd > e1000e > cdrom > serial_core > i5000_edac > edac_mc > bnx2 > serio_raw > pcspkr > sg > dm_raid45 > dm_message > dm_region_hash > dm_log > dm_mod > dm_mem_cache > ata_piix > libata > shpchp > megaraid_sas > sd_mod > scsi_mod > ext3 > jbd > uhci_hcd > ohci_hcd > ehci_hcd > > Pid: 12887, comm: drbd0_receiver Tainted: G 2.6.18-128.1.16.el5xen #1 > RIP: e030:[<ffffffff80212bad>] > [<ffffffff80212bad>] csum_partial+0x56/0x4bc > RSP: e02b:ffff88000c347718 EFLAGS: 00010202 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff880010ced500 > RDX: 00000000000000e7 RSI: 000000000000039c RDI: ffff880011e3cc64 > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000025b85e7c R11: 0000000000000002 R12: 0000000000000028 > R13: 0000000000000028 R14: ffff88001c56f7b0 R15: 0000000025b85e7c > FS: 00002b391e123f60(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 > Process drbd0_receiver (pid: 12887, threadinfo ffff88000c346000, task > ffff88001c207820) > Stack: > 000000000000039c > 00000000000005b4 > ffffffff8023d496 > ffff88001e7e48d8 > > 0000001400000000 > ffff8800000003c4 > ffff88001c56f7b0 > ffff88001e7e48d8 > > ffff88001e7e48ec > ffff88000c3478e8 > > Call Trace: > [<ffffffff8023d496>] skb_checksum+0x11b/0x260 > [<ffffffff80411472>] skb_checksum_help+0x71/0xd0 > [<ffffffff8853f33e>] :iptable_nat:ip_nat_fn+0x56/0x1c3 > [<ffffffff8853f6cf>] :iptable_nat:ip_nat_local_fn+0x32/0xb7 > [<ffffffff8023550c>] nf_iterate+0x41/0x7d > [<ffffffff8042f004>] dst_output+0x0/0xe > [<ffffffff80258b28>] nf_hook_slow+0x58/0xbc > [<ffffffff8042f004>] dst_output+0x0/0xe > [<ffffffff802359ab>] ip_queue_xmit+0x41c/0x48c > [<ffffffff8022c1cb>] local_bh_enable+0x9/0xa5 > [<ffffffff8020b6b7>] kmem_cache_alloc+0x62/0x6d > [<ffffffff8023668d>] alloc_skb_from_cache+0x74/0x13c > [<ffffffff80222a0b>] tcp_transmit_skb+0x62f/0x667 > [<ffffffff8043903a>] tcp_retransmit_skb+0x53d/0x638 > [<ffffffff80439353>] tcp_xmit_retransmit_queue+0x21e/0x2bb > [<ffffffff80225cff>] tcp_ack+0x1705/0x1879 > [<ffffffff8021c6b1>] tcp_rcv_established+0x804/0x925 > [<ffffffff80263710>] schedule_timeout+0x1e/0xad > [<ffffffff8023cef3>] tcp_v4_do_rcv+0x2a/0x2fa > [<ffffffff8040bbfe>] sk_wait_data+0xac/0xbf > [<ffffffff8029b018>] autoremove_wake_function+0x0/0x2e > [<ffffffff80434f71>] tcp_prequeue_process+0x65/0x78 > [<ffffffff8021dd39>] tcp_recvmsg+0x492/0xb1f > [<ffffffff80233102>] sock_common_recvmsg+0x2d/0x43 > [<ffffffff80233102>] sock_common_recvmsg+0x2d/0x43 > [<ffffffff80231c18>] sock_recvmsg+0x101/0x120 > [<ffffffff80231c18>] sock_recvmsg+0x101/0x120 > [<ffffffff8029b018>] autoremove_wake_function+0x0/0x2e > [<ffffffff80343366>] swiotlb_map_sg+0xf7/0x205 > [<ffffffff880b563c>] :megaraid_sas:megasas_make_sgl64+0x78/0xa9 > [<ffffffff880b61bc>] :megaraid_sas:megasas_queue_command+0x343/0x3ed > [<ffffffff884e119f>] :drbd:drbd_recv+0x7b/0x109 > [<ffffffff884e53b2>] :drbd:receive_DataRequest+0x3b/0x655 > [<ffffffff884e1c4b>] :drbd:drbdd+0x77/0x152 > [<ffffffff884e4870>] :drbd:drbdd_init+0xea/0x1dc > [<ffffffff884f432a>] :drbd:drbd_thread_setup+0xa2/0x18b > [<ffffffff80260b2c>] child_rip+0xa/0x12 > [<ffffffff884f4288>] :drbd:drbd_thread_setup+0x0/0x18b > [<ffffffff80260b22>] child_rip+0x0/0x12 > > > Code: > 44 > 8b > 0f > ff > ca > 83 > ee > 04 > 48 > 83 > c7 > 04 > 4d > 01 > c8 > 41 > 89 > d2 > 41 > 89 > > RIP > [<ffffffff80212bad>] csum_partial+0x56/0x4bc > RSP <ffff88000c347718> > CR2: ffff880011e3cc64 > > Kernel panic - not syncing: Fatal exception > ####### > > > Any ideas on how to diagnose this properly and eventually find the culprit? > > > Regards, _______________________________________________ drbd-user mailing list drbd-user@... http://lists.linbit.com/mailman/listinfo/drbd-user |
|
|
Re: Kernel Panic occuring when drbd is up & (re)syncingIt looks like I am getting kernel bug on 64-bit Xen Debian in similar
conditions, ie, when running drbd-verify. I have got it happening on both cluster nodes. Kernel 2.6.26-2-xen-amd64, DRBD 8.3.5 compiled from Debian unstable package for 8.3.4 For anyone interested, here is the stack trace. BR, Ivars Nov 16 03:00:29 ariel kernel: [31375.026193] BUG: unable to handle kernel NULL pointer dereference at 0000000000000016 Nov 16 03:00:29 ariel kernel: [31375.026288] IP: [<ffffffffa02f9169>] :drbd:drbd_connector_callback+0x32/0x181 Nov 16 03:00:29 ariel kernel: [31375.026359] PGD 164c4067 PUD 170d1067 PMD 0 Nov 16 03:00:29 ariel kernel: [31375.026423] Oops: 0000 [1] SMP Nov 16 03:00:29 ariel kernel: [31375.026474] CPU 0 Nov 16 03:00:29 ariel kernel: [31375.026512] Modules linked in: xt_physdev iptable_filter ip_tables x_tables sha1_generic dr bd cn iscsi_trgt crc32c libcrc32c ipv6 bridge xfs w83627ehf lm85 hwmon_vid netconsole configfs xenblktap netloop softdog ipm i_watchdog ipmi_msghandler loop psmouse serio_raw pcspkr i2c_i801 i2c_core button rng_core shpchp pci_hotplug intel_agp evde v ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom ide_disk ide_pci_generic ata_piix piix ide_core ata_ generic libata scsi_mod dock skge ehci_hcd uhci_hcd thermal processor fan thermal_sys [last unloaded: scsi_wait_scan] Nov 16 03:00:29 ariel kernel: [31375.027370] Pid: 3165, comm: cqueue Not tainted 2.6.26-2-xen-amd64 #1 Nov 16 03:00:29 ariel kernel: [31375.027405] RIP: e030:[<ffffffffa02f9169>] [<ffffffffa02f9169>] :drbd:drbd_connector_callb ack+0x32/0x181 Nov 16 03:00:29 ariel kernel: [31375.027485] RSP: e02b:ffff8800104f3e50 EFLAGS: 00010206 Nov 16 03:00:29 ariel kernel: [31375.027519] RAX: 0000000000000000 RBX: ffff88001648c220 RCX: 0000000000000000 Nov 16 03:00:29 ariel kernel: [31375.027555] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800164c9c10 Nov 16 03:00:29 ariel kernel: [31375.027597] RBP: ffff88001648c1d8 R08: ffff8800104f2000 R09: ffffffff80553e18 Nov 16 03:00:29 ariel kernel: [31375.027633] R10: 0000000000000000 R11: 7fffffffffffffff R12: ffff8800164c9c10 Nov 16 03:00:29 ariel kernel: [31375.027669] R13: ffffffffa02d30c3 R14: ffffffff8057d1c0 R15: 0000000000000000 Nov 16 03:00:29 ariel kernel: [31375.027709] FS: 00007f9ee13c46e0(0000) GS:ffffffff8053a000(0000) knlGS:0000000000000000 Nov 16 03:00:29 ariel kernel: [31375.027761] CS: e033 DS: 0000 ES: 0000 Nov 16 03:00:29 ariel kernel: [31375.027793] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 16 03:00:29 ariel kernel: [31375.027829] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Nov 16 03:00:29 ariel kernel: [31375.027866] Process cqueue (pid: 3165, threadinfo ffff8800104f2000, task ffff8800161e1440) Nov 16 03:00:29 ariel kernel: [31375.027918] Stack: 0000000000000000 ffff88001648c220 ffff88001648c1d8 ffff88001648c1d0 Nov 16 03:00:29 ariel kernel: [31375.028024] ffffffffa02d30c3 ffffffff8057d1c0 0000000000000000 ffffffffa02d30d8 Nov 16 03:00:29 ariel kernel: [31375.028120] 7fffffffffffffff ffff880016f76840 ffff88001648c1d0 ffffffff8023c34c Nov 16 03:00:29 ariel kernel: [31375.028185] Call Trace: Nov 16 03:00:29 ariel kernel: [31375.028250] [<ffffffffa02d30c3>] ? :cn:cn_queue_wrapper+0x0/0x33 Nov 16 03:00:29 ariel kernel: [31375.028393] [<ffffffffa02d30d8>] ? :cn:cn_queue_wrapper+0x15/0x33 Nov 16 03:00:29 ariel kernel: [31375.028439] [<ffffffff8023c34c>] ? run_workqueue+0xbe/0x189 Nov 16 03:00:29 ariel kernel: [31375.028482] [<ffffffff8023cd35>] ? worker_thread+0xd5/0xe0 Nov 16 03:00:29 ariel kernel: [31375.028522] [<ffffffff8023f6c1>] ? autoremove_wake_function+0x0/0x2e Nov 16 03:00:29 ariel kernel: [31375.028564] [<ffffffff8023cc60>] ? worker_thread+0x0/0xe0 Nov 16 03:00:29 ariel kernel: [31375.028601] [<ffffffff8023f593>] ? kthread+0x47/0x74 Nov 16 03:00:29 ariel kernel: [31375.028637] [<ffffffff802283a8>] ? schedule_tail+0x27/0x5c Nov 16 03:00:29 ariel kernel: [31375.028677] [<ffffffff8020be28>] ? child_rip+0xa/0x12 Nov 16 03:00:29 ariel kernel: [31375.028722] [<ffffffff8023f54c>] ? kthread+0x0/0x74 Nov 16 03:00:29 ariel kernel: [31375.028760] [<ffffffff8020be1e>] ? child_rip+0x0/0x12 Nov 16 03:00:29 ariel kernel: [31375.028796] Nov 16 03:00:29 ariel kernel: [31375.028824] Nov 16 03:00:29 ariel kernel: [31375.028852] Code: 41 55 41 54 49 89 fc 55 53 48 83 ec 08 65 8b 04 25 24 00 00 00 83 3d a6 75 01 00 02 74 1e 89 c0 48 c1 e0 07 48 ff 80 00 09 31 a0 <f6> 42 16 20 be 98 00 00 00 0f 84 20 01 00 00 eb 1a 41 5b 5b 5d Nov 16 03:00:29 ariel kernel: [31375.029581] RIP [<ffffffffa02f9169>] :drbd:drbd_connector_callback+0x32/0x181 Nov 16 03:00:29 ariel kernel: [31375.029657] RSP <ffff8800104f3e50> Nov 16 03:00:29 ariel kernel: [31375.029688] CR2: 0000000000000016 Nov 16 03:00:29 ariel kernel: [31375.030762] ---[ end trace 296f6157c8798c56 ]--- Jean-Francois Chevrette wrote: > It appears that there is currently a problem with the latest > CentOS/Redhat kernel. We have noticed the same problem when using LVM > snapshots and a backup technology called R1Soft CDP. > > Some related info: > http://bugs.centos.org/view.php?id=3869 > forum.r1soft.com/showthread.php?t=1158 > > No sign of a bug at bugzilla.redhat.com > > For now we have reverted to kernel-2.6.18-128.7.1 on which we did not > have any issues for the past 4 hours. Previously, a few seconds after > starting a 'drbdadm verify' the kernel panic would occur. > > DRBD devs might want to check it out. > > Regards, drbd-user mailing list drbd-user@... http://lists.linbit.com/mailman/listinfo/drbd-user |
|
|
Re: Kernel Panic occuring when drbd is up & (re)syncingHi list,
Any news about this bug ?? 2009/11/16 Ivars Strazdiņš <ivars.strazdins@...>: > It looks like I am getting kernel bug on 64-bit Xen Debian in similar > conditions, ie, when running drbd-verify. > I have got it happening on both cluster nodes. > > Kernel 2.6.26-2-xen-amd64, DRBD 8.3.5 compiled from Debian unstable package > for 8.3.4 > > For anyone interested, here is the stack trace. > BR, > Ivars > > > Nov 16 03:00:29 ariel kernel: [31375.026193] BUG: unable to handle kernel > NULL pointer dereference at 0000000000000016 > Nov 16 03:00:29 ariel kernel: [31375.026288] IP: [<ffffffffa02f9169>] > :drbd:drbd_connector_callback+0x32/0x181 > Nov 16 03:00:29 ariel kernel: [31375.026359] PGD 164c4067 PUD 170d1067 PMD 0 > Nov 16 03:00:29 ariel kernel: [31375.026423] Oops: 0000 [1] SMP > Nov 16 03:00:29 ariel kernel: [31375.026474] CPU 0 > Nov 16 03:00:29 ariel kernel: [31375.026512] Modules linked in: xt_physdev > iptable_filter ip_tables x_tables sha1_generic dr > bd cn iscsi_trgt crc32c libcrc32c ipv6 bridge xfs w83627ehf lm85 hwmon_vid > netconsole configfs xenblktap netloop softdog ipm > i_watchdog ipmi_msghandler loop psmouse serio_raw pcspkr i2c_i801 i2c_core > button rng_core shpchp pci_hotplug intel_agp evde > v ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom > ide_disk ide_pci_generic ata_piix piix ide_core ata_ > generic libata scsi_mod dock skge ehci_hcd uhci_hcd thermal processor fan > thermal_sys [last unloaded: scsi_wait_scan] > Nov 16 03:00:29 ariel kernel: [31375.027370] Pid: 3165, comm: cqueue Not > tainted 2.6.26-2-xen-amd64 #1 > Nov 16 03:00:29 ariel kernel: [31375.027405] RIP: e030:[<ffffffffa02f9169>] > [<ffffffffa02f9169>] :drbd:drbd_connector_callb > ack+0x32/0x181 > Nov 16 03:00:29 ariel kernel: [31375.027485] RSP: e02b:ffff8800104f3e50 > EFLAGS: 00010206 > Nov 16 03:00:29 ariel kernel: [31375.027519] RAX: 0000000000000000 RBX: > ffff88001648c220 RCX: 0000000000000000 > Nov 16 03:00:29 ariel kernel: [31375.027555] RDX: 0000000000000000 RSI: > 0000000000000000 RDI: ffff8800164c9c10 > Nov 16 03:00:29 ariel kernel: [31375.027597] RBP: ffff88001648c1d8 R08: > ffff8800104f2000 R09: ffffffff80553e18 > Nov 16 03:00:29 ariel kernel: [31375.027633] R10: 0000000000000000 R11: > 7fffffffffffffff R12: ffff8800164c9c10 > Nov 16 03:00:29 ariel kernel: [31375.027669] R13: ffffffffa02d30c3 R14: > ffffffff8057d1c0 R15: 0000000000000000 > Nov 16 03:00:29 ariel kernel: [31375.027709] FS: 00007f9ee13c46e0(0000) > GS:ffffffff8053a000(0000) knlGS:0000000000000000 > Nov 16 03:00:29 ariel kernel: [31375.027761] CS: e033 DS: 0000 ES: 0000 > Nov 16 03:00:29 ariel kernel: [31375.027793] DR0: 0000000000000000 DR1: > 0000000000000000 DR2: 0000000000000000 > Nov 16 03:00:29 ariel kernel: [31375.027829] DR3: 0000000000000000 DR6: > 00000000ffff0ff0 DR7: 0000000000000400 > Nov 16 03:00:29 ariel kernel: [31375.027866] Process cqueue (pid: 3165, > threadinfo ffff8800104f2000, task ffff8800161e1440) > Nov 16 03:00:29 ariel kernel: [31375.027918] Stack: 0000000000000000 > ffff88001648c220 ffff88001648c1d8 ffff88001648c1d0 > Nov 16 03:00:29 ariel kernel: [31375.028024] ffffffffa02d30c3 > ffffffff8057d1c0 0000000000000000 ffffffffa02d30d8 > Nov 16 03:00:29 ariel kernel: [31375.028120] 7fffffffffffffff > ffff880016f76840 ffff88001648c1d0 ffffffff8023c34c > Nov 16 03:00:29 ariel kernel: [31375.028185] Call Trace: > Nov 16 03:00:29 ariel kernel: [31375.028250] [<ffffffffa02d30c3>] ? > :cn:cn_queue_wrapper+0x0/0x33 > Nov 16 03:00:29 ariel kernel: [31375.028393] [<ffffffffa02d30d8>] ? > :cn:cn_queue_wrapper+0x15/0x33 > Nov 16 03:00:29 ariel kernel: [31375.028439] [<ffffffff8023c34c>] ? > run_workqueue+0xbe/0x189 > Nov 16 03:00:29 ariel kernel: [31375.028482] [<ffffffff8023cd35>] ? > worker_thread+0xd5/0xe0 > Nov 16 03:00:29 ariel kernel: [31375.028522] [<ffffffff8023f6c1>] ? > autoremove_wake_function+0x0/0x2e > Nov 16 03:00:29 ariel kernel: [31375.028564] [<ffffffff8023cc60>] ? > worker_thread+0x0/0xe0 > Nov 16 03:00:29 ariel kernel: [31375.028601] [<ffffffff8023f593>] ? > kthread+0x47/0x74 > Nov 16 03:00:29 ariel kernel: [31375.028637] [<ffffffff802283a8>] ? > schedule_tail+0x27/0x5c > Nov 16 03:00:29 ariel kernel: [31375.028677] [<ffffffff8020be28>] ? > child_rip+0xa/0x12 > Nov 16 03:00:29 ariel kernel: [31375.028722] [<ffffffff8023f54c>] ? > kthread+0x0/0x74 > Nov 16 03:00:29 ariel kernel: [31375.028760] [<ffffffff8020be1e>] ? > child_rip+0x0/0x12 > Nov 16 03:00:29 ariel kernel: [31375.028796] > Nov 16 03:00:29 ariel kernel: [31375.028824] > Nov 16 03:00:29 ariel kernel: [31375.028852] Code: 41 55 41 54 49 89 fc 55 > 53 48 83 ec 08 65 8b 04 25 24 00 00 00 83 3d a6 75 01 00 02 74 1e 89 c0 48 > c1 e0 07 48 ff 80 00 09 31 a0 <f6> 42 16 20 be 98 00 00 00 0f 84 20 01 00 00 > eb 1a 41 5b 5b 5d > Nov 16 03:00:29 ariel kernel: [31375.029581] RIP [<ffffffffa02f9169>] > :drbd:drbd_connector_callback+0x32/0x181 > Nov 16 03:00:29 ariel kernel: [31375.029657] RSP <ffff8800104f3e50> > Nov 16 03:00:29 ariel kernel: [31375.029688] CR2: 0000000000000016 > Nov 16 03:00:29 ariel kernel: [31375.030762] ---[ end trace 296f6157c8798c56 > ]--- > > > Jean-Francois Chevrette wrote: >> >> It appears that there is currently a problem with the latest CentOS/Redhat >> kernel. We have noticed the same problem when using LVM snapshots and a >> backup technology called R1Soft CDP. >> >> Some related info: >> http://bugs.centos.org/view.php?id=3869 >> forum.r1soft.com/showthread.php?t=1158 >> >> No sign of a bug at bugzilla.redhat.com >> >> For now we have reverted to kernel-2.6.18-128.7.1 on which we did not have >> any issues for the past 4 hours. Previously, a few seconds after starting a >> 'drbdadm verify' the kernel panic would occur. >> >> DRBD devs might want to check it out. >> >> Regards, > > _______________________________________________ > drbd-user mailing list > drbd-user@... > http://lists.linbit.com/mailman/listinfo/drbd-user > drbd-user mailing list drbd-user@... http://lists.linbit.com/mailman/listinfo/drbd-user |
| Free embeddable forum powered by Nabble | Forum Help |