Hi,
I experienced a very strange issue this week and I am trying to get to the bottom of it.
At 10:07 we lost pretty much all virtual infrastructure, on our monitoring kit/syslogs etc some events were generated, but as the syslog server was virtual this was pretty much stopped as soon as the issue happened.
I have multiple ESX clusters with one of them having quite recent hardware (IBM Bladecentre with HS23 blades) All 6 hosts in this cluster had PSOD. Obviously all guest VMs where unavailable also.
The guys onsite eventually figured out what was happening and bounced all the servers. I was out on a customer site so did not investigate until well after the event. the guys who saw the PSOD both tell me that the PSOD mentioned something about a log being full. I have checked out the dump log and cant find any reference to this.
Here is the crash log around the event,
2012-10-16T09:07:18.364Z cpu24:4120)0x412200607858:[0x418023a9b0ec]Util_Udelay@vmkernel#nover+0x2f stack: 0x412200010000
2012-10-16T09:07:18.365Z cpu24:4120)0x4122006078a8:[0x418024045a56]_be_mpu_post_wrb_ring@<None>#<None>+0xed stack: 0x4122ffffffff
2012-10-16T09:07:18.365Z cpu24:4120)0x412200607908:[0x4180240428e9]be_function_post_mcc_wrb@<None>#<None>+0x128 stack: 0x0
2012-10-16T09:07:18.365Z cpu24:4120)0x412200607998:[0x418024043aaf]be_eq_modify_delay@<None>#<None>+0x156 stack: 0x0
2012-10-16T09:07:18.366Z cpu24:4120)0x412200607ad8:[0x418024039efd]rate_timer_func@<None>#<None>+0x360 stack: 0x0
2012-10-16T09:07:18.366Z cpu24:4120)0x412200607b78:[0x418023a96e12]Timer_BHHandler@vmkernel#nover+0x225 stack: 0xfffc01000000df
2012-10-16T09:07:18.367Z cpu24:4120)0x412200607bb8:[0x418023a1890d]BH_Check@vmkernel#nover+0x80 stack: 0x4122ffffffff
2012-10-16T09:07:18.367Z cpu24:4120)0x412200607bf8:[0x418023a4221d]IDT_HandleInterrupt@vmkernel#nover+0x13c stack: 0x418046000140
2012-10-16T09:07:18.368Z cpu24:4120)0x412200607c18:[0x418023a42a7d]IDT_IntrHandler@vmkernel#nover+0xa4 stack: 0x412200607d28
2012-10-16T09:07:18.368Z cpu24:4120)0x412200607c28:[0x418023af2047]gate_entry@vmkernel#nover+0x46 stack: 0x4018
2012-10-16T09:07:18.368Z cpu24:4120)0x412200607d28:[0x418023d00281]Power_HaltPCPU@vmkernel#nover+0x274 stack: 0x206a8148a83722
2012-10-16T09:07:18.369Z cpu24:4120)0x412200607e58:[0x418023bf05fa]CpuSchedIdleLoopInt@vmkernel#nover+0xb3d stack: 0x412200607e98
2012-10-16T09:07:18.369Z cpu24:4120)0x412200607e68:[0x418023bf75f6]CpuSched_IdleLoop@vmkernel#nover+0x15 stack: 0x28
2012-10-16T09:07:18.370Z cpu24:4120)0x412200607e98:[0x418023a4631e]Init_SlaveIdle@vmkernel#nover+0x13d stack: 0x0
2012-10-16T09:07:18.370Z cpu24:4120)0x412200607fe8:[0x418023d06479]SMPSlaveIdle@vmkernel#nover+0x310 stack: 0x0
[31;1m2012-10-16T09:07:21.363Z cpu20:3901569)ALERT: Heartbeat: 618: PCPU 13 didn't have a heartbeat for 8 seconds. *may* be locked up[0m
[31;1m2012-10-16T09:07:21.363Z cpu13:4134)ALERT: NMI: 1943: NMI IPI received. Was eip(base):ebp:cs [0x3023e6(0x418023a00000):0x412200987f10:0x4010](Src 0x1, CPU13)[0m
2012-10-16T09:07:21.364Z cpu13:4134)0x412200987f10:[0x418023d023e6]PowerSetPStateAnyPCPU@vmkernel#nover+0xf9 stack: 0x18
2012-10-16T09:07:21.365Z cpu13:4134)0x412200987f40:[0x418023d51b88]VMKAcpiStateNotifyHandler@vmkernel#nover+0xcb stack: 0x0
2012-10-16T09:07:21.365Z cpu13:4134)0x412200987f60:[0x418023d1a028]AcpiEvNotifyDispatch@vmkernel#nover+0x63 stack: 0x0
2012-10-16T09:07:21.365Z cpu13:4134)0x412200987ff0:[0x418023a3e2ef]helpFunc@vmkernel#nover+0x54e stack: 0x0
2012-10-16T09:07:21.366Z cpu13:4134)0x412200987ff8:[0x0]<unknown> stack: 0x0
[31;1m2012-10-16T09:07:27.363Z cpu4:4523330)ALERT: Heartbeat: 618: PCPU 18 didn't have a heartbeat for 8 seconds. *may* be locked up[0m
[31;1m2012-10-16T09:07:27.363Z cpu18:4155)ALERT: NMI: 1943: NMI IPI received. Was eip(base):ebp:cs [0x4c04da(0x418023a00000):0x412200ec7dd0:0x4010](Src 0x1, CPU18)[0m
2012-10-16T09:07:27.364Z cpu18:4155)0x412200ec7dd0:[0x418023ec04da]__raw_spin_failed@com.vmware.driverAPI#9.2+0x1 stack: 0x410000000001
2012-10-16T09:07:27.365Z cpu18:4155)0x412200ec7e10:[0x41802403c52d]be_get_stats@<None>#<None>+0x94 stack: 0x410005aa0000
2012-10-16T09:07:27.365Z cpu18:4155)0x412200ec7e30:[0x41802403ca88]benet_get_stats@<None>#<None>+0x63 stack: 0x412200ec7e80
2012-10-16T09:07:27.365Z cpu18:4155)0x412200ec7f20:[0x418023ecc1f5]GetDeviceStats@com.vmware.driverAPI#9.2+0x50 stack: 0x410009239168
2012-10-16T09:07:27.366Z cpu18:4155)0x412200ec7f60:[0x418023b85c2b]UplinkAsyncProcessCallsHelperCB@vmkernel#nover+0x122 stack: 0x0
2012-10-16T09:07:27.366Z cpu18:4155)0x412200ec7ff0:[0x418023a3e2ef]helpFunc@vmkernel#nover+0x54e stack: 0x0
2012-10-16T09:07:27.367Z cpu18:4155)0x412200ec7ff8:[0x0]<unknown> stack: 0x0
[31;1m2012-10-16T09:07:32.363Z cpu28:4124)ALERT: Heartbeat: 618: PCPU 24 didn't have a heartbeat for 21 seconds. *may* be locked up[0m
[31;1m2012-10-16T09:07:32.363Z cpu24:4120)ALERT: NMI: 1915: NMI IPI recvd. We Halt. eip(base):ebp:cs [0x9b0ec(0x418023a00000):0x412200607858:0x4010](Src0x1, CPU24)[0m
2012-10-16T09:07:32.363Z cpu28:4124)World: 7145: PRDA 0x418047000000 ss 0x0 ds 0x4018 es 0x4018 fs 0x4018 gs 0x4018
2012-10-16T09:07:32.363Z cpu28:4124)World: 7147: TR 0x110 GDT 0x41220071f000 (0x401f) IDT 0x418023af4000 (0xfff)
2012-10-16T09:07:32.403Z cpu28:4124)Panic: 835: Saved backtrace: pcpu 24 Heartbeat NMI
2012-10-16T09:07:32.404Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607858:[0x418023a9b0ec]Util_Udelay@vmkernel#nover+0x2f stack: 0x4122
2012-10-16T09:07:32.404Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x4122006078a8:[0x418024045a56]_be_mpu_post_wrb_ring@<None>#<None>+0xed stac
2012-10-16T09:07:32.405Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607908:[0x4180240428e9]be_function_post_mcc_wrb@<None>#<None>+0x128
2012-10-16T09:07:32.405Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607998:[0x418024043aaf]be_eq_modify_delay@<None>#<None>+0x156 stack:
2012-10-16T09:07:32.406Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607ad8:[0x418024039efd]rate_timer_func@<None>#<None>+0x360 stack: 0x
2012-10-16T09:07:32.406Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607b78:[0x418023a96e12]Timer_BHHandler@vmkernel#nover+0x225 stack: 0
2012-10-16T09:07:32.407Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607bb8:[0x418023a1890d]BH_Check@vmkernel#nover+0x80 stack: 0x4122fff
2012-10-16T09:07:32.407Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607bf8:[0x418023a4221d]IDT_HandleInterrupt@vmkernel#nover+0x13c stac
2012-10-16T09:07:32.408Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607c18:[0x418023a42a7d]IDT_IntrHandler@vmkernel#nover+0xa4 stack: 0x
2012-10-16T09:07:32.408Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607c28:[0x418023af2047]gate_entry@vmkernel#nover+0x46 stack: 0x4018,
2012-10-16T09:07:32.409Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607d28:[0x418023d00281]Power_HaltPCPU@vmkernel#nover+0x274 stack: 0x
2012-10-16T09:07:32.409Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607e58:[0x418023bf05fa]CpuSchedIdleLoopInt@vmkernel#nover+0xb3d stac
2012-10-16T09:07:32.410Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607e68:[0x418023bf75f6]CpuSched_IdleLoop@vmkernel#nover+0x15 stack:
2012-10-16T09:07:32.410Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607e98:[0x418023a4631e]Init_SlaveIdle@vmkernel#nover+0x13d stack: 0x
2012-10-16T09:07:32.411Z cpu28:4124)pcpu 24 Heartbeat NMI: 0x412200607fe8:[0x418023d06479]SMPSlaveIdle@vmkernel#nover+0x310 stack: 0x0,
2012-10-16T09:07:32.429Z cpu28:4124)[45m[33;1mVMware ESXi 5.0.0 [Releasebuild-768111 x86_64][0m
PCPU 24: no heartbeat (but 2/2 IPIs received).
2012-10-16T09:07:32.429Z cpu28:4124)cr0=0x80010039 cr2=0x0 cr3=0x10d000 cr4=0x216c
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:0 world:2841010 name:"vmm1:flc-rds03.domain.co.uk" (V)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:1 world:4097 name:"idle1" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:2 world:1744551 name:"vmm0:server06.domain.local" (V)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:3 world:4165861 name:"vmx" (U)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:4 world:3244326 name:"vmm1:dc01-scott.scottmail.co.uk-VSS" (V)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:5 world:4101 name:"idle5" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:6 world:3244324 name:"vmm0:dc01-scott.scottmail.co.uk-VSS" (V)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:7 world:4523330 name:"vmm0:fli-ips02.domain.local" (V)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:8 world:3842033 name:"vmm1:server03.domain.co.uk" (V)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:9 world:4105 name:"idle9" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:10 world:2841008 name:"vmm0:flc-rds03.domain.co.uk" (V)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:11 world:3907678 name:"vmm0:flc-lync01.domain.co.uk-VSS" (V)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:12 world:3846127 name:"vmm0:server03.domain.co.uk" (V)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:13 world:4134 name:"helper0-0" (SH)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:14 world:3265111 name:"vmm0:fls-cog02.domain.co.uk" (V)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:15 world:3899488 name:"vmm1:flc-lync01.domain.co.uk-VSS" (V)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:16 world:4112 name:"idle16" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:17 world:4113 name:"idle17" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:18 world:4155 name:"helper12-0" (SH)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:19 world:4115 name:"idle19" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:20 world:4116 name:"idle20" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:21 world:4117 name:"idle21" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:22 world:4118 name:"idle22" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:23 world:4119 name:"idle23" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:24 world:4120 name:"idle24" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:25 world:4121 name:"idle25" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:26 world:3901569 name:"vmm0:Webserver03.domain.co.uk-VSS" (V)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:27 world:4123 name:"idle27" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:28 world:4124 name:"idle28" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:29 world:4125 name:"idle29" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:30 world:4126 name:"idle30" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)pcpu:31 world:4127 name:"idle31" (IS)
2012-10-16T09:07:32.429Z cpu28:4124)@BlueScreen: PCPU 24: no heartbeat (but 2/2 IPIs received).
2012-10-16T09:07:32.429Z cpu28:4124)Code start: 0x418023a00000 VMK uptime: 52:19:16:13.591
2012-10-16T09:07:32.430Z cpu28:4124)Saved backtrace from: pcpu 24 Heartbeat NMI
2012-10-16T09:07:32.430Z cpu28:4124)0x412200607858:[0x418023a9b0ec]Util_Udelay@vmkernel#nover+0x2f stack: 0x412200010000
2012-10-16T09:07:32.431Z cpu28:4124)0x4122006078a8:[0x418024045a56]_be_mpu_post_wrb_ring@<None>#<None>+0xed stack: 0x4122ffffffff
2012-10-16T09:07:32.432Z cpu28:4124)0x412200607908:[0x4180240428e9]be_function_post_mcc_wrb@<None>#<None>+0x128 stack: 0x0
2012-10-16T09:07:32.433Z cpu28:4124)0x412200607998:[0x418024043aaf]be_eq_modify_delay@<None>#<None>+0x156 stack: 0x0
2012-10-16T09:07:32.434Z cpu28:4124)0x412200607ad8:[0x418024039efd]rate_timer_func@<None>#<None>+0x360 stack: 0x0
2012-10-16T09:07:32.434Z cpu28:4124)0x412200607b78:[0x418023a96e12]Timer_BHHandler@vmkernel#nover+0x225 stack: 0xfffc01000000df
2012-10-16T09:07:32.435Z cpu28:4124)0x412200607bb8:[0x418023a1890d]BH_Check@vmkernel#nover+0x80 stack: 0x4122ffffffff
2012-10-16T09:07:32.436Z cpu28:4124)0x412200607bf8:[0x418023a4221d]IDT_HandleInterrupt@vmkernel#nover+0x13c stack: 0x418046000140
2012-10-16T09:07:32.437Z cpu28:4124)0x412200607c18:[0x418023a42a7d]IDT_IntrHandler@vmkernel#nover+0xa4 stack: 0x412200607d28
2012-10-16T09:07:32.438Z cpu28:4124)0x412200607c28:[0x418023af2047]gate_entry@vmkernel#nover+0x46 stack: 0x4018
2012-10-16T09:07:32.438Z cpu28:4124)0x412200607d28:[0x418023d00281]Power_HaltPCPU@vmkernel#nover+0x274 stack: 0x206a8148a83722
2012-10-16T09:07:32.439Z cpu28:4124)0x412200607e58:[0x418023bf05fa]CpuSchedIdleLoopInt@vmkernel#nover+0xb3d stack: 0x412200607e98
2012-10-16T09:07:32.440Z cpu28:4124)0x412200607e68:[0x418023bf75f6]CpuSched_IdleLoop@vmkernel#nover+0x15 stack: 0x28
2012-10-16T09:07:32.441Z cpu28:4124)0x412200607e98:[0x418023a4631e]Init_SlaveIdle@vmkernel#nover+0x13d stack: 0x0
2012-10-16T09:07:32.442Z cpu28:4124)0x412200607fe8:[0x418023d06479]SMPSlaveIdle@vmkernel#nover+0x310 stack: 0x0
2012-10-16T09:07:32.450Z cpu28:4124)base fs=0x0 gs=0x418047000000 Kgs=0x0
2012-10-01T18:17:33.919Z cpu5:4965)ScsiDeviceIO: 3081: Failed write command to write-quiesced partition naa.60050768028104d2200000000000000e:1
2012-10-16T09:07:32.363Z cpu28:4124)Heartbeat: 618: PCPU 24 didn't have a heartbeat for 21 seconds. *may* be locked up
2012-10-16T09:07:27.363Z cpu18:4155)NMI: 1943: NMI IPI received. Was eip(base):ebp:cs [0x4c04da(0x418023a00000):0x412200ec7dd0:0x4010](Src 0x1, CPU18)
2012-10-16T09:07:27.363Z cpu4:4523330)Heartbeat: 618: PCPU 18 didn't have a heartbeat for 8 seconds. *may* be locked up
2012-10-16T09:07:21.363Z cpu13:4134)NMI: 1943: NMI IPI received. Was eip(base):ebp:cs [0x3023e6(0x418023a00000):0x412200987f10:0x4010](Src 0x1, CPU13)
2012-10-16T09:07:21.363Z cpu20:3901569)Heartbeat: 618: PCPU 13 didn't have a heartbeat for 8 seconds. *may* be locked up
2012-10-16T09:07:18.363Z cpu24:4120)NMI: 1943: NMI IPI received. Was eip(base):ebp:cs [0x9b0ec(0x418023a00000):0x412200607858:0x4010](Src 0x1, CPU24)
2012-10-16T09:07:32.453Z cpu28:4124)Backtrace for current CPU #28, worldID=4124, ebp=0x412200707a68
2012-10-16T09:07:32.454Z cpu28:4124)0x412200707a68:[0x418023a6d0c8]Panic_WithBacktrace@vmkernel#nover+0xa3 stack: 0x412200707ad8, 0x9a0
2012-10-16T09:07:32.454Z cpu28:4124)0x412200707ad8:[0x418023cd9bd7]Heartbeat_DetectCPULockups@vmkernel#nover+0x2be stack: 0x0, 0x410005
2012-10-16T09:07:32.455Z cpu28:4124)0x412200707b78:[0x418023a96df7]Timer_BHHandler@vmkernel#nover+0x20a stack: 0xfffc01000000df, 0xdf,
2012-10-16T09:07:32.455Z cpu28:4124)0x412200707bb8:[0x418023a1890d]BH_Check@vmkernel#nover+0x80 stack: 0x4122ffffffff, 0x412200707cc0,
2012-10-16T09:07:32.456Z cpu28:4124)0x412200707bf8:[0x418023a4221d]IDT_HandleInterrupt@vmkernel#nover+0x13c stack: 0x418047000140, 0x0,
2012-10-16T09:07:32.456Z cpu28:4124)0x412200707c18:[0x418023a42a7d]IDT_IntrHandler@vmkernel#nover+0xa4 stack: 0x412200707d28, 0x418023d
2012-10-16T09:07:32.457Z cpu28:4124)0x412200707c28:[0x418023af2047]gate_entry@vmkernel#nover+0x46 stack: 0x4018, 0x4018, 0x0, 0x0, 0x0
2012-10-16T09:07:32.457Z cpu28:4124)0x412200707d28:[0x418023d00281]Power_HaltPCPU@vmkernel#nover+0x274 stack: 0x206a8b6ea74972, 0x206a8
2012-10-16T09:07:32.458Z cpu28:4124)0x412200707e58:[0x418023bf05fa]CpuSchedIdleLoopInt@vmkernel#nover+0xb3d stack: 0x412200707e98, 0x41
2012-10-16T09:07:32.458Z cpu28:4124)0x412200707e68:[0x418023bf75f6]CpuSched_IdleLoop@vmkernel#nover+0x15 stack: 0x2c, 0x1c, 0x0, 0x2c,
2012-10-16T09:07:32.459Z cpu28:4124)0x412200707e98:[0x418023a4631e]Init_SlaveIdle@vmkernel#nover+0x13d stack: 0x0, 0x200000000, 0x0, 0x
2012-10-16T09:07:32.459Z cpu28:4124)0x412200707fe8:[0x418023d06479]SMPSlaveIdle@vmkernel#nover+0x310 stack: 0x0, 0x0, 0x0, 0x0, 0x0
2012-10-16T09:07:32.459Z cpu28:4124)vmkernel 0x0 .data 0x0 .bss 0x0
2012-10-16T09:07:32.459Z cpu28:4124)procfs 0x418023e9b000 .data 0x417fe3efc000 .bss 0x417fe3efc220
2012-10-16T09:07:32.459Z cpu28:4124)vmkplexer 0x418023e9e000 .data 0x417fe3efd040 .bss 0x417fe3efd4e0
2012-10-16T09:07:32.459Z cpu28:4124)vmklinux_9 0x418023ea2000 .data 0x417fe3eff080 .bss 0x417fe3f0d340
2012-10-16T09:07:32.459Z cpu28:4124)vmklinux_9_2_0_0 0x418023f15000 .data 0x417fe3f120c0 .bss 0x417fe3f1c868
2012-10-16T09:07:32.459Z cpu28:4124)tpm_tis 0x418023f16000 .data 0x417fe3f1d0e0 .bss 0x417fe3f1d300
2012-10-16T09:07:32.459Z cpu28:4124)random 0x418023f19000 .data 0x417fe3f1e140 .bss 0x417fe3f1e880
2012-10-16T09:07:32.459Z cpu28:4124)usb 0x418023f1d000 .data 0x417fe3f22160 .bss 0x417fe3f24100
2012-10-16T09:07:32.459Z cpu28:4124)ehci-hcd 0x418023f3a000 .data 0x417fe3f251a0 .bss 0x417fe3f256a0
2012-10-16T09:07:32.459Z cpu28:4124)hid 0x418023f44000 .data 0x417fe3f261c0 .bss 0x417fe3f267c0
2012-10-16T09:07:32.459Z cpu28:4124)dm 0x418023f49000 .data 0x417fe3f27200 .bss 0x417fe3f27200
2012-10-16T09:07:32.459Z cpu28:4124)nmp 0x418023f4b000 .data 0x417fe3f28240 .bss 0x417fe3f2bd20
2012-10-16T09:07:32.459Z cpu28:4124)vmw_satp_local 0x418023f6b000 .data 0x417fe3f2c260 .bss 0x417fe3f2c2b0
2012-10-16T09:07:32.459Z cpu28:4124)vmw_satp_default_aa 0x418023f6d000 .data 0x417fe3f2d270 .bss 0x417fe3f2d270
2012-10-16T09:07:32.459Z cpu28:4124)vmw_psp_lib 0x418023f6e000 .data 0x417fe3f2e280 .bss 0x417fe3f2e610
2012-10-16T09:07:32.459Z cpu28:4124)vmw_psp_fixed 0x418023f70000 .data 0x417fe3f2f290 .bss 0x417fe3f2f290
2012-10-16T09:07:32.459Z cpu28:4124)vmw_psp_rr 0x418023f72000 .data 0x417fe3f302a0 .bss 0x417fe3f30330
2012-10-16T09:07:32.459Z cpu28:4124)vmw_psp_mru 0x418023f75000 .data 0x417fe3f312b0 .bss 0x417fe3f312b0
2012-10-16T09:07:32.459Z cpu28:4124)libata 0x418023f77000 .data 0x417fe3f322c0 .bss 0x417fe3f35ba0
2012-10-16T09:07:32.459Z cpu28:4124)usb-storage 0x418023f96000 .data 0x417fe3f36300 .bss 0x417fe3f3ac40
2012-10-16T09:07:32.459Z cpu28:4124)vfat 0x418023fa2000 .data 0x417fe3f3c340 .bss 0x417fe3f3e3c0
2012-10-16T09:07:32.459Z cpu28:4124)vprobe 0x418023fab000 .data 0x417fe3f3f380 .bss 0x417fe3f4b200
2012-10-16T09:07:32.459Z cpu28:4124)vmci 0x418023fdc000 .data 0x417fe3f793c0 .bss 0x417fe3f7e380
2012-10-16T09:07:32.459Z cpu28:4124)iscsi_trans 0x418023ffc000 .data 0x417fe3f7f400 .bss 0x417fe3f80820
2012-10-16T09:07:32.459Z cpu28:4124)etherswitch 0x418024007000 .data 0x417fe3f81440 .bss 0x417fe3f91360
2012-10-16T09:07:32.459Z cpu28:4124)netsched 0x41802402a000 .data 0x417fe3f92480 .bss 0x417fe3f95400
2012-10-16T09:07:32.459Z cpu28:4124)cnic_register 0x418024030000 .data 0x417fe3f964c0 .bss 0x417fe3f96760
2012-10-16T09:07:32.459Z cpu28:4124)be2net 0x418024032000 .data 0x417fe3f974e0 .bss 0x417fe3f983c0
2012-10-16T09:07:32.459Z cpu28:4124)usbnet 0x418024055000 .data 0x417fe3f9b520 .bss 0x417fe3f9bbe0
2012-10-16T09:07:32.459Z cpu28:4124)cdc_ether 0x41802405a000 .data 0x417fe3f9c540 .bss 0x417fe3f9c8e0
2012-10-16T09:07:32.459Z cpu28:4124)iscsi_linux 0x41802405c000 .data 0x417fe3f9d580 .bss 0x417fe3f9e040
2012-10-16T09:07:32.459Z cpu28:4124)libfc 0x41802405f000 .data 0x417fe3f9e5a0 .bss 0x417fe3f9f560
2012-10-16T09:07:32.459Z cpu28:4124)libfcoe 0x418024079000 .data 0x417fe3fa05e0 .bss 0x417fe3fa0900
2012-10-16T09:07:32.459Z cpu28:4124)mpt2sas 0x41802407f000 .data 0x417fe3fa1600 .bss 0x417fe3fa2760
2012-10-16T09:07:32.459Z cpu28:4124)lpfc820 0x4180240a8000 .data 0x417fe3fa3640 .bss 0x417fe3fb2d80
2012-10-16T09:07:32.459Z cpu28:4124)lvmdriver 0x418024165000 .data 0x417fe3fb3680 .bss 0x417fe3fb6480
2012-10-16T09:07:32.459Z cpu28:4124)deltadisk 0x418024179000 .data 0x417fe3fb86c0 .bss 0x417fe3fbbac0
2012-10-16T09:07:32.459Z cpu28:4124)multiextent 0x418024195000 .data 0x417fe3fbc700 .bss 0x417fe3fbc780
2012-10-16T09:07:32.459Z cpu28:4124)vmw_satp_svc 0x418024197000 .data 0x417fe3fbd710 .bss 0x417fe3fbd718
2012-10-16T09:07:32.459Z cpu28:4124)heartbeat 0x418024199000 .data 0x417fe3fbe740 .bss 0x417fe3fcd400
2012-10-16T09:07:32.459Z cpu28:4124)shaper 0x4180241a9000 .data 0x417fe3fcd780 .bss 0x417fe3fd1400
2012-10-16T09:07:32.460Z cpu28:4124)cdp 0x4180241af000 .data 0x417fe3fd17c0 .bss 0x417fe3fe1280
2012-10-16T09:07:32.460Z cpu28:4124)ipfix 0x4180241c2000 .data 0x417fe3fe1800 .bss 0x417fe3fefd00
2012-10-16T09:07:32.460Z cpu28:4124)fence_overlay 0x4180241d1000 .data 0x417fe3ff0840 .bss 0x417fe3ff0c50
2012-10-16T09:07:32.460Z cpu28:4124)tcpip3 0x4180241da000 .data 0x417fe3ff1880 .bss 0x417fe3ff9fe0
2012-10-16T09:07:32.460Z cpu28:4124)dvsdev 0x418024289000 .data 0x417fe400e8c0 .bss 0x417fe400e900
2012-10-16T09:07:32.460Z cpu28:4124)dvfilter 0x41802428c000 .data 0x417fe400f900 .bss 0x417fe4010800
2012-10-16T09:07:32.460Z cpu28:4124)esxfw 0x41802429e000 .data 0x417fe4011940 .bss 0x417fe4020d00
2012-10-16T09:07:32.460Z cpu28:4124)vmkapei 0x4180242b0000 .data 0x417fe4021980 .bss 0x417fe4021ae0
2012-10-16T09:07:32.460Z cpu28:4124)vmkibft 0x4180242b5000 .data 0x417fe40229a0 .bss 0x417fe4025be0
2012-10-16T09:07:32.460Z cpu28:4124)vmfs3 0x4180242b8000 .data 0x417fe4026a00 .bss 0x417fe4027560
2012-10-16T09:07:32.460Z cpu28:4124)nfsclient 0x418024305000 .data 0x417fe4028a40 .bss 0x417fe402c100
2012-10-16T09:07:32.460Z cpu28:4124)ipmi_msghandler 0x418024321000 .data 0x417fe402ca80 .bss 0x417fe402d260
2012-10-16T09:07:32.460Z cpu28:4124)ipmi_si_drv 0x41802432a000 .data 0x417fe402daa0 .bss 0x417fe402e360
2012-10-16T09:07:32.460Z cpu28:4124)ipmi_devintf 0x418024334000 .data 0x417fe402eae0 .bss 0x417fe402eda0
2012-10-16T09:07:32.460Z cpu28:4124)vmkstatelogger 0x418024337000 .data 0x417fe402fb00 .bss 0x417fe4032fe0
2012-10-16T09:07:32.460Z cpu28:4124)migrate 0x418024354000 .data 0x417fe4033b40 .bss 0x417fe40387e0
2012-10-16T09:07:32.460Z cpu28:4124)cbt 0x41802439e000 .data 0x417fe4039b80 .bss 0x417fe4039c00
2012-10-16T09:07:32.460Z cpu28:4124)svmmirror 0x4180243a0000 .data 0x417fe403abc0 .bss 0x417fe403ac40
2012-10-16T09:07:32.460Z cpu28:4124)hbr_filter 0x4180243a4000 .data 0x417fe403bc00 .bss 0x417fe403bd40
2012-10-16T09:07:32.460Z cpu28:4124)vmw_satp_lsi 0x4180243c1000 .data 0x417fe4040c40 .bss 0x417fe4040dc8
Coredump to disk.
I have raised an incident with support who have come back to me saying that its probably donw to the Be2Net driver (Emulex 10GbE NIC)
I am not disagreeing with them, but find it a little suspicous that a driver could cause 6 servers to all crash at the exact same moment and also in the dump log i see PCPU errors, so how could the physical CPU affect the NIC?
The servers were installed using the IBM version of ESXi5 and they are patched to 768111 (At the time, been brought to 821926 now) all other hosts in the same bladecentre (HS22's) did not have any issue.
Any thoughts please?
Andy