These messages are logged in vmkernel.log with WD SATA Harddisks attached to the mainboard-SATA-controller:
2013-06-30T06:15:01.293Z cpu2:2572)<3>ata5: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xa frozen
2013-06-30T06:15:01.293Z cpu2:2572)<3>ata5: irq_stat 0x00400040, connection status changed
2013-06-30T06:15:01.293Z cpu2:2572)<3>ata5: SError: { PHYRdyChg 10B8B DevExch }
2013-06-30T06:15:01.294Z cpu2:2572)<6>ata5: hard resetting link
2013-06-30T06:15:06.251Z cpu2:2572)<6>ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
2013-06-30T06:15:06.319Z cpu3:2572)<6>ata5.00: configured for UDMA/100
2013-06-30T06:15:06.319Z cpu3:2572)<6>ata5: EH complete
Those message repeat several times per hour.
As a consequence,..
- sometimes the harddisk in question gets "grayed out" in vpshere client
- copy processes like migration with veeam fail
- sometimes the whole ESXi becomes unaavailable
- sometimes the HD recovers and is available again for some time.
- in fact the HD is unusable
I have been googling a lot about this and found out, that this probably comes from from libata-driver, but also found something that perhaps this is a problem triggered by the HD itself.
BTW the mainboard is an GA-H55M-USB3; the SATA-controller is listed as "IBEX Peak 6-port SATA-AHCI-controller"
I have been testing with several WD SATA HDs (2x1TB, 1x1,5TB) They all are capapbla of SATA 3Gb/s.
Changing cables and/or SATA-Ports did not change the behaviour, also jumpering the HD to 1.5Gb/s.
On the same ESXi are other HD attached to SATA:
- 1 Intel SSD on vmhba0
- 1 300Gb Samsung SATA (older model)
- 1 500Gb Samsung SATA (2008)
- None of them has problems
The brand-new WD-HD were intended to replace the older HDs.
Yesterday, I simply replaced the WD 1TB by another SAMSUNG 500Gb and the messages did not showup again.
Can anybody help me ?
I cannnot imagine that this should be a problem of newer WD-HDs, but it sounds as being a problem of some HD
Otherwise: Which brand/model is known to work well ?