Looking for some advice on something that has been driving me nuts for a few months.
I'm running ESXi 5.1 on several Dell PowerEdge R805's. I'm using two NetGear ReadyNAS 3200's for storage, they're hosting NFS across gigabit ethernet.
This setup has been working great, and generally speaking I have no problems. However, on every ESXi host, I can see in the logs that the storage sometimes goes into All Paths Down state, then exits APD state. The APD state almost always lasts for exactly 7 seconds. The timing on when this happens does not seem to coincide with anything else that I can tell - it can happen at any time of day. Sometimes it happens multiple times in one day. Other times it might be a week between errors. Sometimes the error is on ALL the ESXi hosts for the same 7 seconds, other times it is only on one ESXi host.
Error example:
It's not an issue for the most part, and performance and operation does not seem to be impacted. HOWEVER - every 3-4 months or so the ReadyNAS units go completely unresponsive. I have to manually power them off and back on again to restore access to the ESXi hosts. I'm wondering if this is related to the sporadic APD error.....
Things I've tried:
- Contact ReadyNAS for troubleshooting (spent a LOT of time with them, they really have no idea on what is causing this)
- Replacement of gigabit switch
- Upgraded firmware on ReadyNAS
- Tried both teamed and active/backup configurations on NAS NICs
Any ideas?