Hi all,
I posed this question to Luc Dekens (@LucD22) via twitter earlier and he suggested posting it up here... good suggestion.
Scenario:
We have a set of ESX hosts running on blade hardware, currently using dual pass-through 10g out to Top Of Rack Juniper EX4500 switches. I'm proposing with our next design to use in-chassis Force10 MXL switches and uplink to a distributed spine (either Lagg'd 10g or 40g). There's been some resistance from network team claiming we need full speed to all the blades, all the time...... I have a strong suspicion our overall throughput for a given chassis is far less than 320gbit..... but I'd like to prove that with some solid data.
We've got monitoring, but they all use 5minute averages (at best), so transient peaks tend to be missed (and I'd like to capture them if possible).
What I'd like to do:
Generate a set of data that I can use to create a stacked line graph (Each Host OR pNIC would be a line, with total of all of them at each datapoint too). Probably should separate Inbound and Outbound traffic...
I'd like the data to be as granular as possible, so capturing the 20sec averages, calculating a Maximum of the averages over a period would be ideal for each host. Thinking say a datapoint every 30minutes, over a period of X (likely the length of time the script is running allowed to run for?), maybe a failsafe of (X period, configurable?)
However, for the total, it should probably sum each collection point, then calc the max of those for each datapoint (so as to not capture two peaks that don't occur simultaneously, skewing the data.)
It may be worth splitting to a new CSV file every day / some-other-small-period, so long running
Other ways of doing this:
Crank up the stats levels to get the "maximum" values in vCenter at the larger . Ideally not..... it puts reasonable amount of strain on the vCenter DB, plus it will produce skewed data when combining Max values.
Thoughts?
cnidus