Server, storage and IO metrics that matter need context
There is an old saying that the best I/O (Input/Output) is the one that you do not have to do.
In the meantime, let’s get a side of some context with them IOPS from vendors, marketers and their pundits who are tossing them around for server, storage and IO metrics that matter.
Expanding the conversation, the need for more context
The good news is that people are beginning to discuss storage beyond space capacity and cost per GByte, TByte or PByte for both DRAM or nand flash Solid State Devices (SSD), Hard Disk Drives (HDD) along with Hybrid HDD (HHDD) and Solid State Hybrid Drive (SSHD) based solutions. This applies to traditional enterprise or SMB IT data center with physical, virtual or cloud based infrastructures.
This is good because it expands the conversation beyond just cost for space capacity into other aspects including performance (IOPS, latency, bandwidth) for various workload scenarios along with availability, energy effective and management.
Adding a side of context
The catch is that IOPS while part of the equation are just one aspect of performance and by themselves without context, may have little meaning if not misleading in some situations.
Granted it can be entertaining, fun to talk about or simply make good press copy for a million IOPS. IOPS vary in size depending on the type of work being done, not to mention reads or writes, random and sequential which also have a bearing on data throughout or bandwidth (Mbytes per second) along with response time.
However, are those million IOP's applicable to your environment or needs?
Likewise, what do those million or more IOPS represent about type of work being done? For example, are they small 64 byte or large 64 Kbyte sized, random or sequential, cached reads or lazy writes (deferred or buffered) on a SSD or HDD?
How about the response time or latency for achieving them IOPS?
In other words, what is the context of those metrics and why do they matter?
Click on image to view more metrics that matter including IOP's for HDD and SSD's
Metrics that matter give context for example IO sizes closer to what your real needs are, reads and writes, mixed workloads, random or sequential, sustained or bursty, in other words, real world reflective.
As with any benchmark take them with a grain (or more) of salt, they key is use them as an indicator then align to your needs. The tool or technology should work for you, not the other way around.
Here are some examples of context that can be added to help make IOP's and other metrics matter:
- What is the IOP size, are they 512 byte (or smaller) vs. 4K bytes (or larger)?
- Are they reads, writes, random, sequential or mixed and what percentage?
- How was the storage configured including RAID, replication, erasure or dispersal codes?
- Then there is the latency or response time and IO queue depths for the given number of IOPS.
- Let us not forget if the storage systems (and servers) were busy with other work or not.
- If there is a cost per IOP, is that list price or discount (hint, if discount start negotiations from there)
- What was the number of threads or workers, along with how many servers?
- What tool was used, its configuration, as well as raw or cooked (aka file system) IO?
- Was the IOP's number with one worker or multiple workers on a single or multiple servers?
- Did the IOP's number come from a single storage system or total of multiple systems?
- Fast storage needs fast serves and networks, what was their configuration?
- Was the performance a short burst, or long sustained period?
- What was the size of the test data used; did it all fit into cache?
- Were short stroking for IOPS or long stroking for bandwidth techniques used?
- Data footprint reduction (DFR) techniques (thin provisioned, compression or dedupe) used?
- Were write data committed synchronously to storage, or deferred (aka lazy writes used)?
The above are just a sampling and not all may be relevant to your particular needs, however they help to put IOP's into more contexts. Another consideration around IOPS are the configuration of the environment, from an actual running application using some measurement tool, or are they generated from a workload tool such as IOmeter, IOrate, VDbench among others.
Sure, there are more contexts and information that would be interesting as well, however learning to walk before running will help prevent falling down.
Does size or age of vendors make a difference when it comes to context?
Some vendors are doing a good job of going for out of this world record-setting marketing hero numbers.
Meanwhile other vendors are doing a good job of adding context to their IOP or response time or bandwidth among other metrics that matter. There is a mix of startup and established that give context with their IOP's or other metrics, likewise size or age does not seem to matter for those who lack context.
Some vendors may not offer metrics or information publicly, so fine, go under NDA to learn more and see if the results are applicable to your environments.
Likewise, if they do not want to provide the context, then ask some tough yet fair questions to decide if their solution is applicable for your needs.
Putting this all into context
What this means is let us start putting and asking for metrics that matter such as IOP's with context.
If you have a great IOP metric, if you want it to matter than include some context such as what size (e.g. 4K, 8K, 16K, 32K, etc.), percentage of reads vs. writes, latency or response time, random or sequential.
IMHO the most interesting or applicable metrics that matter are those relevant to your environment and application. For example if your main application that needs SSD does about 75% reads (random) and 25% writes (sequential) with an average size of 32K, while fun to hear about, how relevant is a million 64 byte read IOPS? Likewise when looking at IOPS, pay attention to the latency, particular if SSD or performance is your main concern.
Get in the habit of asking or telling vendors or their surrogates to provide some context with them metrics if you want them to matter.
So how about some context around them IOP's (or latency and bandwidth or availability for that matter)?
Ok, nuff said (for now).
Cheers gs