There have been many articles exploring the performance aspects of file systems, storage systems, and storage devices. Coupled with Throughput (Bytes per second), IOPS (Input/Output Operations per Second) is one of the two measures of performance that are typically examined when discussing storage media. Vendors will publish performance results with data such as “Peak Sequential Throughput is X MB/s” or “Peak IOPS is X” indicating the performance of the storage device. But what does an IOPS really mean and how is it defined?
Typically an IOP is an IO operation where data is sent from an application to the storage device. An IOPS is the measure of how many of these you can perform per second. But notice that the phrase “typically” is used in this explanation. That means there is no hard and fast definition of an IOPS that is standard for everyone. Consequently, as you can imagine, it’s possible to “game” the results and publish whatever results you like. (see a related article, Lies, Damn Lies and File System Benchmarks). That is the sad part of IOPS and Bandwidth – the results can be manipulated to be almost whatever the tester wants.
However, IOPS is a very important performance measure for applications because, believe it or not, many applications perform IO using very small transfer sizes (for example, see this article). How quickly or efficiently a storage system an perform IOPS can drive overall performance of the application. Moreover, today’s systems have lots of cores and run several applications at one time, further pushing the storage performance requirements. Therefore, knowing the IOPS measure of your storage devices is important but you just need to critical of the numbers that are published.
There are several tools that are commonly used for measuring IOPS on systems. The first one is called Iometer, that you commonly see used on Windows systems. The second most common tool is IOzone, which have been used in the articles published on Linux Magazine because it is open-source, easy to build on almost any system, has a great deal of tests and options, and is widely used for storage testing. It is fairly evident at this point that having two tools could lead to some differences in IOPS measurements. Ideally there should be a precise definition of an IOPS with an accepted way to measure it. Then the various tools for examining IOPS would have to prove that they satisfy the definition (“certified” is another way of saying this). But just picking the software tool is perhaps the easiest part of measuring IOPS.
One commonly overlooked aspect of measuring IOPS is the size of the I/O operation (sometimes called the “payload size” using the terminology of the networking world). Is the I/O operation involve just a single byte? Or does it involve 1 MB? Just stating that a device can achieve 1,000 IOPS really tells you nothing. Is that 1,000 1-byte operations per second or 1,000 1MB operations per second?
The most common IO operation size for Linux is 4KB (or just 4K). It corresponds to the page size on almost all Linux systems so usually produces the best IOPS (but not always). Personally, I want to see IOPS measures for a range of IO operation sizes. I like to see 1KB (in case there is some exceptional performance at really small payload sizes), 4KB, 32KB, 64KB, maybe 128KB or 256KB, and 1MB. The reason I like to see a range of payload sizes is that it tells me how quickly the performance drops with payload size which I can then compare to the typical payload size of my application(s) (actually the “spectrum” of payload sizes). But if push comes to shove, I want to at least see the 4KB payload size but most importantly I want the publisher to tell me the payload size they used.
A second commonly overlooked aspect of measuring IOPS is whether the IO operation is a read or write or possibly a mix of them (you knew it wasn’t going to be good when I start numbering discussion points). Hard drives, which have spinning media, usually don’t have much difference between read and write operations and how fast they can execute them. However, SSDs are a bit different and have asymmetric performance. Consequently, you need to define how the IO operations were performed. For example, it could be stated, “This hardware is capable of Y 4K Write IOPS” where Y is the number, which means that the test was just write operations. If you compare some recent results for the two SSDs that were tested (see this article) you can see that SSDs can have very different Read IOPS and Write IOPS performance – sometimes even an order of magnitude different.
Many vendors choose to publish either Read IOPS or Write IOPS but rarely both. Other vendors like to publish IOPS for a mixed operation environment stating that the test was 75% Read and 25% Write. While they should be applauded for stating the mix of IO operations, they should also publish their Read IOPS performance (all read IO operations), and their Write IOPS performance (all write IO operations) so that the IOPS performance can be bounded. At this point in the article, vendors should be publishing the IOP performance measures something like the following:
- 4K Read IOPS =
- 4K Write IOPS =
- (optional) 4K (X% Read/Y% Write) IOPS =
Note that the third bullet is optional and the ratios of read and write IOPS is totally up to the vendor.
A third commonly overlooked aspect of measuring IOPS is whether the IO operations are sequential or random. With sequential IOPS, the IO operations happen sequentially on the storage media. For example block 233 is used for the first IO operation, followed by block 234, followed by block 235, etc. With random IOPS the first IO operation is on block 233 and the second is on block 568192 or something like that. With the right options on the test system, such as a large queue depth, the IO operations can be optimized to improve performance. Plus the storage device itself may do some optimization. With true random IOPS there is much less chance that the server or storage device can optimize the access very much.
Most vendors report the sequential IOPS since typically it has a much larger value than random IOPS. However, in my opinion, random IOPS is much more meaningful, particularly in the case of a server. With a server you may have several applications running at once, accessing different files and different parts of the disk so that to the storage device, the access looks random.
So, at this point in the discussion, the IOPS performance should be listed something like the following:
- 4K Random Read IOPS =
- 4K Random Write IOPS =
- 4K Sequential Read IOPS =
- 4K Sequential Write IOPS =
- (optional) 4K Random (X% Read/Y% Write) IOPS =
The IOPS can be either random or sequential (I like to see both), but at the very least they should publish if the IOPS are sequential or random.
A fourth commonly overlooked aspect of measuring IOPS is the queue depth. With Windows storage benchmarks, you see the queue depth adjusted quite a bit in the results. Linux does a pretty good job setting good queue depths so there is much less need to change the defaults. However, the queue depths can be adjusted which can possibly change the performance. Changing the queue depth on Linux is fairly easy.
The Linux IO Scheduler has the functionality to sort the incoming IO request into something called the request-queue where they are optimized for the best possible device access which usually means sequential access. The size of this queue is controllable. For example, you can look at the queue depth for the “sda” disk in a system and change it as shown below:
# cat /sys/block/sda/queue/nr_requests
# echo 100000 > /sys/block/sda/queue/nr_requests
Configuring the queue depth can only be done by root.
At this point the IOPS performance should be published something like the following:
- 4K Random Read IOPS = X (queue depth = Z)
- 4K Random Write IOPS = Y (queue depth = Z)
- 4K Sequential Read IOPS = X (queue depth = Z)
- 4K Sequential Write IOPS = Y (queue depth = Z)
- (optional) 4K Random (X% Read/Y% Write) IOPS = W (queue depth = Z)
Or if you like they need to tell you the queue depth once if it applies to all of the tests.
In the Linux world, not too many “typical” benchmarks try different queue depths since typically the queue depth is 128 already which provides for good performance. However, depending upon the workload or the benchmark, you can adjust the queue depth to produce better performance. However, just be warned that if you change the queue depth for some benchmark, real application performance could suffer.
Notice that it is starting to take a fair amount of work to list the IOPS performance. There are at least four IOPS numbers that need to be reported for a specified queue depth. However, I personally would like to see the IOPS performance for several payload sizes and several queue depths. Very quickly, the number of tests that need to be run is growing quite rapidly. To take the side of the vendors, producing this amount of benchmarking data takes time, effort, and money. It may not be worthwhile for them to perform all of this work if the great masses don’t understand nor appreciate the data. On the other hand, taking the side of the user, this type of data is very useful and important since it can help set expectations when we buy a new storage device. And remember, the customer is always right so we need to continue to ask the vendors for this type of data.
There are several other “tricks” you can do to improve performance including more OS tuning, turning off all cron jobs during testing, locking process to specific cores using numactl, and so on. Covering all of them is beyond this article but you can assume that most vendors like to tune their systems to improve performance (ah – the wonders of benchmarks). One way to improve this situation is to report all details of the test environment (I try to do this) so that one could investigate which options might have been changed. However, for rotating media (hard drives), one can estimate the IOPS performance of single devices (i.e. individual drives).
Estimating IOPS for Rotating Media
For pretty much all rotational storage devices, the dominant factors in determining IOPS performance are seek time, access latency, and rotational speed (but typically we think of rotational speed as affecting seek time and latency). Basically the dominant factors affect the time to access a particular block of data on the storage media and report it back. For rotating media the latency is basically the same for read or write operations, making our life a bit easier.
The seek time is usually reported by disk vendors and is the time it takes for the drive head to move into position to read (or write) the correct track. The latency refers to the amount of time it takes for the specific spot on the drive to be in place underneath a drive head. The sum of these two times is the basic amount of time to read (or write) a specific spot on the drive. Since we’re focusing on rotating media, these times are mechanical so we can safely assume they are much larger than the amount of time to actually read the data or get it back to the drive controller and the OS (remember we’re talking IOPS so the amount of data is usually very small).
To estimate the IOPS performance of a hard drive, we simple use the average of these two times to compute the number of IO operations we can do per second.
Estimated IOPS = 1 / (average latency + average seek time)
For both numbers, the values should be in milliseconds (or at least in the same units – I’ll leave the math up to you). For example, if a disk has an average latency of 3 ms and an average seek time of 4.45 ms, then the estimated IOPS performance is,
Estimated IOPS = 1 / (average latency + average seek time)
Estimated IOPS = 1 / (3 + 4.45 ms)
Estimated IOPS = 1 / (0.00745)
Estimated IOPS = 134
This handy-dandy formula works for rotating media for single drives (SSDs IOPS performance is more difficult to estimate and not as accurate). Estimating performance for storage arrays that have RAID controllers and several drives is much more difficult and is usually not easy to do. However, there are some articles floating around the web that attempt to estimate the performance.
IOPS is one of the important measures of performance of storage devices. Personally I think it is the first performance measure one should examine since IOPS are important to the overall performance of a system. However, there is no standard definition of an IOPS so just like most benchmarks, it is almost impossible to compare values from one storage device to another or one vendor to another.
In the article I tried to explain a bit about IOPS and how they can be influenced by various factors. Hopefully this helps you realize that published IOPS benchmarks perhaps have been “gamed” by vendors and that you should ask for more details on how the values were found. Even better, you can run the benchmarks yourself or even ask posted benchmarks how they tested for IOPS performance.