The numbers

Ok, I show you the numbers! Carefully watch this graph, because it is the only one you well see in this article:

We can isolate some interesting things:

  • single-thread scenarios show basically no difference: this was expected, as "seeker" don't use asynchronous I/O and, with a single thread, we have a single request waiting for the disk at a time. In other word, we have nothing to reorder;
  • with 32 threads and both NCQ and OS queues to OFF, we see no improvement over single-thread scenario;
  • with 32 threads and NCQ OFF but OS queue ON, the deadline and anticipatory schedulers show about 50% improvement in throughput. CFQ is a little slower, while noop (which is a "dumb" scheduler, being a simple FIFO queue) shows, as expected, no improvement at all;
  • with 32 threads and NCQ ON but OS queue OFF we see comparable results. Note the good showing of noop (higher then CFQ), which prove that hardware queues are very efficient;
  • with 32 thread and both NCQ and OS queues ON, we see the greatest performance. Deadline continue to be the leader, but the surprise here is see a good show from noop: evidently, with NCQ ON even the simplistic noop's FIFO-scheme is useful to keep the disk loaded;
  • finally, with a 1024-entry OS queue we don't see any improvements: this was expected as, testing with only 32 threads, we don't have enough requests to fill this big queue.