Linux software RAID 10 layouts performance: near, far and offset benchmark analysis

Written by Gionatan Danti on . Posted in Linux & Unix

User Rating:  / 31
PoorBest 

Testbed and methods

All tests were performed on this machine:

  • Core i7 860 @ 2.8 Ghz
  • Motherboard Asus P55 Pro
  • 8 GB of DDR3 memory running @ 1600 Mhz
  • Four 1 TB (931.5 GiB) Western Digital Green disks (model WD10EADS-00L5B1, firmware version 01.01A01, ~5400 rpm)
  • Video card GeForce 8400GS
  • S.O. RHEL 6.3 x86_64

 Each disk was connected to a SATA port provided by the P55 chipset and was partitioned in the following mode:

  • a first, EXT4 boot partition of ~0.5 GiB in RAID1
  • a second, EXT4 system partition of ~32 GiB in RAID10 “near” layout
  • a third, SWAP partition of ~4 GiB in RAID10 “near” array
  • a fourth, XFS data partition of ~100 GiB configured in RAID10 array with different layouts (“near, “far”, “offset”).

Please note that, in order to reduce RAID array synchronization time, I use a 100 GiB partition for benchmarks. However, in final production server, the data partition was of about 900 GiB. This has some performance implication that we will discuss later.

For each RAID10 array, I used the default 512 KiB chunk size.

In order to quickly evaluate each RAID array layout, I used Intel I/O Meter with an 8 GiB test file. As it seems that, on Linux, IOMeter is unable to correctly scale up the queue depth above 2 (it uses O_DIRECT for opening the test file, and this serialize and synchronize access to it), I created an high queue depth by running 1, 2 and 4 “worker” threads. This setup simulates the load imposed by 1, 2, 4 and 8 concurrent I/O threads.

I understand that this is limited, imperfect testing. However, all Linux software RAID comparisons that I found seems to be so flawed that I/O Meter results alone should be much, much more accurate anyway.

So, it's time for some numbers...

Comments   

 
#11 Eli Vaughan 2014-03-19 17:05
Without getting into the holy war of near/far/offset performance/rel iability...

You responded to someone that the option for creating said arrays used the "-p [layout]" option. however, i wanted to point out that (with a performance hit) you can use different options than simply near, far, offset. you can store multiple copies of the mirror (more then 2 mirrors) by simply specifying. this will help redundancy, at an obvious hit on performance.

--layout=n3 3 near copies
--layout=f3 3 far copies
--layout=o3 3 offset copies

Just a note. Great write up.
 
 
#12 Rüdiger Meier 2017-02-28 12:51
I wonder why you write for "near layout
"2x sequential read speed (sequential read access can be striped only over disks with different data)

Shouldn't it be possible to read blocks A,B,C,D also from 4 different disks?

I guess the far-layout advantage for sequential reads is because rotating disks are usually faster at the beginning of the disk. So when reading far-layout it's possible to only use the first half of each disk.

And here is maybe one disadvantage of far-layout: I guess it's not possible to make all disks larger (or smaller) to enlarge (or shrink) the array space without rebuilding the whole array. This should be no problem for near and offset.
 
 
#13 Gionatan Danti 2017-02-28 16:37
Quoting Rüdiger Meier:

Shouldn't it be possible to read blocks A,B,C,D also from 4 different disks?


Basically, the answer is NO, for two reasons:

1) the kernel md driver can dispatch a single, large read request to chunked/striped disks only. This means that the "mirror" drives (in a RAID10 setup) are not engaged by single sequential read requests. I just recently tested a 4-way RAID1 mirror and, while multiple concurrent random read requests scaled very well (4x the single drive result), single sequential read requests were no faster than single drive.

2) even if the kernel splits a single large request and dispatch its chunks to different mirrored drives (and it does NOT that), you had to consider that, due to how data are physically layed out on the disk platter, scaling would be much less than ideal. For example, lets consider how data on the first disks pair of a RAID10 "near" layout are placed:

DISK1: A B C D E F G H
DISK2: A B C D E F G H

If a request requires both A and B chunks, it can theoretically engage both disks (and I repeat: with current kernels this does NOT happen), with a corresponding increasing in throughput. However, if a subsequent request require C and D chunks, you had to consider that DISK1's heads MUST travel over the (redundant) B chunks, wasting potential bandwidth.

In short: while RAID1 near layout is very good for random reads, it fall short of offset/far for sequential reads. Anyway, random reads often are the most frequent access pattern, rather than large sequential IO.

Regards.
 

You have no rights to post comments