Linux software RAID 10 layouts performance: near, far and offset benchmark analysis

Written by Gionatan Danti on . Posted in Linux & Unix

User Rating:  / 27

RAID 10 layouts

RAID10 requires a minimum of 4 disks (in theory, on Linux mdadm can create a custom RAID 10 array using two disks only, but this setup is generally avoided). Depending on the failed disk it can tolerate from a minimum of N / 2 – 1 disks failure (in the case that all failed disk have the same data) to a maximum of N – 2 disks failure (in the case that none of the failed disk has identical data).

RAID10 is the combination of RAID 0 and RAID 1. This can be implemented as a nested levels or as a native levels. In the first case the operating system effectively use two software drivers to manage the RAID array: the first, higher level RAID 0 driver stripes you requests on top of two virtual disks that really are, in turn, a RAID 1 array composed by two or more physical disks managed by a lower level RAID 1 driver. In the latter case (a “native level”), the operating system use a single RAID driver capable to understood this complex RAID level and to directly manage the disks, without relying on other RAID implementations.

Linux software raid has native RAID10 capability, and it exposes three possible layout for RAID10-style array: near (default), far and offset. These layouts have different performance characteristics, so it is important to choose the right layout for your workload. But how they differ?

I prepared three diagrams showing how data layout is affected by the three “near”, “far” and “offset” options. These diagram are somewhat simplified; for a full, detailed explanation you should read md(4) manpage.

The first graph depicts a RAID10 NEAR layout:

As you can see, the default near layout is very similar to a nested RAID1+0 setup. For example, assuming a 2 MB sized write and a 512 KB chunk size, the host write is first broken into two 1 MB stripes and in turn in four 512 KB chunks. Finally each chunk is replicated between consecutive device.

Over a single disk scenario, a four disk RAID10 near setup should have the following maximum (best-case) performance profile:

  • 2x sequential read speed (sequential read access can be striped only over disks with different data)
  • 2x sequential write speed (while writes can engage all four disks, remember that two disks are in RAID1 fashion)
  • 4x random read speed (random read are bound to the number of active spindles, four in this example)
  • 2x random write speed (again, writes need to be replicated).

Now it is the turn of FAR layout:

RAID10 FAR layout

As you can see, things are considerably different: the disks are effectively traversed by two RAID0 sets, and each half-disk stores a different set of data. The second stripe replicates the first one. Please note that the mirrored chunk continue to be the “base unit” of the array, meaning that a failed disk will not bring down the entire array.

Maximum performances over single-disk mode should be:

  • 4x sequential read
  • 2x sequential write
  • 4x random read
  • 2x random write

So, it seems that the far layout is always better or at least on par with near, right? Well, no. The far layout has a weak point: as the two data copies are placed far away to each other, in random write and mixed random read/write workloads the disks will spend much more time in seeking. As seek time is the dominant factor in random workloads, and random workloads are generally the dominant usage pattern, chances are that a far layout will performs quite lower that a near layout.

Finally, we have the OFFSET layout:

RAID10 OFFSET layout

As the name imply, it is somewhat similar to a “far” layout, but with the difference that the multiple data copies are placed quite near each other. For example, A's copy is placed on the consecutive disk at a one-chunk offset from original A location.

Maximum performance over single-disk are equal to far layout:

  • 4x sequential read
  • 2x sequential write
  • 4x random read
  • 2x random write

But we have a difference: as the data copies are near the original location, disk seek time is greatly reduced compared to the standard far layout. This means that the single critical problem of far layout should be solved, without impacting too much on its very good sequential speed (that are going to be only a little lower).

UPDATE 22/09/2013: as noted by reader Alberto Lauretti (thank you!) in comment #7, "offset" layout have slightly lower reliability then near of far layouts. The point is that any failures involving two consecutive disks (eg: first and second disks, second and third disks, ecc) will lead to data loss. This is a direct consequence of having your data "scrambled" by the offset layout. Anyway, this should be of minor concern: any RAID10 array with a failed disk should be immediately repaired replacing the failed disk, as even near and far layouts are exposed to data loss if a second disk failure happens (albeit with lower probability than offset layout).

Will benchmark results confirm of deny these considerations? Let's see...


#11 Eli Vaughan 2014-03-19 17:05
Without getting into the holy war of near/far/offset performance/rel iability...

You responded to someone that the option for creating said arrays used the "-p [layout]" option. however, i wanted to point out that (with a performance hit) you can use different options than simply near, far, offset. you can store multiple copies of the mirror (more then 2 mirrors) by simply specifying. this will help redundancy, at an obvious hit on performance.

--layout=n3 3 near copies
--layout=f3 3 far copies
--layout=o3 3 offset copies

Just a note. Great write up.
#12 Rüdiger Meier 2017-02-28 12:51
I wonder why you write for "near layout
"2x sequential read speed (sequential read access can be striped only over disks with different data)

Shouldn't it be possible to read blocks A,B,C,D also from 4 different disks?

I guess the far-layout advantage for sequential reads is because rotating disks are usually faster at the beginning of the disk. So when reading far-layout it's possible to only use the first half of each disk.

And here is maybe one disadvantage of far-layout: I guess it's not possible to make all disks larger (or smaller) to enlarge (or shrink) the array space without rebuilding the whole array. This should be no problem for near and offset.
#13 Gionatan Danti 2017-02-28 16:37
Quoting Rüdiger Meier:

Shouldn't it be possible to read blocks A,B,C,D also from 4 different disks?

Basically, the answer is NO, for two reasons:

1) the kernel md driver can dispatch a single, large read request to chunked/striped disks only. This means that the "mirror" drives (in a RAID10 setup) are not engaged by single sequential read requests. I just recently tested a 4-way RAID1 mirror and, while multiple concurrent random read requests scaled very well (4x the single drive result), single sequential read requests were no faster than single drive.

2) even if the kernel splits a single large request and dispatch its chunks to different mirrored drives (and it does NOT that), you had to consider that, due to how data are physically layed out on the disk platter, scaling would be much less than ideal. For example, lets consider how data on the first disks pair of a RAID10 "near" layout are placed:


If a request requires both A and B chunks, it can theoretically engage both disks (and I repeat: with current kernels this does NOT happen), with a corresponding increasing in throughput. However, if a subsequent request require C and D chunks, you had to consider that DISK1's heads MUST travel over the (redundant) B chunks, wasting potential bandwidth.

In short: while RAID1 near layout is very good for random reads, it fall short of offset/far for sequential reads. Anyway, random reads often are the most frequent access pattern, rather than large sequential IO.


Add comment

Security code