It is not a secret that while processors, memories and peripherals have constantly increased their speed (sometime in a more then linear manner), the commons storage subsystems where constantly lagging behind the other components.
This is hardly a surprise, considering that the great majority of the commons permanent data storage systems are based on mechanical (rather than electronic) devices. This mechanical nature intrinsically mean that these devices are way slower than processors and other electronic devices. For example, consider the probably most common storage media: the hard disks. While these devices have grown in capacity and offer an outstanding space/cost ratio (today you can buy an high-quality 2 TB disk for less than 200€, while 10 years ago you had to pay about the same money for a 20 GB disk) their speed evolved with a much, much lower rate. This is indeed due to the fact that these mechanical devices have two moving parts: the rotating platters (activated by an electric motor) and the heads (moved around by an actuator).
So, while high capacity guarantee you high sequential read and write speed (because the platter's areal density was improved tremendously over time), random read and write speed are only a little fractions of the maximum theoretical speed and are only a little better than those of a 10 years old disk. At the same time these moving parts also imply that hard disk are prone to fault with an order of magnitude (or more) greater that other pure electronic parts.
The combination of slow speed and high fault rate induce some specialists at University on California to create, in 1987, the specifications of the RAID system. The term RAID stand for “Redundant array of inexpensive disk” and, by using multiple physical disks to create a single, virtual disk (a RAID array), it improve speed and/or it lower the possibility of data loss. For more details about RAID's story, please visit this Wikipedia link: http://en.wikipedia.org/wiki/RAID
While in origin these RAID array where controlled by a dedicated hardware piece (the RAID controller), today most operating system can directly control the RAID array. In this case, we have a software raid setup: the controller's role is assumed by a dedicated software driver that manage the single disks. Sure, an hardware-based RAID controller can often guarantee greater performances (given that it has a powerful processor and dedicated RAM to use as disk cache) but hey – it costs! In contrast, the software raid setup costs you anything more that the disks. So, software RAID are commonly used in low-end server installation, where costs is of utmost importance.
The RAID array can be configured in very different manner, depending on the number of disks and on the finalities of the array. The RAID layout greatly affect its performances and its resilience to data loss, so it is of great importance to understand the pros and cons of each common RAID layout, from both the performance and data loss standpoints. In this article I will try to cover the most commonly used software RAID setup in the Linux environment., in order to provide you with some indications on which layout performs better. However, it is useless to have a very fast storage subsystem that also had a very high possibility to lose all your data. So, before performance results, data loss resilience will be discussed in the recaps of the various RAID levels.