Wednesday, July 29, 2009

The RAID language

Did you ever wonder about all the options and sub-options you have to choose from when setting up a RAID array?
So I had to do build a new server this morning and wanted to really understand all these terms, once and for all.
I’m working with HP servers but the RAID language is (mostly) universal:

let's start with the basic, what is RAID
Redundant Array of Inexpensive Disks. Listed are the most commonly used configurations:

RAID 0 - No Fault Tolerance
RAID 1+0 - Drive Mirroring
RAID 4 - Data Guarding
RAID 5 - Distributed Data Guarding
RAID 6 (ADG) - Advanced Data Guarding
RAID 0 - No Fault Tolerance; indicates that there is no fault tolerance method used. However, the data is striped across all physical drives in the array for rapid access.
If you select this option for any of your logical drives, you will experience data loss for that logical drive if one physical drive fails. However, because none of the capacity of the logical drives is used for redundant data, this method offers the best processing speed and capacity. You may consider assigning RAID 0 to drives that require large capacity and high speed, but pose no safety risk.
RAID 1+0 - Drive Mirroring is a fault tolerance method that uses 50 percent of drive storage capacity to provide greater data reliability by storing a duplicate of all user data. Half the physical drives in the array are duplicated or "mirrored" by the other half.
RAID 1+0 first mirrors each drive in the array to another, then stripes the data across the mirrored pairs.
Drive mirroring creates fault tolerance by storing two sets of duplicate data on a pair of disk drives. There must be an even number of drives for RAID 1+0. This is the most costly fault tolerance method.
If a drive fails, the mirror drive provides a backup copy of the files and normal system operations are not interrupted. The mirroring feature requires a minimum of two drives, and in a multiple drive configuration (four or more drives), mirroring can withstand multiple simultaneous drive failures as long as the failed drives are not mirrored to each other.
RAID 4 - Data Guarding is a fault tolerance method that uses a small percentage of a drive array storage capacity to store data guard code that is used to recover data if a physical drive fails.
RAID 5 - Distributed Data Guarding is a fault tolerance method that stores parity data across all the physical drives in the array, which allows more simultaneous read operations and higher performance than RAID 4 - Data Guarding. If a drive fails, the controller uses the parity data and the data on the remaining drives to reconstruct data from the failed drive. This allows the system to continue operating with a slightly reduced performance until you replace the failed drive.
RAID 5 requires an array with a minimum of 3 physical drives. The capacity of the logical drive used for fault tolerance depends on the number of physical drives in the array. For example, in an array containing 3 physical drives, only 33 percent of the total logical drive storage capacity is used for parity data; while a 14-drive configuration uses only 7 percent.
RAID 6 (ADG) - Advanced Data Guarding. This fault tolerance method provides the highest level of data protection. It is similar to RAID 5 in that parity data is distributed across all drives in the array, except that multiple separate sets of parity data are used in RAID 6 (ADG), and the capacity of multiple drives is used to store the parity data. Assuming the capacity of two drives is used for parity data, the system will continue to operate even if two drives fail simultaneously, whereas RAID 4 and RAID 5 can only sustain failure of a single drive. The fault tolerance of RAID 6 (ADG) configurations is actually higher than that of RAID 1+0 configurations because in RAID 1+0 there is a chance that two drives mirrored to each other will fail simultaneously.
RAID 6 (ADG) read performance is similar to that of RAID 5, since all drives can service read operations. However, the write performance is lower with RAID 6 (ADG) than with RAID 5, because parity data must be updated on multiple drives. Performance is further reduced in a degraded state.
RAID 6 (ADG) requires an array with a minimum of 2+P physical drives, where P is the number of drives used for parity data; normally, P= 2. The percentage of total drive capacity used for fault tolerance is equal to the number of drives used for parity data divided by the total number of physical drives. For example, in an array containing a total of five physical drives (two of which are used for parity), 40 percent of the total logical drive storage capacity is used for fault tolerance. A 14-drive configuration (again using two drives for parity) uses only 14 percent of total capacity for fault tolerance.

Fault Tolerance
The ability of a server to recover from hardware problems without interrupting the server's performance. Fault tolerance methods, among others, include:

RAID 1+0 - Drive Mirroring
RAID 4 - Data Guarding
RAID 5 - Distributed Data Guarding
RAID 6 (ADG) - Advanced Data Guarding

Logical Drive Extension
This allows you to increase the size of an existing logical drive without disturbing the data on the logical drive. If an existing logical drive is full of data, you can extend the logical drive when there is free space on the array. If there is no free space on the array, you can add drives to the array and proceed to extend the logical drive.

Logical Drives
An equal area from all physical drives in a drive array grouped together logically to act as a single hard drive. Logical drives are configured with software utilities to enhance the performance and usability of drive arrays.

Online Spare/Active SpareOnline Spare - physical drive used in RAID 1+0 - Drive Mirroring, RAID 4 - Data Guarding, RAID 5 - Distributed Data Guarding, and RAID 6 (ADG) - Advanced Data Guarding to provide drive replacement for a failed drive without user intervention. The spare immediately replaces a failed drive as soon as the failure occurs. The controller automatically begins rebuilding the data from the failed drive on the spare to return to a fault tolerant state. The failed drive can be replaced while the system is operating at top performance. The drawback is that the drive is not used while inactive and this reduces the amount of usable storage capacity.
A spare may become an Active Spare if the current Active Spare fails and there are multiple spares available. An Active Spare is a spare that is currently in use by an Array which contains a failed Physical Drive.

Arrays
A group of physical drives configured into one or more logical drives. Arrayed drives have significant performance and data protection advantages over non-arrayed drives.

Array Accelerator
An internal part of the Array Controller that dramatically improves performance of disk read and write operations by providing a buffer. A battery backup and ECC memory protects the data.

Capacity Expansion
A feature that allows an increase in storage capacity of a drive array with the addition of one or more physical drives to the array. With the added space on the array, one or more new logical drives can be created. This feature is available only on Array Controllers that support expansion.

Controller Duplexing/Disk Duplexing
Disk duplexing is a variation of disk mirroring in which each of multiple storage disks has its own SCSI controller. Disk mirroring (also known as RAID-1) is the practice of duplicating data in separate volumes on two hard disks to make storage more fault-tolerant. Mirroring provides data protection in the case of disk failure, because data is constantly updated to both disks. However, since the separate disks rely upon a common controller, access to both copies of data is threatened if the controller fails. Disk duplexing overcomes this problem; the use of redundant controllers enables continued data access as long as one of the controllers continues to function.

Expand Priority
After choosing to expand an array, the level of priority that expanding array capacity should have over handling current operating system requests.

Maximum Boot Size
Max Boot or Maximum Boot Size determines the number of sectors used for the logical drive. When Max Boot is disabled, the logical drive is created with 32 sectors per track. In this configuration, the largest boot drive which can be created is 4 GB. With Max Boot enabled, the controller creates the logical drive with 63 sectors per track which will allow you to create a boot drive which is up to 8 GB in size. We suggest only enabling Max Boot on the drive from which you will boot your server, as a slight performance gain is seen using 32 sectors per track.
The maximum boot size option is initially disabled. Disabling maximum boot size means that the logical drive will report the default of 32 sectors per track to BIOS calls (int13h). Enabling the maximum boot size increases the number of sectors reported in BIOS calls to the maximum of 63 in order to increase the number of blocks available. Enabling maximum boot size may be necessary to create large boot partitions for some operating systems. For example, enabling maximum boot size on a logical drive in Windows NT 4.0 allows you to create a bootable partition with a maximum size of 8 GB, instead of the 4 GB maximum size allowed when maximum boot size is disabled. When a logical drive larger than 255 GB is created, a sector size of 63 will be reported to BIOS calls regardless of whether or not the maximum boot size was enabled.

Migration
A feature that allows you to change the fault tolerance level or stripe size of a configured logical drive without incurring any data loss.

Online Recovery Server
An Array Controller that has been set by the System Configuration Utility to Online Recovery Server mode is a controller that has the ability to dynamically move storage devices from a failed server to an active server. In effect, the storage devices are hot plugged from one system and hot plugged into the new one.

Online Spare
A physical drive used in RAID 1+0 - Drive Mirroring, RAID 4 - Data Guarding, RAID 5 - Distributed Data Guarding, and RAID 6 (ADG) - Advanced Data Guarding to provide drive replacement for a failed drive without user intervention. The spare immediately replaces a failed drive as soon as the failure occurs. The controller automatically begins rebuilding the data from the failed drive on the spare to return to a fault tolerant state. The failed drive can be replaced while the system is operating at top performance. The drawback is that the drive is not used while inactive and this reduces the amount of usable storage capacity.

RAID Overhead
A pre-defined space set aside for RAID redundant information on a logical drive.

Rebuild Priority
After a failed drive has been replaced, the level of priority that rebuilding the data from the failed drive should have over handling current requests from the operating system.

Redundant Controllers
A pair of controllers that have been installed into a system and share a single storage system. The controllers are interconnected either via an Inter-Controller Link (ICL) for 64-Bit or Extended PCI Controllers or internally for Fibre Channel Controllers.
The primary controller of the pair handles all communications and control of the storage system and its attached drives. If the primary controller is no longer able to issue read or write commands to these drives, the secondary controller assumes control.

SCSI
SCSI stands for Small Computer Systems Interface.

SCSI ID
A unique ID assigned to each SCSI device connected to the same SCSI channel. The ID number uniquely defines each peripheral device address and determines the device priority on the bus. ID 7 (SCSI controller) is the highest priority; ID 0 is the lowest.

Selective Storage Presentation
Selective Storage Presentation allows logical drives on an array controller to be shared by multiple servers. A server connects to the array controller by using a host controller that is installed in the server. Selective Storage Presentation allows users to name connections from host controllers to array controllers, and it allows users to grant or deny access to connections for each logical drive. Selective Storage Presentation is currently only supported for fibre channel controllers.

Stripe Size
A stripe is a collection of contiguous data that is distributed evenly across all physical drives in a logical drive. The size of the stripe is selected to optimize the performance of the operating system. Stripe size is synonymous with distribution factor.

Surface Scan Analysis
Surface Scan Analysis is a background process that scans hard drives for bad sectors in fault tolerant logical drives. In RAID 5 or RAID 6 (ADG) configurations, Surface Scan also verifies the consistency of parity data. This process assures that you can recover all data successfully if a drive failure occurs in the future.

Useable Capacity
Space on the array available to the user for logical drives.

No comments:

Post a Comment