2014/10/02

Storage Best Practices

Comments welcome.  These are just some things I've decided work best and save hassles:

General Storage

  1. Compression should be used on the back-end storage, not in the VM or host that is mounting the storage.
  2. Data should be grouped in volumes based on general purpose/function, performance requirements, backup requirements, and any isolation requirements. For example:
    1. DB logs should be on a separate hosting aggregate/raid-set from the databases that they help to protect.
    2. General business data should not be on the same volume as Financials data, because they have distinct back-up requirements.
    3. File share data and VM data should not be on the same underlying volume (though they may live on the same RAID set / aggregate.
  3. NAS/SAN volumes and shares should be named to say as briefly as possible what they actually are:
    1. Scratch/temp data should always be named such that the use can see that it is "scratch" data. e.g., "localscratch0", "nas_scratch0".
    2. vi_eng_0  (virtual infrastructure for Engineering's VM's, number 0)
    3. volume IT_Software shared out as \\corp.mycompany.com\LS\IT\Software.
  4. Security
    1. Permissions should be set for at least the share and directly-contained folders using domain-local security groups (using nested groups) that are specific to just that share and those folders.
    2. "Everyone" should almost never be used.
    3. "Deny" permissions should almost never be used; instead, create a group that includes only the desired people/roles.
  5. Scratch data should not be backed up.  Take every relevant opportunity to remind users of that.
  6. Linux
    1. LVM
      1. LVM should be used where possible -- always for the OS; for data disks it may be omitted for cases where separate LUNs are presented for data and there is only one filesystem per LUN.
      2. The OS should use a different Volume group from major application data. (simple LAMP boxes with a single disk device may us all on a single volume group; systems with more disk devices, or where one may wish to restore data LUNs from snapshot should have a separate VG_Data including all physical volumes holding application data
      3. All partitions on a single disk device should only belong to the same Volume Group. (this makes re-assembling a broken system easier... if the data volume group has a physical volume missing/broken or restored from snapshot, the system can still boot if the OS volume group is intact; then the data volume group can be re-assembled.).
      4. Logical volumes should be named with "LV" plus the mount path, substituting "_" for "/"; LV_root for the root "/" filesystem, and LV_swap for the swap filesystem.
      5. /boot should not be on a logical volume.
      6. Logical volumes should always be thick-provisioned
      7. Resizing
        1. Before any resize operation, always back up the data to external storage first!
        2. When growing a logical volume, always grow the logical volume first, using round "g" size, then resize the filesystem (don't specify a size, where possible, and let resize detect the new size).  If the hosting physical volume is to be grown, grow the hosting partition/LUN and then pvextend before growing the LUN.
        3. When shrinking a logical volume, always shrink the filesystem first, using roung "g" size, then resize the logical volume, also using round "g" size specification.  If the hosting partition/LUN is also to be shrunk, then it should be shrunk last using round "g" size specification.
    2. Filesystems should be labeled with the mount path, e.g. "/var/log" where it fits; if the path is too long, then the last unique part of the path should be used, e.g. "postgres-backups".
  7. Windows
    1. Application Data for major applications and those where one may wish to restore data LUNs from snapshot separately from the OS should place application data on separate disk devices (not just separate partitions)
    2. Filesystems should be labeled based on mount/drive path and purpose, e.g. "C_OS", "D_Data", "F_Logs", "G_MailboxDB00", "C_Appdata_logs"

General SAN

  1. LUNs should always be masked on the SAN target device to only allow access from only those initiators who require access to that LUN.
  2. LUNs should be formatted using the  and mounted (fstab) mapped using the multipath device
  3. Multipathing
    1. Multipathing should be used where possible, with each target (redundant initiators, switches/paths, and targets)
    2. Redundant targets should be used, as well: Initiator A should connect on subnet/switch A to the target LUN on SAN head A; initiator B should connect on subnet/switch B to the target LUN through SAN head B.
    3. Active-active with round-robin load balancing should be used.
    4. LUNs should generally be thinly provisioned (except for LUNs hosting essential databases); the monitoring system (Zabbix) should be configured to alert when the hosting aggregate is at 80, 90, and 95% of capacity..

FC

  1. FC switches should use WWN-based zoning
  2. Any host OS installation should include *unplugging* the HBA so that the OS does not wipe all visible LUNs. (If the host OS is being installed for boot-from-SAN, then all LUN masks and target attributes should be triple-checked to avoid inadvertent destruction of data.)
  3. switches in separate paths should be managed separately. (Combining them into a single management domain would combine them into a single fault domain for administrative errors).
  4. WWN's should, where be possible, be aliased all storage devices (switches and SAN) to include the hostname (and optionally a function) such that they can be easily identified, referenced, and so that all related zonings/mappings can be updated only by updating the alias itself. (For example, if an HBA must be replaced, that would invalidate all WWN-based mappings; but since we used an alias for all mappings, we just update the single alias definition that included that WWN.

iSCSI

  1. all IQN's should always be typed as all lower-case.
  2. Initiator IQN names should be:
    1. iqn.2014-01.local:hostname[-boot]
    2. The - is optional, and is only needed if there are several different initiators on a single host
    3. No iscsi traffic should be routed.  In other words, any given initiator should be connecting to a target on the same subnet.
  3. Targets should be mapped using IP address, not Host name.
  4. Jumbo frames (9000-Byte) should be used where possible.

No comments: