2014/10/06

XenServer boot from iSCSI



UPDATE: XenServer boot from iSCSI (at least with iBFT) is just plain awful. Don't bother.  Multipath does not work. Too many hacks to try to get things to work. It is not supported, anyway, and it will probably break every time you do an upgrade.  Plus, the NICs that you use for iSCSI boot will be unusable for any other purpose.  This is aspect of XenServer is very immature and not robust.

  1. Credits (Special thanks for pointers from:)
    1. https://www.krystalmods.com/index.php?title=xenserver-6-supports-iscsi-boot-undocumented-feature&more=1&c=1&tb=1&pb=1
    2. http://serverfault.com/questions/598773/install-xenserver-on-iscsi-target
    3. http://serverfault.com/questions/431864/boot-from-iscsi-how-does-it-work
  2. Notes/Warnings
    1. This was originally writtenf or XenServer 6.2.
    2. The NIC that is used for iSCSI boot will not be available for use for any other purposes (admin network, regular storage network, VM network, etc.)
    3. This is with Intel I350 Gbit NICs, as on a Supermicro X9DRT motherboard, which uses the ibft module.
    4. Once booted, you're in a Busybox environment... some commands will be limited or may not work the way in which you are accustomed on a full Linux system. Consider the information here for more troubleshooting steps to confirm that iSCSI works (LUNs are reachable, access is granted, etc.)
    5. As a best practice, use all lower-case named target and initiator names (some firmwares may silently convert case).
    6. WARNING: If you do not have your LUNs properly masked, do not specify the installation target correctly, and so forth, you may destroy your data. Know what you are doing, and use at your own risk.
  3. Connect up the Intel NICs
  4. Enable the Option ROM in the BIOS and add to the boot order; configure the iSCSI boot-enabled NICs in the NIC ROM set-up (perhaps ctrl-D when prompted after POST and prior to OS boot); You'll want to configure networking, initiator and target name; consider following these SAN best practices
  5. Configure the iscsi target to allow access from this initiator
  6. Boot off the CD and enter the command shell:
    1. Insert the XenServer 6.2 CD and power on the computer.  At the "boot: " prompt, type "shell" and press enter.
  7. Prepare iscsi:
    1. echo "InitiatorName=iqn.2014-01.local:hostname" > /etc/iscsi/initiatorname.iscsi (replace the initiator name and/or hostname as appropriate)
    2. echo "node.session.initial_logon_retry_max = 60" >> /etc/iscsi/iscsid.conf
    3. modprobe iscsi_ibft
    4. modprobe scsi_transport_iscsi
    5. modprobe iscsi_tcp
    6. iscsid -c /etc/iscsi/iscsid.conf -i /etc/iscsi/initiatorname.iscsi -f &
  8. set up multipath (if you have multiple paths)
    1. modprobe dm-multipath
  9. Start the installer
    1. /opt/xensource/installer/init --use_ibft
    2. (use --mpath if you have set up multipath)
    3. (use --device_mapper_multipath=true)
    4. (use --network_device=eth0 , or the correct iSCSI network device, if needed)
Multipath doesn't seem to "just work"... at one point, I had to do this, because if I don't, then installer runs without multipath, but when the init does switchroot, the OS hangs and complains that "a device that was previously mounted without multipath is now mounted with multipath."
  1. Go to http://debaan.blogspot.com/2014/10/iscsi-stuff-in-busybox-on-linux.html and follow those steps to get logged in to both LUNs and get them seen with multipath.
  2. Start the installer as before.  Wait until the last step, when it says that it is ready to reboot.
  3. Alt+F2 to get to a command shell (again).
  4. Set up a chroot:
    1. mount -o bind /dev /tmp/root/dev
    2. mount -o bind /sys /tmp/root/sys
    3. mount -o bind /proc /tmp/root/proc
    4. chroot /tmp/root /bin/bash
    5. cd /boot
    6. re-run the
      iscsiadm -m discovery -t sendtargets -p 10.10.10.20 commands to make the chroot environment aware of the targets
    7. mkinitrd --with=dm-multipath --with=iscsi_ibft --with=scsi_transport_iscsi -f /boot/initrd-2.6.32.43-0.4.1.xs1.8.0.835.170778xen 2.6.32.43-0.4.1.xs1.8.0.835.170778xen
    8.  

iSCSI stuff in Busybox on Linux


  1. Notes/Warnings
    1. This works with Intel I350 Gbit NICs, as on a Supermicro X9DRT motherboard, which uses the igb (1Gbit) or ixgbe (10Gbit) module as the NIC device driver, and the iscsi_ibft module for iscsi boot. You'll have to change the module names if using Broadcom or other chips.
    2. Once booted, you're in a Busybox environment... some commands will be limited or may not work the way in which you are accustomed on a full Linux system.
    3. As a best practice, use all lower-case named target and initiator names (some firmwares may silently convert case).
    4. WARNING: If you do not have your LUNs properly masked, do not specify the installation target correctly, and so forth, you may destroy your data. Know what you are doing, and use at your own risk.
  2. Connect up the Intel NICs
  3. Configure the iscsi target to allow access from this initiator (hint: this is done on your SAN), and mask out other LUNs to be NOT visible to this initiator
  4. Boot off the CD and enter the busybox command environment
  5. Manually configure the ip address and enable the interfaces, and verify connectivity
    1. ifconfig eth0 inet 10.10.10.15 netmask 255.255.255.0 up
    2. route add default gw 10.10.10.1
    3. ping 10.10.10.1; ping www.google.com
    4. At this point you can enable ssh to access this from a different computer over the network, if you need to:
      1. ssh-keygen -f /etc/ssh_host_rsa_key -t rsa  (use an empty passphrase)
      2. ssh-keygen -f /etc/ssh_host_dsa_key -t dsa (use an empty passphrase)
      3. /usr/sbin/sshd
      4. echo "root:supersecret" | chpasswd
      5. (now, copy the hashed password for root from /etc/passwd to /etc/shadow
  6. Prepare iscsi:
    1. mkdir /etc/iscsi
    2. echo "InitiatorName=iqn.2014-01.local:hostname" > /etc/iscsi/initiatorname.iscsi (replace the initiator name and/or hostname as appropriate)
    3. echo "node.startup = automatic" > /etc/iscsi/iscsid.conf
    4. echo "node.session.initial_logon_retry_max = 60" >> /etc/iscsi/iscsid.conf
    5. modprobe iscsi_ibft
    6. modprobe scsi_transport_iscsi
    7. modprobe iscsi_tcp
    8. iscsid -c /etc/iscsi/iscsid.conf -i /etc/iscsi/initiatorname.iscsi -f &
  7. detect iscsi targets (where 10.10.10.20 is the IP address of the iSCSI SAN target) (only do this if you are not going to use the ibft boot stuff from an OS installer)
    1. iscsiadm -m discovery -t sendtargets -p 10.10.10.20
    2. iscsiadm -m node -l
  8. set up multipath (if you have multiple paths)
    1. modprobe dm-multipath
    2. multipath -l -r
    3. multipath -l -r (verify that the paths exist and are recognized...)
  9. At this point, you can use the multipath device to access your iSCSI LUNs as needed.
  10. If you want to run an OS installer, at this point start the installer
    1. (in CentOS or XenServer installation media, append --use_ibft)
    2. (use --mpath if you have set up multipath)
    3. (use --network_device=eth0 , or the correct iSCSI network device, if needed)

2014/10/02

Storage Best Practices

Comments welcome.  These are just some things I've decided work best and save hassles:

General Storage

  1. Compression should be used on the back-end storage, not in the VM or host that is mounting the storage.
  2. Data should be grouped in volumes based on general purpose/function, performance requirements, backup requirements, and any isolation requirements. For example:
    1. DB logs should be on a separate hosting aggregate/raid-set from the databases that they help to protect.
    2. General business data should not be on the same volume as Financials data, because they have distinct back-up requirements.
    3. File share data and VM data should not be on the same underlying volume (though they may live on the same RAID set / aggregate.
  3. NAS/SAN volumes and shares should be named to say as briefly as possible what they actually are:
    1. Scratch/temp data should always be named such that the use can see that it is "scratch" data. e.g., "localscratch0", "nas_scratch0".
    2. vi_eng_0  (virtual infrastructure for Engineering's VM's, number 0)
    3. volume IT_Software shared out as \\corp.mycompany.com\LS\IT\Software.
  4. Security
    1. Permissions should be set for at least the share and directly-contained folders using domain-local security groups (using nested groups) that are specific to just that share and those folders.
    2. "Everyone" should almost never be used.
    3. "Deny" permissions should almost never be used; instead, create a group that includes only the desired people/roles.
  5. Scratch data should not be backed up.  Take every relevant opportunity to remind users of that.
  6. Linux
    1. LVM
      1. LVM should be used where possible -- always for the OS; for data disks it may be omitted for cases where separate LUNs are presented for data and there is only one filesystem per LUN.
      2. The OS should use a different Volume group from major application data. (simple LAMP boxes with a single disk device may us all on a single volume group; systems with more disk devices, or where one may wish to restore data LUNs from snapshot should have a separate VG_Data including all physical volumes holding application data
      3. All partitions on a single disk device should only belong to the same Volume Group. (this makes re-assembling a broken system easier... if the data volume group has a physical volume missing/broken or restored from snapshot, the system can still boot if the OS volume group is intact; then the data volume group can be re-assembled.).
      4. Logical volumes should be named with "LV" plus the mount path, substituting "_" for "/"; LV_root for the root "/" filesystem, and LV_swap for the swap filesystem.
      5. /boot should not be on a logical volume.
      6. Logical volumes should always be thick-provisioned
      7. Resizing
        1. Before any resize operation, always back up the data to external storage first!
        2. When growing a logical volume, always grow the logical volume first, using round "g" size, then resize the filesystem (don't specify a size, where possible, and let resize detect the new size).  If the hosting physical volume is to be grown, grow the hosting partition/LUN and then pvextend before growing the LUN.
        3. When shrinking a logical volume, always shrink the filesystem first, using roung "g" size, then resize the logical volume, also using round "g" size specification.  If the hosting partition/LUN is also to be shrunk, then it should be shrunk last using round "g" size specification.
    2. Filesystems should be labeled with the mount path, e.g. "/var/log" where it fits; if the path is too long, then the last unique part of the path should be used, e.g. "postgres-backups".
  7. Windows
    1. Application Data for major applications and those where one may wish to restore data LUNs from snapshot separately from the OS should place application data on separate disk devices (not just separate partitions)
    2. Filesystems should be labeled based on mount/drive path and purpose, e.g. "C_OS", "D_Data", "F_Logs", "G_MailboxDB00", "C_Appdata_logs"

General SAN

  1. LUNs should always be masked on the SAN target device to only allow access from only those initiators who require access to that LUN.
  2. LUNs should be formatted using the  and mounted (fstab) mapped using the multipath device
  3. Multipathing
    1. Multipathing should be used where possible, with each target (redundant initiators, switches/paths, and targets)
    2. Redundant targets should be used, as well: Initiator A should connect on subnet/switch A to the target LUN on SAN head A; initiator B should connect on subnet/switch B to the target LUN through SAN head B.
    3. Active-active with round-robin load balancing should be used.
    4. LUNs should generally be thinly provisioned (except for LUNs hosting essential databases); the monitoring system (Zabbix) should be configured to alert when the hosting aggregate is at 80, 90, and 95% of capacity..

FC

  1. FC switches should use WWN-based zoning
  2. Any host OS installation should include *unplugging* the HBA so that the OS does not wipe all visible LUNs. (If the host OS is being installed for boot-from-SAN, then all LUN masks and target attributes should be triple-checked to avoid inadvertent destruction of data.)
  3. switches in separate paths should be managed separately. (Combining them into a single management domain would combine them into a single fault domain for administrative errors).
  4. WWN's should, where be possible, be aliased all storage devices (switches and SAN) to include the hostname (and optionally a function) such that they can be easily identified, referenced, and so that all related zonings/mappings can be updated only by updating the alias itself. (For example, if an HBA must be replaced, that would invalidate all WWN-based mappings; but since we used an alias for all mappings, we just update the single alias definition that included that WWN.

iSCSI

  1. all IQN's should always be typed as all lower-case.
  2. Initiator IQN names should be:
    1. iqn.2014-01.local:hostname[-boot]
    2. The - is optional, and is only needed if there are several different initiators on a single host
    3. No iscsi traffic should be routed.  In other words, any given initiator should be connecting to a target on the same subnet.
  3. Targets should be mapped using IP address, not Host name.
  4. Jumbo frames (9000-Byte) should be used where possible.