2015/08/25

Quickly check hundreds of Linux VM's for read-only filesystems

Sometimes a NAS cluster failover event for routine maintenance may take too long; or you bought the wrong storage solution; or you have a major network or SAN fault... or something else.

Any of these situations can cause your virtual machines to remount their filesystems "read-only".  The proper thing (apart from fixing your infrastructure) to make your VM's more tolerant is to configure SCSI timeouts within each VM, in the case of VMware, or in the case of KVM or Xen on NFS, to change the NFS mount options. (see http://discussions.citrix.com/topic/344713-linux-vm-storage-going-read-only-on-nfs-shares-and-a-proposed-fix-in-xenserver/ for the Citrix XenServer hack.)

To detect this "read-only filesystem" rapidly, I wrote a little script I call "ckrofs".  You use it as follows:

ckrofs ...
or
ckrofs `cat hostlist`



Here's the code:

#!/bin/bash

HostList=`echo ${@} | tr \\\n " "`

PARALLELISM=20

function remoteCheck()

   TargetHost=$1
   ThisHostOutput="### $TargetHost ###"
   ThisHostOutput+="$(ssh -q $TargetHost 'for targetfs in `mount | egrep "ext" | cut -d " " -f 3`; do touch $targetfs/foofile > /dev/null 2>&1 && rm -f $targetfs/foofile  || echo -n " $targetfs appears to be readonly #";  done ' 2>&1 || echo ' unable to log in to host' )"
   echo $ThisHostOutput
}

export -f remoteCheck


echo ${HostList} | xargs -P $PARALLELISM -d" " --replace="HOST" /bin/bash -c 'remoteCheck HOST'