2008/11/14

bond interfaces in Xen

CentOS 5.2 x86_64 / RHEL 5U2

bond type: 802.3ad

We're going to bond eth0 and eth1 in a 802.3ad bond (bond0), and make that the primary bridge for Xen (xenbr0). It will be in vlan access mode (i.e. no vlan trunk). It will also be the primary interface for dom0. Eth2 and Eth3 are present, but not used.

Configure the switchports where eth0 and eth1 are plugged in for 802.3ad, lacp.

/etc/modprobe.conf (bond interface must be first, then load drivers for other network interfaces)
alias bond0 bonding
alias eth0 e1000
alias eth1 e1000
alias eth2 e1000
alias eth3 e1000
change this line in /etc/xen/xend-config.sxp
(network-script 'network-bridge netdev=bond0')
Ensure you have a line in /etc/sysconfig/network to define that bond0 should be used as the default gateway:
GATEWAYDEV=bond0
/etc/sysconfig/network-scripts/ifcfg-bond0 (adjust for your ip settings, use a unique MAC address):
DEVICE=bond0
IPADDR=10.10.10.119
MACADDR=00:00:10:01:01:19
NETMASK=255.255.255.0
NETWORK=10.10.10.0
BROADCAST=10.10.10.255
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
BONDING_OPTS="mode=4 miimon=100"

/etc/sysconfig/network-scripts/ifcfg-eth0 (adjust according to your device's true mac address):
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
MASTER=bond0
SLAVE=yes
HWADDR=00:1E:68:37:FA:92
/etc/sysconfig/network-scripts/ifcfg-eth1 (adjust according to your device's true mac address):
DEVICE=eth1
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
MASTER=bond0
SLAVE=yes
HWADDR=00:1E:68:37:FA:93
/etc/sysconfig/network-scripts/ifcfg-eth2 (adjust according to your device's true mac address):
DEVICE=eth2
BOOTPROTO=dhcp
HWADDR=00:1E:68:37:FA:94
ONBOOT=no
/etc/sysconfig/network-scripts/ifcfg-eth3 (adjust according to your device's true mac address):
DEVICE=eth3
BOOTPROTO=dhcp
HWADDR=00:1E:68:37:FA:95
ONBOOT=no

802.3ad link aggregation

From wikipedia’s article on 802.3ad:
The most common way to balance the traffic is to use L3 hashes. These hashes are calculated when the first connection is established and then kept in the devices' memory for future use. This effectively limits the client bandwidth in an aggregate to its single member's maximum bandwidth per session. This is the main reason why 50/50 load balancing is almost never reached in real-life implementations, more like 70/30. More advanced distribution layer switches can employ an L4 hash, which will bring the balance closer to 50/50.
...
A limitation on link aggregation is that it would like to avoid reordering Ethernet frames. That goal is approximated by sending all frames associated with a particular session across the same link[3]. Depending on the traffic, this may not provide even distribution across the links in the trunk.
So, while 802.3ad does provide us redundancy, and the bandwidth of all connections to and from all hosts over a 802.3ad link will potentially aggregate to the combined bandwidth of the bonded interfaces, each individual client-server connection will be limited to the bandwidth available on a single component link of the bond.

It is arguable, then, that ALB (adaptive load balancing) may be a simpler and equally reliable method for bonding multiple interfaces on a server. In the case of ALB, no switch configuration is required, meaning that you can more readily swap ports on a switch.
…downside to ALB is that, because the server then presents one ip address with two different MAC addresses (one for each physical interface), when one interface loses connectivity, communication with any remote host that was using that physical interface will be lost until an ARP request causes the remote host to associate the IP address with the MAC of the remaining physical interface; some tcp sessions may timeout and fail entirely.

I am unclear as to whether gratuitous ARP is used to accelerate this failover.

I learned something new today.

2008/10/29

ldapsearch against AD (or other directory)

find all computer objects in a domain, print their cname's (host names):

ldapsearch -b "dc=mydomain,dc=com" -D "cn=myadminaccount,ou=Users,dc=mydomain,dc=com" -H ldap://mydomaincontroller/ -W -x "(objectclass=Computer)" cn

2008/08/01

Awk and other text processing tips

awk is great for working with data that is in several columns. 

How to sum the third column?

e.g., calculate total tps across all physical disks from iostat -d output:

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 3.73 30.38 67.80 4864950 10857636
sdb 3.82 30.39 71.74 4866793 11488576
sdc 0.00 0.05 0.00 7208 8


iostat -d |egrep "sd.\ " | awk 'BEGIN {x=0} {x+=$2} END {print x}'

or, less elegantly,

iostat -d| egrep "sd.\ " | awk 'BEGIN {ORS=""}; {print $2"+"}' | ( cat; echo 0)|bc

How to grab just certain columns?

How do I use awk to print the first column, and then the third through the end, for example to grab just the fields I want from an apache log file?

awk '{ print $1" " substr($0, index($0,$6)) }' /var/log/httpd/access_log*

gives us something like

10.95.10.20 "POST /license/associateproduct.php HTTP/1.1" 200 8 "-" "Java/1.6.0_17"
10.95.14.248 "POST /license/authorize.php HTTP/1.1" 200 84 "-" "PycURL/7.19.5"
 

2008/07/16

NDMP Backups with EMC Networker - Quick Start


Table of Contents

Background

Networker Server Setup

Install Base OS and prerequisites

Install and configure Networker App

Configure Licensing for NDMP Backups

Configure backup location, pool, and media (disk based backups are fast for testing)

Client/Management Station Setup

Install Networker Client and Management Center

Configure the Management Console and Networker Server

Configuring for NetApp FAS

Prep the FAS

Prep Networker

Perform the Backup

Configuring for EMC Celerra

Prep Networker

Prep the Celerra

Perform the Backup




Background

Why NetApp and/or EMC has not made such a short and simple guide (that I could find), why a humble sys-admin like myself has had to go through the trial and error process to figure this all out, is only a reflection on the fact that EMC (especially EMC, but to some extent NetApp also,) has become so large as to no longer realize that customers are there. Someone checks "Product Manual" off a release check-list, and calls it good. But none of those people evaluate whether the manual is any good, instead preferring to up-sell to $5000 of videos, or on-site consultants. Nobody thinks that, "Hey, this customer ordered the NDMP option; We should send them a quick-start guide to help them get started with some of the big partners".


This quick guide does provide a quick start guide to performing a disk-based backup via NDMP of your NetApp Filer or EMC Celerra NAS with EMC Networker 7.4. This quick guide does not cover the ins and outs of each platform, data archival and disaster preparedness best practices, etc. This guide was set up using my environment at a company where I know longer work; I cannot provide additional support.


If you are evaluating EMC Networker, ask your sales rep for evaluation license codes for the NDMP and disk backup features; otherwise, you'll need to buy them.


The license codes are all temporary, 30 days.


Networker Server Setup


Install Base OS and prerequisites

Install CentOS 5 i386. Install the X window system (no gnome or KDE is required), as some X libraries are a prerequisite for the networker client (and perhaps other components


Install libgcj and java-1.4.2-gcj-compat for the graphical java components.


service iptables stop

chkconfig iptables off

Disable selinux (set to disabled in /etc/sysconfig/selinux).

Install and configure Networker App

Extract the networker server Linux files as follows:

tar xzf nw74_linux_x86_64.tar.gz

Install the networker server linux component rpms as follows:

rpm -ivh lgtoclnt-7.4-1.i686.rpm lgtolicm-7.4-1.i686.rpm lgtoman-7.4-1.i686.rpm lgtoserv-7.4-1.i686.rpm lgtonode-7.4-1.i686.rpm lgtonmc-3.4-1.i686.rpm

If using DHCP, add DHCP_HOSTNAME=<hostname> to /etc/sysconfig/network-scripts/ifcfg-eth0 . This will cause the DHCP client to also register the hostname in DNS.


Add an entry for the eth0 ip address and hostname to /etc/hosts.


Run these commands to configure and start networker services

/etc/init.d/networker stop

/etc/init.d/gst stop

/opt/lgtonmc/bin/nmc_config

[root@networker ~]# /opt/lgtonmc/bin/nmc_config


NOTE

====

Install has detected the configuration file of a previous lgtonmc

package. Install will attempt to read the configuration parameters

in this file and present them as default values where appropriate.

Please modify any value that is incorrect or needs to be changed.



The Command Line Reports feature of NetWorker Management

Console requires a Java Runtime Environment (JRE) be

installed on this machine. The JRE version should be

1.4.2 or higher, up to (but not including) 1.6.


Is there a supported JRE already installed on this machine [y]? y


Please specify the directory where JRE is installed [/usr/lib/jvm/jre]? /etc/alternatives/jre_1.4.2/


What port should the web server use [9000]?


What port should the GST server use [9001]?


What directory should be used for the LGTOnmc database [/opt/lgtonmc/lgto_gstdb]?


/opt/lgtonmc/lgto_gstdb/lgto_gst.db already exists, do you want to retain this database [y]? n


/opt/lgtonmc/lgto_gstdb/lgto_gst.db already exists, is it okay to remove it [n]? y


Where are the NetWorker binaries installed [/usr/sbin]?


Start daemons at end of configuration [n]?


Creating installation log in /opt/lgtonmc/logs/install.log.


Performing initialization. Please wait...


Installation successful.

/etc/init.d/networker start

/etc/init.d/gst start

chkconfig networker on

chkconfig gst on


Run nsradmin. Type “visual” at the “nsradmin>” prompt to go into visual edit mode. Select type: “NSR”, then you should see your networker server. Select “Edit”, then key down to the “administrator” field. Add ,“*@*” – WARNING: this is only for testing, and will enable anybody on any machine to affect your backups. DON’T FORGET TO CHANGE IT.


Configure Licensing for NDMP Backups

(See client setup for instructions on accessing the Networker Management Console.)


In the Networker Management Console, click Setup LicensingNew, and enter an enabler code for “NetWorker NDMP Client Connection for NetWorker” or “NetWorker NDMP Client Connection – Tier 1”, Networker Server, Storage, DiskBackup, NDMP for NetApp (use your tier in each case) as needed.



Configure backup location, pool, and media (disk based backups are fast for testing)

(See client setup for instructions on accessing the Networker Management Console.)


In the NMC, double-click the server name of your Networker Server.


Go to Devices Nodename Storage Nodes Nodename; right-click and select “Properties”. Change “Node Type” to NDMP.


Add storage location (device) for disk based backups on the Networker Storage node:


Go to Devices Nodename Devices, Right-click, and select “New”. For “Name:”, enter the backup path, and for “Media Type” enter “file”. Click “OK”. Then double-click the newly-created device again to pull up more configuration options.







Create a media pool for the disk-based backup:


Go to Media <Server Name>Media PoolsNew


Label the media for the device. Go to Devices<Server Name>Devices, right-click your disk backup device, and select “Label”. Change the pool to “disk backup pool” (or whatever your pool was). Note: if you don’t label it to the right pool, you’ll get a perpetual notice in /nsr/logs/messages: “NetWorker media: (waiting) Waiting for 1 writable volumes to backup pool 'disk based backup' disk(s) on <servername>”.


Ensure that a volume exists for the media pool. You may have to create this manually

Go to Media <Server Name> Volumes


Client/Management Station Setup


Your workstation will be the management station.


Install Networker Client and Management Center


This table describes the browser, OS, and JRE requirements:


Platform

Browser

AIX

Mozilla 1.7

HP/UX

Mozilla 1.6

Red Hat Enterprise Linux Server 3, 4 and 5
SuSE Linux Enterprise Server 8, 9 and 10

Mozilla 1.7

Solaris 8 or 9

Netscape 7
Mozilla 1.7

Solaris 10

Mozilla 1.7

Windows2003 Enterprise or Datacenter Edition

Internet Explorer 6.x

Windows XP
Windows2000
Windows2003 32-bit

Internet Explorer 5.5 or 6.x

Windows Vista

Internet Explorer 7.0


Environment

Version

Java Runtime Environment

1.4.2 or greater (but not 1.6) for all, except,
1.5.0_11 or greater (but not 1.6) for Windows Vista



You may need to disable the Java 6 runtimes on your workstation from the java control panel (access it in the system control panel.) Also within the Java control panel, you may have to set your temporary file size to 300MB.


Map a drive to \a80IT-Software . Then browse to that drive AppsEMCNetworkerNetworker_7.4networker_7.4-win . Run setup.exe . Select “client” and “networker management console”.


Configure the Management Console and Networker Server


Start up the Networker console by opening a web browser on the management workstation to http://localhost:9000/ , click “start”, and use the credentials administrator:administrator .


Note: The Java app that opens will not receive an icon in the task bar.


Accept the EULA. Change the password as desired. Configure the NLM, Database Backup Server, and Networker server for your networker server FQDN. I receive an error about permissions on the database, but it seems to work okay.


In the Networker Management Console, on the Enterprise tab, your networker server should show up. Click on it, and then double-click the application, “Networker”. This is where you’ll configure



Configuring for NetApp FAS


Prep the FAS


First, ensure that manual entries exist in the hosts files for host name and FQDN for the networker server (e.g., networker), storage node, and NDMP server (e.g., fas270c), each of those nodes. (A correct DNS configuration is not necessarily adequate, and the application’s reverse lookups may fail).


Prep Networker


Configure the client Fas270c:


Go to Configuration <Server Name>ClientsNew


Perform the Backup


Run the command on the legato server:

(this example will perform a full NDMP dump of a volume, a qtree, or a subvolume or sub-qtree)

[root@vm-legato ~]# nsrndmp_save -T dump -M -c fas270c.<fqdn> -g Default /vol/vol0/auto-D


Configuring for EMC Celerra

Prep Networker

Configure the client for the Celerra Data mover in the Networker UI:



Set tape buffer size. (Ignore this step in 5.5.30. default of 128KB; networker disk backup requires 32KB; range is 64-1168)


Prep the Celerra


Log in to the Celerra front-end as nasadmin.

$ server_param server_2 -facility NDMP -modify bufsz -value 32

[nasadmin@ns350cs nasadmin]$ server_param server_2 -facility NDMP -list

server_2 :

param_name facility default current configured

maxProtocolVersion NDMP 4 4

convDialect NDMP '8859-1' '8859-1'

scsiReserve NDMP 1 1

dialect NDMP '' ''

includeCkptFs NDMP 1 1

md5 NDMP 0 0

forceRecursiveForNonDAR NDMP 0 0

snapTimeout NDMP 5 5

bufsz NDMP 128 128

snapsure NDMP 0 0

v4OldTapeCompatible NDMP 1 1


create ndmp user on data mover

[root@ns350cs root]# /nas/sbin/server_user server_2 -add -password ndmp

Creating new user ndmp

User ID: 1000

Group ID: 1000

Comment: NDMP user

Home directory: /home/ndmp

Shell:

Changing password for user ndmp

New passwd: ndmppass

Retype new passwd: ndmppass


Perform the Backup

[root@vm-legato ~] nsrndmp_save -T dump -M -c 192.168.123.156 -g Default /siq_daily.2/foo

2008/05/07

PAM notes

Much of these notes are taken from the Linux-PAM man page, from http://www.kernel.org/pub/linux/libs/pam/Linux-PAM-html/Linux-PAM_SAG.html, and from http://www.hccfl.edu/pollock/AUnix2/PAM-Help.htm .

Linux PAM (Pluggable Authentication Modules) "is a system of libraries that handle the authentication tasks of applications (services) on the system."

Configuration for individual applications/services may reside in /etc/pam.d . If that directory does not exist, then PAM will look for the single config file /etc/pam.conf . The configuration file(s) define the connection between applications (services) and the pluggable authentication modules (PAMs) that perform the actual authentication tasks.

PAM policy/config file syntax
Each line (rule) in a policy file has 4 parts:
  1. context (service type) - what aspect of the user's request for a restricted service does this line affect?
    1. auth -- (authentication) authenticate a user and set up user credentials. Authentication means that the user proves his identity; typically, this involves entering a password, but it may include a hardware based authentication scheme (e.g., smart card). The setting up of user credentials may include setting up group memberships or other privileges.
    2. account -- (authorization?) -- provide account verification types of service: e.g., has the user's password expired? is this user permitted access to the requested service at this time? Are sufficient system resources available? is this account allowed on the console?
    3. password -- update authentication tokens, for example, "please enter a new password". Typically, there is one module for each challenge/response based auth type.
    4. session -- session setup and cleanup; covers things that should be done prior to a service being given, and after it is withdrawn. For example, leaving audit trails, mounting the user's home dir, or unmounting it after logoff.
  2. control - tells PAM how to handle a "fail" result from a module's authentication task. There are two types of syntax for this control field: the simple one has a single simple keyword; the more complicated one involves a square-bracketed selection of value=action pairs.
    1. For the historical/simple control field, valid values are
      1. required -- "if fail, then ultimately fail, but first finish the remaining stacked modules." failure of such a PAM will ultimately lead to the PAM-API returning a failure after the remaining stacked modules (for this service and type) have been invoked.
      2. requisite -- "if fail, then return to the app now with a failure". like required, but in the case where such a module returns a failure, control is directly returned to the application (without attempting the other stacked modules). The returned value is that associated with the first required or requisite module to fail. This flag may protect, for example, against the possibility of a user getting the opportunity to enter credentials over an unsafe medium; but it may also inform an attacker of valid accounts on a system.
      3. sufficient -- "if a prior required module has not failed, then a success here is good enough to return to the app immediately with success". success of such a module is enough to satisfy the authentication requirements of the stack of modules (if a prior required module has failed, then the success of this one is ignored). A failure of this module is not deemed as fatal to satisfying the application that this type has succeeded. If the module succeeds, the PAM framework returns success to the application immediately, without trying other modules
      4. optional -- the success of this module is only important if it is the only module in the stack associated with this service+type.
      5. include -- include all lines of given type from the configuration file specified as an argument to this control. (On recent RH-based systems, individual application files will tend to include system-auth instead of "other".)
      6. substack -- include all lines of given type from the configuration file specified as an argument to this control. This differs from include in that evaluation of the done and actions in a substack does not cause skipping the rest of the complete module stack, but only of the substack. Jumps in a substack also can not make evaluation jump out of it, and the whole substack is counted as one module when the jump is done in a parent stack. The reset action will reset the state of a module stack to the state it was in as of beginning of the substack evaluation.
  3. module path - the PAM module being called
  4. module arguments (optional) -- options passed to the PAM module. This is a space separated list of tokens that can be used to modify the specific behavior of the given PAM. See individual module documentation for details of that module's options. For arguments that include spaces, surround that argument with square brackets.
In the case where the /etc/pam.conf file is used, an additional service field (e.g. login or su, or other (=default)) appears at the beginning of each line, and describes which service the line applies to.

The lines/modules are run in the order in which they occur in the file. They're passed the module options (if any) and user/request info, and they return a pass/fail (or other) result. The modules are run until an overall pass/fail result is reached, and that result is passed back to the service to which the user has requested access.

Any line in the config file that is not formatted correctly will generally tend to make the authentication process fail.

Errors are written to syslog.

Detail on Control types
  1. For the historical/simple control field, valid values are
    1. required -- failure of such a PAM will ultimately lead to the PAM-API returning a failure after the remaining stacked modules (for this service and type) have been invoked.
    2. requisite -- like required, but in the case where such a module returns a failure, control is directly returned to the application (without attempting the other stacked modules). The returned value is that associated with the first required or requisite module to fail. This flag may protect, for example, against the possibility of a user getting the opportunity to enter credentials over an unsafe medium; but it may also inform an attacker of valid accounts on a system.
    3. sufficient -- success of such a module is enough to satisfy the authentication requirements of the stack of modules (if a prior required module has failed, then the success of this one is ignored). A failure of this module is not deemed as fatal to satisfying the application that this type has succeeded. If the module succeeds, the PAM framework returns success to the application immediately, without trying other modules
    4. optional -- the success of this module is only important if it is the only module in the stack associated with this service+type.
    5. include -- include all lines of given type from the configuration file specified as an argument to this control. (On recent RH-based systems, individual application files will tend to include system-auth instead of "other".)
    6. substack -- include all lines of given type from the configuration file specified as an argument to this control. This differs from include in that evaluation of the done and actions in a substack does not cause skipping the rest of the complete module stack, but only of the substack. Jumps in a substack also can not make evaluation jump out of it, and the whole substack is counted as one module when the jump is done in a parent stack. The reset action will reset the state of a module stack to the state it was in as of beginning of the substack evaluation.
  2. The more complicated syntax is as follows:
    1. valid control values have the following form: [value1=action1 value2=action2 ...]
    2. ...where valueN corresponds to the return code from the function invoked in that module line, and is one of these:
      1. success, open_err, symbol_err, service_err, system_err, buf_err, perm_denied, auth_err, cred_insufficient, authinfo_unavail, user_unknown, maxtries, new_authtok_reqd, acct_expired, session_err, cred_unavail, cred_expired, cred_err, no_module_data, conv_err, authtok_err, authtok_recover_err, authtok_lock_busy, authtok_disable_aging, try_again, ignore, abort, authtok_expired, module_unknown, bad_item, conv_again, incomplete, and default.
    3. ActionN can be either an unsigned integer, n, signifying an action of "jump over the next n modules in the stack', or it can take one of the following forms:
      1. ignore -- when used with a stack of modules, the module's return status will not contribute to the return code the application obtains.
      2. bad -- this action indicates that the return code should be thought of as indicative of the module failing. If this module is the first in the stack to fail, its status value will be used for that of the whole stack.
      3. die -- equivalent to bad, with the side effect of terminating the module stack and PAM , and immediately returning to the application
      4. ok -- this tells PAM that the administrator thinks this return code should contribute directly to the return code of the full stack of modules. IE, if the former state of the stack would lead to a return of PAM_SUCCESS, then the module's return code will override thei value. However, if the former state of the stack holds some value that is indicative of a module's failure, this 'ok' value will not be used to override that data.
      5. done -- equivalent to ok, with the side effect of terminating the module stack and PAM immediately and returning to the application.
      6. reset -- clear all memory of the state of the module stack and start again with the next stacked module.
    4. Each of the four keywords, required, requisite, sufficient, and optional, have an equivalent expression in terms of the [...] syntax as follows:
      1. required -- [success=ok new_authtok=ok ignore=ignore default=bad]
      2. requisite -- [success=ok new_authtok=ok ignore=ignore default=die]
      3. sufficient -- [success=done new_authtok=done default=ignore]
      4. optional -- [success=ok new_authtok_reqd=ok default=ignore]

2008/03/24

Fix Absolute Symbolic Links from tar or rsync SNAFU

Tar will strip leading slashes from sym links (if you don't use -P), as will rsync in some situations (for example, with chroot=true). This can put you in a world of hurt. Here are some tips for how to fix it.
  • This one won't work for a couple of reasons; first, it can't distinguish between intended relative paths and accidental relative paths; also, cut won't work if any files that have a different character length in any fields of the ls output, such as ownership or date (there's a way to merge whitespace that escapes me, not fmt or pr, but something :
    • find . -type l -print0 | xargs -0 ls -laQ | grep ' -> \"relative/path/stuff'| cut --complement -d " " -f 1-8 | sed -e 's/ -> \"/\;\"\//g' | awk -F\; '{ print "ln -sf " $2 " " $1 }' > fix_symlinks
  • How about this:
    • find . -type l -printf 'ln -sf "/%l" "%p"\n' | grep "\"/accidental/relative/path" > fix_symlinks
    • check the fix_symlinks for things that you actually don't want to change.
    • . ./fix_symlinks
There has got to be a more elegant way to do this. *sigh* if my perl skills were half-decent, it'd probably be a breeze.

Also, note the "symlinks" command in the "symlinks" package.