Santosh Gupta's passion for AIX

AIX is short for Advanced Interactive eXecutive. AIX is the UNIX operating system from IBM for RS/6000, pSeries and the latest p5 & p5+ systems. Currently, it is called "System P". AIX/5L the 5L addition to AIX stands for version 5 and Linux affinity. AIX and RS/6000 was released on the 14th of February, 1990 in London. Currently, the latest release of AIX is version 6. AIX 7 beta will be released in Aug 2010, along with the new POWER7 hardware range. Today IBM Pureflex is

Friday, February 29, 2008

Step 32 & 33 Check for cluster Stabilize & VG varied on

Wait for the cluster to stabilize. You can check when the cluster is up by following
commands
a. netstat –i
b. ifconfig –a : look-out for service ip. It will show on each node if the cluster is up.

Check whether the VGs under cluster’s RGs are varied-ON and the filesystems in the
VGs are mounted after the cluster start.

Here test1vg and test2vg are VGs which are varied-ON when the cluster is started and
Filesystems /test2 and /test3 are mounted when the cluster starts.
/test2 and /test3 are in test2vg which is part of the RG which is owned by this node.
32. Perform all the tests such as resource take-over, node failure, n/w failure and verify
the cluster before releasing the system to the customer.

step 30 & 31 Synchronize & start Cluster

Synchronize the cluster:
This will sync the info from one node to second node.
Smitty cl_sync

That’s it. Now you are ready to start the cluster.
Smitty clstart

You can start the cluster together on both nodes or start individually on each node.

You can start the cluster together on both nodes or start individually on each node.

step 29 Adding IP label & RG owned by Node

Add the service IP label for the owner node and also the VGs owned by the owner node
Of this resource group.

Continue similarly for all the resource groups.

step 28 Setting attributes of Resource group

Set attributes of the resource groups already defined:
Here you have to actually assign the resources to the resource groups.
smitty hacmp -> Extended Configuration -> Extended Resource Configuration ->
HACMP extended resource group configuration

step 27 Adding Resource Group

Add Resource Groups:
smitty hacmp -> Extended Configuration -> Extended Resource Configuration ->
HACMP extended resource group configuration

Continue similarly for all the resource groups.
The node selected first while defining the resource group will be the primary owner of
that resource group. The node after that is secondary node.
Make sure you set primary node correctly for each resource group. Also set the failover/fallback policies as per the requirement of the setup

step 26 Defining IP labels

Define the service IP labels for both nodes.
smitty hacmp -> Extended Configuration -> Extended Resource Configuration ->
HACMP extended resource configuration -> Configure HACMP service IP label

step 25 Adding Persistent IP labels

Add a persistent ip label for both nodes.

step 24 Adding persistent IP

Add the persistent IPs:

smitty hacmp -> Extended Configuration -> Extended Topology Configuration ->
Configure HACMP persistent nodes IP label/Addresses

step23 Adding boot IP & Disk heart beat information

Include all the four boot ips (2 for each nodes) in this ether interface already defined.Then include the disk for heartbeat on both the nodes in the diskhb already defined

step 22 Adding device for Disk Heart Beat

Include the interfaces/devices in the ether n/w and diskhb already defined.
smitty hacmp -> Extended Configuration -> Extended Topology Configuration ->
Configure HACMP communication interfaces/devices -> Add communication
Interfaces/devices.

Step21 Adding Communication interface

Add HACMP communication interfaces. (Ether interfaces.)
smitty hacmp -> Extended Configuration -> Extended Topology Configuration ->
Configure HACMP networks -> Add a network to the HACMP cluster.
Select ether and Press enter.
Then select diskhb and Press enter. Diskhb is your non-tcpip heartbeat.

Step20 Discover HACMP config for Network settings

22. Discover HACMP config: This will import for both nodes all the node info, boot ips,
service ips from the /etc/hosts
smitty hacmp -> Extended configurations -> Discover hacmp related information

Step 19 Define Cluster Nodes

19. Define the cluster nodes. #smitty hacmp -> Extended Configuration -> Extended topology configuration -> Configure an HACMP node - > Add a node to an HACMP cluster Define both the nodes on after the other.

Thursday, February 28, 2008

Step 18 to configure HACMP

18. Define cluster name.

Steps 1 to 17 to configure HACMP

Steps to configure HACMP:

1. Install the nodes, make sure the redundancy is maintained for power supplies, n/w and
fiber n/ws. Then Install AIX on the nodes.
2. Install all the HACMP filesets except HAview and HATivoli.
Install all the RSCT filesets from the AIX base CD.
Make sure that the AIX, HACMP patches and server code are at the latest level (ideally
recommended).
4. Check for fileset bos.clvm to be present on both the nodes. This is required to make the
VGs enhanced concurrent capable.
5. V.IMP: Reboot both the nodes after installing the HACMP filesets.
6. Configure shared storage on both the nodes. Also in case of a disk heartbeat, assign a
1GB shared storage LUN on both nodes.
7. Create the required VGs only on the first node. The VGs can be either normal VGs or
Enhanced concurrent VGs. Assign particular major number to each VGs while creating
the VGs. Record the major no. information.
To check the Majar no. use the command:
ls –lrt /dev grep
Mount automatically at system restart should be set to NO.
8. Varyon the VGs that was just created.
9. V.IMP: Create log LV on each VG first before creating any new LV. Give a unique
name to logLV.
Destroy the content of logLV by: logform /dev/loglvname
Repeat this step for all VGs that were created.
10. Create all the necessary LVs on each VG.
11. Create all the necessary file systems on each LV created…..you can create mount pts
as per the requirement of the customer,
Mount automatically at system restart should be set to NO.
12. umount all the filesystems and varyoff all the VGs.

13. chvg –an ---All VGs will be set to do not mount automatically at
System restart.
14. Go to node 2 and run cfgmgr –v to import the shared volumes.
15. Import all the VGs on node 2
use smitty importvg -----import with the same major number as assigned on node
16. Run chvg –an for all VGs on node 2.
17. V.IMP: Identify the boot1, boot2, service ip and persistent ip for both the nodes
and make the entry in the /etc/hosts.

Wednesday, February 20, 2008

HACMP v5.x Disk Heartbeat device configuration

Creating a Disk Heartbeat device in HACMP v5.x

Introduction
This document is intended to supplement existing documentation on how to configure, test, and monitor a disk heartbeat device and network in HACMP/ES V 5.x. This feature is new in V5.1, and it provides another alternative for non-ip based heartbeats. The intent of this document is to provide step-by-step directions as they are currently sketchy in the HACMP v5.1 pubs. This will hopefully clarify several misconceptions that have been brought to my attention.
This example consists of a two-node cluster (nodes GT40 & SL55) with shared ESS vpath devices. If more than two nodes exist in your cluster, you will need N number or non-ip heartbeat networks. Where N represents the number of nodes in the cluster. (i.e. three node cluster requires 3 non-ip heartbeat networks). This creates a heartbeat ring.
It’s worth noting that one should not confuse concurrent volume groups with concurrent resource groups. And note, there is a difference between concurrent volume groups and enhanced concurrent volume groups. A concurrent resource group is one which may be active on more than one node at a type. A concurrent volume group also shares the characteristic that it may be active on more than one node at a time. This is also true for an enhanced concurrent VG; however, in a non-concurrent resource group, the enhanced concurrent VG, while it may be active and not have a SCSI reserve residing on the disk, it’s data is only normally accessed by one system at a time.

Pre-Reqs

In this document, it is assumed that the shared storage devices are already made available and configured to AIX, and that the proper levels of RSCT and HACMP are already installed. Since utilizing enhanced-concurrent volume groups, it is also necessary to make sure that bos.clvm.enh is installed. This is not normally installed as part of a HACMP installation via the installp command.
Disk Heartbeat Details
This provides the ability to use existing shared disks, regardless of disk type, to provide a serial network like heartbeat path. A benefit of this is that one need not dedicate the integrated serial ports for HACMP heartbeats (if supported on the subject systems) or purchase an 8-port asynchronous adapter.
This feature utilizes a special area on the disk previously reserved for “Concurrent Capable” volume groups (traditionally only for SSA disks). Since AIX 5.2 dropped support for the SSA concurrent volume groups, this fit makes it available for use. This also means that the disk chosen for serial heartbeat can be part of a data volume group. (Note Performance Concerns below)

The disk heart beating code went into the 2.2.1.30 version of RSCT. Some recommended APARs bring that to 2.2.1.31. If you've got that level installed, and HACMP 5.1, you can use disk heart beating. The relevant file to look for is /usr/sbin/rsct/bin/hats_diskhb_nim. Though it is supported mainly through RSCT, we recommend AIX 5.2 when utilizing disk heartbeat.

To use disk heartbeats, no node can issue a SCSI reserve for the disk. This is because both nodes using it for heart beating must be able to read and write to that disk. It is sufficient that the disk be in an enhanced concurrent volume group to meet this requirement. (It should also be possible to use a disk that is in no volume group for disk heart beating. RSCT certainly won't care; but HACMP SMIT panels may not be particularly helpful in setting this up.)

Now, in HACMP 5.1 with AIX 5.1, enhanced concurrent mode volume groups can be used only in concurrent (or "online on all available nodes") resource groups. This means that disk heart beating is useful only to people running concurrent configurations, or who can allocate such a volume group/disk (which is certainly possible, though perhaps an expensive approach). In other words, at HACMP 5.1 and AIX 5.1, typical HACMP clusters (with a server and idle standby) will require an additional concurrent resource group with a disk in an enhanced concurrent VG dedicated for heartbeat use. At AIX 5.2, disk heartbeats can exist on an enhanced concurrent VG that resides in a non-concurrent resource group. At AIX 5.2, one may also use the fast disk takeover feature in non-concurrent resource groups with enhanced concurrent volume groups. With HACMP 5.1 and AIX 5.2, enhanced concurrent mode volume groups can be used in serial access configurations for fast disk takeover, along with disk heart beating. (AIX 5.2 requires RSCT 2.3.1.0 or later) That is, the facility becomes usable to the average customer, without committment of additional resource, since disk heart beating can occur on a volume group used for ordinary filesystem and logical volume activity.

Performance Concerns with Disk Heart Beating

Most modern disks take somewhere around 15 milliseconds to service an IO request, which means that they can't do much more than 60 seeks per second. The sectors used for disk heart beating are part of the VGDA, which is at the outer edge of the disk, and may not be near the application data. This means that every time a disk heart beat is done, a seek will have to be done. Disk heart beating will typically (with the default parameters) require four (4) seeks per second. That is each of two nodes will write to the disk and read from the disk once/second, for a total of 4 IOPS. So, if possible, a disk should be selected as a heart beat path that does not normally do more than about 50 seeks per second. The filemon tool can be used to monitor the seek activity on a disk.

In cases where a disk must be used for heart beating that already has a high seek rate, it may be necessary to change the heart beat timing parameters to prevent long write delays from being seen as a failure.

The above cautions as stated apply to JBOD configurations, and should be modified based on the technology of the disk subsystem:
• If the disk used for heart beating is in a controller that provides large amounts of cache - such as the ESS - the number of seeks per second can be much larger
• If the disk used for heart beating is part of a RAID set without a caching front end controller, the disk may be able to support fewer seeks, due to the extra activity required by RAID operations
Pros & Cons of using Disk Heart Beating
Pros:
1. No additional hardware needed.
2. Easier to span greater distances.
3. No loss in usable storage space and can use existing data volume groups.
4. Uses enhanced concurrent vgs which also allows for fast-disk takeover.

Cons:
1. Must be aware of the devices diskhb uses and administer devices properly*
2. Lose the forced down option of stopping cluster services because of enhanced concurrent vg usage.

*I have had a customer delete all their disk definitions and run cfgmgr again to clean up number holes in their device definition list. When they did, obviously , the device names did not come back in the same order as they were before. So the diskhb device assigned to HACMP, was no longer valid as a different device was configured using the old device name and it was not part of an enhanced concurrent vg. Hence diskhb no longer worked, and since the customer did not monitor their cluster either, they were unaware that the diskhb no longer worked.

Configuring Disk Heartbeat

As mentioned previously, disk heartbeat utilizes enhanced-concurrent volume groups. If starting with a new configuration of disks, you will want to create enhanced-concurrent volume groups, either manually, or by utilizing C-SPOC. My example shows using C-SPOC which is the best practice to use here.

If you plan to use an existing volume group for disk heartbeats that is not enhanced concurrent, then you will have to convert them to such using the chvg command. We recommend that the VG be active on only one node, and that the application not be running when making this change run chvg –C vgname to change the VG to enhanced concurrent mode. Vary it off, then run importvg –L vgname on the other node to make it aware that the vg is now enhanced concurrent capable. If using this method, you can skip to the “Creating Disk Heartbeat Devices and Network” section of this document.

Disk and VG Preparation

To be able to use C-SPOC successfully, it is required that some basic IP based topology already exists, and that the storage devices have their PVIDs in both system’s ODMs. This can be verified by running lspv on each system. If a PVID does not exist on each system, it is necessary to run chdev -l -a pv=yes on each system. This will allow C-SPOC to match up the device(s) as known shared storage devices.
In this example, vpath0 on GT40 is the same virtual disk as vpath3 on SL55.
Use C-SPOC to create an Enhanced Concurrent volume group. In the following example, since vpath devices are being used, the following smit screen paths were used.
smitty cl_adminGo to HACMP Concurrent Logical Volume Management Concurrent Volume Groups Create a Concurrent Volume Group with Data Path Devices and press Enter

Choose the appropriate nodes, and then choose the appropriate shared storage devices based on pvids (vpath0 and vpath3 in this example). Choose a name for the VG , desired PP size, make sure that Enhanced Concurrent Mode is set to true and press Enter. (enhconcvg in this example). This will create the shared enhanced-concurrent vg needed for our disk heartbeat. .

It’s a good idea to verify via lspv once this has completed to make sure the device and vg is show appropriately as follows:

GT40#/ lspv
vpath0 000a7f5af78e0cf4 enhconcvg

SL55#/lspv
vpath3 000a7f5af78e0cf4 enhconcvg

Creating Disk Heartbeat Devices and Network

There are two different ways to do this. Since we have already created the enhanced concurrent vg, we can use the discovery method (1) and let HA find it for us. Or we can do this manually via the Pre-defined devices method (2). Following is an example of each.

1) Creating via Discover Method: (See Note)
Enter smitty hacmpExtended ConfigurationDiscover HACMP-related Information from Configured NodesPress Enter

This will run automatically and create a clip_config file that contains the information it has discovered. Once completed, go back to the Extended Configuration menu and chose:

Extended Topology ConfigurationConfigure HACMP Communication Interfaces/DevicesAdd Communication Interfaces/DevicesAdd Discovered Communication Interface and DevicesCommunication Devices Choose appropriate devices (ex. vpath0 and vpath3)

Select Point-to-Point Pair of Discovered Communication Devices to Add
Move cursor to desired item and press F7. Use arrow keys to scroll.
ONE OR MORE items can be selected.
Press Enter AFTER making all selections.
# Node Device Device Path Pvid
> nodeGT40 vpath0 /dev/vpath0 000a7f5af78
> nodeSL55 vpath3 /dev/vpath3 000a7f5af78

Note: Base HA 5.1 appears to have a problem when using the Discovered Devices this method. If you get this error: "ERROR: Invalid node name 000a7f5af78e0cf4".
Then you will need apar IY51594. Otherwise you will have to create via the Pre-Defined Devices method. Once corrected, this section will be completed
2) Creating via Pre-Defined Devices Method

When using this method, it is necessary to create a diskhb network first, then assign the disk-node pair devices to the network. Create the diskhb network as follows:

smitty hacmp  Extended Configuration  Extended Topology Configuration Configure HACMP Networks Add a Network to the HACMP cluster  choose diskhb  Enter desired network name (ex. disknet1)--press Enter

smitty hacmp  Extended Configuration  Extended Topology Configuration  Configure HACMP Communication Interfaces/Devices  Add Communication Interfaces/Devices Add Pre-Defined Communication Interfaces and Devices 
Communication Devices  Choose your diskhb Network Name 

Add a Communication Device

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

[Entry Fields]
* Device Name [GT40_hboverdisk]
* Network Type diskhb
* Network Name disknet1
* Device Path [/dev/vpath0]
* Node Name [GT40]

For Device Name, that is a unique name you can chose. It will show up in your topology under this name, much like serial heartbeat and ttys have in the past.

For the Device Path, you want to put in /dev/. Then choose the corresponding node for this device and device name (ex. GT40). Then press Enter.

You will repeat this process for the other node (ex. SL55) and the other device (vpath3). This will complete both devices for the diskhb network.

Testing Disk Heartbeat Connectivity

Once the device and network definitions have been created, it is a good idea to test it and make sure communications is working properly. If the volume group is varied on in normal mode on one of the nodes, the test will probably not work.

/usr/sbin/rsct/bin/dhb_read is used to test the validity of a diskhb connection. The usage of dhb_read is as follows:

dhb_read -p devicename //dump diskhb sector contents
dhb_read -p devicename -r //receive data over diskhb network
dhb_read -p devicename -t //transmit data over diskhb network

To test that disknet1, in the example configuration, can communicate from nodeB(ex. SL55) to nodeA (ex. GT40), you would run the following commands:

On nodeA, enter:

dhb_read -p rvpath0 -r

On nodeB, enter:

dhb_read -p rvpath3 -t

Note: That the device name is raw device as designated with the “r” proceeding the device name.

If the link from nodeB to nodeA is operational, both nodes will display:

Link operating normally.

You can run this again and swap which node transmits and which one receives. To make the network active, it is necessary to sync up the cluster. Since the volume group has not been added to the resource group, we will sync up once instead of twice.

Add Shared Disk as a Shared Resource

In most cases you would have your diskhb device on a shared data vg. It is necessary to add that vg into your resource group and synchronize the cluster.

smitty hacmp Extended Configuration Extended Resource Configuration > Extended Resource Group Configuration Change/Show Resources and Attributes for a Resource Group and press Enter.

Choose the appropriate resource group, enter the new vg (enhconcvg) into the volume group list and press Enter.

Return to the top of the Extended Configuration menu and synchronize the cluster.

Monitor Disk Heartbeat

Once the cluster is up and running, you can monitor the activity of the disk (actually all) heartbeats via lssrc -ls topsvcs. An example of the output follows:

Subsystem Group PID Status
topsvcs topsvcs 32108 active

Network Name Indx Defd Mbrs St Adapter ID Group ID
disknet1 [ 3] 2 2 S 255.255.10.0 255.255.10.1
disknet1 [ 3] rvpath3 0x86cd1b02 0x86cd1b4f
HB Interval = 2 secs. Sensitivity = 4 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent : 229 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 217 ICMP 0 Dropped: 0
NIM's PID: 28724

Be aware that there is a grace period for heartbeats to start processing. This is normally around 60 seconds. So if you run this command quickly after starting the cluster, you may not see anything at all until heartbeat processing is started after the grace period time has elapsed.

Saturday, February 16, 2008

HACMP failover scenario

HA failover scenarios

1. Graceful
For graceful failover, you can run “smitty clstop” then select graceful option. This will not change anything except stopping the cluster on that node.
Note: If you stop the cluster, check the status using lssrc –g cluster, sometimes clstrmgrES daemon will take long time to stop, DO NOT KILL THIS DAEMON.It will stop automatically after a while.
You can do this on both the nodes

2. Takeover
For takeover, run “smitty clstop” with takeover option, this will stop the cluster on that node and the standby node will take over the pakage
You can do this on both the nodes

3. Soft Pakckage Failover
Run smitty cm_hacmp_resource_group_and_application_management_menu >>>Move a Resource Group to Another Node >>>>select the package name and node name >>>enter
This will move the package from that node to the node that you have selected in the above menu. This method will give lot of troubles in HA 4.5 whereas it runs good on HA 5.2 unless we have any apps startup issues.
You can do this on both the nodes
│
4. Failover Network Adapter(s):
For this type of testing , run “ifconfig enx down” , then package IP will failover to primary adapter. You can not even see any outage or anything.

We can manually (ifconfig enx up) bring it back to original adapter , but better to reboot the server to bring the package back to the original node

5. Hardware Failure (crash):
This is a standard type of testing; run the command “reboot –q” then the node will godown without stopping any apps and come up immediately. The package will failover to the standby node with in 2 min os downtime (Even tough HA failover is fast, some apps will take long time to start

Friday, February 15, 2008

Specifying the default gateway on a specific interface in HACMP

Specifying the default gateway on a specific interface

When you're using HACMP, you usually have multiple network adapters installed and thus multiple network interface to handle with. If AIX configured the default gateway on a wrong interface (like on your management interface instead of the boot interface), you might want to change this, so network traffic isn't sent over the management interface. Here's how you can do this:

First, stop HACMP or do a take-over of the resource groups to another node; this will avoid any problems with applications when you start fiddling with the network configuration.

Then open up a virtual terminal window to the host on your HMC. Otherwise you would loose the connection, as soon as you drop the current default gateway.

Now you need to determine where your current default gateway is configured. You can do this by typing: lsattr -El inet0 and netstat -nr. The lsattr command will show you the current default gateway route and the netstat command will show you the interface it is configured on. You can also check the ODM: odmget -q"attribute=route" CuAt.

Now, delete the default gateway like this:
lsattr -El inet0 | awk '$2 ~ /hopcount/ { print $2 }' | read GW
chdev -l inet0 -a delroute=${GW}

If you would now use the route command to specifiy the default gateway on a specific interface, like this:
route add 0 [ip address of default gateway: xxx.xxx.xxx.254] -if enX
You will have a working entry for the default gateway. But... the route command does not change anything in the ODM. As soon as your system reboots; the default gateway is gone again. Not a good idea.

A better solution is to use the chdev command:
chdev -l inet0 -a addroute=net,-hopcount,0,,0,[ip address of default gateway]
This will set the default gateway to the first interface available.

To specify the interface use:
chdev -l inet0 -a addroute=net,-hopcount,0,if,enX,,0,[ip address of default gateway]
Substitute the correct interface for enX in the command above.

If you previously used the route add command, and after that you use chdev to enter the default gateway, then this will fail. You have to delete it first by using route delete 0, and then give the chdev command.

Afterwards, check with lsattr -El inet0 and odmget -q"attribute=route" CuAt if the new default gateway is properly configured. And ofcourse, try to ping the IP address of the default gateway and some outside address. Now reboot your system and check if the default gateway remains configured on the correct interface. And startup HACMP again!

HACMP topology & usefull commands

Hacmp can be configured in 3 ways.

1. Rotating
2. Cascading
3. Mutual Failover

The cascading and rotating resource groups are the “classic”, pre-HA 5.1 types. The new “custom” type of resource group has been introduced in HA 5.1 onwards.

Cascading resource group:
Upon node failure, a cascading resource group falls over to the available node with the next priority in the node priority list.
Upon node reintegration into the cluster, a cascading resource group falls back to its home node by default.

Cascading without fallback
Thisoption, this means whenever a primary node fails, the package will failover to the next available node in the list and when the primary node comes online then the package will not fallback automatically. We need to move package to its home node at a convenient time.

Rotating resource group:
This is almost similar to Cascading without fallback, whenever package failover to the standby nodes it will never fallback to the primary node automatically, we need to move it manually at our convenience.

Mutual takeover:
Mutual takeover option, which means both the nodes in this type are active-active mode. Whenever fail over happens the package on the failed node will move to the other active node and will run with already existing package. Once the failed node comes online we can move the package manually to that node.

Useful HACMP commands

clstat - show cluster state and substate; needs clinfo.
cldump - SNMP-based tool to show cluster state
cldisp - similar to cldump, perl script to show cluster state.
cltopinfo - list the local view of the cluster topology.
clshowsrv -a - list the local view of the cluster subsystems.
clfindres (-s) - locate the resource groups and display status.
clRGinfo -v - locate the resource groups and display status.
clcycle - rotate some of the log files.
cl_ping - a cluster ping program with more arguments.
clrsh - cluster rsh program that take cluster node names as argument.
clgetactivenodes - which nodes are active?
get_local_nodename - what is the name of the local node?
clconfig - check the HACMP ODM.
clRGmove - online/offline or move resource groups.
cldare - sync/fix the cluster.
cllsgrp - list the resource groups.
clsnapshotinfo - create a large snapshot of the hacmp configuration.
cllscf - list the network configuration of an hacmp cluster.
clshowres - show the resource group configuration.
cllsif - show network interface information.
cllsres - show short resource group information.
lssrc -ls clstrmgrES - list the cluster manager state.
lssrc -ls topsvcs - show heartbeat information.
cllsnode - list a node centric overview of the hacmp configuration.

Sunday, February 10, 2008

HACMP Basics

HACMP Basics

History
IBM's HACMP exists for almost 15 years. It's not actually an IBM product, they bought it from CLAM, which was later renamed to Availant and is now called LakeViewTech. Until august 2006, all development of HACMP was done by CLAM. Nowadays IBM does it's own development of HACMP in Austin, Poughkeepsie and Bangalore

IBM's high availability solution for AIX, High Availability Cluster Multi Processing (HACMP), consists of two components:

•High Availability: The process of ensuring an application is available for use through the use of duplicated and/or shared resources (eliminating Single Points Of Failure – SPOF's)

.Cluster Multi-Processing: Multiple applications running on the same nodes with shared or concurrent access to the data.

A high availability solution based on HACMP provides automated failure detection, diagnosis, application recovery and node reintegration. With an appropriate application, HACMP can also provide concurrent access to the data for parallel processing applications, thus offering excellent horizontal scalability.

What needs to be protected? Ultimately, the goal of any IT solution in a critical environment is to provide continuous service and data protection.

The High Availability is just one building block in achieving the continuous operation goal. The High Availability is based on the availability hardware, software (OS and its components), application and network components.

The main objective of the HACMP is to eliminate Single Points of Failure (SPOF's)

“…A fundamental design goal of (successful) cluster design is the elimination of single points of failure (SPOFs)…”

Eliminate Single Point of Failure (SPOF)
Cluster Eliminated as a single point of failure

Node Using multiple nodes
Power Source Using Multiple circuits or uninterruptible
Network/adapter Using redundant network adapters
Network Using multiple networks to connect nodes.
TCP/IP Subsystem Using non-IP networks to connect adjoining nodes & clients
Disk adapter Using redundant disk adapter or multiple adapters
Disk Using multiple disks with mirroring or RAID
Application Add node for takeover; configure application monitor
Administrator Add backup or every very detailed operations guide
Site Add additional site.

Cluster Components

Here are the recommended practices for important cluster components.

Nodes

HACMP supports clusters of up to 32 nodes, with any combination of active and standby nodes. While it
is possible to have all nodes in the cluster running applications (a configuration referred to as "mutual
takeover"), the most reliable and available clusters have at least one standby node - one node that is normally
not running any applications, but is available to take them over in the event of a failure on an active
node.

Additionally, it is important to pay attention to environmental considerations. Nodes should not have a
common power supply - which may happen if they are placed in a single rack. Similarly, building a cluster
of nodes that are actually logical partitions (LPARs) with a single footprint is useful as a test cluster, but
should not be considered for availability of production applications.
Nodes should be chosen that have sufficient I/O slots to install redundant network and disk adapters.
That is, twice as many slots as would be required for single node operation. This naturally suggests that
processors with small numbers of slots should be avoided. Use of nodes without redundant adapters
should not be considered best practice. Blades are an outstanding example of this. And, just as every cluster
resource should have a backup, the root volume group in each node should be mirrored, or be on a

RAID device.
Nodes should also be chosen so that when the production applications are run at peak load, there are still
sufficient CPU cycles and I/O bandwidth to allow HACMP to operate. The production application
should be carefully benchmarked (preferable) or modeled (if benchmarking is not feasible) and nodes chosen
so that they will not exceed 85% busy, even under the heaviest expected load.
Note that the takeover node should be sized to accommodate all possible workloads: if there is a single
standby backing up multiple primaries, it must be capable of servicing multiple workloads. On hardware
that supports dynamic LPAR operations, HACMP can be configured to allocate processors and memory to
a takeover node before applications are started. However, these resources must actually be available, or
acquirable through Capacity Upgrade on Demand. The worst case situation – e.g., all the applications on
a single node – must be understood and planned for.

Networks

HACMP is a network centric application. HACMP networks not only provide client access to the applications
but are used to detect and diagnose node, network and adapter failures. To do this, HACMP uses
RSCT which sends heartbeats (UDP packets) over ALL defined networks. By gathering heartbeat information
on multiple nodes, HACMP can determine what type of failure has occurred and initiate the appropriate
recovery action. Being able to distinguish between certain failures, for example the failure of a network
and the failure of a node, requires a second network! Although this additional network can be “IP
based” it is possible that the entire IP subsystem could fail within a given node. Therefore, in addition
there should be at least one, ideally two, non-IP networks. Failure to implement a non-IP network can potentially
lead to a Partitioned cluster, sometimes referred to as 'Split Brain' Syndrome. This situation can
occur if the IP network(s) between nodes becomes severed or in some cases congested. Since each node is
in fact, still very alive, HACMP would conclude the other nodes are down and initiate a takeover. After
takeover has occurred the application(s) potentially could be running simultaneously on both nodes. If the
shared disks are also online to both nodes, then the result could lead to data divergence (massive data corruption).
This is a situation which must be avoided at all costs.

The most convenient way of configuring non-IP networks is to use Disk Heartbeating as it removes the
problems of distance with rs232 serial networks. Disk heartbeat networks only require a small disk or
LUN. Be careful not to put application data on these disks. Although, it is possible to do so, you don't want
any conflict with the disk heartbeat mechanism!

Adapters

As stated above, each network defined to HACMP should have at least two adapters per node. While it is
possible to build a cluster with fewer, the reaction to adapter failures is more severe: the resource group
must be moved to another node. AIX provides support for Etherchannel, a facility that can used to aggregate
adapters (increase bandwidth) and provide network resilience. Etherchannel is particularly useful for
fast responses to adapter / switch failures. This must be set up with some care in an HACMP cluster.
When done properly, this provides the highest level of availability against adapter failure. Refer to the IBM
techdocs website: http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101785 for further
details.
Many System p TM servers contain built-in Ethernet adapters. If the nodes are physically close together, it
is possible to use the built-in Ethernet adapters on two nodes and a "cross-over" Ethernet cable (sometimes
referred to as a "data transfer" cable) to build an inexpensive Ethernet network between two nodes for
heart beating. Note that this is not a substitute for a non-IP network.
Some adapters provide multiple ports. One port on such an adapter should not be used to back up another
port on that adapter, since the adapter card itself is a common point of failure. The same thing is true
of the built-in Ethernet adapters in most System p servers and currently available blades: the ports have a
common adapter. When the built-in Ethernet adapter can be used, best practice is to provide an additional
adapter in the node, with the two backing up each other.
Be aware of network detection settings for the cluster and consider tuning these values. In HACMP terms,
these are referred to as NIM values. There are four settings per network type which can be used : slow,
normal, fast and custom. With the default setting of normal for a standard Ethernet network, the network
failure detection time would be approximately 20 seconds. With todays switched network technology this
is a large amount of time. By switching to a fast setting the detection time would be reduced by 50% (10
seconds) which in most cases would be more acceptable. Be careful however, when using custom settings,
as setting these values too low can cause false takeovers to occur. These settings can be viewed using a variety
of techniques including : lssrc –ls topsvcs command (from a node which is active) or odmget
HACMPnim |grep –p ether and smitty hacmp.

Applications
The most important part of making an application run well in an HACMP cluster is understanding the
application's requirements. This is particularly important when designing the Resource Group policy behavior
and dependencies. For high availability to be achieved, the application must have the ability to
stop and start cleanly and not explicitly prompt for interactive input. Some applications tend to bond to a
particular OS characteristic such as a uname, serial number or IP address. In most situations, these problems
can be overcome. The vast majority of commercial software products which run under AIX are well
suited to be clustered with HACMP.

Application Data Location
Where should application binaries and configuration data reside? There are many arguments to this discussion.
Generally, keep all the application binaries and data were possible on the shared disk, as it is easy
to forget to update it on all cluster nodes when it changes. This can prevent the application from starting or
working correctly, when it is run on a backup node. However, the correct answer is not fixed. Many application
vendors have suggestions on how to set up the applications in a cluster, but these are recommendations.
Just when it seems to be clear cut as to how to implement an application, someone thinks of a new
set of circumstances. Here are some rules of thumb:
If the application is packaged in LPP format, it is usually installed on the local file systems in rootvg. This
behavior can be overcome, by bffcreate’ing the packages to disk and restoring them with the preview option.
This action will show the install paths, then symbolic links can be created prior to install which point
to the shared storage area. If the application is to be used on multiple nodes with different data or configuration,
then the application and configuration data would probably be on local disks and the data sets on
shared disk with application scripts altering the configuration files during fallover. Also, remember the
HACMP File Collections facility can be used to keep the relevant configuration files in sync across the cluster.
This is particularly useful for applications which are installed locally.

Start/Stop Scripts
Application start scripts should not assume the status of the environment. Intelligent programming should
correct any irregular conditions that may occur. The cluster manager spawns theses scripts off in a separate
job in the background and carries on processing. Some things a start script should do are:
First, check that the application is not currently running! This is especially crucial for v5.4 users as
resource groups can be placed into an unmanaged state (forced down action, in previous versions).
Using the default startup options, HACMP will rerun the application start script which may cause
problems if the application is actually running. A simple and effective solution is to check the state
of the application on startup. If the application is found to be running just simply end the start script
with exit 0.
Verify the environment. Are all the disks, file systems, and IP labels available?
If different commands are to be run on different nodes, store the executing HOSTNAME to variable.
Check the state of the data. Does it require recovery? Always assume the data is in an unknown state
since the conditions that occurred to cause the takeover cannot be assumed.
Are there prerequisite services that must be running? Is it feasible to start all prerequisite services
from within the start script? Is there an inter-resource group dependency or resource group sequencing
that can guarantee the previous resource group has started correctly? HACMP v5.2 and later has
facilities to implement checks on resource group dependencies including collocation rules in
HACMP v5.3.
Finally, when the environment looks right, start the application. If the environment is not correct and
error recovery procedures cannot fix the problem, ensure there are adequate alerts (email, SMS,
SMTP traps etc) sent out via the network to the appropriate support administrators.
Stop scripts are different from start scripts in that most applications have a documented start-up routine
and not necessarily a stop routine. The assumption is once the application is started why stop it? Relying
on a failure of a node to stop an application will be effective, but to use some of the more advanced features
of HACMP the requirement exists to stop an application cleanly. Some of the issues to avoid are:

Be sure to terminate any child or spawned processes that may be using the disk resources. Consider
implementing child resource groups.
Verify that the application is stopped to the point that the file system is free to be unmounted. The
fuser command may be used to verify that the file system is free.
In some cases it may be necessary to double check that the application vendor’s stop script did actually
stop all the processes, and occasionally it may be necessary to forcibly terminate some processes.
Clearly the goal is to return the machine to the state it was in before the application start script was run.
Failure to exit the stop script with a zero return code as this will stop cluster processing. * Note: This is not the case with start scripts!
Remember, most vendor stop/starts scripts are not designed to be cluster proof! A useful tip is to have stop
and start script verbosely output using the same format to the /tmp/hacmp.out file. This can be achieved
by including the following line in the header of the script: set -x && PS4="${0##*/}"'[$LINENO]

AIX Security Checklist

AIX Security Checklist

AIX Environment Procedures

The best way to approach this portion of the checklist is to do a comprehensive physical inventory of the servers. Serial numbers and physical location would be sufficient.

____Record server serial numbers
____Physical location of the servers

Next we want to gather a rather comprehensive list of both the AIX and pseries inventories. By running these next 4 scripts we can gather the information for analyze.

____Run these 4 scripts: sysinfo, tcpchk, nfsck and nethwchk. (See Appendix A for scripts)
____sysinfo:
____Determine active logical volume groups on the servers: lsvg -o
____List physical volumes in each volume group: lsvg –p "vgname"
____List logical volumes for each volume group: lsvg –l "vgname"
____List physical volumes information for each hard disk
____lspv hdiskx
____lspv –p hdiskx
____lspv –l hdiskx
____List server software inventory: lslpp -L
____List server software history: lslpp –h
____List all hardware attached to the server: lsdev –C | sort –d
____List system name, nodename, LAN network number, AIX release, AIX version and machine ID: uname –x
____List all system resources on the server: lssrc –a
____List inetd services: lssrc –t 'service name' –p 'process id'
____List all host entries on the servers: hostent -S
____Name all nameservers the servers have access to: namerslv –Is
____Show status of all configured interfaces on the server: netstat –i
____Show network addresses and routing tables: netstat –nr
____Show interface settings: ifconfig
____Check user and group system variables
____Check users: usrck –t ALL
____Check groups: grpck –t ALL
____Run tcbck to verify if it is enabled: tcbck
____Examine the AIX failed logins: who –s /etc/security/failedlogin
____Examine the AIX user log: who /var/adm/wtmp
____Examine the processes from users logged into the servers: who –p /var/adm/wtmp
____List all user attributes: lsuser ALL | sort –d
____List all group attributes: lsgroup ALL
____tcpchk:
____Confirm the tcp subsystem installed: lslpp –l | grep bos.net
____Determine if it is running: lssrc –g tcpip
____Search for .rhosts and .netrc files: find / -name .rhosts -print ; find / -name .netrc –print
____Checks for rsh functionality on host: cat /etc/hosts.equiv
____Checks for remote printing capability: cat /etc/hosts.lpd | grep v #
____nfschk:
____Verify NFS is installed: lslpp -L | bin/grep nfs
____Check NFS/NIS status: lssrc -g nfs | bin/grep active
____Checks to see if it is an NFS server and what directories are exported: cat /etc/xtab
____Show hosts that export NFS directories: showmount
____Show what directories are exported: showmount –e
____nethwchk
____Show network interfaces that are connected: lsdev –Cc if
____Display active connection on boot: odmget -q value=up CuAt | grep name|cut -c10-12
___Show all interface status: ifconfig ALL

Root level access

____Limit users who can su to another UID: lsuser –f ALL
____Audit the sulog: cat /var/adm/sulog
____Verify /etc/profile does not include current directory
____Lock down cron access
____To allow root only: rm –i /var/adm/cron/cron.deny and rm –I /var/adm/cron/cron.allow
____To allow all users: touch cron.allow (if file does not already exist)
____To allow a user access: touch /var/adm/cron/cron.allow then echo "UID">/var/adm/cron/cron.allow
____To deny a user access: touch /var/adm/cron/cron.deny then echo "UID">/var/adm/cron/cron.deny
____Disable direct herald root access: add rlogin=false to root in /etc/security/user file or through smit

____Limit the $PATH variable in /etc/environment. Use the users .profile instead.

Authorization/authentication administration

____Report all password inconsistencies and not fix them: pwdck –n ALL
____Report all password inconsistencies and fix them: pwdck –y ALL
____Report all group inconsistencies and not fix them: grpck –n ALL
____Report all group inconsistencies and fix them: grpck –y ALL
____Browse the /etc/shadow, etc/password and /etc/group file weekly

SUID/SGID

____Review all SUID/SGID programs owned by root, daemon, and bin.
____Review all SETUID programs: find / -perm -1000 –print
____Review all SETGID programs: find / -perm -2000 –print
____Review all sticky bit programs: find / -perm -3000 –print
____Set user .profile in /etc/security/.profile

Permissions structures

____System directories should have 755 permissions at a minimum
____Root system directories should be owned by root
____Use the sticky bit on the /tmp and /usr/tmp directories.
____Run checksum (md5) against all /bin, /usr/bin, /dev and /usr/sbin files.
____Check device file permissions:
____disk, storage, tape, network (should be 600) owned by root.
____tty devices (should be 622) owned by root.
____/dev/null should be 777.
____List all hidden files in there directories ( the .files).
____List all writable directories (use the find command).
____$HOME directories should be 710
____$HOME .profile or .login files should be 600 or 640.
____Look for un-owned files on the server: find / -nouser –print.
Note: Do not remove any /dev files.
____Do not use r-type commands: rsh, rlogin, rcp and tftp or .netrc or .rhosts files.
____Change /etc/host file permissions to 660 and review its contents weekly.

____Check for both tcp/udp failed connections to the servers: netstat –p tcp; netstat –p udp.
____Verify contents of /etc/exports (NFS export file).
____If using ftp, make this change to the /etc/inetd.conf file to enable logging.
ftp stream tcp6 nowait root /usr/sbin/ftpd ftpd –l
____Set NFS mounts to –ro (read only) and only to the hosts that they are needed.
____Consider using extended ACL's (please review the tcb man page).
____Before making network connection collect a full system file listing and store it off-line:
ls -Ra -la>/tmp/allfiles.system
____Make use of the strings command to check on files: strings /etc/hosts | grep Kashmir

Recommendations

Remove unnecessary services

By default the Unix operating system gives us 1024 services to connect to, we want to parse this down to a more manageable value. There are 2 files in particular that we want to parse. The first is the /etc/services file itself. A good starting point is to eliminate all unneeded services and add services as you need them. Below is a screenshot of an existing ntp server etc/services file on one of my lab servers.

#
# Network services, Internet style
#
ssh 22/udp
ssh 22/tcp mail
auth 113/tcp authentication
sftp 115/tcp
ntp 123/tcp # Network Time Protocol
ntp 123/udp # Network Time Protocol
#
# UNIX specific services
#
login 513/tcp
shell 514/tcp cmd # no passwords used

Parse /etc/rc.tcpip file

This file starts the daemons that we will be using for the tcp/ip stack on AIX servers. By default the file will start the sendmail, snmp and other daemons. We want to parse this to reflect what
functionality we need this server for. Here is the example for my ntp server.

# Start up the daemons
#
echo "Starting tcpip daemons:"
trap 'echo "Finished starting tcpip daemons."' 0
# Start up syslog daemon (for error and event logging)
start /usr/sbin/syslogd "$src_running"

# Start up Portmapper

start /usr/sbin/portmap "$src_running"

# Start up socket-based daemons
start /usr/sbin/inetd "$src_running"

# Start up Network Time Protocol (NTP) daemon
start /usr/sbin/xntpd "$src_running"

This helps also to better understand what processes are running on the server.

Remove unauthorized /etc/inittab entries

Be aware of what is in the /etc/inittab file on the AIX servers. This file works like the registry in a Microsoft environment. If an intruder wants to hide an automated script, he would want it launched here or in the cron file. Monitor this file closely.

Parse /etc/inetd.conf file

This is the AIX system file that starts system services, like telnet, ftp, etc. We also want to closely watch this file to see if there are any services that have been enabled without authorization. If you are using ssh for example this is what the inetd.con file should look like. Because we are using other internet connections, this file is not used in my environment and should not be of use to you. This is why ssh should be used for all administrative connections into the environment. It provides an encrypted tunnel so connection traffic is secure. In the case of telnet, it is very trivial to sniff the UID and password.

## protocol. "tcp" and "udp" are interpreted as IPv4.
##
## service socket protocol wait/ user server server program
## name type nowait program arguments
##

Edit /etc/rc.net

This is network configuration file used by AIX. This is the file you use to set your default network route along your no (for network options) attributes. Because the servers will not be used as routers to forward traffic and we do not want to use loose source routing at you, we will be making a few changes in this file. A lot of them are to protect from DOS and DDOS attacks from the internet. Also protects from ACK and SYN attacks on the internal network.

##################################################################
##################################################################
# Changes made on 06/07/02 to tighten up socket states on this

# server.

##################################################################
if [ -f /usr/sbin/no ] ; then
/usr/sbin/no -o udp_pmtu_discover=0 # stops autodiscovery of MTU
/usr/sbin/no -o tcp_pmtu_discover=0 # on the network interface
/usr/sbin/no -o clean_partial_conns=1 # clears incomplete 3-way conn.
/usr/sbin/no -o bcastping=0 # protects against smurf icmp attacks
/usr/sbin/no -o directed_broadcast=0 # stops packets to broadcast add.
/usr/sbin/no -o ipignoreredirects=1 # prevents loose
/usr/sbin/no -o ipsendredirects=0 # source routing
/usr/sbin/no -o ipsrcrouterecv=0 # attacks on
/usr/sbin/no -o ipsrcrouteforward=0 # our network
/usr/sbin/no -o ip6srcrouteforward=0 # from using indirect
/usr/sbin/no -o icmpaddressmask=0 # dynamic routes
/usr/sbin/no -o nonlocsrcroute=0 # to attack us from
/usr/sbin/no -o ipforwarding=0 # Stops server from acting like a router
fi

Securing root

Change the /etc/motd banner

This computer system is the private property of XYZ Insurance. It is for authorized use only. All users (authorized or non-authorized) have no explicit or implicit expectations of privacy.

Any or all users of this system and all the files on this system may be intercepted, monitored, recorded, copied, audited, inspected and disclosed to XYZ Insurance's management personnel.

By using this system, the end user consents to such interception, monitoring, recording, copying, auditing, inspection and disclosure at the discretion of such personnel. Unauthorized or improper use of this system may result in civil and/or criminal penalities and administrative or disciplinary action, as deemed appropriate by said actions. By continuing to use this system, the individual indicates his/her awareness of and consent to these terms and conditions of use.

LOG OFF IMMEDIATELY if you do not agree to the provisions stated in this warning banner.

Modify /etc/security/user

root:
loginretries = 5 – failed retries until account locks
rlogin = false – Disables remote herald access to a root shell. Need to su from another UID.
admgroups = system
minage = 0 – minimum aging is no time value
maxage = 4 – maximum aging is set to 30 days or 4 weeks
umask = 22

Tighten up /etc/security/limits

This is an attribute that should be changed due to a runaway resource hog. This orphaned process can grow to use
an exorbinate amount of disk space. To provent this we can set the ulimit value here.

default:
#fsize = 2097151
fsize = 8388604 – sets the soft file block size to a max of 8 Gig.

Variable changes in /etc/profile

Set the $TMOUT variable in /etc/profile. This will cause an open shell to close after 15 minutes of inactivity. It works in conjunction with the screensaver, to prevent an open session to be used to either delete the server or worse corrupt data on the server.

# Automatic logout, include in export line if uncommented
TMOUT=900

4.6.5 Sudo is your friend….

This is a nice piece of code that the system administrators can use in order to allow "root-like" functionality. It allows a non-root user to run system binaries or commands. The /etc/sudoers file is used to configure exactly what the user can do. The service is configured and running on ufxcpidev. The developers are running a script called changeperms in order to tag there .ear files with there own ownership attributes.

First we setup sudo to allow root-like or superuser doer access to sxnair.

# sudoers file.
#
# This file MUST be edited with the 'visudo' command as root.
#
# See the sudoers man page for the details on how to write a sudoers file.
#
# Host alias specification

# User alias specification

# Cmnd alias specification

# User privilege specification
root ALL=(ALL) ALL
sxnair,jblade,vnaidu ufxcpidev=/bin/chown * /usr/WebSphere/AppServer/installedApps/*
#
#
# Override the built in default settings
Defaults syslog=auth

Defaults logfile=/var/log/sudo.log

For more details, please see the XYZ Company Insurance Work Report that I compiled, or visit this
URL: http://www.courtesan.com/sudo/.

Tighten user/group attributes

Change /etc/security/user

These are some of the changes to the /etc/security/user file that will promote a more heightened
configuration of default user attributes at your company.

default:

umask = 077 – defines umask values – 22 is readable only for that UID
pwdwarntime = 7 – days of password expiration warnings
loginretries = 5 – failed login attempts before account is locked
histexpire = 52 – defines how long a password cannot be re-used
histsize = 20 – defines how many previous passwords the system remembers
minage = 2 – minimum number of weeks a password is valid
maxage = 8 – maximum number of weeks a password is valid
maxexpired = 4 – maximum time in weeks a password can be changed after it exp

Friday, February 8, 2008

HACMP log files

/usr/sbin/cluster/etc/rhosts --- to accept incoming communication from clcomdES (cluster communucation enahanced security)
/usr/es/sbin/cluster/etc/rhosts

Note: If there is an unresolvable label in the /usr/es/sbin/cluster/etc/rhosts file,
then all clcomdES connections from remote nodes will be denied.

cluster manager clstrmgrES
cluster lock Daemon (clockdES)
cluster multi peer extension communication daemon (clsmuxpdES)

The clcomdES is used for cluster configuration operations such as cluster synchronisation
cluster management (C-SPoC) * Dynamic re-configuration DARE configuration. (DARE ) operation.

For clcomdES there should be atleast 20 MB free space in /var file system.
/var/hacmp/clcomd/clcomd.log --it requires 2 MB
/var/hacmp/clcomd/clcomdiag.log --it requires 18MB
Additional 1 MB required for
/var/hacmp/odmcache directory

clverfify.log also present in /var directory
/var/hacmp/clverify/current//* contains log for mcurrent execution of clverify
/var/hacmp/clverify/pass//* contains logs from the last passed verification
/var/hacmp/clverify/pass.prev//* contains log from the second last passed verification