Sunday, February 10, 2008

HACMP Basics

HACMP Basics

History
IBM's HACMP exists for almost 15 years. It's not actually an IBM product, they bought it from CLAM, which was later renamed to Availant and is now called LakeViewTech. Until august 2006, all development of HACMP was done by CLAM. Nowadays IBM does it's own development of HACMP in Austin, Poughkeepsie and Bangalore

IBM's high availability solution for AIX, High Availability Cluster Multi Processing (HACMP), consists of two components:

High Availability: The process of ensuring an application is available for use through the use of duplicated and/or shared resources (eliminating Single Points Of Failure – SPOF's)

.Cluster Multi-Processing: Multiple applications running on the same nodes with shared or concurrent access to the data.

A high availability solution based on HACMP provides automated failure detection, diagnosis, application recovery and node reintegration. With an appropriate application, HACMP can also provide concurrent access to the data for parallel processing applications, thus offering excellent horizontal scalability.

What needs to be protected? Ultimately, the goal of any IT solution in a critical environment is to provide continuous service and data protection.

The High Availability is just one building block in achieving the continuous operation goal. The High Availability is based on the availability hardware, software (OS and its components), application and network components.

The main objective of the HACMP is to eliminate Single Points of Failure (SPOF's)

“…A fundamental design goal of (successful) cluster design is the elimination of single points of failure (SPOFs)…”


Eliminate Single Point of Failure (SPOF)
Cluster Eliminated as a single point of failure

Node Using multiple nodes
Power Source Using Multiple circuits or uninterruptible
Network/adapter Using redundant network adapters
Network Using multiple networks to connect nodes.
TCP/IP Subsystem Using non-IP networks to connect adjoining nodes & clients
Disk adapter Using redundant disk adapter or multiple adapters
Disk Using multiple disks with mirroring or RAID
Application Add node for takeover; configure application monitor
Administrator Add backup or every very detailed operations guide
Site Add additional site.


Cluster Components

Here are the recommended practices for important cluster components.


Nodes

HACMP supports clusters of up to 32 nodes, with any combination of active and standby nodes. While it
is possible to have all nodes in the cluster running applications (a configuration referred to as "mutual
takeover"), the most reliable and available clusters have at least one standby node - one node that is normally
not running any applications, but is available to take them over in the event of a failure on an active
node.

Additionally, it is important to pay attention to environmental considerations. Nodes should not have a
common power supply - which may happen if they are placed in a single rack. Similarly, building a cluster
of nodes that are actually logical partitions (LPARs) with a single footprint is useful as a test cluster, but
should not be considered for availability of production applications.
Nodes should be chosen that have sufficient I/O slots to install redundant network and disk adapters.
That is, twice as many slots as would be required for single node operation. This naturally suggests that
processors with small numbers of slots should be avoided. Use of nodes without redundant adapters
should not be considered best practice. Blades are an outstanding example of this. And, just as every cluster
resource should have a backup, the root volume group in each node should be mirrored, or be on a

RAID device.
Nodes should also be chosen so that when the production applications are run at peak load, there are still
sufficient CPU cycles and I/O bandwidth to allow HACMP to operate. The production application
should be carefully benchmarked (preferable) or modeled (if benchmarking is not feasible) and nodes chosen
so that they will not exceed 85% busy, even under the heaviest expected load.
Note that the takeover node should be sized to accommodate all possible workloads: if there is a single
standby backing up multiple primaries, it must be capable of servicing multiple workloads. On hardware
that supports dynamic LPAR operations, HACMP can be configured to allocate processors and memory to
a takeover node before applications are started. However, these resources must actually be available, or
acquirable through Capacity Upgrade on Demand. The worst case situation – e.g., all the applications on
a single node – must be understood and planned for.

Networks

HACMP is a network centric application. HACMP networks not only provide client access to the applications
but are used to detect and diagnose node, network and adapter failures. To do this, HACMP uses
RSCT which sends heartbeats (UDP packets) over ALL defined networks. By gathering heartbeat information
on multiple nodes, HACMP can determine what type of failure has occurred and initiate the appropriate
recovery action. Being able to distinguish between certain failures, for example the failure of a network
and the failure of a node, requires a second network! Although this additional network can be “IP
based” it is possible that the entire IP subsystem could fail within a given node. Therefore, in addition
there should be at least one, ideally two, non-IP networks. Failure to implement a non-IP network can potentially
lead to a Partitioned cluster, sometimes referred to as 'Split Brain' Syndrome. This situation can
occur if the IP network(s) between nodes becomes severed or in some cases congested. Since each node is
in fact, still very alive, HACMP would conclude the other nodes are down and initiate a takeover. After
takeover has occurred the application(s) potentially could be running simultaneously on both nodes. If the
shared disks are also online to both nodes, then the result could lead to data divergence (massive data corruption).
This is a situation which must be avoided at all costs.

The most convenient way of configuring non-IP networks is to use Disk Heartbeating as it removes the
problems of distance with rs232 serial networks. Disk heartbeat networks only require a small disk or
LUN. Be careful not to put application data on these disks. Although, it is possible to do so, you don't want
any conflict with the disk heartbeat mechanism!

Adapters

As stated above, each network defined to HACMP should have at least two adapters per node. While it is
possible to build a cluster with fewer, the reaction to adapter failures is more severe: the resource group
must be moved to another node. AIX provides support for Etherchannel, a facility that can used to aggregate
adapters (increase bandwidth) and provide network resilience. Etherchannel is particularly useful for
fast responses to adapter / switch failures. This must be set up with some care in an HACMP cluster.
When done properly, this provides the highest level of availability against adapter failure. Refer to the IBM
techdocs website: http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101785 for further
details.
Many System p TM servers contain built-in Ethernet adapters. If the nodes are physically close together, it
is possible to use the built-in Ethernet adapters on two nodes and a "cross-over" Ethernet cable (sometimes
referred to as a "data transfer" cable) to build an inexpensive Ethernet network between two nodes for
heart beating. Note that this is not a substitute for a non-IP network.
Some adapters provide multiple ports. One port on such an adapter should not be used to back up another
port on that adapter, since the adapter card itself is a common point of failure. The same thing is true
of the built-in Ethernet adapters in most System p servers and currently available blades: the ports have a
common adapter. When the built-in Ethernet adapter can be used, best practice is to provide an additional
adapter in the node, with the two backing up each other.
Be aware of network detection settings for the cluster and consider tuning these values. In HACMP terms,
these are referred to as NIM values. There are four settings per network type which can be used : slow,
normal, fast and custom. With the default setting of normal for a standard Ethernet network, the network
failure detection time would be approximately 20 seconds. With todays switched network technology this
is a large amount of time. By switching to a fast setting the detection time would be reduced by 50% (10
seconds) which in most cases would be more acceptable. Be careful however, when using custom settings,
as setting these values too low can cause false takeovers to occur. These settings can be viewed using a variety
of techniques including : lssrc –ls topsvcs command (from a node which is active) or odmget
HACMPnim |grep –p ether and smitty hacmp.

Applications
The most important part of making an application run well in an HACMP cluster is understanding the
application's requirements. This is particularly important when designing the Resource Group policy behavior
and dependencies. For high availability to be achieved, the application must have the ability to
stop and start cleanly and not explicitly prompt for interactive input. Some applications tend to bond to a
particular OS characteristic such as a uname, serial number or IP address. In most situations, these problems
can be overcome. The vast majority of commercial software products which run under AIX are well
suited to be clustered with HACMP.

Application Data Location
Where should application binaries and configuration data reside? There are many arguments to this discussion.
Generally, keep all the application binaries and data were possible on the shared disk, as it is easy
to forget to update it on all cluster nodes when it changes. This can prevent the application from starting or
working correctly, when it is run on a backup node. However, the correct answer is not fixed. Many application
vendors have suggestions on how to set up the applications in a cluster, but these are recommendations.
Just when it seems to be clear cut as to how to implement an application, someone thinks of a new
set of circumstances. Here are some rules of thumb:
If the application is packaged in LPP format, it is usually installed on the local file systems in rootvg. This
behavior can be overcome, by bffcreate’ing the packages to disk and restoring them with the preview option.
This action will show the install paths, then symbolic links can be created prior to install which point
to the shared storage area. If the application is to be used on multiple nodes with different data or configuration,
then the application and configuration data would probably be on local disks and the data sets on
shared disk with application scripts altering the configuration files during fallover. Also, remember the
HACMP File Collections facility can be used to keep the relevant configuration files in sync across the cluster.
This is particularly useful for applications which are installed locally.

Start/Stop Scripts
Application start scripts should not assume the status of the environment. Intelligent programming should
correct any irregular conditions that may occur. The cluster manager spawns theses scripts off in a separate
job in the background and carries on processing. Some things a start script should do are:
First, check that the application is not currently running! This is especially crucial for v5.4 users as
resource groups can be placed into an unmanaged state (forced down action, in previous versions).
Using the default startup options, HACMP will rerun the application start script which may cause
problems if the application is actually running. A simple and effective solution is to check the state
of the application on startup. If the application is found to be running just simply end the start script
with exit 0.
Verify the environment. Are all the disks, file systems, and IP labels available?
If different commands are to be run on different nodes, store the executing HOSTNAME to variable.
Check the state of the data. Does it require recovery? Always assume the data is in an unknown state
since the conditions that occurred to cause the takeover cannot be assumed.
Are there prerequisite services that must be running? Is it feasible to start all prerequisite services
from within the start script? Is there an inter-resource group dependency or resource group sequencing
that can guarantee the previous resource group has started correctly? HACMP v5.2 and later has
facilities to implement checks on resource group dependencies including collocation rules in
HACMP v5.3.
Finally, when the environment looks right, start the application. If the environment is not correct and
error recovery procedures cannot fix the problem, ensure there are adequate alerts (email, SMS,
SMTP traps etc) sent out via the network to the appropriate support administrators.
Stop scripts are different from start scripts in that most applications have a documented start-up routine
and not necessarily a stop routine. The assumption is once the application is started why stop it? Relying
on a failure of a node to stop an application will be effective, but to use some of the more advanced features
of HACMP the requirement exists to stop an application cleanly. Some of the issues to avoid are:


Be sure to terminate any child or spawned processes that may be using the disk resources. Consider
implementing child resource groups.
Verify that the application is stopped to the point that the file system is free to be unmounted. The
fuser command may be used to verify that the file system is free.
In some cases it may be necessary to double check that the application vendor’s stop script did actually
stop all the processes, and occasionally it may be necessary to forcibly terminate some processes.
Clearly the goal is to return the machine to the state it was in before the application start script was run.
Failure to exit the stop script with a zero return code as this will stop cluster processing. * Note: This is not the case with start scripts!
Remember, most vendor stop/starts scripts are not designed to be cluster proof! A useful tip is to have stop
and start script verbosely output using the same format to the /tmp/hacmp.out file. This can be achieved
by including the following line in the header of the script: set -x && PS4="${0##*/}"'[$LINENO]

10 comments:

Unknown said...

Dear Sir,
I was searching for such a easy to understand docs on HACMP, but here I got more that my expectation. I heartly thank you for sharing u r knowledge.

Unknown said...

Thanks Sir for the post, Will appreciate if u can give some more knowledge about the topic that can tell us some of the practical issues as well.

Unknown said...

Hi,

Great help to admins who's learning HACMP.


...Sachin

Manu Arora said...

Thanks for this, it was very - very helpful.

Guruprasad said...

This is the best blog i have ever seen for AIX.....Thnaks for sharing your knowledge.

Sat said...

Hi santhosh,
I am following your blog from long time..
could you add the task how we will configure disks on VIO?

Abhay Patil said...

Thanks a lot Santosh for your sharing... God Bless you.

Unknown said...

Thanks santhosh ji, i am following your blog forlong time and i clear interview by reading your blog.

Unknown said...

Thanks a ton Santosh....I really appreciate this

Unknown said...

Very Good Info... thanks a ton !!!

Appriciated ...!!