Recovering a Failed VIO Disk
Here is a recovery procedure for replacing a failed client disk on a Virtual IO
server. It assumes the client partitions have mirrored (virtual) disks. The
recovery involves both the VIO server and its client partitions. However,
it is non disruptive for the client partitions (no downtime), and may be
non disruptive on the VIO server (depending on disk configuration). This
procedure does not apply to Raid5 or SAN disk failures.
The test system had two VIO servers and an AIX client. The AIX client had two
virtual disks (one disk from each VIO server). The two virtual disks
were mirrored in the client using AIX's mirrorvg. (The procedure would be
the same on a single VIO server with two disks.)
The software levels were:
p520: Firmware SF230_145 VIO Version 1.2.0 Client: AIX 5.3 ML3
We had simulated the disk failure by removing the client LV on one VIO server. The
padmin commands to simulate the failure were:
#rmdev -dev vtscsi01 # The virtual scsi device for the LV (lsmap -all)
#rmlv -f aix_client_lv # Remove the client LV
This caused "hdisk1" on the AIX client to go "missing" ("lsvg -p rootvg"....The
"lspv" will not show disk failure...only the disk status at the last boot..)
The recovery steps included:
VIO Server
Fix the disk failure, and restore the VIOS operating system (if necessary)mklv -lv aix_client_lv rootvg 10G # recreate the client LV mkvdev -vdev aix_client_lv -vadapter vhost1 # connect the client LV to the appropriate vhost
AIX Client
# cfgmgr # discover the new virtual hdisk2
replacepv hdisk1 hdisk2
# rebuild the mirror copy on hdisk2
# bosboot -ad /dev/hdisk2 ( add boot image to hdisk2)
# bootlist -m normal hdisk0 hdisk2 ( add the new disk to the bootlist)
# rmdev -dl hdisk1 ( remove failed hdisk1)
The "replacepv" command assigns hdisk2 to the volume group, rebuilds the mirror, and
then removes hdisk1 from the volume group.
As always, be sure to test this procedure before using in production.
No comments:
Post a Comment