HeartBeat+NFS配置

接上一篇的DRBD的系统环境及安装配置继续....
 
一、Hearbeat配置
1、安装heartbeat

# yum install epel-release -y
# yum --enablerepo=epel install heartbeat -y

2、设置heartbeat配置文件
(node1)
编辑ha.cf,添加下面配置:

# vi /etc/ha.d/ha.cf
logfile         /var/log/ha-log
logfacility     local0
keepalive       2
deadtime        5
ucast           eth0 192.168.0.192    # 指定对方网卡及IP
auto_failback   off
node            drbd1.corp.com drbd2.corp.com

(node2)
编辑ha.cf,添加下面配置:

# vi /etc/ha.d/ha.cf
logfile         /var/log/ha-log
logfacility     local0
keepalive       2
deadtime        5
ucast           eth0 192.168.0.191
auto_failback   off
node            drbd1.corp.com drbd2.corp.com

3、编辑双机互联验证文件authkeys,添加以下内容:(node1,node2)

# vi /etc/ha.d/authkeys
auth 1
1 crc

给验证文件600权限

# chmod 600 /etc/ha.d/authkeys

4、编辑集群资源文件:(node1,node2)

# vi /etc/ha.d/haresources
drbd1.corp.com IPaddr::192.168.0.190/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/store::ext4 killnfsd

注:该文件内IPaddr,Filesystem等脚本存放路径在/etc/ha.d/resource.d/下,也可在该目录下存放服务启动脚本(例如:mysql,www),将相同脚本名称添加到/etc/ha.d/haresources内容中,从而跟随heartbeat启动而启动该脚本。
 
IPaddr::192.168.0.190/24/eth0:用IPaddr脚本配置对外服务的浮动虚拟IP
drbddisk::r0:用drbddisk脚本实现DRBD主从节点资源组的挂载和卸载
Filesystem::/dev/drbd0::/store::ext4:用Filesystem脚本实现磁盘挂载和卸载
 
5、编辑脚本文件killnfsd,用来重启NFS服务:(node1,node2)

# vi /etc/ha.d/resource.d/killnfsd
killall -9 nfsd; /etc/init.d/nfs restart;exit 0

赋予755执行权限:

# chmod 755 /etc/ha.d/resource.d/killnfsd

二、创建DRBD脚本文件drbddisk:(node1,node2)
 
编辑drbddisk,添加下面的脚本内容

# vi /etc/ha.d/resource.d/drbddisk
#!/bin/bash
#
# This script is inteded to be used as resource script by heartbeat
#
# Copright 2003-2008 LINBIT Information Technologies
# Philipp Reisner, Lars Ellenberg
#
###

DEFAULTFILE="/etc/default/drbd"
DRBDADM="/sbin/drbdadm"

if [ -f $DEFAULTFILE ]; then
 . $DEFAULTFILE
fi

if [ "$#" -eq 2 ]; then
 RES="$1"
 CMD="$2"
else
 RES="all"
 CMD="$1"
fi

## EXIT CODES
# since this is a "legacy heartbeat R1 resource agent" script,
# exit codes actually do not matter that much as long as we conform to
#  http://wiki.linux-ha.org/HeartbeatResourceAgent
# but it does not hurt to conform to lsb init-script exit codes,
# where we can.
#  http://refspecs.linux-foundation.org/LSB_3.1.0/
#LSB-Core-generic/LSB-Core-generic/iniscrptact.html
####

drbd_set_role_from_proc_drbd()
{
local out
if ! test -e /proc/drbd; then
ROLE="Unconfigured"
return
fi

dev=$( $DRBDADM sh-dev $RES )
minor=${dev#/dev/drbd}
if [[ $minor = *[!0-9]* ]] ; then
# sh-minor is only supported since drbd 8.3.1
minor=$( $DRBDADM sh-minor $RES )
fi
if [[ -z $minor ]] || [[ $minor = *[!0-9]* ]] ; then
ROLE=Unknown
return
fi

if out=$(sed -ne "/^ *$minor: cs:/ { s/:/ /g; p; q; }" /proc/drbd); then
set -- $out
ROLE=${5%/**}
: ${ROLE:=Unconfigured} # if it does not show up
else
ROLE=Unknown
fi
}

case "$CMD" in
   start)
# try several times, in case heartbeat deadtime
# was smaller than drbd ping time
try=6
while true; do
$DRBDADM primary $RES && break
let "--try" || exit 1 # LSB generic error
sleep 1
done
;;
   stop)
# heartbeat (haresources mode) will retry failed stop
# for a number of times in addition to this internal retry.
try=3
while true; do
$DRBDADM secondary $RES && break
# We used to lie here, and pretend success for anything != 11,
# to avoid the reboot on failed stop recovery for "simple
# config errors" and such. But that is incorrect.
# Don't lie to your cluster manager.
# And don't do config errors...
let --try || exit 1 # LSB generic error
sleep 1
done
;;
   status)
if [ "$RES" = "all" ]; then
   echo "A resource name is required for status inquiries."
   exit 10
fi
ST=$( $DRBDADM role $RES )
ROLE=${ST%/**}
case $ROLE in
Primary|Secondary|Unconfigured)
# expected
;;
*)
# unexpected. whatever...
# If we are unsure about the state of a resource, we need to
# report it as possibly running, so heartbeat can, after failed
# stop, do a recovery by reboot.
# drbdsetup may fail for obscure reasons, e.g. if /var/lock/ is
# suddenly readonly.  So we retry by parsing /proc/drbd.
drbd_set_role_from_proc_drbd
esac
case $ROLE in
Primary)
echo "running (Primary)"
exit 0 # LSB status "service is OK"
;;
Secondary|Unconfigured)
echo "stopped ($ROLE)"
exit 3 # LSB status "service is not running"
;;
*)
# NOTE the "running" in below message.
# this is a "heartbeat" resource script,
# the exit code is _ignored_.
echo "cannot determine status, may be running ($ROLE)"
exit 4 #  LSB status "service status is unknown"
;;
esac
;;
   *)
echo "Usage: drbddisk [resource] {start|stop|status}"
exit 1
;;
esac

exit 0

赋予755执行权限:

# chmod 755 /etc/ha.d/resource.d/drbddisk

三、启动HeartBeat服务
 
在两个节点上启动HeartBeat服务,先启动node1:(node1,node2)

# service heartbeat start
# chkconfig heartbeat on

现在从其他机器能够ping通虚IP 192.168.0.190,表示配置成功
 
四、配置NFS:(node1,node2)
 
编辑exports配置文件,添加以下配置:

# vi /etc/exports
/store        *(rw,no_root_squash)

重启NFS服务:

# service rpcbind restart
# service nfs restart
# chkconfig rpcbind on
# chkconfig nfs off

注:这里设置NFS开机不要自动运行,因为/etc/ha.d/resource.d/killnfsd 该脚本会控制NFS的启动。
 
五、测试高可用
 
1、正常热备切换
在客户端挂载NFS共享目录

# mount -t nfs 192.168.0.190:/store /tmp

模拟将主节点node1 的heartbeat服务停止,则备节点node2会立即无缝接管;测试客户端挂载的NFS共享读写正常。
 
此时备机node2上的DRBD状态:

# service drbd status
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@drbd2.corp.com, 2015-05-12 21:05:41
m:res  cs         ro                 ds                 p  mounted     fstype
0:r0   Connected  Primary/Secondary  UpToDate/UpToDate  C  /store      ext4

2、异常宕机切换
强制关机,直接关闭node1电源
 
node2节点也会立即无缝接管,测试客户端挂载的NFS共享读写正常。
 
此时node2上的DRBD状态:

# service drbd status
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@drbd2.corp.com, 2015-05-12 21:05:41
m:res  cs         ro                 ds                 p  mounted     fstype
0:r0   Connected  Primary/Unknown    UpToDate/DUnknown  C  /store      ext4