Orchestrator是GO编写的MySQL高可用性和复制拓扑管理工具,支持复制拓扑结构的调整,自动故障转移和手动主从切换等。提供Web界面展示MySQL复制的拓扑关系及状态,也可以在Web上更改MySQL的复制关系和部分配置信息,同时也提供命令行和api接口,方便运维管理。相比较MHA来看最重要的是解决了管理节点的单点问题,其通过raft协议保证本身的高可用

安装MySQL数据库

参考MySQL安装部署(一)

安装Orchestrator服务

下载Orchestrator
Download Orchestrator

安装依赖

$ yum install lib64onig2-5.9.2-4-mdv2012.0.x86_64 –y
$ yum install jq-1.5-1.el7.x86_64 –y

安装Orchestrator

$ rpm -ivh orchestrator-3.1.2-1.x86_64.rpm
$ rpm –ivh orchestrator-client-3.1.2-1.x86_64.rpm

配置环境变量

$ echo “export PATH=$PATH:/usr/local/orchestrator” >> ~/.bash_profile
$ echo “export ORCHESTRATOR_API=”localhost:3000/api” >> ~/.bash_profile
$ source ~/.bash_profile

配置数据库
Orchestrator的相关配置信息都保存在数据库中,可以使用MySQL或者sqlite,这里我们采用MySQL来存储

root@(none) 15:14> create database orchestrator;
root@(none) 15:14> grant all on orchestrator.* to 'orchestrator'@'10.0.%' identified by 'Abcd123#';
root@(none) 15:14> flush privileges;

配置参数文件

$ cat orchestrator.conf.json
{
"Debug":true,
"ListenAddress":":18888",
"MySQLTopologyUser":"admin",
"MySQLTopologyPassword":"Abcd123#",
"MySQLOrchestratorHost": "10.0.137.103",
"MySQLOrchestratorPort": 33006,
"MySQLOrchestratorDatabase": "orchestrator",
"MySQLOrchestratorUser": "orchestrator",
"MySQLOrchestratorPassword": "Abcd123#",
"RecoverMasterClusterFilters": ["*"],
"RecoverIntermediateMasterClusterFilters": ["*"],
"FailureDetectionPeriodBlockMinutes": 60,
"RecoveryPeriodBlockSeconds": 3600,
“RaftEnabled”:true,
“RaftDatadir”:”/var/lib/orchestrator”,
“RaftBind”:10.0.137.103,
“DefaultRaftPort”:10008,
“RaftNodes”:[“10.0.137.103”,”10.0.137.104”,”10.0.137.105”]
}
  • ListenAddress:WEB控制台访问端口
  • MySQLTopologPassword:后端数据库用户
  • MySQLTopologPassword:后端数据库用户密码
  • RecoverMasterClusterFilters:自动切换配置
  • RecoverIntermediateMasterClusterFilters:自动切换配置
  • FailureDetectionPeriodBlockMinutes和RecoveryPeriodBlockSeconds都为1个小时,表示如果发生切换之后,一个小时之内,如果主库再次故障将不被检测到,也不会触发切换。
  • Orchestrator自身高可用是通过Raft协议来实现的,因此需要配置相关参数开启Raft并设置Leader以及成员。

启动WEB控制台

$ cd /usr/local/orchestrator && ./orchestrator --config=./orchestrator.conf http &

添加后端数据节点

orchestrator --config=/usr/local/orchestrator/orchestrator.conf.json -c discover -i t-luhx01-v-szzb:33006

访问WEB控制台(http://ip:18888)
Orchestrator_1

关闭主节点,尝试自动切换(旧主节点会被隔离,需要手动加入),在自动切换后,需要手动确认知晓该切换记录,否则后续的切换将会被阻塞,出现如下错误
Orchestrator_2

手动确认可以通过WEB上audit->recovery去查看记录
Orchestrator_3

也可以通过下列命令确认

$ orchestrator-client -c ack-all-recoveries --reason='yes'

VIP切换脚本
在Orchestrator配置文件中PostFailoverProcesses模块设置如下语句

"PostFailoverProcesses": [
"echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log",
"/usr/local/bin/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log"
],

orch_hook.sh(注意替换VIP和网卡)

#!/bin/bash


isitdead=$1
cluster=$2
oldmaster=$3
newmaster=$4
mysqluser="orchestrator"
export MYSQL_PWD="xxxpassxxx"

logfile="/var/log/orch_hook.log"

# list of clusternames
clusternames=(t-luhx01-v-szzb t-luhx02-v-szzb t-luhx03-v-szzb)

# clustername=( interface IP user Inter_IP)
luhxdb=( ens192 "10.0.139.201" root "10.0.139.201")

if [[ $isitdead == "DeadMaster" ]]; then

array=$cluster
interface=$array[0]
IP=$array[1]
user=$array[2]

if [ ! -z ${!IP} ] ; then

echo $(date)
echo "Revocering from: $isitdead"
echo "New master is: $newmaster"
echo "/usr/local/bin/orch_vip.sh -d 1 -n $newmaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster" | tee $logfile
/usr/local/bin/orch_vip.sh -d 1 -n $newmaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster

else

echo "Cluster does not exist!" | tee $logfile

fi
elif [[ $isitdead == "DeadIntermediateMasterWithSingleSlaveFailingToConnect" ]]; then

array=$cluster
interface=$array[0]
IP=$array[3]
user=$array[2]
slavehost=`echo $5 | cut -d":" -f1`

echo $(date)
echo "Revocering from: $isitdead"
echo "New intermediate master is: $slavehost"
echo "/usr/local/bin/orch_vip.sh -d 1 -n $slavehost -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster" | tee $logfile
/usr/local/bin/orch_vip.sh -d 1 -n $slavehost -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster


elif [[ $isitdead == "DeadIntermediateMaster" ]]; then

array=$cluster
interface=$array[0]
IP=$array[3]
user=$array[2]
slavehost=`echo $5 | sed -E "s/:[0-9]+//g" | sed -E "s/,/ /g"`
showslave=`mysql -h$newmaster -u$mysqluser -sN -e "SHOW SLAVE HOSTS;" | awk '{print $2}'`
newintermediatemaster=`echo $slavehost $showslave | tr ' ' '\n' | sort | uniq -d`

echo $(date)
echo "Revocering from: $isitdead"
echo "New intermediate master is: $newintermediatemaster"
echo "/usr/local/bin/orch_vip.sh -d 1 -n $newintermediatemaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster" | tee $logfile
/usr/local/bin/orch_vip.sh -d 1 -n $newintermediatemaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster

fi

orch_vip.sh

#!/bin/bash

emailaddress="email@example.com"
sendmail=0

function usage {
cat << EOF
usage: $0 [-h] [-d master is dead] [-o old master ] [-s ssh options] [-n new master] [-i interface] [-I] [-u SSH user]

OPTIONS:
-h Show this message
-o string Old master hostname or IP address
-d int If master is dead should be 1 otherweise it is 0
-s string SSH options
-n string New master hostname or IP address
-i string Interface exmple eth0:1
-I string Virtual IP
-u string SSH user
EOF

}

while getopts ho:d:s:n:i:I:u: flag; do
case $flag in
o)
orig_master="$OPTARG";
;;
d)
isitdead="${OPTARG}";
;;
s)
ssh_options="${OPTARG}";
;;
n)
new_master="$OPTARG";
;;
i)
interface="$OPTARG";
;;
I)
vip="$OPTARG";
;;
u)
ssh_user="$OPTARG";
;;
h)
usage;
exit 0;
;;
*)
usage;
exit 1;
;;
esac
done


if [ $OPTIND -eq 1 ]; then
echo "No options were passed";
usage;
fi

shift $(( OPTIND - 1 ));

# discover commands from our path
ssh=$(which ssh)
arping=$(which arping)
ip2util=$(which ip)

# command for adding our vip
cmd_vip_add="sudo -n $ip2util address add ${vip} dev ${interface}"
# command for deleting our vip
cmd_vip_del="sudo -n $ip2util address del ${vip}/32 dev ${interface}"
# command for discovering if our vip is enabled
cmd_vip_chk="sudo -n $ip2util address show dev ${interface} to ${vip%/*}/32"
# command for sending gratuitous arp to announce ip move
cmd_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*} "
# command for sending gratuitous arp to announce ip move on current server
cmd_local_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*} "

vip_stop() {
rc=0

# ensure the vip is removed
$ssh ${ssh_options} -tt ${ssh_user}@${orig_master} \
"[ -n \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_del} && sudo ${ip2util} route flush cache || [ -z \"\$(${cmd_vip_chk})\" ]"
rc=$?
return $rc
}

vip_start() {
rc=0

# ensure the vip is added
# this command should exit with failure if we are unable to add the vip
# if the vip already exists always exit 0 (whether or not we added it)
$ssh ${ssh_options} -tt ${ssh_user}@${new_master} \
"[ -z \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_add} && ${cmd_arp_fix} || [ -n \"\$(${cmd_vip_chk})\" ]"
rc=$?
$cmd_local_arp_fix
return $rc
}

vip_status() {
$arping -c 1 -I ${interface} ${vip%/*}
if ping -c 1 -W 1 "$vip"; then
return 0
else
return 1
fi
}

if [[ $isitdead == 0 ]]; then
echo "Online failover"
if vip_stop; then
if vip_start; then
echo "$vip is moved to $new_master."
if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null ; fi
else
echo "Can't add $vip on $new_master!"
if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null ; fi
exit 1
fi
else
echo $rc
echo "Can't remove the $vip from orig_master!"
if [ $sendmail -eq 1 ]; then mail -s "Can't remove the $vip from orig_master!" "$emailaddress" < /dev/null &> /dev/null ; fi
exit 1
fi


elif [[ $isitdead == 1 ]]; then
echo "Master is dead, failover"
# make sure the vip is not available
if vip_status; then
if vip_stop; then
if [ $sendmail -eq 1 ]; then mail -s "$vip is removed from orig_master." "$emailaddress" < /dev/null &> /dev/null ; fi
else
if [ $sendmail -eq 1 ]; then mail -s "Couldn't remove $vip from orig_master." "$emailaddress" < /dev/null &> /dev/null ; fi
exit 1
fi
fi

if vip_start; then
echo "$vip is moved to $new_master."
if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null ; fi

else
echo "Can't add $vip on $new_master!"
if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null ; fi
exit 1
fi
else
echo "Wrong argument, the master is dead or live?"

fi

注意:该脚本仅提供切换功能,第一次需要手动挂载VIP。另外需要轮流切换为每个节点的群集都设置相同的群集别名,这里为LUHXDB
orchestrator_4

附录

命令行与API
列出所有群集

orchestrator-client -c clusters

列出所有群集别名

orchestrator-client -c clusters-alias

发现实例

orchestrator-client -c discover -i t-luhx01-v-szzb:33006

遗忘实例

orchestrator-client -c forget -i t-luhx02-v-szzb:33006

打印指定群集拓扑

orchestrator-client -c topology-tabulated -i t-luhx03-v-szzb:33006

查看使用的API

orchestrator-client -c which-api

搜索实例

orchestrator-client -c search -i luhx

打印出集群中可作为pt-online-schema-change可操作的建康副本

orchestrator-client -c which-cluster-osc-running-replicas -i luhxdb
将集群的主提交到KV存储,可用于服务自动发现
orchestrator-client -c submit-masters-to-kv-stores

迁移一个从库到另一个实例上

orchestrator-client -c relocate -i t-luhx01-v-szzb:33006 -d t-test-v-szzb:33006

迁移一个实例所有从库到另一个实例上

orchestrator-client -c relocate-replicas -i t-luhx01-v-szzb:33006 -d t-test-v-szzb:33006

创建双主复制

Orchestrator-client -c make-to-master -i t-luhx01-v-szzb:33006

提升实例权重,切换时会优先成为主(有效期一个小时)

orchestrator-client -c register-candidate -i t-luhx02-v-szzb:33006 –promotion-rule prefer

指定实例停止复制

orchestrator-client -c stop-replica-nice -i t-luhx02-v-szzb

指定实例重启复制

Orchestrator-client -c restart-replica -i t-luhx02-v-szzb

手动执行恢复,指定一个宕机的实例

orchestrator-client -c recover -i t-luhx01-v-szzb:33006

优雅的进行主从切换

orchestrator-client -c graceful-master-takeover -a t-luhx01-v-szzb:33006 -d t-luhx03-v-szzb:33006

手动强制恢复

orchestrator-client -c force-master-failover -i t-luhx01-v-szzb:33006

强行丢弃master并制定一个实例,旧主独立,新主作为master

orchestrator-client -c force-master-takeover -i t-luhx01-v-szzb:33006 -d t-luhx02-v-szzb:33006

确认群集恢复理由

orchestrator-client -c ack-all-recoveries --reason=’yes’

Orchestrator Hook
①”OnFailureDetectionProcesses”: [] —检测故障时执行
②”PreGracefulTakeoverProcesses”:[] —在主变为只读节点之前执行
③”PreFailoverProcesses”:[] —在执行恢复操作之前执行
④”PostMasterFailoverProcesses”:[] —在主恢复成功结束时执行
⑤”PostFailoverProcesses”:[] —在任何成功的恢复结束时执行
⑥”PostUnsuccessfulFailoverProcesses”:[] —在任何不成功恢复结束时执行
⑦”PostIntermediateMasterFailoverProcesses”:[] —在成功的中间恢复结束时执行
⑧”PostGracefulTakeoverProcesses”:[] —在旧主位于新晋升的主之后执行

情形一:主库宕机,自动切换
① –> ① –> ③ –> ④ –> ⑤

情形二:优雅的主从切换
② –> ① –> ③ –> ④ –> ⑤ –> ⑦

情形三:手动恢复,当从库宕机或处于维护模式,此时主机宕机不会进行failover,需要手动恢复
① –> ① –> ③ –> ④ –> ⑤

情形四:手动强制恢复
① –> ③ –> ① –> ④ –> ⑤

参考链接
MySQL高可用复制管理工具 —— Orchestrator介绍