MySQL安装部署之Orchestrator

Orchestrator是GO编写的MySQL高可用性和复制拓扑管理工具，支持复制拓扑结构的调整，自动故障转移和手动主从切换等。提供Web界面展示MySQL复制的拓扑关系及状态，也可以在Web上更改MySQL的复制关系和部分配置信息，同时也提供命令行和api接口，方便运维管理。相比较MHA来看最重要的是解决了管理节点的单点问题，其通过raft协议保证本身的高可用

安装MySQL数据库

参考MySQL安装部署(一)

安装Orchestrator服务

下载Orchestrator
Download Orchestrator

安装依赖

$ yum install lib64onig2-5.9.2-4-mdv2012.0.x86_64 –y
$ yum install jq-1.5-1.el7.x86_64 –y

安装Orchestrator

$ rpm -ivh orchestrator-3.1.2-1.x86_64.rpm
$ rpm –ivh orchestrator-client-3.1.2-1.x86_64.rpm

配置环境变量

$ echo “export PATH=$PATH:/usr/local/orchestrator” >> ~/.bash_profile
$ echo “export ORCHESTRATOR_API=”localhost:3000/api” >> ~/.bash_profile
$ source ~/.bash_profile

配置数据库
Orchestrator的相关配置信息都保存在数据库中，可以使用MySQL或者sqlite，这里我们采用MySQL来存储

root@(none) 15:14> create database orchestrator;
root@(none) 15:14> grant all on orchestrator.* to 'orchestrator'@'10.0.%' identified by 'Abcd123#';
root@(none) 15:14> flush privileges;

配置参数文件

$ cat orchestrator.conf.json
{
  "Debug":true,
  "ListenAddress":":18888",
  "MySQLTopologyUser":"admin",
  "MySQLTopologyPassword":"Abcd123#",
  "MySQLOrchestratorHost": "10.0.137.103",
  "MySQLOrchestratorPort": 33006,
  "MySQLOrchestratorDatabase": "orchestrator",
  "MySQLOrchestratorUser": "orchestrator",
  "MySQLOrchestratorPassword": "Abcd123#",
  "RecoverMasterClusterFilters": ["*"],
  "RecoverIntermediateMasterClusterFilters": ["*"],
  "FailureDetectionPeriodBlockMinutes": 60,
  "RecoveryPeriodBlockSeconds": 3600,
  “RaftEnabled”：true,
  “RaftDatadir”：”/var/lib/orchestrator”,
  “RaftBind”：10.0.137.103,
  “DefaultRaftPort”：10008,
  “RaftNodes”：[“10.0.137.103”,”10.0.137.104”,”10.0.137.105”]
}

ListenAddress：WEB控制台访问端口
MySQLTopologPassword：后端数据库用户
MySQLTopologPassword：后端数据库用户密码
RecoverMasterClusterFilters：自动切换配置
RecoverIntermediateMasterClusterFilters：自动切换配置
FailureDetectionPeriodBlockMinutes和RecoveryPeriodBlockSeconds都为1个小时，表示如果发生切换之后，一个小时之内，如果主库再次故障将不被检测到，也不会触发切换。
Orchestrator自身高可用是通过Raft协议来实现的，因此需要配置相关参数开启Raft并设置Leader以及成员。

启动WEB控制台

$ cd /usr/local/orchestrator && ./orchestrator --config=./orchestrator.conf http &

添加后端数据节点

orchestrator --config=/usr/local/orchestrator/orchestrator.conf.json -c discover -i t-luhx01-v-szzb:33006

访问WEB控制台(http://ip:18888)
Orchestrator_1

关闭主节点，尝试自动切换(旧主节点会被隔离，需要手动加入),在自动切换后，需要手动确认知晓该切换记录，否则后续的切换将会被阻塞，出现如下错误
Orchestrator_2

手动确认可以通过WEB上audit->recovery去查看记录
Orchestrator_3

也可以通过下列命令确认

$ orchestrator-client -c ack-all-recoveries --reason='yes'

VIP切换脚本
在Orchestrator配置文件中PostFailoverProcesses模块设置如下语句

"PostFailoverProcesses": [
    "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log",
    "/usr/local/bin/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log"
  ],

orch_hook.sh(注意替换VIP和网卡)

#!/bin/bash


isitdead=$1
cluster=$2
oldmaster=$3
newmaster=$4
mysqluser="orchestrator"
export MYSQL_PWD="xxxpassxxx"

logfile="/var/log/orch_hook.log"

# list of clusternames
clusternames=(t-luhx01-v-szzb t-luhx02-v-szzb t-luhx03-v-szzb)

# clustername=( interface IP user Inter_IP)
luhxdb=( ens192 "10.0.139.201" root "10.0.139.201")

if [[ $isitdead == "DeadMaster" ]]; then

	array=$cluster
	interface=$array[0]
	IP=$array[1]
	user=$array[2]

	if [ ! -z ${!IP} ] ; then

		echo $(date)
		echo "Revocering from: $isitdead"
		echo "New master is: $newmaster"
		echo "/usr/local/bin/orch_vip.sh -d 1 -n $newmaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster" | tee $logfile
		/usr/local/bin/orch_vip.sh -d 1 -n $newmaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster

	else

		echo "Cluster does not exist!" | tee $logfile

	fi
elif [[ $isitdead == "DeadIntermediateMasterWithSingleSlaveFailingToConnect" ]]; then

	array=$cluster
	interface=$array[0]
	IP=$array[3]
	user=$array[2]
	slavehost=`echo $5 | cut -d":" -f1`

	echo $(date)
	echo "Revocering from: $isitdead"
	echo "New intermediate master is: $slavehost"
	echo "/usr/local/bin/orch_vip.sh -d 1 -n $slavehost -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster" | tee $logfile
	/usr/local/bin/orch_vip.sh -d 1 -n $slavehost -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster


elif [[ $isitdead == "DeadIntermediateMaster" ]]; then

	array=$cluster
	interface=$array[0]
	IP=$array[3]
	user=$array[2]
	slavehost=`echo $5 | sed -E "s/:[0-9]+//g" | sed -E "s/,/ /g"`
	showslave=`mysql -h$newmaster -u$mysqluser -sN -e "SHOW SLAVE HOSTS;" | awk '{print $2}'`
	newintermediatemaster=`echo $slavehost $showslave | tr ' ' '\n' | sort | uniq -d`

	echo $(date)
	echo "Revocering from: $isitdead"
	echo "New intermediate master is: $newintermediatemaster"
	echo "/usr/local/bin/orch_vip.sh -d 1 -n $newintermediatemaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster" | tee $logfile
	/usr/local/bin/orch_vip.sh -d 1 -n $newintermediatemaster -i ${!interface} -I ${!IP} -u ${!user} -o $oldmaster

fi

orch_vip.sh

#!/bin/bash

emailaddress="email@example.com"
sendmail=0

function usage {
  cat << EOF
 usage: $0 [-h] [-d master is dead] [-o old master ] [-s ssh options] [-n new master] [-i interface] [-I] [-u SSH user]
 
 OPTIONS:
    -h        Show this message
    -o string Old master hostname or IP address 
    -d int    If master is dead should be 1 otherweise it is 0
    -s string SSH options
    -n string New master hostname or IP address
    -i string Interface exmple eth0:1
    -I string Virtual IP
    -u string SSH user
EOF

}

while getopts ho:d:s:n:i:I:u: flag; do
  case $flag in
    o)
      orig_master="$OPTARG";
      ;;
    d)
      isitdead="${OPTARG}";
      ;;
    s)
      ssh_options="${OPTARG}";
      ;;
    n)
      new_master="$OPTARG";
      ;;
    i)
      interface="$OPTARG";
      ;;
    I)
      vip="$OPTARG";
      ;;
    u)
      ssh_user="$OPTARG";
      ;;
    h)
      usage;
      exit 0;
      ;;
    *)
      usage;
      exit 1;
      ;;
  esac
done


if [ $OPTIND -eq 1 ]; then 
    echo "No options were passed"; 
    usage;
fi

shift $(( OPTIND - 1 ));

# discover commands from our path
ssh=$(which ssh)
arping=$(which arping)
ip2util=$(which ip)

# command for adding our vip
cmd_vip_add="sudo -n $ip2util address add ${vip} dev ${interface}"
# command for deleting our vip
cmd_vip_del="sudo -n $ip2util address del ${vip}/32 dev ${interface}"
# command for discovering if our vip is enabled
cmd_vip_chk="sudo -n $ip2util address show dev ${interface} to ${vip%/*}/32"
# command for sending gratuitous arp to announce ip move
cmd_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "
# command for sending gratuitous arp to announce ip move on current server
cmd_local_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "

vip_stop() {
    rc=0

    # ensure the vip is removed
    $ssh ${ssh_options} -tt ${ssh_user}@${orig_master} \
    "[ -n \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_del} && sudo ${ip2util} route flush cache || [ -z \"\$(${cmd_vip_chk})\" ]"
    rc=$?
    return $rc
}

vip_start() {
    rc=0

    # ensure the vip is added
    # this command should exit with failure if we are unable to add the vip
    # if the vip already exists always exit 0 (whether or not we added it)
    $ssh ${ssh_options} -tt ${ssh_user}@${new_master} \
     "[ -z \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_add} && ${cmd_arp_fix} || [ -n \"\$(${cmd_vip_chk})\" ]"
    rc=$?
    $cmd_local_arp_fix
    return $rc
}

vip_status() {
    $arping -c 1 -I ${interface} ${vip%/*}   
    if ping -c 1 -W 1 "$vip"; then
        return 0
    else
        return 1
    fi
}

if [[ $isitdead == 0 ]]; then
    echo "Online failover"
    if vip_stop; then 
        if vip_start; then
            echo "$vip is moved to $new_master."
            if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
        else
            echo "Can't add $vip on $new_master!" 
            if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
            exit 1
        fi
    else
        echo $rc
        echo "Can't remove the $vip from orig_master!"
        if [ $sendmail -eq 1 ]; then mail -s "Can't remove the $vip from orig_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
        exit 1
    fi


elif [[ $isitdead == 1 ]]; then
    echo "Master is dead, failover"
    # make sure the vip is not available 
    if vip_status; then 
        if vip_stop; then
            if [ $sendmail -eq 1 ]; then mail -s "$vip is removed from orig_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
        else
            if [ $sendmail -eq 1 ]; then mail -s "Couldn't remove $vip from orig_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
            exit 1
        fi
    fi

    if vip_start; then
          echo "$vip is moved to $new_master."
          if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null  ; fi

    else
          echo "Can't add $vip on $new_master!" 
          if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
          exit 1
    fi
else
    echo "Wrong argument, the master is dead or live?"

fi

注意：该脚本仅提供切换功能，第一次需要手动挂载VIP。另外需要轮流切换为每个节点的群集都设置相同的群集别名，这里为LUHXDB
orchestrator_4

附录

命令行与API
列出所有群集

orchestrator-client -c clusters

列出所有群集别名

orchestrator-client -c clusters-alias

发现实例

orchestrator-client -c discover -i t-luhx01-v-szzb:33006

遗忘实例

orchestrator-client -c forget -i t-luhx02-v-szzb:33006

打印指定群集拓扑

orchestrator-client -c topology-tabulated -i t-luhx03-v-szzb:33006

查看使用的API

orchestrator-client -c which-api

搜索实例

orchestrator-client -c search -i luhx

打印出集群中可作为pt-online-schema-change可操作的建康副本

orchestrator-client -c which-cluster-osc-running-replicas -i luhxdb

将集群的主提交到KV存储，可用于服务自动发现
orchestrator-client -c submit-masters-to-kv-stores

迁移一个从库到另一个实例上

orchestrator-client -c relocate -i t-luhx01-v-szzb:33006 -d t-test-v-szzb:33006

迁移一个实例所有从库到另一个实例上

orchestrator-client -c relocate-replicas -i t-luhx01-v-szzb:33006 -d t-test-v-szzb:33006

创建双主复制

Orchestrator-client -c make-to-master -i t-luhx01-v-szzb:33006

提升实例权重，切换时会优先成为主(有效期一个小时)

orchestrator-client -c register-candidate -i t-luhx02-v-szzb:33006 –promotion-rule prefer

指定实例停止复制

orchestrator-client -c stop-replica-nice -i t-luhx02-v-szzb

指定实例重启复制

Orchestrator-client -c restart-replica -i t-luhx02-v-szzb

手动执行恢复，指定一个宕机的实例

orchestrator-client -c recover -i t-luhx01-v-szzb:33006

优雅的进行主从切换

orchestrator-client -c graceful-master-takeover -a t-luhx01-v-szzb:33006 -d t-luhx03-v-szzb:33006

手动强制恢复

orchestrator-client -c force-master-failover -i t-luhx01-v-szzb:33006

强行丢弃master并制定一个实例，旧主独立，新主作为master

orchestrator-client -c force-master-takeover -i t-luhx01-v-szzb:33006 -d t-luhx02-v-szzb:33006

确认群集恢复理由

orchestrator-client -c ack-all-recoveries --reason=’yes’

Orchestrator Hook
①”OnFailureDetectionProcesses”: [] —检测故障时执行
②”PreGracefulTakeoverProcesses”:[] —在主变为只读节点之前执行
③”PreFailoverProcesses”:[] —在执行恢复操作之前执行
④”PostMasterFailoverProcesses”:[] —在主恢复成功结束时执行
⑤”PostFailoverProcesses”:[] —在任何成功的恢复结束时执行
⑥”PostUnsuccessfulFailoverProcesses”:[] —在任何不成功恢复结束时执行
⑦”PostIntermediateMasterFailoverProcesses”:[] —在成功的中间恢复结束时执行
⑧”PostGracefulTakeoverProcesses”:[] —在旧主位于新晋升的主之后执行

情形一：主库宕机，自动切换
① –> ① –> ③ –> ④ –> ⑤

情形二：优雅的主从切换
② –> ① –> ③ –> ④ –> ⑤ –> ⑦

情形三：手动恢复，当从库宕机或处于维护模式，此时主机宕机不会进行failover，需要手动恢复
① –> ① –> ③ –> ④ –> ⑤

情形四：手动强制恢复
① –> ③ –> ① –> ④ –> ⑤

参考链接
MySQL高可用复制管理工具 —— Orchestrator介绍