项目概述

Otter是基于数据库增量日志解析,准实时同步到本地机房或异地机房的MySQL/Oracle数据库。
Otter

  • Otter基于canal实现数据库增量日志数据的获取
  • manager运行过程中推送同步配置到node节点
  • node节点将同步状态反馈到manager
  • 基于zookeeper,解决分布式状态调度,允许多个node节点协同工作

otter的S/E/T/L stage阶段模型
Otter SETL
Tips:select阶段为解决数据来源的差异性,比如接入canal获取增量数据,Extract/Transform/Load 阶段类似于数据仓库的ETL模型,具体可为数据join,数据转化,数据Load

Otter应用场景

  1. 异构数据库同步:目前源端只支持MySQL,目标端可以是MySQL或者Oracle,功能主要取决于canal
  2. 单机房同步:RTT<1ms,可用于数据库版本升级、数据迁移、异步二级索引
  3. 异地机房同步:可实现机房容灾
  4. 双向同步:避免回环算法以及数据最终一致性算法
  5. 文件同步:进行数据复制的同时,复制相关的图片等文件

名词说明

otter model

  • pipeline:从源端到目标端的整个过程描述,主要由一些同步映射过程组成
  • channel:同步通道,单向同步中有一个,双向同步有两个
  • DataMediaPair:根据业务表定义映射关系
  • DataMedia:抽象的数据介质概念,可以理解为数据表
  • DataMediaSource:抽象的数据介质源信息,补充描述DataMedia
  • ColumnPair:定义字段映射关系
  • ColumnGroup:定义字段映射组
  • Node:处理同步过程的工作节点,对应一个jvm

项目部署

IP 角色
10.0.139.161 Otter Manager,MySQL(配置库),Zookeeper集群
10.0.139.162 Otter Node,MySQL(数据节点),Zookeeper集群
10.0.139.163 Otter Node,MySQL(数据节点),Zookeeper集群

MySQL部署步骤就省略了,其中要求数据库开启binlog,并且binlog_format必须为ROW模式

Zookeeper集群部署

解压安装包

$ tar -xvf apache-zookeeper-3.5.5-bin.tar.gz 
$ mv apache-zookeeper-3.5.5-bin /usr/local/zookeeper

配置环境变量

$ echo "export PATH=$PATH:/usr/local/zookeeper/bin" >> /etc/profile
$ echo "export ZOOKEEPER_HOME=/usr/local/zookeeper"

创建数据目录

$ mkdir /service/zookeeper/data -p
$ mkdir /service/zookeeper/logs -p

编辑配置文件

$ cat /usr/local/zookeeper/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/service/zookeeper/data
dataLogDir=/service/zookeeper/logs
clientPort=2181
server.1=10.0.139.161:2888:3888
server.2=10.0.139.162:2888:3888
server.3=10.0.139.163:2888:3888

Tips:在datadir下创建myid文件,将当前节点的编号写入进去(例如161对应的就是1)

启动服务

$ zkServer.sh  start
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

$ zkServer.sh status
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: leader

Manager部署

解压安装包

$ mkdir /usr/local/otter-manager
$ tar -xvf manager.deployer-4.2.18.tar.gz -C /usr/local/otter-manager/

初始化manager系统表

root@(none) 16:04:  source /media/mysql/otter-otter-4.2.18/manager/deployer/src/main/resources/sql/otter-manager-schema.sql
Query OK, 1 row affected (0.00 sec)

修改配置

$ cat /usr/local/otter-manager/conf/otter.properties
## otter manager domain name
otter.domainName = 10.0.139.161
## otter manager http port
otter.port = 8080
## jetty web config xml
otter.jetty = jetty.xml

## otter manager database config
otter.database.driver.class.name = com.mysql.jdbc.Driver
otter.database.driver.url = jdbc:mysql://10.0.139.161:3306/otter
otter.database.driver.username = root
otter.database.driver.password = Abcd123#

## otter communication port
otter.communication.manager.port = 1099

## otter communication payload size (default = 8388608)
otter.communication.payload = 8388608

## otter communication pool size
otter.communication.pool.size = 10

## default zookeeper address
otter.zookeeper.cluster.default = 10.0.139.161:2181
## default zookeeper sesstion timeout = 60s
otter.zookeeper.sessionTimeout = 60000

## otter arbitrate connect manager config
otter.manager.address = ${otter.domainName}:${otter.communication.manager.port}

## should run in product mode , true/false
otter.manager.productionMode = true

## self-monitor enable or disable
otter.manager.monitor.self.enable = true
## self-montir interval , default 120s
otter.manager.monitor.self.interval = 120
## auto-recovery paused enable or disable
otter.manager.monitor.recovery.paused = true
# manager email user config
otter.manager.monitor.email.host = smtp.gmail.com
otter.manager.monitor.email.username =
otter.manager.monitor.email.password =
otter.manager.monitor.email.stmp.port = 465

启动manager

$ sh /usr/local/otter-manager/bin/startup.sh

访问http://10.0.139.161:8090/login.htm,默认用户名密码为admin/admin
otter1

添加zookeeper集群
add-zookeeper

Node部署

在manager添加机器,添加完成会自动生成node对应的唯一标识nid
add-node

  • 机器名称:可自定义
  • 机器ip:对应node节点将要部署的机器ip,如果有多ip时,可选择其中一个ip进行暴露
  • 机器端口:node数据通讯端口
  • 下载端口:node数据下载端口
  • 外部ip:可以通过外部ip走公网处理
  • zookeeper集群:为提升通讯效率,不同机房的机器可选择就近的zookeeper集群

node节点进行跨机房传输时,会使用到HTTP多线程传输技术,目前主要依赖了aria2c做为其下载客户端,因此还需要部署一下aria2cDownLoad

$ tar -xvf aria2-1.19.0.tar.gz 
$ cd aria2-1.19.0
$ ./configure --prefix=/usr/local/aria2
$ make && make install -j 4
$ echo " export PATH=$PATH:/usr/local/aria2/bin" >> /etc/profile
$ source /etc/profile

解压node安装包

$ mkdir /usr/local/otter-node
$ tar -xvf node.deployer-4.2.18.tar.gz -C /usr/local/otter-node/

nid配置(对应manager上的nid)

$ echo 1 > /usr/local/otter-node/conf/nid

配置修改

# otter node root dir
otter.nodeHome = ${user.dir}/../

## otter node dir
otter.htdocs.dir = ${otter.nodeHome}/htdocs
otter.download.dir = ${otter.nodeHome}/download
otter.extend.dir= ${otter.nodeHome}/extend

## default zookeeper sesstion timeout = 60s
otter.zookeeper.sessionTimeout = 60000

## otter communication payload size (default = 8388608)
otter.communication.payload = 8388608

## otter communication pool size
otter.communication.pool.size = 10

## otter arbitrate & node connect manager config
otter.manager.address = 10.0.139.161:1099

启动node

$ sh /usr/local/otter-node/bin/startup.sh

node

同步配置(双A同步,即两个节点都可以写入)

创建用户

root@(none) 10:23:  create user 'canal'@'%' identified by 'Abcd123#';
Query OK, 0 rows affected (0.01 sec)

root@(none) 10:24: GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT on *.* to 'canal'@'%';
Query OK, 0 rows affected (0.01 sec)

root@(none) 10:24: flush privileges;
Query OK, 0 rows affected (0.01 sec)

数据源配置
add-datasource
datasource

数据表配置
add-datatable
datatable

添加canal
add-canal
canal

双向同步需要在node节点的数据库上配置数据表

/*
供 otter 使用, otter 需要对 retl.* 的读写权限,以及对业务表的读写权限
1. 创建database retl
*/
CREATE DATABASE retl;

/* 2. 用户授权 给同步用户授权 */
CREATE USER retl@'%' IDENTIFIED BY 'retl';
GRANT USAGE ON *.* TO `retl`@'%';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO `retl`@'%';
GRANT SELECT, INSERT, UPDATE, DELETE, EXECUTE ON `retl`.* TO `retl`@'%';
/* 业务表授权,这里可以限定只授权同步业务的表 */
GRANT SELECT, INSERT, UPDATE, DELETE ON *.* TO `retl`@'%';

/* 3. 创建系统表 */
USE retl;
DROP TABLE IF EXISTS retl.retl_buffer;
DROP TABLE IF EXISTS retl.retl_mark;
DROP TABLE IF EXISTS retl.xdual;

CREATE TABLE retl_buffer
(
ID BIGINT(20) AUTO_INCREMENT,
TABLE_ID INT(11) NOT NULL,
FULL_NAME varchar(512),
TYPE CHAR(1) NOT NULL,
PK_DATA VARCHAR(256) NOT NULL,
GMT_CREATE TIMESTAMP NOT NULL,
GMT_MODIFIED TIMESTAMP NOT NULL,
CONSTRAINT RETL_BUFFER_ID PRIMARY KEY (ID)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE retl_mark
(
ID BIGINT AUTO_INCREMENT,
CHANNEL_ID INT(11),
CHANNEL_INFO varchar(128),
CONSTRAINT RETL_MARK_ID PRIMARY KEY (ID)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

CREATE TABLE xdual (
ID BIGINT(20) NOT NULL AUTO_INCREMENT,
X timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (ID)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

/* 4. 插入初始化数据 */
INSERT INTO retl.xdual(id, x) VALUES (1,now()) ON DUPLICATE KEY UPDATE x = now();

添加channel,一个channel即看作一个同步任务
add-channel

添加两个pipeline,其中一个需要设置为主节点,并且只支持一个节点进行DDL(高级设置),另外需要注意两个pipeline的select和load选择不一样
add-pipeline
pipeline

针对不同的pipeline配置数据库表映射关系
add-image

回到同步管理界面启动channel即可
start channel

检查pipeline对应的监控和日志
log check

更多项目文档内容请参考项目Otter Wiki