项目概述 Otter是基于数据库增量日志解析,准实时同步到本地机房或异地机房的MySQL/Oracle数据库。
Otter基于canal实现数据库增量日志数据的获取
manager运行过程中推送同步配置到node节点
node节点将同步状态反馈到manager
基于zookeeper,解决分布式状态调度,允许多个node节点协同工作
otter的S/E/T/L stage阶段模型Tips:select阶段为解决数据来源的差异性,比如接入canal获取增量数据,Extract/Transform/Load 阶段类似于数据仓库的ETL模型,具体可为数据join,数据转化,数据Load
Otter应用场景
异构数据库同步:目前源端只支持MySQL,目标端可以是MySQL或者Oracle,功能主要取决于canal
单机房同步:RTT<1ms,可用于数据库版本升级、数据迁移、异步二级索引
异地机房同步:可实现机房容灾
双向同步:避免回环算法以及数据最终一致性算法
文件同步:进行数据复制的同时,复制相关的图片等文件
名词说明
pipeline:从源端到目标端的整个过程描述,主要由一些同步映射过程组成
channel:同步通道,单向同步中有一个,双向同步有两个
DataMediaPair:根据业务表定义映射关系
DataMedia:抽象的数据介质概念,可以理解为数据表
DataMediaSource:抽象的数据介质源信息,补充描述DataMedia
ColumnPair:定义字段映射关系
ColumnGroup:定义字段映射组
Node:处理同步过程的工作节点,对应一个jvm
项目部署
IP
角色
10.0.139.161
Otter Manager,MySQL(配置库),Zookeeper集群
10.0.139.162
Otter Node,MySQL(数据节点),Zookeeper集群
10.0.139.163
Otter Node,MySQL(数据节点),Zookeeper集群
MySQL部署步骤就省略了,其中要求数据库开启binlog,并且binlog_format必须为ROW模式
Zookeeper集群部署 解压安装包
$ tar -xvf apache-zookeeper-3.5.5-bin.tar.gz $ mv apache-zookeeper-3.5.5-bin /usr/local/zookeeper
配置环境变量
$ echo "export PATH=$PATH:/usr/local/zookeeper/bin" >> /etc/profile $ echo "export ZOOKEEPER_HOME=/usr/local/zookeeper"
创建数据目录
$ mkdir /service/zookeeper/data -p $ mkdir /service/zookeeper/logs -p
编辑配置文件
$ cat /usr/local/zookeeper/conf/zoo.cfg tickTime=2000 initLimit=10 syncLimit=5 dataDir=/service/zookeeper/data dataLogDir=/service/zookeeper/logs clientPort=2181 server.1=10.0.139.161:2888:3888 server.2=10.0.139.162:2888:3888 server.3=10.0.139.163:2888:3888
Tips:在datadir下创建myid文件,将当前节点的编号写入进去(例如161对应的就是1)
启动服务
$ zkServer.sh start /usr/bin/java ZooKeeper JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED $ zkServer.sh status /usr/bin/java ZooKeeper JMX enabled by default Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg Client port found: 2181. Client address: localhost. Mode: leader
Manager部署 解压安装包
$ mkdir /usr/local/otter-manager $ tar -xvf manager.deployer-4.2.18.tar.gz -C /usr/local/otter-manager/
初始化manager系统表
root@(none) 16:04: source /media/mysql/otter-otter-4.2.18/manager/deployer/src/main/resources/sql/otter-manager-schema.sql Query OK, 1 row affected (0.00 sec)
修改配置
$ cat /usr/local/otter-manager/conf/otter.properties ## otter manager domain name otter.domainName = 10.0.139.161 ## otter manager http port otter.port = 8080 ## jetty web config xml otter.jetty = jetty.xml ## otter manager database config otter.database.driver.class.name = com.mysql.jdbc.Driver otter.database.driver.url = jdbc:mysql://10.0.139.161:3306/otter otter.database.driver.username = root otter.database.driver.password = Abcd123# ## otter communication port otter.communication.manager.port = 1099 ## otter communication payload size (default = 8388608) otter.communication.payload = 8388608 ## otter communication pool size otter.communication.pool.size = 10 ## default zookeeper address otter.zookeeper.cluster.default = 10.0.139.161:2181 ## default zookeeper sesstion timeout = 60s otter.zookeeper.sessionTimeout = 60000 ## otter arbitrate connect manager config otter.manager.address = ${otter.domainName}:${otter.communication.manager.port} ## should run in product mode , true/false otter.manager.productionMode = true ## self-monitor enable or disable otter.manager.monitor.self.enable = true ## self-montir interval , default 120s otter.manager.monitor.self.interval = 120 ## auto-recovery paused enable or disable otter.manager.monitor.recovery.paused = true # manager email user config otter.manager.monitor.email.host = smtp.gmail.com otter.manager.monitor.email.username = otter.manager.monitor.email.password = otter.manager.monitor.email.stmp.port = 465
启动manager
$ sh /usr/local/otter-manager/bin/startup.sh
访问http://10.0.139.161:8090/login.htm,默认用户名密码为admin/admin
添加zookeeper集群
Node部署 在manager添加机器,添加完成会自动生成node对应的唯一标识nid
机器名称:可自定义
机器ip:对应node节点将要部署的机器ip,如果有多ip时,可选择其中一个ip进行暴露
机器端口:node数据通讯端口
下载端口:node数据下载端口
外部ip:可以通过外部ip走公网处理
zookeeper集群:为提升通讯效率,不同机房的机器可选择就近的zookeeper集群
node节点进行跨机房传输时,会使用到HTTP多线程传输技术,目前主要依赖了aria2c做为其下载客户端,因此还需要部署一下aria2cDownLoad
$ tar -xvf aria2-1.19.0.tar.gz $ cd aria2-1.19.0 $ ./configure --prefix=/usr/local/aria2 $ make && make install -j 4 $ echo " export PATH=$PATH:/usr/local/aria2/bin" >> /etc/profile $ source /etc/profile
解压node安装包
$ mkdir /usr/local/otter-node $ tar -xvf node.deployer-4.2.18.tar.gz -C /usr/local/otter-node/
nid配置(对应manager上的nid)
$ echo 1 > /usr/local/otter-node/conf/nid
配置修改
# otter node root dir otter.nodeHome = ${user.dir}/../ ## otter node dir otter.htdocs.dir = ${otter.nodeHome}/htdocs otter.download.dir = ${otter.nodeHome}/download otter.extend.dir= ${otter.nodeHome}/extend ## default zookeeper sesstion timeout = 60s otter.zookeeper.sessionTimeout = 60000 ## otter communication payload size (default = 8388608) otter.communication.payload = 8388608 ## otter communication pool size otter.communication.pool.size = 10 ## otter arbitrate & node connect manager config otter.manager.address = 10.0.139.161:1099
启动node
$ sh /usr/local/otter-node/bin/startup.sh
同步配置(双A同步,即两个节点都可以写入) 创建用户
root@(none) 10:23: create user 'canal'@'%' identified by 'Abcd123#'; Query OK, 0 rows affected (0.01 sec) root@(none) 10:24: GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT on *.* to 'canal'@'%'; Query OK, 0 rows affected (0.01 sec) root@(none) 10:24: flush privileges; Query OK, 0 rows affected (0.01 sec)
数据源配置
数据表配置
添加canal
双向同步需要在node节点的数据库上配置数据表
/* 供 otter 使用, otter 需要对 retl.* 的读写权限,以及对业务表的读写权限 1. 创建database retl */ CREATE DATABASE retl; /* 2. 用户授权 给同步用户授权 */ CREATE USER retl@'%' IDENTIFIED BY 'retl'; GRANT USAGE ON *.* TO `retl`@'%'; GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO `retl`@'%'; GRANT SELECT, INSERT, UPDATE, DELETE, EXECUTE ON `retl`.* TO `retl`@'%'; /* 业务表授权,这里可以限定只授权同步业务的表 */ GRANT SELECT, INSERT, UPDATE, DELETE ON *.* TO `retl`@'%'; /* 3. 创建系统表 */ USE retl; DROP TABLE IF EXISTS retl.retl_buffer; DROP TABLE IF EXISTS retl.retl_mark; DROP TABLE IF EXISTS retl.xdual; CREATE TABLE retl_buffer ( ID BIGINT(20) AUTO_INCREMENT, TABLE_ID INT(11) NOT NULL, FULL_NAME varchar(512), TYPE CHAR(1) NOT NULL, PK_DATA VARCHAR(256) NOT NULL, GMT_CREATE TIMESTAMP NOT NULL, GMT_MODIFIED TIMESTAMP NOT NULL, CONSTRAINT RETL_BUFFER_ID PRIMARY KEY (ID) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE retl_mark ( ID BIGINT AUTO_INCREMENT, CHANNEL_ID INT(11), CHANNEL_INFO varchar(128), CONSTRAINT RETL_MARK_ID PRIMARY KEY (ID) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; CREATE TABLE xdual ( ID BIGINT(20) NOT NULL AUTO_INCREMENT, X timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (ID) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8; /* 4. 插入初始化数据 */ INSERT INTO retl.xdual(id, x) VALUES (1,now()) ON DUPLICATE KEY UPDATE x = now();
添加channel,一个channel即看作一个同步任务
添加两个pipeline,其中一个需要设置为主节点,并且只支持一个节点进行DDL(高级设置),另外需要注意两个pipeline的select和load选择不一样
针对不同的pipeline配置数据库表映射关系
回到同步管理界面启动channel即可
检查pipeline对应的监控和日志
更多项目文档内容请参考项目Otter Wiki