pt-stalk采用shell脚本编写,主要用于在问题时间点收集OS及MySQL的诊断信息,包括CPU,内存,磁盘等资源以及数据库锁等待,主从复制,状态等信息。

触发

pt-stalk可以用后台服务的方式监控MySQL并指定触发条件,当触发条件时收集相关当前时间点系统和数据库信息。相关参数如下:

  • function:默认为status,表示监控show global status的输出;processlist表示监控show processlist输出,也可以自定义监控脚本
  • variable:默认为Threads_runing,表示监控参数,可根据监控输出自行指定
  • threshold:默认为25,表示监控阈值,超过阈值将触发条件;如果参数非数值,需要配合match一同使用,
  • cycles:默认为5,表示连续满足5次触发条件,才触发信息收集
  • iterations:指定收集次数,到达参数指定后退出,默认一直运行
  • run-time:收集多长时间的数据,默认30秒
  • sleep:前一次触发收集后,休息多长时间再次开启监控,默认300秒
  • interval:状态检查频率,默认1秒
  • dest:监控数据存放地址,默认为/var/lib/pt-stalk
  • retention-time:监控数据保留时长,默认30天
  • daemonize:后台运行
  • log:运行日志,默认为/var/log/pt-stalk.log
  • collect:条件触发时收集诊断数据。collect-gdb表示收集GDB堆栈跟踪;collect-strace表示收集跟踪数据;collect-tcpdump表示收集tcpdump数据

示例

创建cpu使用率判断脚本

$ cat /root/highcpu.sh
function cpu_check(){
a=$(sar 1 1 | grep -i "Average:"| awk '{print $8}');echo 100 - $a |bc
}

开启守护进程,cpu使用率超过50%触发收集

pt-stalk --daemonize --dest=/tmp/pt-stalk --user=root --password=Abcd123# --port=33006 --function=/root/highcpu.sh --variable highcpu --cycles=3 --interval=1 --threshold 50 --sleep=60 --log=/var/log/pt-stalk.log

查看收集的文件

[root@t-luhx03-v-szzb pt-stalk]# ls -lrt
total 4800
-rw-r----- 1 root root 395 Jan 18 16:56 2021_01_18_16_56_59-trigger
-rw-r----- 1 root root 14937 Jan 18 16:56 2021_01_18_16_56_59-pmap
-rw-r----- 1 root root 24865 Jan 18 16:57 2021_01_18_16_56_59-variables
-rw-r----- 1 root root 1258 Jan 18 16:57 2021_01_18_16_56_59-log_error
-rw-r----- 1 root root 9370 Jan 18 16:57 2021_01_18_16_56_59-innodbstatus1
-rw-r----- 1 root root 31995 Jan 18 16:57 2021_01_18_16_56_59-ps
-rw-r----- 1 root root 55 Jan 18 16:57 2021_01_18_16_56_59-mutex-status1
-rw-r----- 1 root root 8927 Jan 18 16:57 2021_01_18_16_56_59-opentables1
-rw-r----- 1 root root 39172 Jan 18 16:57 2021_01_18_16_56_59-sysctl
-rw-r----- 1 root root 9393 Jan 18 16:57 2021_01_18_16_56_59-lsof
-rw-r----- 1 root root 137 Jan 18 16:57 2021_01_18_16_56_59-disk-space
-rw-r----- 1 root root 29387 Jan 18 16:57 2021_01_18_16_56_59-iostat
-rw-r----- 1 root root 2784 Jan 18 16:57 2021_01_18_16_56_59-vmstat
-rw-r----- 1 root root 1081739 Jan 18 16:57 2021_01_18_16_56_59-mysqladmin
-rw-r----- 1 root root 104546 Jan 18 16:57 2021_01_18_16_56_59-procstat
-rw-r----- 1 root root 38100 Jan 18 16:57 2021_01_18_16_56_59-meminfo
-rw-r----- 1 root root 32462 Jan 18 16:57 2021_01_18_16_56_59-diskstats
-rw-r----- 1 root root 77935 Jan 18 16:57 2021_01_18_16_56_59-procvmstat
-rw-r----- 1 root root 89640 Jan 18 16:57 2021_01_18_16_56_59-netstat_s
-rw-r----- 1 root root 165210 Jan 18 16:57 2021_01_18_16_56_59-interrupts
-rw-r----- 1 root root 28080 Jan 18 16:57 2021_01_18_16_56_59-df
-rw-r----- 1 root root 375240 Jan 18 16:57 2021_01_18_16_56_59-slabinfo
-rw-r----- 1 root root 24018 Jan 18 16:57 2021_01_18_16_56_59-processlist
-rw-r----- 1 root root 735190 Jan 18 16:57 2021_01_18_16_56_59-netstat
-rw-r----- 1 root root 11730 Jan 18 16:57 2021_01_18_16_56_59-slave-status
-rw-r----- 1 root root 9423 Jan 18 16:57 2021_01_18_16_56_59-innodbstatus2
-rw-r----- 1 root root 16 Jan 18 16:57 2021_01_18_16_56_59-hostname
-rw-r----- 1 root root 8927 Jan 18 16:57 2021_01_18_16_56_59-opentables2
-rw-r----- 1 root root 55 Jan 18 16:57 2021_01_18_16_56_59-mutex-status2
-rw-r----- 1 root root 1242 Jan 18 16:57 2021_01_18_16_56_59-mpstat-overall
-rw-r----- 1 root root 18149 Jan 18 16:57 2021_01_18_16_56_59-mpstat
-rw-r----- 1 root root 2031 Jan 18 16:57 2021_01_18_16_56_59-iostat-overall
-rw-r----- 1 root root 326 Jan 18 16:57 2021_01_18_16_56_59-vmstat-overall
-rw-r----- 1 root root 379534 Jan 18 16:58 2021_01_18_16_56_59-top
-rw-r----- 1 root root 36862 Jan 18 16:58 2021_01_18_16_56_59-output
-rw-r----- 1 root root 396 Jan 18 16:58 2021_01_18_16_58_33-trigger
-rw-r----- 1 root root 15691 Jan 18 16:58 2021_01_18_16_58_33-pmap
-rw-r----- 1 root root 21675 Jan 18 16:58 2021_01_18_16_58_33-variables
-rw-r----- 1 root root 1258 Jan 18 16:58 2021_01_18_16_58_33-log_error
-rw-r----- 1 root root 9372 Jan 18 16:58 2021_01_18_16_58_33-innodbstatus1
-rw-r----- 1 root root 31379 Jan 18 16:58 2021_01_18_16_58_33-ps
-rw-r----- 1 root root 55 Jan 18 16:58 2021_01_18_16_58_33-mutex-status1
-rw-r----- 1 root root 8927 Jan 18 16:58 2021_01_18_16_58_33-opentables1
-rw-r----- 1 root root 39173 Jan 18 16:58 2021_01_18_16_58_33-sysctl
-rw-r----- 1 root root 0 Jan 18 16:58 2021_01_18_16_58_33-dmesg
-rw-r----- 1 root root 244 Jan 18 16:58 2021_01_18_16_58_33-vmstat-overall
-rw-r----- 1 root root 1054 Jan 18 16:58 2021_01_18_16_58_33-iostat-overall
-rw-r----- 1 root root 76 Jan 18 16:58 2021_01_18_16_58_33-mpstat-overall
-rw-r----- 1 root root 9393 Jan 18 16:58 2021_01_18_16_58_33-lsof
-rw-r----- 1 root root 76856 Jan 18 16:58 2021_01_18_16_58_33-top
-rw-r----- 1 root root 6489 Jan 18 16:58 2021_01_18_16_58_33-mpstat
-rw-r----- 1 root root 11801 Jan 18 16:58 2021_01_18_16_58_33-iostat
-rw-r----- 1 root root 1146 Jan 18 16:58 2021_01_18_16_58_33-vmstat
-rw-r----- 1 root root 430080 Jan 18 16:58 2021_01_18_16_58_33-mysqladmin
-rw-r----- 1 root root 31173 Jan 18 16:58 2021_01_18_16_58_33-procvmstat
-rw-r----- 1 root root 41820 Jan 18 16:58 2021_01_18_16_58_33-procstat
-rw-r----- 1 root root 528 Jan 18 16:58 2021_01_18_16_58_33-lock-waits
-rw-r----- 1 root root 66084 Jan 18 16:58 2021_01_18_16_58_33-interrupts
-rw-r----- 1 root root 528 Jan 18 16:58 2021_01_18_16_58_33-transactions
-rw-r----- 1 root root 12984 Jan 18 16:58 2021_01_18_16_58_33-diskstats
-rw-r----- 1 root root 528 Jan 18 16:58 2021_01_18_16_58_33-prepared-statements
-rw-r----- 1 root root 11232 Jan 18 16:58 2021_01_18_16_58_33-df
-rw-r----- 1 root root 35856 Jan 18 16:58 2021_01_18_16_58_33-netstat_s
-rw-r----- 1 root root 15240 Jan 18 16:58 2021_01_18_16_58_33-meminfo
-rw-r----- 1 root root 150096 Jan 18 16:58 2021_01_18_16_58_33-slabinfo
-rw-r----- 1 root root 11118 Jan 18 16:58 2021_01_18_16_58_33-processlist
-rw-r----- 1 root root 289026 Jan 18 16:58 2021_01_18_16_58_33-netstat
-rw-r----- 1 root root 4692 Jan 18 16:58 2021_01_18_16_58_33-slave-status
-rw-r----- 1 root root 19495 Jan 18 16:58 2021_01_18_16_58_33-output
-rw-r----- 1 root root 137 Jan 18 16:58 2021_01_18_16_58_33-disk-space

注:重点关注的有innodbstatus、iostat、lock-waits,transactions等

分析

percona-toolkit中还有一个pt-sift工具用于分析pt-stalk采集的数据,进行汇总展示。

[root@t-luhx03-v-szzb ~]# pt-sift /tmp/log/pt-stalk/2021_01_18_17_12_00
======== t-luhx03-v-szzb at 2021_01_18_17_12_00 DEFAULT (15 of 15) ========
--diskstats--
#ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime
{29} dm-0 0.1 39.0 0.0 0% 0.0 13.0 13.1 6.8 0.1 0% 0.0 1.3 0% 0 13.2 1.2 0.3
dm-0 0% . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
--vmstat--
r b swpd free buff cache si so bi bo in cs us sy id wa st
14 0 133756 134628 277768 3901324 0 0 1 38 1 0 2 1 97 0 0
0 0 133756 133356 277784 3904740 0 0 5 150 2538 2345 2 2 95 0 0
wa 0% . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
--innodb--
txns: 1xnot (0s)
0 queries inside InnoDB, 0 queries in queue
Main thread: sleeping, pending reads 0, writes 0, flush 0
Log: lsn = 17603194, chkp = 17603185, chkp age = 9
Threads are waiting at:
Threads are waiting on:
--processlist--
State
2
1 Waiting on empty queue
1 starting
1 logging slow query
Command
2 Sleep
2 Query
1 Daemon
--stack traces--
No stack trace file exists
--oprofile--
No opreport file exists