【问题记录】xxl-job admin端锁超时问题

背景

启动 xxl-job admin 端,日志开始报获取锁超时

2024-07-11 15:12:45,375 ERROR [xxl-job, admin JobScheduleHelper#scheduleThread] c.x.j.a.c.thread.JobScheduleHelper [JobScheduleHelper.java : 155] >>>>>>>>>>> xxl-job, JobScheduleHelper#scheduleThread error:{}
com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
	at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:124)
	at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:912)
	at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:354)
	at com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44)
	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java)
	at com.xxl.job.admin.core.thread.JobScheduleHelper$1.run(JobScheduleHelper.java:74)
	at java.lang.Thread.run(Thread.java:748)
2024-07-11 15:13:35,464 ERROR [xxl-job, admin JobScheduleHelper#scheduleThread] c.x.j.a.c.thread.JobScheduleHelper [JobScheduleHelper.java : 155] >>>>>>>>>>> xxl-job, JobScheduleHelper#scheduleThread error:{}
com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
	at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:124)
	at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:912)
	at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:354)
	at com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44)
	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java)
	at com.xxl.job.admin.core.thread.JobScheduleHelper$1.run(JobScheduleHelper.java:74)
	at java.lang.Thread.run(Thread.java:748)

分析

定位 admin 端代码如下

                while (!scheduleThreadToStop) {

                    // Scan Job
                    long start = System.currentTimeMillis();

                    Connection conn = null;
                    Boolean connAutoCommit = null;
                    PreparedStatement preparedStatement = null;

                    boolean preReadSuc = true;
                    try {

                        conn = XxlJobAdminConfig.getAdminConfig().getDataSource().getConnection();
                        connAutoCommit = conn.getAutoCommit();
                        conn.setAutoCommit(false);

                        preparedStatement = conn.prepareStatement(  "select * from xxl_job_lock where lock_name = 'schedule_lock' for update" );
                        preparedStatement.execute();

                        // tx start
                        ...
                    }  catch (Exception e) {
                        if (!scheduleThreadToStop) {
                            logger.error(">>>>>>>>>>> xxl-job, JobScheduleHelper#scheduleThread error:{}", e);
                        }
                    } finally {

                        // commit
                        if (conn != null) {
                            try {
                                conn.commit();
                            } catch (SQLException e) {
                                if (!scheduleThreadToStop) {
                                    logger.error(e.getMessage(), e);
                                }
                            }
                            try {
                                conn.setAutoCommit(connAutoCommit);
                            } catch (SQLException e) {
                                if (!scheduleThreadToStop) {
                                    logger.error(e.getMessage(), e);
                                }
                            }
                            try {
                                conn.close();
                            } catch (SQLException e) {
                                if (!scheduleThreadToStop) {
                                    logger.error(e.getMessage(), e);
                                }
                            }
                        }

                        // close PreparedStatement
                        if (null != preparedStatement) {
                            try {
                                preparedStatement.close();
                            } catch (SQLException e) {
                                if (!scheduleThreadToStop) {
                                    logger.error(e.getMessage(), e);
                                }
                            }
                        }
                    }
                    long cost = System.currentTimeMillis()-start;


                    // Wait seconds, align second
                    if (cost < 1000) {  // scan-overtime, not wait
                        try {
                            // pre-read period: success > scan each second; fail > skip this period;
                            TimeUnit.MILLISECONDS.sleep((preReadSuc?1000:PRE_READ_MS) - System.currentTimeMillis()%1000);
                        } catch (InterruptedException e) {
                            if (!scheduleThreadToStop) {
                                logger.error(e.getMessage(), e);
                            }
                        }
                    }

这段代码在任务执行之前会通过 SELECT * FROM xxl_job_lock WHERE lock_name = 'schedule_lock' FOR UPDATE; 获取行锁,避免在 admin 集群的情况下重复调度,且这个任务的周期在 5 秒以内。debug 后发现每次都会阻塞在获取行锁,并且 50 秒报一次错误日志

SHOW GLOBAL VARIABLES LIKE 'innodb_lock_wait_timeout';

发现获取锁超时时间确实是 50 秒,也就是说这个周期任务每次都会卡在获取行锁阻塞超时,接下来查看是什么阻塞了每次获取锁的事务

SELECT * FROM INFORMATION_SCHEMA.INNODB_TRX;

这条 SQL 是查当前事务,能查出两条记录,一条是 LOCK_WAITS 状态的 SELECT * FROM xxl_job_lock WHERE lock_name = 'schedule_lock' FOR UPDATE; 语句,即被阻塞的事务;另一条是 RUNNING 状态的长事务,从 2 个小时前开始。初步判断是因为这条长事务

等到想查这条长事务具体的 SQL 时,却发现那条长事务已经提交或回滚了,程序又恢复正常了,晕。。。

最后整理一下定位此类问题的 SQL

SELECT * FROM sys.innodb_lock_waits;

SELECT * FROM INFORMATION_SCHEMA.INNODB_TRX;

SELECT * FROM performance_schema.data_locks;

SHOW FULL PROCESSLIST;

select thread_id,processlist_id from performance_schema.threads where processlist_id=1352;

select * from performance_schema.events_statements_history where thread_id=4767;​

select * from performance_schema.events_statements_current;

社区

发现 xxl-job github 也有类似 issue:

https://github.com/xuxueli/xxl-job/issues?q=Lock+wait+timeout+exceeded%3B

相关推荐

  1. 问题记录xxl-job admin超时问题

    2024-07-12 04:58:05       26 阅读
  2. 问题记录】线程池死问题

    2024-07-12 04:58:05       24 阅读
  3. SpringClould超时问题汇总

    2024-07-12 04:58:05       27 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-07-12 04:58:05       67 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-07-12 04:58:05       71 阅读
  3. 在Django里面运行非项目文件

    2024-07-12 04:58:05       58 阅读
  4. Python语言-面向对象

    2024-07-12 04:58:05       69 阅读

热门阅读

  1. CNN -1 神经网络-概述2

    2024-07-12 04:58:05       25 阅读
  2. 使用 CSS 实现多列布局

    2024-07-12 04:58:05       22 阅读
  3. 【LeetCode】633. 平方数之和

    2024-07-12 04:58:05       20 阅读
  4. 深度优先算法-DFS(算法篇)

    2024-07-12 04:58:05       24 阅读
  5. C++ override关键字

    2024-07-12 04:58:05       24 阅读
  6. LeetCode 2974.最小数字游戏:排序+交换奇偶位

    2024-07-12 04:58:05       30 阅读
  7. 实现前端用户密码重置功能(有源码)

    2024-07-12 04:58:05       28 阅读
  8. [XCUITest] 处理iOS权限点击授权 有哪些权限?

    2024-07-12 04:58:05       23 阅读
  9. 周鸿祎关于2024年AI看法

    2024-07-12 04:58:05       19 阅读