今天看到个父子线程提交导致线程池死锁的问题,把该问题复盘一下
模拟
public static void main(String[] args) throws Exception {
ThreadPoolExecutor executor = new ThreadPoolExecutor(10, 10, 60, TimeUnit.SECONDS, new LinkedBlockingQueue<>());
for (int i = 0; i < 10; i++) {
executor.execute(() -> {
try {
System.out.println("父任务开始");
TimeUnit.SECONDS.sleep(5);
Future<?> future = executor.submit(() -> System.out.println("子任务开始"));
System.out.println("等待子任务结果:" + future.get());
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
throw new RuntimeException(e);
}
});
}
TimeUnit.SECONDS.sleep(6);
System.out.println(executor.getQueue());
// executor.shutdown();
}
分析
首先看下线程池配置,10 个核心线程数,10 个最大线程数,无界的任务队列。开始一下子提交了 10 个父任务,于是核心线程立刻被用完了,后面的任务只能进阻塞队列。每个父任务等待 5 秒(模拟业务耗时)后开始提交子任务,并同步等子任务的结果,然后死锁就产生了
因为所有父任务的子任务都在阻塞队列中,他们在等执行父任务的工作线程空闲后把他们取出来执行;而所有父任务都在等子任务完成,这样就产生了死锁
在 main 线程 6 秒后,我打印了任务队列的情况,确实有 10 个子任务。再 jstack 确认下,这里截取了一部分
"pool-1-thread-4" #14 prio=5 os_prio=31 tid=0x00007f95a696c000 nid=0x5b03 waiting on condition [0x00007000054cb000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076bbb0780> (a java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at com.cyk.lottery.application.mq.Main.lambda$main$1(Main.java:22)
at com.cyk.lottery.application.mq.Main$$Lambda$1/1717159510.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-1-thread-3" #13 prio=5 os_prio=31 tid=0x00007f95a60df800 nid=0x7b03 waiting on condition [0x00007000053c8000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076b68dd58> (a java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at com.cyk.lottery.application.mq.Main.lambda$main$1(Main.java:22)
at com.cyk.lottery.application.mq.Main$$Lambda$1/1717159510.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"pool-1-thread-2" #12 prio=5 os_prio=31 tid=0x00007f95a60de800 nid=0x5903 waiting on condition [0x00007000052c5000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076ba69098> (a java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
也可以看到所有父任务都处于休眠状态,卡在 FutureTask#get
解决方案
如果生产发生这样的问题,无疑是一场灾难,线程池会因此直接瘫痪,并且如果后续的任务不断涌进任务队列,也有让应用 OOM 的风险
怎么解决呢?其实解决方案很简单,就是区分父子任务的线程池,不要混用