Hive使用双重GroupBy解决数据倾斜问题

1.数据准备

create table wordcount(a string) row format delimited fields terminated by ‘,’;

load data local inpath ‘opt/2.txt’ into table wordcount;

hive (default)> select * from wordcount;
OK
wordcount.a
b
a
a
a
a
b
b
c
c
e
d

2.双重group by实现 解决数据倾斜

随机数:ceil(rand()*10)

select split(salt_a,‘‘)[1] alpah ,sum(count) from
(
select concat_ws(’
’,cast(ceil(rand()*10) as string),a) salt_a,count(1) count from wordcount group by concat_ws(‘‘,cast(ceil(rand()*10) as string),a)
) b group by split(salt_a,’
’)[1];

alpah _c1
a 4
b 3
c 2
d 1
e 1

解析:

2.1 第一层加盐group by

select concat_ws(‘‘,cast(ceil(rand()*10) as string),a) salt_a,count(1) count from wordcount group by concat_ws(’’,cast(ceil(rand()*10) as string),a)
salt_a count
10_a 1
10_b 1
1_a 2
2_a 1
3_b 1
4_b 1
4_c 1
4_d 1
6_c 1
7_e 1
Time taken: 176.729 seconds, Fetched: 10 row(s)

2.2 第二层去盐group by

select split(salt_a,‘_’)[1] alpah ,sum(count) from

b group by split(salt_a,‘_’)[1];

相关推荐

  1. Hive使用双重GroupBy解决数据倾斜问题

    2024-02-19 00:04:01       58 阅读
  2. [hive面试必备]-hive如何解决数据倾斜问题

    2024-02-19 00:04:01       41 阅读
  3. Hive 数据倾斜

    2024-02-19 00:04:01       34 阅读
  4. Hive优化(4)——数据倾斜优化

    2024-02-19 00:04:01       32 阅读
  5. Spark数据倾斜解决方案

    2024-02-19 00:04:01       42 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-02-19 00:04:01       94 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-02-19 00:04:01       100 阅读
  3. 在Django里面运行非项目文件

    2024-02-19 00:04:01       82 阅读
  4. Python语言-面向对象

    2024-02-19 00:04:01       91 阅读

热门阅读

  1. Qt - 不同类之间槽函数和信号的连接

    2024-02-19 00:04:01       45 阅读
  2. 部分系统函数实现

    2024-02-19 00:04:01       43 阅读
  3. GB/28181 2022 上联检测项

    2024-02-19 00:04:01       47 阅读
  4. 洛谷 P6382 『MdOI R2』Car

    2024-02-19 00:04:01       51 阅读
  5. 云服务器可以运用在哪些方面?

    2024-02-19 00:04:01       44 阅读
  6. 题记(45)--字符串匹配

    2024-02-19 00:04:01       51 阅读
  7. pytorch入门笔记二

    2024-02-19 00:04:01       49 阅读
  8. 关于数据库

    2024-02-19 00:04:01       65 阅读
  9. 面试浏览器框架八股文十问十答第一期

    2024-02-19 00:04:01       76 阅读
  10. 【SpringSecurity】2. 初学SpringSecurity

    2024-02-19 00:04:01       51 阅读
  11. C#系列-C#实现秒杀功能(14)

    2024-02-19 00:04:01       51 阅读