carbondata连接数优化

2024-06-16 22:20:03
开发
31

一，背景

  carbondata的入库采用arbonData Thrift Server方式提供，由于存在异常的入库segments但是显示状态是success，所以每天运行另一个博客中的脚本，出现连接超时，运行不正常，排查是每天连接数太多，每天将segments都遍历一遍。

二优化策略

a,策略一：
1，通过添加spark的调度池
在Spark中，调度池（Scheduler Pool）用于为不同的作业分配资源池，以控制其执行优先级。设置调度池可以帮助管理不同作业之间的资源争用情况。要使用调度池，您需要配置Fair Scheduler并创建相应的调度池配置文件。
1-1 设置调度池
spark.sql.hive.thriftServer.scheduler.pool=my-pool
1-2配置调度池文件
cp fairscheduler.xml.template fairscheduler.xml

 <pool name="my-pool">
       <schedulingMode>FAIR</schedulingMode>
               <weight>1</weight>
                       <minShare>3</minShare>
                               <maxRunningApps>50</maxRunningApps>
                                       <maxResources>100g,50</maxResources>
                                               <minResources>4g,8</minResources>
                                                       <fairSharePreemptionTimeout>300</fairSharePreemptionTimeout>
                                                               <minSharePreemptionTimeout>120</minSharePreemptionTimeout>
                                                                       <fairSharePreemptionThreshold>0.5</fairSharePreemptionThreshold>
                                                                           </pool>

2，启用异步模式，提搞并发能力
 spark.sql.hive.thriftServer.async = true 
3,spark-default中配置


```xml
spark.sql.hive.thriftServer.scheduler.pool=my-pool
spark.sql.hive.thriftServer.thrift.port=10000
spark.sql.hive.thriftServer.idleSessionTimeout=3600
spark.sql.hive.thriftServer.async=true

4，启动命令
     /bin/spark-submit --master yarn   --conf spark.driver.maxResultSize=20g --conf spark.sql.hive.thriftServer.scheduler.pool=my-pool  --conf spark.scheduler.mode=FAIR \
    --conf spark.scheduler.allocation.file=$SPARK_HOME/conf/fairscheduler.xml --conf spark.sql.shuffle.partition=50 --driver-memory 25g --executor-cores 4 --executor-memory 5G --num-executors 10 --class org.apache.carbondata.spark.thriftserver.CarbonThriftServer $SPARK_HOME/carbonlib/apache-carbondata-2.X-bin-sparkx-hadoop2.x.x.jar 
通过指定spark.sql.hive.thriftServer.scheduler.pool设置
5，验证
    通过查看是否 有create pool和 Removed from pool
b,策略二：
    可以尝试通过zk进行负载均衡，这样还待测试

原文地址:https://blog.csdn.net/weixin_51473488/article/details/139647784 本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：https://www.suanlizi.com/kf/1802345394584293376.html 如若内容造成侵权/违法违规/事实不符，请联系《酸梨子》网邮箱：1419361763@qq.com进行投诉反馈，一经查实，立即删除！

阅读全部

carbondata连接数优化

一，背景

二 优化策略

相关推荐

最近更新

热门阅读

二优化策略