Spark_spark参数配置优先级

总结 :

优先级低-》优先级高

spark-submit 提交的优先级 < scala/java代码中的配置参数 < spark SQL hint

spark submit 中提交参数

#!/usr/bin/env bash

source /home/work/batch_job/product/common/common.sh
spark_version="/home/work/opt/spark"
export SPARK_CONF_DIR=${spark_version}/conf/
spark_shell="/home/opt/spark/spark3-client/bin/spark-shell"
spark_sql="/home/work/opt/spark/spark3-client/bin/spark-sql"
echo ${spark_sql}
echo ${spark_shell}
${spark_shell} --master yarn \
        --queue test \
        --name "evelopment_sun-data-new_spark_shell" \
        --conf "spark.speculation=true" \
        --conf "spark.network.timeout=400s" \
        --conf "spark.executor.cores=2" \
        --conf "spark.executor.memory=4g" \
        --conf "spark.executor.instances=300" \
        --conf "spark.driver.maxResultSize=4g" \
        --conf "spark.sql.shuffle.partitions=800" \
        --conf "spark.driver.extraJavaOptions=-Dfile.encoding=utf-8" \
        --conf "spark.executor.extraJavaOptions=-Dfile.encoding=utf-8" \
        --conf "spark.driver.memory=8g" \
        --conf "spark.sql.autoBroadcastJoinThreshold=-1" \
        --conf "spark.sql.turing.pooledHiveClientEnable=false" \
        --conf "spark.sql.hive.metastore.jars=/home/work/opt/spark/spark3-client/hive_compatibility/*" \
        --conf "spark.driver.extraClassPath=./__spark_libs__/hive-extensions-2.0.0.0-SNAPSHOT.jar:./hive_jar/parquet-hadoop-bundle-1.6.0.jar:/home/work/opt/spark/spark3-client/hive_compatibility/parquet-hadoop-bundle-1.6.0.jar" \
       --conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 \
       --conf "spark.sql.legacy.timeParserPolicy=LEGACY" \
       --conf "spark.sql.storeAssignmentPolicy=LEGACY" \
       --conf spark.executor.extraJavaOptions="-XX:+UseG1GC" \
       --jars ./online-spark-1.0-SNAPSHOT.jar

scala/java代码中的配置参数

    val conf = new SparkConf().setAppName(s"production_data-new_UserOverview_${event_day}")
    val spark = SparkSession.builder().config("spark.debug.maxToStringFields", "500").config(conf).getOrCreate()

SQL hint

SELECT /*+ MERGEJOIN(t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

Hints - Spark 3.5.0 hints Documentation

相关推荐

  1. Spark_spark参数配置优先级

    2023-12-06 23:36:06       69 阅读
  2. SpringBoot配置优先级

    2023-12-06 23:36:06       34 阅读
  3. Spring Boot配置文件优先级

    2023-12-06 23:36:06       45 阅读
  4. 优先级策略:在Eureka中配置服务实例优先级

    2023-12-06 23:36:06       26 阅读
  5. MacOS 14最新配置文件优先级

    2023-12-06 23:36:06       52 阅读

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2023-12-06 23:36:06       94 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2023-12-06 23:36:06       100 阅读
  3. 在Django里面运行非项目文件

    2023-12-06 23:36:06       82 阅读
  4. Python语言-面向对象

    2023-12-06 23:36:06       91 阅读

热门阅读

  1. spark写入数据报错

    2023-12-06 23:36:06       60 阅读
  2. pymysql的基本用法

    2023-12-06 23:36:06       60 阅读
  3. 网络数据通信—ProtoBuf实现序列化和反序列化

    2023-12-06 23:36:06       60 阅读
  4. git小白初学习

    2023-12-06 23:36:06       50 阅读
  5. 让 OpenAI GPT4 出 10 道题测试其他开源大语言模型

    2023-12-06 23:36:06       49 阅读
  6. 什么是DDI?DDI的原理和作用是什么?一文看懂

    2023-12-06 23:36:06       53 阅读
  7. USTC Fall2023 高级人工智能期末考试回忆版

    2023-12-06 23:36:06       59 阅读
  8. 力扣labuladong一刷day29天二叉树

    2023-12-06 23:36:06       57 阅读
  9. 还记得当初自己为什么选择计算机?

    2023-12-06 23:36:06       49 阅读
  10. Spring第四课,MVC终章,应用分层的好处,总结

    2023-12-06 23:36:06       58 阅读
  11. 【分布式学习】之架构、系统、集群部署

    2023-12-06 23:36:06       62 阅读