Query Processing 查询处理

Query Processing Overview • Objective: Provide correct answer to query (almost) as efficiently as possible 查询处理的目的是 高效准确查询Query Processing Operations • Query processing involves several operations: — Lexical & syntactic analysis – transform SQL into an internal form — Normalisation – collecting AND and OR predicates — Semantic analysis – i.e., does the query make sense? — Simplification – e.g., remove common or redundant sub-expressions — Generating an execution plan – query optimisation — Executing the plan and returning results to the client • To describe most of these, we need to use Relational Algebra - 词法和语法分析 - 将 SQL 转换为内部形式 - 规范化 - 收集 AND 和 OR 谓词 - 语义分析  即查询是否有意义? - 简化--例如,删除常见或多余的子表达式 - 生成执行计划--查询优化 - 执行计划并将结果返回给客户端 - 要描述其中的大部分操作,我们需要使用关系代数

Sets and Cartesian Product • Set – a collection of objects characterized by some defining property — e.g., a column in a database table such as last names of all staff • Cartesian Product of sets (×) – one of the operations on sets — e.g., consider two sets in the staff table — set of all first names ◦ fName = {Mary, David} — set of all last names ◦ lName = Howe, Ford — their cartesian product ◦ fName × lName = (Mary,Howe), (Mary,Ford), (David, Howe), (David, Ford)集合 - 以某些定义属性为特征的对象集合 笛卡尔乘积

Relation Relation – defined between two sets and is a subset of cartesian product between those two sets • e.g., FirstNameOf = (Mary, Howe), (David, Ford)关系--在两个集合之间定义,是这两个集合的卡特积子集

Relational Model • The name ‘relational model’ comes from this mathematical notion of relation — Where a relation is a set (collection) of tuples that have related objects such as first name and last name of the same person — e.g., (fName, lName) is a relation • We can have relations over any number of sets — e.g., (staffNo, fName, lName, position) • In general we can denote a relation as (A,B,C,D,... ,Z) where A, B, C and Z are all its attribute sets 关系是具有相关对象(如同一个人的名字和姓氏)的元组集合(集合)

Introducing Relational Algebra • What is relational algebra (RA) and why is it useful? — RA is a symbolic formal way of describing relational operations — RA says how, as well as what (order is important) — Can use re-write rules to simplify and optimise complex queries... • Maths example: — a + b · x + c · x 2 + d · x 3 ; 3 adds, 3 multiplies, 2 powers; — a + x · (b + x · (c + x · d)); 3 adds, 3 multiplies;关系代数是描述关系操作的一种符号化的正式方式 关系代数说明如何操作以及操作什么(顺序很重要) - 可以使用重写规则来简化和优化复杂的查询 简而言之 关系代数就是描述关系操作符号的含义

Basic Relational Algebra Operators • The basic RA operators are: — Selection σ (Sigma); Projection π (Pi); Rename ρ (Rho) • Examples基本关系代数运算符- 选择 σ (Sigma);投影 π (Pi);重命名 ρ (Rho)

关系代数符号

Query Processing Example • Example: find all managers who work at a London Branch:

• There are at least 3 ways of writing this in RA notation:One of these will be the most efficient – but which??

Lexical & Syntactical Analysis & Query Trees • Lexical & syntactical analysis involves: — identifying keywords & literals — identifying table names & aliases — mapping aliases to table names — identifying column names — checking columns exist in tables • The output of this phase is a relational algebra tree (RAT)词法和句法分析与查询树 - 词法和句法分析包括: - 识别关键字和文字 - 识别表名和别名 - 将别名映射到表名 - 识别列名 - 检查列是否存在于表中 - 该阶段的输出是关系代数树 (RAT)

Semantic Analysis • Does the query make sense? — Is the query legal SQL? — Is the RAT connected? – if not, query is incomplete! • Can the query be simplified? – for example:查询的意义 是否为合法SQL RAT连接情况(没有连接查询不完整) 查询的简化

Normalisation & Normal Forms 正则化和正则表达式

• Why is this useful? – sometimes a query might best be split into subqueries (remember set operations?): 将查询拆分成子查询• Disjunctions suggest union:分词代表并集

• Conjunctions suggest intersection:连词代表交集

Some RA Equivalences Rules (Re-Write Rules) • There are many equivalence rules (see C&B pp.736–739). Here are a few:RA等价规则

Generating Query Plans • Most RDBMSs generate candidate query plans by using RA re-write rules to generate alternate RATs and to move operations around each tree:生成查询计划--大多数 RDBMS 通过使用 RA 重写规则生成备用 RAT 并在每个树上移动操作,从而生成候选查询计划

• For complex queries, there may be a very large number of candidate plans...复杂查询可能有多种候选计划

Heuristic Query Optimisation Rules To avoid considering all possible plans, many DBMSs use heuristic rules: • keep together selections (σ) on the same table • perform selections as early as possible • re-write selection on a cartesian product as a join • perform “small joins” first • keep together projections (π) on the same relation • apply projections as early as possible • if duplicates are to be eliminated, use a sort algorithm启发式查询优化规则 为避免考虑所有可能的计划,许多 DBMS 使用启发式规则: - 将同一表上的选择 (σ) 保持在一起 - 尽早执行选择 - 将卡特积上的选择重写为连接 - 先执行 "小连接" - 将同一关系上的投影 (π) 保持在一起 - 尽早应用投影 - 如果要消除重复,则使用排序算法

Cost-Based Query Optimisation • Remember, accessing disc blocks is expensive! • Ideally, the query optimiser should take into account: — the size (cardinality) of each table — which tables have indexes — the type of each index – clustered, non-clustered — which predicates can be evaluated using an index — how much memory query will need – and for how long — whether the query can be split over multiple CPUs基于成本的查询优化  - 理想情况下,查询优化器应考虑以下因素 - 每个表的大小(cardinality) - 哪些表有索引 - 每个索引的类型 聚类、非聚类 - 哪些谓词可以使用索引进行评估 -查询需要多少内存,以及需要多长时间 - 查询是否可以在多个 CPU 上进行分割

相关推荐

  1. Elasticsearch如何处理多个关键字查询

    2024-03-18 05:10:02       17 阅读
  2. SQL Server查询计划(Query Plan)——SQL处理过程

    2024-03-18 05:10:02       36 阅读
  3. mybatis-plus循环处理多个条件的 or 查询

    2024-03-18 05:10:02       29 阅读
  4. Django杂记之数据查询和文件处理

    2024-03-18 05:10:02       12 阅读

最近更新

  1. TCP协议是安全的吗?

    2024-03-18 05:10:02       18 阅读
  2. 阿里云服务器执行yum,一直下载docker-ce-stable失败

    2024-03-18 05:10:02       19 阅读
  3. 【Python教程】压缩PDF文件大小

    2024-03-18 05:10:02       19 阅读
  4. 通过文章id递归查询所有评论(xml)

    2024-03-18 05:10:02       20 阅读

热门阅读

  1. leetcode-提莫攻击

    2024-03-18 05:10:02       23 阅读
  2. [青龙面板]依赖管理一键安装/免代码安装

    2024-03-18 05:10:02       18 阅读
  3. 程序员如何规划职业赛道?

    2024-03-18 05:10:02       24 阅读
  4. 粤嵌6818开发板嵌入式开发Linux内存映射

    2024-03-18 05:10:02       21 阅读
  5. [hive面试必备]-hive如何解决数据倾斜问题

    2024-03-18 05:10:02       26 阅读
  6. 描述CSS选择器及其优先级规则

    2024-03-18 05:10:02       20 阅读
  7. sui move 动态字段练习(4)

    2024-03-18 05:10:02       22 阅读
  8. 面试算法-41-打家劫舍

    2024-03-18 05:10:02       22 阅读
  9. 一周速递|全球车联网产业动态(2024年3月17日)

    2024-03-18 05:10:02       22 阅读
  10. MaskedArray如何填补为nan

    2024-03-18 05:10:02       21 阅读
  11. 安全地使用v-html

    2024-03-18 05:10:02       24 阅读