【数据结构】哈希表

 A structure that maps keys to values is a hash table.

That Every element has equal probability of hashing into any of the slots is simple uniform hashing.

A function that computes the location of the key in the array is a hash function.

Standard deletion cannot be performed in an open addressing hash table, because the cells might have caused collision. Hence, the hash tables implement lazy deletion.

 The average retrieval time when n keys hash to the same slot is given by Theta(n) as the collision occurs in the hash table.

基本要素

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARGF2aWRfSHp5,size_20,color_FFFFFF,t_70,g_se,x_16

重点:数据字段

TableSize:表的大小

哈希函数:理想情况下,该函数应易于计算,并应确保任何两个不同的键得到不同的单元格。每个键被映射到范围O到tablesize -1中的某个数字,并放置在适当的单元格中。

哈希的基本思想:处理函数的选择,决定当碰撞发生时该做什么,决定表的大小。

特殊情况

A collision occurs when we hash two nonidentical identifiers into the same bucket, i.e.  f ( i1 ) = f ( i2 ) when i1 != i2 .

冲突发生在当我们把两个不相同的标识符放入一个桶中。

An overflow occurs when we hash a new identifier into a full bucket.

过载发生在当我们把新的标识符放入一个满的桶中。

解决冲突方法

The load factor for an open addressing technique should be 0.5. For separate chaining technique, the load factor is 1

条件:1. f ( x ) must be easy to compute and minimizes the number of collisions.

           2.f ( x ) should be unbiased.  That is, for any x and any i, we have that Probability( f ( x ) = i ) = 1 / b.  

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARGF2aWRfSHp5,size_17,color_FFFFFF,t_70,g_se,x_16

 watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARGF2aWRfSHp5,size_18,color_FFFFFF,t_70,g_se,x_16

 watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARGF2aWRfSHp5,size_19,color_FFFFFF,t_70,g_se,x_16

 watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARGF2aWRfSHp5,size_17,color_FFFFFF,t_70,g_se,x_16

 图样:

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARGF2aWRfSHp5,size_12,color_FFFFFF,t_70,g_se,x_16

linear probing

f(i)=i is the correct function definition for linear probing.

Primary collision occurs due to linear probing technique. It is overcome using a quadratic probing technique.

quadratic probing

eq?F%28i%29%3Di%5E2

eq?Hash%20%5C%3B%20key%3D%28hash%28x%29+F%28i%5E2%29%29%25table%5C%3Bsize

double hashing

watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARGF2aWRfSHp5,size_18,color_FFFFFF,t_70,g_se,x_16

 watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBARGF2aWRfSHp5,size_19,color_FFFFFF,t_70,g_se,x_16

 Note:

1. If double hashing is correctly implemented, simulations imply that the expected number of probes is almost the same as for a random collision resolution strategy.如果双哈希是正确实现的,仿真表明预期的探测数量几乎是相同的随机碰撞解决策略。           

2. Quadratic probing does not require the use of a second hash function and is thus likely to be simpler and faster in practice.二次探测不需要使用第二个哈希函数,因此在实践中可能更简单、更快。

rehashing

Rehashing is not a collision resolution strategy for open addressing.

Clustering is not a theoretical problem but it occurs in implementations of hashing. Rehashing is a kind of hashing.

Build another table that is about twice as big; 构建另一个大约两倍大的表;

Scan down the entire original hash table for non-deleted elements; 扫描整个原始哈希表中未删除的元素;

Use a new function to hash those elements into the new table.使用一个新函数将这些元素散列到新表中。

When to rehash:1.As soon as the table is half full 等表格满一半的时候

                           2.When an insertion fails 当插入失败时

                           3.When the table reaches a certain load factor当表达到一定的负载系数时

Note: Usually there should have been N/2 insertions before rehash, so O(N) rehash only adds a constant cost to each insertion. However, in an interactive system, the unfortunate user whose insertion caused a rehash could see a slowdown.通常在重哈希之前应该有N/2次插入,所以O(N)重哈希只会给每次插入增加一个常数的代价。然而,在一个交互系统中,不幸的用户的插入导致了重新hash,可能会看到减速。

相关推荐

  1. 数据结构-

    2024-03-27 11:08:02       43 阅读
  2. 数据结构——

    2024-03-27 11:08:02       8 阅读
  3. 数据结构-

    2024-03-27 11:08:02       10 阅读
  4. 数据结构-

    2024-03-27 11:08:02       8 阅读

最近更新

  1. TCP协议是安全的吗?

    2024-03-27 11:08:02       16 阅读
  2. 阿里云服务器执行yum,一直下载docker-ce-stable失败

    2024-03-27 11:08:02       16 阅读
  3. 【Python教程】压缩PDF文件大小

    2024-03-27 11:08:02       15 阅读
  4. 通过文章id递归查询所有评论(xml)

    2024-03-27 11:08:02       18 阅读

热门阅读

  1. 英语口语 3.27

    2024-03-27 11:08:02       18 阅读
  2. C++ Lists(链表)基本用法

    2024-03-27 11:08:02       17 阅读
  3. Github 2024-03-26 开源项目日报 Top10

    2024-03-27 11:08:02       16 阅读
  4. 检查文件是否为图片或者视频

    2024-03-27 11:08:02       17 阅读
  5. 智能媒体时代认知安全的关键资源

    2024-03-27 11:08:02       16 阅读
  6. [蓝桥杯 2015]机器人数目

    2024-03-27 11:08:02       17 阅读