哈希表原理与冲突解决详解

哈希的概念

哈希（Hash），又称散列，是一种将输入（键）通过哈希函数转换为固定长度输出（哈希值）的过程。这个输出通常是一个整数。哈希函数具有以下性质：

确定性：相同的输入必须产生相同的输出。
高效性：计算过程必须快速。
均匀性：理想情况下，不同的输入应均匀地映射到输出空间，减少冲突。

哈希最常见的应用是哈希表（Hash Table），它利用哈希值作为数组下标来存储数据，从而实现近乎常数时间的查找、插入和删除操作。

例如：待插入数据集合为：{1, 7, 6, 4, 5, 9}; 哈希函数为：hash(key) = key % capacity; capacity 为存储元素底层空间总的大小。

capacity = 10， hash(1)= 1%10=1 hash(7)=7%10=7 hash(6)= 6%10= 6 hash(4)=4%10=4 hash(5)= 5%10=5 hash(9)= 9%10 =9

用该方法进行搜索不必进行多次关键码的比较，因此搜索的速度比较快，但当我们向集合中插入元素 44 时，会出现什么问题？元素 44 与元素 4 通过该哈希函数得到的哈希值是相同的，此时出现了哈希冲突！

哈希表数据结构

哈希表是一种根据键直接访问内存位置的数据结构。它由两个主要部分组成：

桶数组（Bucket Array）：一块连续的内存空间，每个位置称为一个桶。
哈希函数：将键映射到桶的索引。

当我们插入一个键值对时，先计算键的哈希值，然后通过哈希函数得到桶的索引，将值存入该桶。查找时同样计算哈希值，直接定位到桶。

哈希函数

哈希函数的目标是让键均匀分布在桶中。常用的哈希函数有：

直接定址法

哈希函数为 H(key) = a*key + b，其中 a 和 b 为常数。适用于关键字分布基本连续的情况，可以避免冲突，但若关键字不连续，会造成空间浪费。通常 a=1，b=0 是最简单的形式，即 H(key)=key；

优点：简单、不会产生冲突（如果关键字不重复），查找效率最高 O(1)。
缺点：要求关键字集合中的值连续且分布范围不大。如果关键字不连续（如 1, 100, 1000），会导致大量空位，浪费存储空间。
适用：适用于关键字分布基本连续且范围较小的静态集合，如学号从 2023001~2023100 的学生记录。

除留余数法

哈希函数为 H(key) = key % p，其中 p 通常为小于或等于表长的质数或素数。这种方法简单，适用范围广，但可能产生冲突，需要处理冲突。需要选择合适的 p 以减少冲突。

优点：计算简单，适用范围广，关键字可以是整数、字符串（先转换为整数）等。
缺点：会产生冲突（不同关键字映射到同一地址），需要配合冲突解决策略（如链地址法、开放地址法）。
适用：绝大多数哈希表实现采用此方法（如 C++ unordered_set/map 的哈希函数底层会结合其他混合算法，但基本思想基于取模）。

#pragma once #include <iostream> #include <vector> // using namespace std; template<class K> struct HashFunc { size_t operator()(const K& key) { return size_t(key); } }; template<> struct HashFunc<string> { size_t operator()(const string& str) { // BK 第二哈希 (乘了一个 31 或 131 或 ) // 一种自定义的转换规则 size_t hash = 0; for (auto e : key) { hash *= 31; hash += e; } return hash; } }; enum State { // 每个哈希桶的三个状态 EMPTY, EXIST, DELETE }; template<class K, class V> struct HashNode { pair<K, V> _kv; State _state = EMPTY; }; template<class K, class V, class Func = HashFunc<K>> class HashTable { public: HashTable() { _tables.resize(10); } bool insert(const pair<K, V>& kv) { auto ret = find(kv.first); if (ret) return false; if (_n * 10 / _tables.size() >= 7) { // 负载因子超过了 0.7, 扩容处理 HashTable<K, V> newtables; newtables._tables.resize(_tables.size() * 2); for (int i = 0; i < _tables.size(); i++) { if (_tables[i]._state == EXIST) { newtables.insert(_tables[i]._kv); } } _tables.swap(newtables._tables); _n = newtables._n; } Func func; size_t i = func(kv.first) % _tables.size(); while (_tables[i]._state == EXIST) { // 向后找 ++i; i %= _tables.size(); } _tables[i]._kv = kv; _tables[i]._state = EXIST; ++_n; return true; } HashNode<K, V>* find(const K& key) { Func func; size_t target = func(key) % _tables.size(); while (_tables[target]._state != EMPTY) { if (_tables[target]._state != DELETE && _tables[target]._kv.first == key) { return &_tables[target]; } ++target; target %= _tables.size(); } return nullptr; } bool erase(const K& key) { auto ret = find(key); if (!ret) return false; else { ret->_state = DELETE; return true; } } private: std::vector<HashNode<K, V>> _tables; size_t _n = 0; // 数据个数 }; void Test() { HashTable<int, int> ht; int a[] = { 11, 21, 4, 14, 24, 15, 9 }; for (auto e : a) { ht.insert({ e, e }); } ht.insert({ 9, 9 }); auto ret = ht.find(4); if (ret) std::cout << ret->_kv.second << std::endl; ht.erase(4); std::cout << ht.find(4) << std::endl; }

#pragma once #include <iostream> #include <vector> using namespace std; template<class K, class V> struct HashNode { pair<K, V> _kv; HashNode* _next; HashNode(const pair<K, V>& kv) : _kv(kv), _next(nullptr) {} }; template<class K> struct HashFunc { size_t operator()(const K& key) { return size_t(key); } }; template<> struct HashFunc<string> { size_t operator()(const string& str) { size_t ret = 0; for (auto& e : str) { ret *= 31; ret += e; } return ret; } }; template<class K, class V, class Func = HashFunc<K>> class Hash { using Node = HashNode<K, V>; public: Hash() { _tables.resize(10, nullptr); } ~Hash() { for (int i = 0; i < _tables.size(); i++) { Node* cur = _tables[i]; while (cur) { Node* next = cur->_next; delete cur; cur = next; } } } bool insert(const pair<K, V>& kv) { Func func; if (find(kv.first)) return false; if (_n == _tables.size()) { // 扩容 vector<Node*> newtables; newtables.resize(_tables.size() * 2, nullptr); for (int i = 0; i < _tables.size(); i++) { Node* cur = _tables[i]; while (cur) { Node* next = cur->_next; int newindex = func(cur->_kv.first) % newtables.size(); cur->_next = newtables[newindex]; newtables[newindex] = cur; cur = next; } } _tables.swap(newtables); } // 在对应哈希值处插入 size_t index = func(kv.first) % _tables.size(); Node* node = new Node(kv); node->_next = _tables[index]; _tables[index] = node; _n++; return true; } HashNode<K, V>* find(const K& key) { Func func; // 查找 hash 值 size_t index = func(key) % _tables.size(); Node* cur = _tables[index]; while (cur) { if (cur->_kv.first == key) { return cur; } cur = cur->_next; } return nullptr; } bool erase(const K& key) { Func func; Node* ret = find(key); if (!ret) return false; size_t index = func(key) % _tables.size(); Node* cur = _tables[index]; Node* prev = nullptr; while (cur) { if (cur->_kv.first == key) { if (prev == nullptr) { _tables[index] = cur->_next; } else { prev->_next = cur->_next; } delete cur; cur = nullptr; --_n; return true; } prev = cur; cur = cur->_next; } return false; } void Print() { for (size_t i = 0; i < _tables.size(); i++) { Node* cur = _tables[i]; while (cur) { cout << cur->_kv.first; if (cur->_next) cout << " -> "; cur = cur->_next; } cout << endl; } } private: vector<Node*> _tables; size_t _n; // 数据的个数 }; void Test() { Hash<int, int> ht; int a[] = { 11, 21, 4, 14, 24, 15, 9, 19, 29, 39 }; for (auto e : a) { ht.insert({ e, e }); } ht.Print(); cout << "--------------------------------------------------------------------------------------" << endl; ht.insert({ 6, 6 }); ht.Print(); cout << "--------------------------------------------------------------------------------------" << endl; ht.erase(4); ht.Print(); cout << "--------------------------------------------------------------------------------------" << endl; }

哈希表原理与冲突解决详解

哈希的概念

哈希表数据结构

哈希函数

直接定址法

除留余数法

更多推荐文章

相关免费在线工具

哈希冲突及解决方法

闭散列 (开放定址法)

开散列 (哈希桶 / 拉链法)

更多推荐文章

相关免费在线工具

哈希表原理与冲突解决详解

哈希的概念

哈希表数据结构

哈希函数

直接定址法

除留余数法

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

哈希冲突及解决方法

闭散列 (开放定址法)

开散列 (哈希桶 / 拉链法)

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具