C++ 进阶：哈希表原理与实现 | 极客日志

C++算法

C++ 进阶：哈希表原理与实现

C++ 哈希表核心原理涵盖哈希函数设计、负载因子控制及冲突解决策略。文章详解直接定址、除法散列等常见哈希算法，分析线性探测、二次探测、双重散列等开放定址法，以及链地址法的结构与实现。重点阐述哈希表扩容机制、删除操作的状态标记处理，并提供基于模板的完整代码示例，包括仿函数特化与质数表优化，适用于 unordered_set/map 底层理解。

Kubernet发布于 2026/3/22更新于 2026/6/1523 浏览

概念介绍

1. 什么是哈希？

哈希（Hash），也称为散列：是一种将任意长度的输入数据（通常称为'键'或'关键字'）通过特定的数学算法（称为'哈希函数'）映射为固定长度输出的技术。这个输出值被称为'哈希值'、'散列值'或'哈希码'。哈希的核心目的是快速实现数据的查找、存储和比较，广泛应用于哈希表、密码学、数据校验等领域。

核心术语

一、哈希函数

哈希函数（Hash Function）：是哈希表（Hash Table）的核心组成部分，它的作用是将任意长度的输入数据（称为'键'或'关键字'）映射到一个固定长度的输出值（称为'哈希值'或'散列值'）。这个输出值通常用于确定该键在哈希表中的存储位置。

1. 哈希函数的核心特点是什么？

哈希函数的核心特点：

确定性：同一输入必须始终映射到同一个哈希值。例如：输入字符串"apple"每次通过哈希函数计算，结果都应相同。
压缩性：无论输入数据的长度如何，输出的哈希值长度是固定的。例如：常用的 MD5 哈希函数会将任意输入映射为 128 位的哈希值，而哈希表中常用的哈希函数可能将键映射为 0~n-1（n 为哈希表长度）的整数。
高效性：计算哈希值的过程应快速且易于实现，时间复杂度通常为 O(1) 或 O(k)（k 为输入数据的长度），避免成为哈希表操作的性能瓶颈。

2. 哈希函数的设计目标是什么？

哈希函数的设计目标：

均匀分布：理想情况下，哈希函数应将不同的键均匀地映射到哈希表的各个位置，避免大量键集中在少数位置（称为'哈希冲突'）。均匀分布能保证哈希表的操作（插入、查找、删除）效率接近 O(1)。
减少冲突：由于输入空间（可能的键）远大于输出空间（哈希表长度），哈希冲突无法完全避免，但好的哈希函数能最大限度降低冲突概率。

3. 常见的哈希函数有哪些？

直接定址法

直接定址法：通过直接利用关键字本身或关键字的某个线性函数来确定哈希地址，从而实现关键字到存储位置的映射。直接定址法是一种简单直观的哈希函数构造方法。

核心公式和基本原理： 直接定址法的哈希函数公式通常为：

H(key) = key

或

H(key) = a × key + b

key：是待映射的关键字。（需要存储的数据的标识）
a 和 b：是常数。（a ≠ 0，用于对关键字进行线性变换）
H(key)：是计算得到的哈希地址。（即：数据在哈希表中的存储位置）

优缺点与适用场景：

优点：简单高效；无冲突（只要关键字不重复，计算出的哈希地址一定唯一）。
缺点：空间浪费大（如果关键字的范围很大，哈希表需要开辟对应范围的空间，但实际存储的关键字可能很少）；关键字需为整数。
场景：关键字的范围较小且连续（或分布集中）。

除法散列法

除法散列法：核心逻辑是用关键字对一个整数取余，把大范围的关键字映射到哈希表的有效下标区间，以此确定存储位置。除法散列法是哈希函数构造方法里的经典手段。

核心公式与基本原理： 除法散列法的哈希函数一般形式为：

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
Base64 字符串编码/解码
将字符串编码和解码为其 Base64 格式表示形式即可。在线工具，Base64 字符串编码/解码在线工具，online
Base64 文件转换器
将字符串、文件或图像转换为其 Base64 表示形式。在线工具，Base64 文件转换器在线工具，online
Markdown转HTML
将 Markdown（GFM）转为 HTML 片段，浏览器内 marked 解析；与 HTML转Markdown 互为补充。在线工具，Markdown转HTML在线工具，online
HTML转Markdown
将 HTML 片段转为 GitHub Flavored Markdown，支持标题、列表、链接、代码块与表格等；浏览器内处理，可链接预填。在线工具，HTML转Markdown在线工具，online

H(key) = key % m

h(key) = floor(m * (key * A mod 1))

|{h ∈ H : h(x) = h(y)}| / |H| ≤ 1/m

/*------------------任务：定义哈希表函数的'结构体模板'------------------*/
template<class K>
struct HashFunc {
    // 1. 重载 () 运算符 ---> 作用：将 K 类型转化为 size_t 类型，用于计算哈希值
    size_t operator()(const K& key) {
        return (size_t)key; // 注意：默认为直接转换，适用于 int、long 等整数类型
    }
};

/*------------------任务：定义哈希函数的'模板特化'------------------*/
template<>
struct HashFunc<string> {
    // 1. 实现：'() 运算符的重载' ---> 作用：将 string 类型的变量转化为哈希值
    size_t operator()(const string& s) {
        // 1. 定义 size_t 类型变量记录 string 类型的变量计算的哈希值
        size_t hash = 0;
        // 2. 使用范围 for 循环遍历字符串并用 BKDR 算法计算其哈希值
        for (auto it : s) {
            // 2.1：先将字符的 ASCII 值累加到哈希值中
            hash += it;
            // 2.2：再让哈希值乘以质数 131（BKDR 哈希算法认为：131 可有效减少冲突）
            hash *= 131;
        }
        // 3. 返回最终计算的哈希值
        return hash;
    }
};

/*------------任务：定义哈希表中节点的三种状态的'枚举'------------*/
enum State {
    EXIST,   // 存在状态
    EMPTY,   // 空状态
    DELETE   // 删除状态
};

/*------------任务：定义哈希表存储的数据结构的'结构体模板'------------*/
template<class K, class V>
struct HashData {
    // 1. 存储键值对类型的数据
    // 2. 记录存储的节点的状态
    pair<K, V> _kv;
    State _state = EMPTY; // 节点的状态默认为空
};

/*------------任务：使用'开放地址法 - 线性探测'实现哈希表------------*/
template<class K, class V, class Hash = HashFunc<K>>
class HashTable {
private:
    /*------------------成员变量------------------*/
    // 1. 存储 HashData 类型数据的数组
    // 2. 记录哈希表中有效元素的变量
    vector<HashData<K, V>> _tables;
    size_t _n;
public:
    // …………
};

/*------------------任务：实现'获取下一个 >=n 的质数的函数'---> '用于哈希表扩容'------------------*/
inline unsigned long _stl_next_prime(unsigned long n) {
    // 1. 指定素数表的大小
    static const int __stl_num_primes = 28;
    // 2. 定义素数表覆盖常见哈希表大小
    static const unsigned long _stl_prime_list[__stl_num_primes] = {
        53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157,
        98317, 196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917,
        25165843, 50331653, 100663319, 201326611, 402653189, 805306457,
        1610612741, 3221225473, 4294967291
    };
    // 3. 使用二分查找找到第一个 >=n 的素数
    // 3.1：使用一个指针指向素数表中的'第一个素数'
    const unsigned long* first = _stl_prime_list;
    // 3.1：使用一个指针指向素数表中的'最后一素数的下一位置'
    const unsigned long* last = _stl_prime_list + __stl_num_primes;
    // 3.3：使用 lower_bound() 接口函数求出第一个 >=n 的素数
    const unsigned long* pos = lower_bound(first, last, n);
    // 3.4：适合作为哈希表容量的质数
    return pos == last ? *(last - 1) : *pos;
    /* * 说明遍历完质数表，所有预定义的质数都比 n 小
     * 此时返回最大的质数 *(last - 1)，因为 last 是数组末尾的下一个位置，last - 1 指向最后一个有效质数 */
}

/*------------------任务：定义'哈希表节点的结构体模板'------------------*/
template<class K, class V>
struct HashNode {
    /*------------------成员变量------------------*/
    // 1. 存储的键值对
    // 2. 下一个节点的指针
    pair<K, V> _kv;
    HashNode<K, V>* _next;

    /*------------------成员函数------------------*/
    // 1. 实现：哈希桶节点的'构造函数'
    HashNode(const pair<K, V>& kv) : _kv(kv), _next(nullptr) {}
};

/*------------------任务：定义'哈希表的类模板'------------------*/
template<class K, class V, class Hash = HashFunc<K>>
class HashTable {
private:
    /*------------------成员变量------------------*/
    // 1. 存储 Node* 类型数据的数组
    // 2. 记录哈希表中有效元素的变量
    vector<HashNode<K, V>*> _tables;
    size_t _n;
public:
    // ...
};

#pragma once
// 包含需要使用的头文件
#include <iostream>
#include <vector>
using namespace std;

/*------------------任务：定义哈希表函数的'通用类模板'------------------*/
template<class K>
struct HashFunc {
    // 1. 重载 () 运算符 ---> 作用：将 K 类型转化为 size_t 类型，用于计算哈希值
    size_t operator()(const K& key) {
        return (size_t)key; // 注意：默认为直接转换，适用于 int、long 等整数类型
    }
};

/*------------------任务：定义哈希函数的'模板特化'------------------*/
template<>
struct HashFunc<string> {
    // 1. 实现：'() 运算符的重载' ---> 作用：将 string 类型的变量转化为哈希值
    size_t operator()(const string& s) {
        // 1. 定义 size_t 类型变量记录 string 类型的变量计算的哈希值
        size_t hash = 0;
        // 2. 使用范围 for 循环遍历字符串并用 BKDR 算法计算其哈希值
        for (auto it : s) {
            // 2.1：先将字符的 ASCII 值累加到哈希值中
            hash += it;
            // 2.2：再让哈希值乘以质数 131（BKDR 哈希算法认为：131 可有效减少冲突）
            hash *= 131;
        }
        // 3. 返回最终计算的哈希值
        return hash;
    }
};

/*------------------任务：实现'获取下一个 >=n 的质数的函数'---> '用于哈希表扩容'------------------*/
inline unsigned long _stl_next_prime(unsigned long n) {
    static const int __stl_num_primes = 28;
    static const unsigned long _stl_prime_list[__stl_num_primes] = {
        53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593, 49157,
        98317, 196613, 393241, 786433, 1572869, 3145739, 6291469, 12582917,
        25165843, 50331653, 100663319, 201326611, 402653189, 805306457,
        1610612741, 3221225473, 4294967291
    };
    const unsigned long* first = _stl_prime_list;
    const unsigned long* last = _stl_prime_list + __stl_num_primes;
    const unsigned long* pos = lower_bound(first, last, n);
    return pos == last ? *(last - 1) : *pos;
}

#pragma once
#include "HashTable.h"
namespace open_address {
    /*------------任务：定义哈希表中节点的三种状态的'枚举'------------*/
    enum State {
        EXIST,   // 存在状态
        EMPTY,   // 空状态
        DELETE   // 删除状态
    };

    /*------------任务：定义哈希表存储的数据结构的'结构体模板'------------*/
    template<class K, class V>
    struct HashData {
        pair<K, V> _kv;
        State _state = EMPTY;
    };

    /*------------任务：使用'开放地址法 - 线性探测'实现哈希表------------*/
    template<class K, class V, class Hash = HashFunc<K>>
    class HashTable {
    private:
        vector<HashData<K, V>> _tables;
        size_t _n;
    public:
        HashTable() : _tables(_stl_next_prime(0)), _n(0) {}

        HashData<K, V>* Find(const K& key) {
            Hash hash;
            size_t hash_0 = hash(key) % _tables.size();
            size_t hash_i = hash_0;
            size_t i = 1;
            while (_tables[hash_i]._state != EMPTY) {
                if (_tables[hash_i]._state == EXIST && _tables[hash_i]._kv.first == key) {
                    return &_tables[hash_i];
                }
                hash_i = (hash_0 + i) % _tables.size();
                ++i;
            }
            return nullptr;
        }

        bool Erase(const K& key) {
            HashData<K, V>* ret = Find(key);
            if (ret) {
                ret->_state = DELETE;
                --_n;
                return true;
            }
            return false;
        }

        bool Insert(const pair<K, V>& kv) {
            if (Find(kv.first)) {
                return false;
            }
            if (_n * 10 / _tables.size() >= 7) {
                HashTable<K, V, Hash> newHt;
                newHt._tables.resize(_stl_next_prime(_tables.size() + 1));
                for (auto& htData : _tables) {
                    if (htData._state == EXIST) {
                        newHt.Insert(htData._kv);
                    }
                }
                _tables.swap(newHt._tables);
            }
            Hash hashFunc;
            size_t hash_0 = hashFunc(kv.first) % _tables.size();
            size_t hash_i = hash_0;
            size_t i = 1;
            while (_tables[hash_i]._state == EXIST) {
                hash_i = (hash_0 + i) % _tables.size();
                ++i;
            }
            _tables[hash_i]._kv = kv;
            _tables[hash_i]._state = EXIST;
            ++_n;
            return true;
        }
    };
}

#pragma once
#include "HashTable.h"
namespace hash_bucket {
    template<class K, class V>
    struct HashNode {
        pair<K, V> _kv;
        HashNode<K, V>* _next;
        HashNode(const pair<K, V>& kv) : _kv(kv), _next(nullptr) {}
    };

    template<class K, class V, class Hash = HashFunc<K>>
    class HashTable {
    private:
        vector<HashNode<K, V>*> _tables;
        size_t _n;
        typedef HashNode<K, V> Node;
    public:
        HashTable() : _tables(_stl_next_prime(0)), _n(0) {}

        ~HashTable() {
            for (size_t i = 0; i < _tables.size(); ++i) {
                Node* current = _tables[i];
                while (current) {
                    Node* next = current->_next;
                    delete current;
                    current = next;
                }
                _tables[i] = nullptr;
            }
        }

        Node* Find(const K& key) {
            Hash hashFunc;
            size_t hash_i = hashFunc(key) % _tables.size();
            Node* current = _tables[hash_i];
            while (current) {
                if (current->_kv.first == key) {
                    return current;
                }
                current = current->_next;
            }
            return nullptr;
        }

        bool Erase(const K& key) {
            Hash hashFunc;
            size_t hash_i = hashFunc(key) % _tables.size();
            Node* curr = _tables[hash_i];
            Node* prev = nullptr;
            while (curr) {
                if (curr->_kv.first == key) {
                    if (prev == nullptr) {
                        _tables[hash_i] = curr->_next;
                    } else {
                        prev->_next = curr->_next;
                    }
                    delete curr;
                    --_n;
                    return true;
                }
                prev = curr;
                curr = curr->_next;
            }
            return false;
        }

        bool Insert(const pair<K, V>& kv) {
            if (Find(kv.first)) {
                return false;
            }
            if (_n == _tables.size()) {
                vector<Node*> newVector(_tables.size() * 2);
                for (size_t i = 0; i < _tables.size(); i++) {
                    Node* current = _tables[i];
                    while (current) {
                        Node* next = current->_next;
                        Hash hashFunc;
                        size_t hash_i = hashFunc(current->_kv.first) % newVector.size();
                        current->_next = newVector[hash_i];
                        newVector[hash_i] = current;
                        current = next;
                    }
                    _tables[i] = nullptr;
                }
                _tables.swap(newVector);
            }
            Node* newNode = new Node(kv);
            Hash hashFunc;
            size_t hash_i = hashFunc(kv.first) % _tables.size();
            newNode->_next = _tables[hash_i];
            _tables[hash_i] = newNode;
            ++_n;
            return true;
        }
    };
}

#include "HashTable.h"
#include "open_address.h"
#include "hash_bucket.h"
#include <string>
#include <iostream>
using namespace std;

void printTestResult(const string& testName, bool result) {
    cout << (result ? "[PASS] " : "[FAIL] ") << testName << endl;
}

void test_open_address() {
    cout << "\n===== 测试开放寻址法哈希表 =====" << endl;
    open_address::HashTable<int, string> ht;
    cout << "创建哈希表成功" << endl;

    bool insert1 = ht.Insert({1, "A"});
    printTestResult("插入键 1 值 A", insert1);
    bool insert2 = ht.Insert({1, "B"});
    printTestResult("插入重复键 1 值 B（期望失败）", !insert2);
    bool insert3 = ht.Insert({2, "C"});
    printTestResult("插入键 2 值 C", insert3);

    auto node1 = ht.Find(1);
    printTestResult("查找键 1", node1 != nullptr && node1->_kv.second == "A");
    auto node2 = ht.Find(2);
    printTestResult("查找键 2", node2 != nullptr && node2->_kv.second == "C");
    auto node3 = ht.Find(3);
    printTestResult("查找不存在的键 3", node3 == nullptr);

    bool erase1 = ht.Erase(1);
    printTestResult("删除键 1", erase1);
    bool erase2 = ht.Erase(1);
    printTestResult("重复删除键 1（期望失败）", !erase2);
    bool erase3 = ht.Erase(3);
    printTestResult("删除不存在的键 3", !erase3);

    cout << "\n--- 扩容测试 ---" << endl;
    cout << "开始插入大量数据以触发扩容..." << endl;
    for (int i = 3; i < 100; ++i) {
        ht.Insert({i, to_string(i)});
    }
    cout << "插入完成，验证数据访问..." << endl;
    auto node99 = ht.Find(99);
    printTestResult("查找扩容后的键 99", node99 != nullptr && node99->_kv.second == "99");
    cout << "开放寻址法哈希表测试完毕" << endl;
}

void test_hash_bucket() {
    cout << "\n===== 测试链地址法哈希表 =====" << endl;
    hash_bucket::HashTable<string, int> ht;
    cout << "创建哈希表成功" << endl;

    bool insert1 = ht.Insert({"apple", 5});
    printTestResult("插入键 apple 值 5", insert1);
    bool insert2 = ht.Insert({"apple", 10});
    printTestResult("插入重复键 apple 值 10（期望失败）", !insert2);
    bool insert3 = ht.Insert({"banana", 8});
    printTestResult("插入键 banana 值 8", insert3);

    auto node1 = ht.Find("apple");
    printTestResult("查找键 apple", node1 != nullptr && node1->_kv.second == 5);
    auto node2 = ht.Find("banana");
    printTestResult("查找键 banana", node2 != nullptr && node2->_kv.second == 8);
    auto node3 = ht.Find("orange");
    printTestResult("查找不存在的键 orange", node3 == nullptr);

    bool erase1 = ht.Erase("apple");
    printTestResult("删除键 apple", erase1);
    bool erase2 = ht.Erase("apple");
    printTestResult("重复删除键 apple（期望失败）", !erase2);
    bool erase3 = ht.Erase("orange");
    printTestResult("删除不存在的键 orange", !erase3);

    cout << "\n--- 扩容测试 ---" << endl;
    cout << "开始插入大量数据以触发扩容..." << endl;
    for (int i = 0; i < 100; ++i) {
        string key = "key_" + to_string(i);
        ht.Insert({key, i});
    }
    cout << "插入完成，验证数据访问..." << endl;
    auto node = ht.Find("key_99");
    printTestResult("查找扩容后的键 key_99", node != nullptr && node->_kv.second == 99);
    cout << "链地址法哈希表测试完毕" << endl;
}

struct Date {
    int _year;
    int _month;
    int _day;
    Date(int year = 1, int month = 1, int day = 1) : _year(year), _month(month), _day(day) {}
    bool operator==(const Date& d) const {
        return _year == d._year && _month == d._month && _day == d._day;
    }
};

struct DateHashFunc {
    size_t operator()(const Date& d) {
        size_t hash = 0;
        hash += d._year; hash *= 131;
        hash += d._month; hash *= 131;
        hash += d._day; hash *= 131;
        return hash;
    }
};

void test01() {
    hash_bucket::HashTable<string, string> ht1;
    const char* a1[] = {"abcd", "sort", "insert"};
    for (auto& it : a1) {
        ht1.Insert({it, it});
    }

    hash_bucket::HashTable<int, int> ht2;
    const int a2[] = {-19, -30, 5, 36, 13, 20, 21, 12};
    for (auto& it : a2) {
        ht2.Insert({it, it});
    }

    hash_bucket::HashTable<Date, int, DateHashFunc> ht3;
    ht3.Insert({{2025, 6, 29}, 1});
    ht3.Insert({{2025, 6, 30}, 1});
}

int main() {
    test_open_address();
    test_hash_bucket();
    test01();
    return 0;
}

C++ 进阶：哈希表原理与实现

概念介绍

1. 什么是哈希？

核心术语

一、哈希函数

1. 哈希函数的核心特点是什么？

2. 哈希函数的设计目标是什么？

3. 常见的哈希函数有哪些？

直接定址法

除法散列法

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

乘法散列法

全域散列法

二、负载因子

1. 什么是负载因子？

2. 负载因子对哈希表的性能有什么影响？

3. 负载因子超过阈值时会发什么？

三、哈希冲突

四、冲突处理

方法一：开放定址法

线性探测

二次探测

双重散列

方法二：链地址法

基本操作

怎么解决键 key 不能取模的问题？

一、开放定址法

哈希结构

删除操作

扩容操作

二、链地址法

哈希结构

扩容操作

代码实现

头文件

哈希表：HashTable.h

开放定址法：open_address.h

链地址法：hash_bucket.h

测试文件：Test.cpp

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具