C++ 搜索引擎核心模块：Searcher 设计与代码解析

C++ 搜索引擎核心模块 Searcher 基于单例模式管理正倒排索引，负责处理用户查询并返回结果。流程涵盖分词、触发检索、合并排序及 JSON 序列化。利用哈希表对文档 ID 去重，累加权重以提升相关性，并通过摘要函数提取关键词上下文。最终输出结构化数据供前端展示，实现从关键词输入到清晰搜索结果的高效匹配。

CloudNative发布于 2026/3/28更新于 2026/6/1224 浏览

Searcher 类作为上层封装，负责调用底层索引文件并实现搜索功能。其核心职责是处理用户搜索词，并根据处理结果返回对应的网页信息。

1. 单例模式

这里采用单例模式实例化 Searcher，同时建立正倒排索引。

private: ns_index::Index* index;
public: Searcher(){}
~Searcher(){}
public: void InitSearcher(const std::string& input) {
    // 1 创建（获取）一个 index 对象
    // 在这里我们用的是单例模式
    index=ns_index::Index::Getinstance();
    // 2 根据对象建立索引
    index->BuildIndex(input);
    LOG1(NORMAL,"建立索引成功...");
}

2. Search 流程

该函数的主要流程包括分词、触发检索、合并排序和构建 JSON 结果四个步骤。

2.1 分词

先创建一个 words 数组，使用 CutString 函数把用户提供的关键字分词并交给 words。

2.2 触发

获取单例模式中的倒排索引，通过 index->GetInvertedList(w) 获取关键词 w 对应的倒排索引列表，然后存入 tokens_map。由于 tokens_map 是哈希结构，会自动实现去重。to_lower 用于实现小写化，避免区分大小写导致匹配失败。

在遍历倒排列表时，元素引用直接操作 tokens_map，确保效率。

2.3 合并

将处理好的倒排列表统一存入 inverted_list_all。之所以使用 vector 是因为访问更方便，接着根据自身的权重进行从大到小排序。

2.4 构建 JSON

这一步即序列化，将内容转变为标准的、线性的、可存储或可传输的格式。代码中通过 Json 库实现序列化，将单个序列化的结果交给总的 root。

Json::StyledWriter writer;
// ... root.append(elem);
*json_string=writer.write(root); // 完成序列化

注意：在这份代码里一会获取正排索引，一会获取倒排索引，但它们都与 inverted_list_all 有关。inverted_list_all 的本质是对多个倒排列表进行'去重、合并、排序'后的候选文档集合，是连接'索引查询'和'结果返回'的中间数据结构。

Searcher 类作为上层封装，负责调用底层索引文件并实现搜索功能。其核心职责是处理用户搜索词，并根据处理结果返回对应的网页信息。

1. 单例模式

这里采用单例模式实例化 Searcher，同时建立正倒排索引。

private: ns_index::Index* index;
public: Searcher(){}
~Searcher(){}
public: void InitSearcher(const std::string& input) {
    // 1 创建（获取）一个 index 对象
    // 在这里我们用的是单例模式
    index=ns_index::Index::Getinstance();
    // 2 根据对象建立索引
    index->BuildIndex(input);
    LOG1(NORMAL,"建立索引成功...");
}

2. Search 流程

该函数的主要流程包括分词、触发检索、合并排序和构建 JSON 结果四个步骤。

2.1 分词

先创建一个 words 数组，使用 CutString 函数把用户提供的关键字分词并交给 words。

2.2 触发

在遍历倒排列表时，元素引用直接操作 tokens_map，确保效率。

2.3 合并

将处理好的倒排列表统一存入 inverted_list_all。之所以使用 vector 是因为访问更方便，接着根据自身的权重进行从大到小排序。

2.4 构建 JSON

这一步即序列化，将内容转变为标准的、线性的、可存储或可传输的格式。代码中通过 Json 库实现序列化，将单个序列化的结果交给总的 root。

Json::StyledWriter writer;
// ... root.append(elem);
*json_string=writer.write(root); // 完成序列化

注意：在这份代码里一会获取正排索引，一会获取倒排索引，但它们都与 inverted_list_all 有关。inverted_list_all 的本质是对多个倒排列表进行'去重、合并、排序'后的候选文档集合，是连接'索引查询'和'结果返回'的中间数据结构。

#pragma once #include"index.hpp" #include"usuallytool.hpp" #include<algorithm> #include<jsoncpp/json/json.h> #include"log.hpp" namespace ns_searcher{ struct InvertedElemPrint{ uint64_t doc_id; int weight; std::vector<std::string> words; InvertedElemPrint(): doc_id(0), weight(0) {} }; class Searcher{ private: ns_index::Index* index; public: Searcher(){}; ~Searcher(){} public: void InitSearcher(const std::string& input) { index=ns_index::Index::Getinstance(); index->BuildIndex(input); LOG1(NORMAL,"建立索引成功..."); } void Search(const std::string& query,std::string* json_string) { std::vector<std::string>words; ns_util::JiebaUtil::CutString(query,&words); std::vector<InvertedElemPrint> inverted_list_all; std::unordered_map<uint64_t,InvertedElemPrint> tokens_map; for(std::string w:words) { boost::to_lower(w); ns_index::InvertedList* inverted_list=index->GetInvertedList(w); if(inverted_list==nullptr) continue; for(const auto &elem : *inverted_list){ auto &item = tokens_map[elem.doc_id]; item.doc_id = elem.doc_id; item.weight += elem.weight; item.words.push_back(elem.word); } } for(const auto &item : tokens_map) inverted_list_all.push_back(std::move(item.second)); std::sort(inverted_list_all.begin(), inverted_list_all.end(), [](const InvertedElemPrint &e1, const InvertedElemPrint &e2){ return e1.weight > e1.weight; }); Json::Value root; for(auto& item:inverted_list_all) { ns_index::DocInfo* doc=index->GetForwardIndex(item.doc_id); if(doc==nullptr) continue; Json::Value elem; elem["title"]=doc->title; elem["desc"]=GetDesc(doc->content,item.words[0]); elem["url"]=doc->url; root.append(elem); } Json::StyledWriter writer; *json_string=writer.write(root); } std::string GetDesc(const std::string& html_content,const std::string& word) { int prev_step=50; int next_step=100; auto iter=std::search(html_content.begin(),html_content.end(),word.begin(),word.end(),[](int x,int y){ return (std::tolower(x)==std::tolower(y)); }); if(iter==html_content.end()) return "None1"; int pos=std::distance(html_content.begin(),iter); if(pos==std::string::npos) return "None1"; int start=0; int end=html_content.size()-1; if(pos-prev_step>start) start=pos-prev_step; if(pos+next_step<end) end=pos+next_step; if(start>=end) return "None2"; std::string desc=html_content.substr(start,end-start); desc+="..."; return desc; } }; }

C++ 搜索引擎核心模块：Searcher 设计与代码解析

1. 单例模式

2. Search 流程

2.1 分词

2.2 触发

2.3 合并

2.4 构建 JSON

C++ 搜索引擎核心模块：Searcher 设计与代码解析

1. 单例模式

2. Search 流程

2.1 分词

2.2 触发

2.3 合并

2.4 构建 JSON

更多推荐文章

相关免费在线工具

3. GetDesc 摘要生成

4. InvertedElemPrint 结构体

5. 完整代码实现

更多推荐文章

相关免费在线工具

C++ 搜索引擎核心模块：Searcher 设计与代码解析

1. 单例模式

2. Search 流程

2.1 分词

2.2 触发

2.3 合并

2.4 构建 JSON

C++ 搜索引擎核心模块：Searcher 设计与代码解析

1. 单例模式

2. Search 流程

2.1 分词

2.2 触发

2.3 合并

2.4 构建 JSON

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

3. GetDesc 摘要生成

4. InvertedElemPrint 结构体

5. 完整代码实现

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具