字符串处理总崩？那是你没解锁 string 的 “防坑 Buff”，C++er 必看

Ne0inhk

15 Mar 2026 — 25 min read

✨ 孤廖：个人主页

🎯 个人专栏：《C++：从代码到机器》

🎯 个人专栏：《Linux系统探幽：从入门到内核》

🎯 个人专栏：《算法磨剑：用C++思考的艺术》

折而不挠，中不为下

文章目录

正文
结语：

正文

1. 为什么学习string类？

1.1 C语言中的字符串

C语言中，字符串是以’\0’结尾的一些字符的集合，为了操作方便，C标准库中提供了一些str系列的库函数，但是这些库函数与字符串是分离开的，不太符合OOP的思想，而且底层空间需要用户
自己管理，稍不留神可能还会越界访问。

在OJ中，有关字符串的题目基本以string类的形式出现，而且在常规工作中，为了简单、方便、快捷，基本都使用string类，很少有人去使用C库中的字符串操作函数

2. 标准库中的string类

2.1 string类(了解)

string类的文档介绍
在使用string类时，必须包含#include头文件以及using namespace std;

2.2 auto和范围for

auto关键字

在这里补充2个C++11的小语法，方便我们后面的学习。在早期C/C++中auto的含义是：使用auto修饰的变量，是具有自动存储器的局部变量，后来这个不重要了。C++11中，标准委员会变废为宝赋予了auto全新的含义即：auto不再是一个存储类型指示符，而是作为一个新的类型指示符来指示编译器，auto声明的变量必须由编译器在编译时期推导而得。*用auto声明指针类型时，用auto和auto*没有任何区别，但用auto声明引用类型时则必须加&当在同一行声明多个变量时，这些变量必须是相同的类型，否则编译器将会报错，因为编译器实际只对第一个类型进行推导，然后用推导出来的类型定义其他变量auto不能作为函数的参数，可以做返回值，但是建议谨慎使用auto不能直接用来声明数组

代码解释：

#define _CRT_SECURE_NO_WARNINGS #include<iostream> using namespace std; int func1() { return 10; } // 不能做参数 void func2(auto a) {} // 可以做返回值，但是建议谨慎使用 auto func3() { return 3; } int main() { int a = 10; auto b = a; auto c = 'a'; auto d = func1(); // 编译报错:rror C3531: “e”: 类型包含“auto”的符号必须具有初始值设定项 auto e; cout << typeid(b).name() << endl; cout << typeid(c).name() << endl; cout << typeid(d).name() << endl; int x = 10; auto y = &x; auto* z = &x; auto& m = x; cout << typeid(x).name() << endl; cout << typeid(y).name() << endl; cout << typeid(z).name() << endl; auto aa = 1, bb = 2; // 编译报错：error C3538: 在声明符列表中，“auto”必须始终推导为同一类型 auto cc = 3, dd = 4.0; // 编译报错：error C3318: “auto []”: 数组不能具有其中包含“auto”的元素类型 auto array[] = { 4, 5, 6 }; return 0; }

#include<iostream> #include <string> #include <map> using namespace std; int main() { std::map<std::string, std::string> dict = { { "apple", "苹果" },{ "orange", "橙子" }, {"pear","梨"} }; // auto的用武之地 //std::map<std::string, std::string>::iterator it = dict.begin(); auto it = dict.begin(); while (it != dict.end()) { cout << it->first << ":" << it->second << endl; ++it; } return 0; }

范围for

对于一个有范围的集合而言，由程序员来说明循环的范围是多余的，有时候还会容易犯错误。因此C++11中引入了基于范围的for循环。for循环后的括号由冒号“ ：”分为两部分：第一部分是范围内用于迭代的变量，第二部分则表示被迭代的范围，自动迭代，自动取数据，自动判断结束。本质还是迭代器范围for可以作用到数组和容器对象上进行遍历范围for的底层很简单，容器遍历实际就是替换为迭代器，这个从汇编层也可以看到。

代码解释：

#include<iostream> #include <string> #include <map> using namespace std; int main() { int array[] = { 1, 2, 3, 4, 5 }; // C++98的遍历 for (int i = 0; i < sizeof(array) / sizeof(array[0]); ++i) { array[i] *= 2; } for (int i = 0; i < sizeof(array) / sizeof(array[0]); ++i) { cout << array[i] << endl; } // C++11的遍历 for (auto& e : array) e *= 2; for (auto e : array) cout << e << " " << endl; string str("hello world"); for (auto ch : str) { cout << ch << " "; } cout << endl; return 0; }

2.3 string类的常用接口说明（注意下面我只讲解最常用的接口）

1. string类对象的常见构造

 #include <iostream> #include <string> using namespace std; void test1() { string s1;//默认构造 为空字符串 string s2("hello,world");//c-str 构造string string s3(3, 'a');//string 中有n个char string s4 = s2;//string s4(s2) 本质是拷贝构造 cout << s1 << endl; cout << s2 << endl; cout << s3 << endl; cout << s4 << endl; } int main() { test1(); return 0; }

2. string类对象的容量操作

函数名称	功能说明
`size`（重点）	返回字符串有效字符长度
`length`	返回字符串有效字符长度
`capacity`	返回空间总大小
`empty`（重点）	检测字符串是否为空串，是返回 `true`，否则返回 `false`
`clear`（重点）	清空有效字符
`reserve`（重点）	为字符串预留空间
`resize`（重点）	将有效字符的个数改成 `n` 个，多出的空间用字符 `c` 填充

代码解释:

#include <iostream> #include <string> using namespace std; int main() { string s1 = "hello,world!"; cout << s1.size() << endl; cout << s1.length() << endl; cout << s1.empty() << endl; cout << "s1:" << " " << s1 << endl; cout << "clear前的实际空间个数" << endl; cout << s1.capacity() << endl; cout << "clear" << endl; s1.clear();//不会回收空间 只清理内容 cout << "clear 后的空间个数" << endl; cout << s1.capacity() << endl; cout << s1.empty() << endl; cout << "reverse后的实际空间个数" << endl; s1.reserve(20); cout << s1.capacity() << endl; cout << s1.size() << endl; cout << "resize" << endl; s1.resize(10, 'a'); cout << s1 << endl; return 0; }

注意:size()与length()方法底层实现原理完全相同，引入size()的原因是为了与其他容器的接口保持一致，一般情况下基本都是用size()早期编写string时实现的lenth 后面stl参考string模板类的实现时
stl 中的其他容器用的都是size() 为了统一 string 也补充了size()clear()只是将string中有效字符清空，不改变底层空间大小。resize(size_t n) 与 resize(size_t n, char c)都是将字符串中有效字符个数改变到n个，不同的是当字符个数增多时：resize(n)用0来填充多出的元素空间，resize(size_t n, charc)用字符c来填充多出的元素空间。注意：resize在改变元素个数时，如果是将元素个数增多，可能会改变底层容量的大小，如果是将元素个数减少，底层空间总大小不变。不同的编译器改变因resize 导致的扩容方式可能不同。
reserve(size_t res_arg=0)：为string预留空间，不改变有效元素个数，当reserve的参数小于string的底层空间总大小时，reserver不会改变容量大小

3. string类对象的访问及遍历操作

代码解释：

 #include <iostream> #include <string> using namespace std; int main() { string s; s += "hello,world"; cout << s << endl; //本质还是迭代器 cout << "范围for遍历s:" << endl; for (auto& e : s) { cout << e << " "; } cout << endl << endl << endl; cout << "正向迭代器遍历s:" << endl; for (auto i = s.begin(); i != s.end(); i++) { cout << *i << " "; } cout << endl; cout << endl; cout << endl; cout << "逆向迭代器遍历s" << endl; for (auto i = s.rbegin(); i != s.rend(); i++) { cout << *i << " "; } cout << endl; return 0; }

4. string类对象的修改操作

函数名称	功能说明
`push_back`	在字符串末尾插入字符 `c`。例如 `string s = "abc"; s.push_back('d');`，执行后 `s` 为 `"abcd"`
`append`	在字符串后追加一个字符串。比如 `string s = "hello"; s.append(" world");`，结果 `s` 是 `"hello world"`
`operator+=`（重点）	在字符串后追加字符串 `str`，用法和 `append` 类似但更简洁，如 `string s = "I "; s += "love C++";`，`s` 变为 `"I love C++"`
`c_str`（重点）	返回 C 格式字符串（`const char*` 类型），可用于兼容 C 语言接口，比如 `printf("%s", s.c_str());`（`s` 是 `string` 对象）
`find + npos`（重点）	从字符串 `pos` 位置开始往后找字符 `c`，返回该字符在字符串中的位置（若找不到返回 `string::npos`）。例如 `string s = "abcabc"; size_t pos = s.find('b', 2);`，从索引 2 开始找 `'b'`，会返回索引 3
`rfind`	从字符串 `pos` 位置开始往前找字符 `c`，返回该字符在字符串中的位置。比如 `string s = "abcabc"; size_t pos = s.rfind('b', 4);`，从索引 4 往前找 `'b'`，会返回索引 3
`substr`	在字符串中从 `pos` 位置开始，截取 `n` 个字符并返回（若不指定 `n`，则截取到字符串末尾）。例如 `string s = "abcdefg"; string sub = s.substr(2, 3);`，`sub` 为 `"cde"`

表格中的具体接口功能已经写的很明白了，下面只做几个接口的代码实现

push_back() :

#include <iostream> #include <string> using namespace std; int main() { string s1; //push_back 在字符串后尾插字符c s1.push_back('c'); char a = 'd'; s1.push_back(a); /*s1.push_back(" const char* c_str");*///不能尾插字符串 cout << s1 << endl; return 0; }

find+npos :

find :

npos:
string找不到对象时返回npos

代码解释:

#include <iostream> #include <string> using namespace std; int main() { //find 返回的是查找对象在string 中的第一个字符的下标 //查找string 类型的字符串 string s = "hello,world"; string s2 = "llo"; cout << s.find(s2) << endl; //查找c_string 类型的字符串 cout << s.find("wor") << endl; //从pos 位置查找c_str 类型的字符串 中的几个 cout << s.find("wor", 5, 2) << endl;//wo //从pos 位置找char cout << s.find('l') << endl;//查找的是第一个 'l' 找到就停止了 cout << s.find('l', 5) << endl;//查找的是最后一个 'l' cout << s.find("www") << endl;// 找不到即返回npos return 0; }

rfind :

 #include <iostream> #include <string> using namespace std; int main() { string s = "hello,world"; cout << s.rfind('l') << endl;// 9 return 0; }

substr :

#include <iostream> #include <string> using namespace std; int main() { string s = "hello,world"; cout << s.substr(2, 3) << endl;//llo return 0; }

注意：在string尾部追加字符时，s.push_back© / s.append(1, c) / s += 'c’三种的实现方式差
不多，一般情况下string类的+=操作用的比较多，+=操作不仅可以连接单个字符，还可
以连接字符串对string操作时，如果能够大概预估到放多少字符，可以先通过reserve把空间预留
好。减少后续不断添加字符导致的空间开辟浪费的性能

5. string类非成员函数

输入输出的运算符重载在本文下方模拟实现string 类时实现

这里 geline 该函数接口挺重要的也比较常用希望大家能够熟练掌握

getline 不是string 类的封装函数其在std 命名空间里想要用的话在std中直接调用即可

#include <iostream> #include <string> using namespace std; int main() { string s; cout << " 直接cin 输入字符串（里面包含空格）的展现效果:" << endl; cin >> s;//输入"abcdef *" 里面输入的有空格 cout << s << endl; cout << "getline 方式实现含特殊字符的字符串:" << endl; /*s.getline(cin, "dadad ", '\n');*///不能这样调用 getline(cin, s);//第三个参数不写默认为'\n'即：getline(cin, s,'\n'); cout << s << endl; return 0; }

 #include <iostream> #include <string> using namespace std; int main() { string s; cout << "getline输入:" << endl; getline(cin, s, '*'); cout << s << endl; return 0; }

为什么第一个getline 没有调用getline 输入呢？在第一个代码中，cin >> s 会在输入缓冲区留下未读取的内容（包括空格和换行符），而 endl 只负责刷新输出缓冲区，对输入缓冲区没有任何影响。因此必须使用 cin.ignore() 来专门处理输入缓冲区的残留数据。而第二个代码iostream流中即键盘文件中没有信息 getline 则会调用以cin 输入标准流的方式将信息输入到键盘文件中从而读取数据到s 中

上面的几个接口大家了解一下，下面的OJ题目中会有一些体现他们的使用。string类中还有
一些其他的操作，这里不一一列举，大家在需要用到时不明白了查文档即可。
string文档

6. vs和g++下string结构的说明

注意：下述结构是在32位平台下进行验证，32位平台下指针占4个字节。

vs下string的结构
string总共占28个字节，内部结构稍微复杂一点，先是有一个联合体，联合体用来定义
string中字符串的存储空间：当字符串长度小于16时，使用内部固定的字符数组来存放(内存池)内存池这里不做阐述后续文章中会有该知识点的学习当字符串长度大于等于16时，从堆上开辟空间

参考：

union _Bxty {// storage for small buffer or pointer to larger one value_type _Buf[_BUF_SIZE]; pointer _Ptr;char _Alias[_BUF_SIZE];// to permit aliasing} _Bx;

这种设计也是有一定道理的，大多数情况下字符串的长度都小于16，那string对象创建
好之后，内部已经有了16个字符数组的固定空间，不需要通过堆创建，效率高。
其次：还有一个size_t字段保存字符串长度，一个size_t字段保存从堆上开辟空间总的
容量
最后：还有一个指针做一些其他事情。
故总共占16+4+4+4=28个字节

g++下string的结构
G++下，string是通过写时拷贝实现的，string对象总共占4个字节，内部只包含了一个
指针，该指针将来指向一块堆空间，内部包含了如下字段：空间总大小字符串有效长度引用计数

struct_Rep_base{ size_type _M_length; size_type _M_capacity; _Atomic_word _M_refcount;};

指向堆空间的指针，用来存储字符串。

7. string 函数接口的应用

小试牛刀：

仅仅反转字母

classSolution{public:boolisLetter(char ch){if(ch >='a'&& ch <='z')returntrue;if(ch >='A'&& ch <='Z')returntrue;returnfalse;} string reverseOnlyLetters(string S){if(S.empty())return S; size_t begin =0, end = S.size()-1;while(begin < end){while(begin < end &&!isLetter(S[begin]))++begin;while(begin < end &&!isLetter(S[end]))--end;swap(S[begin], S[end]);++begin;--end;}return S;}};

找字符串中第一个只出现一次的字符

哈希计数排序

classSolution{public:intfirstUniqChar(string s){// 统计每个字符出现的次数int count[256]={0};int size = s.size();for(int i =0; i < size;++i) count[s[i]]+=1;// 按照字符次序从前往后找只出现一次的字符for(int i =0; i < size;++i)if(1== count[s[i]])return i;return-1;}};

字符串最后一个单词的长度

#include<iostream>#include<string>usingnamespace std;intmain(){ string line;// 不要使用cin>>line,因为会它遇到空格就结束了// while(cin>>line)while(getline(cin, line)){ size_t pos = line.rfind(' '); cout << line.size()- pos -1<< endl;}return0;}

验证回文串

classSolution{public:boolisLetterOrNumber(char ch){return(ch >='0'&& ch <='9')||(ch >='a'&& ch <='z')||(ch >='A'&& ch <='Z');}boolisPalindrome(string s){// 先小写字母转换成大写，再进行判断for(auto& ch : s){if(ch >='a'&& ch <='z') ch -=32;}int begin =0, end = s.size()-1;while(begin < end){while(begin < end &&!isLetterOrNumber(s[begin]))++begin;while(begin < end &&!isLetterOrNumber(s[end]))--end;if(s[begin]!= s[end]){returnfalse;}else{++begin;--end;}}returntrue;}};

后续学了 stl 中的队列可以重新做此道题

字符串相加

classSolution{public: string addstrings(string num1, string num2){// 从后往前相加，相加的结果到字符串可以使用insert头插// 或者+=尾插以后再reverse过来int end1 = num1.size()-1;int end2 = num2.size()-1;int value1 =0, value2 =0, next =0; string addret;while(end1 >=0|| end2 >=0){if(end1 >=0) value1 = num1[end1--]-'0';else value1 =0;if(end2 >=0) value2 = num2[end2--]-'0';else value2 =0;int valueret = value1 + value2 + next;//当前位上十进制的值if(valueret >9){ next =1;//进位 valueret -=10;//当前为-10}else{ next =0;}//addret.insert(addret.begin(), valueret+'0'); addret +=(valueret +'0');//转成数字字符}if(next ==1){//addret.insert(addret.begin(), '1'); addret +='1';}reverse(addret.begin(), addret.end());return addret;}}

后续有机会我会在我的算法专栏里更新高精度加减乘除算法算法磨剑：用C++思考的艺术

关于string 的题目有很多大家可以去洛谷，力扣等刷题网站巩固,这里不做过多阐述了

3. string类的模拟实现

3.1 经典的string类问题

上面已经对string类进行了简单的介绍，大家只要能够正常使用即可。在面试中，面试官总喜欢让学生自己来模拟实现string类，最主要是实现string类的构造、拷贝构造、赋值运算符重载以及析
构函数。大家看下以下string类的实现是否有问题？

// 为了和标准库区分，此处使用StringclassString{public:/*String() :_str(new char[1]) {*_str = '\0';} *///String(const char* str = "\0") 错误示范//String(const char* str = nullptr) 错误示范String(constchar* str =""){// 构造String类对象时，如果传递nullptr指针，可以认为程序非if(nullptr== str){assert(false);return;} _str =newchar[strlen(str)+1];strcpy(_str, str);}~String(){if(_str){delete[] _str; _str =nullptr;}}private:char* _str;};// 测试voidTestString(){ String s1("hello bit!!!"); String s2(s1);}

程序结束时 s1,s2所指向的空间被连续释放了两次！！！

说明：上述String类没有显式定义其拷贝构造函数与赋值运算符重载，此时编译器会合成默认
的，当用s1构造s2时，编译器会调用默认的拷贝构造。最终导致的问题是，s1、s2共用同一块内
存空间，在释放时同一块空间被释放多次而引起程序崩溃，这种拷贝方式，称为浅拷贝。

3.2 浅拷贝

浅拷贝：也称位拷贝，编译器只是将对象中的值拷贝过来。如果对象中管理资源，最后就会导致
多个对象共享同一份资源，当一个对象销毁时就会将该资源释放掉，而此时另一些对象不知道该
资源已经被释放，以为还有效，所以当继续对资源进项操作时，就会发生发生了访问违规。

就像一个家庭中有两个孩子，但父母只买了一份玩具，两个孩子愿意一块玩，则万事大吉，万一
不想分享就你争我夺，玩具损坏

可以采用深拷贝解决浅拷贝问题，即：每个对象都有一份独立的资源，不要和其他对象共享。父
母给每个孩子都买一份玩具，各自玩各自的就不会有问题了

3.3 深拷贝

如果一个类中涉及到资源的管理，其拷贝构造函数、赋值运算符重载以及析构函数必须要显式给出。一般情况都是按照深拷贝方式提供

3.3.1 传统版写法的String类

classString{public:String(constchar* str =""){// 构造String类对象时，如果传递nullptr指针，可以认为程序非if(nullptr== str){assert(false);return;} _str =newchar[strlen(str)+1];strcpy(_str, str);}String(const String& s):_str(newchar[strlen(s._str)+1]){strcpy(_str, s._str);} String&operator=(const String& s){if(this!=&s){char* pStr =newchar[strlen(s._str)+1];strcpy(pStr, s._str);delete[] _str; _str = pStr;}return*this;}~String(){if(_str){delete[] _str; _str =nullptr;}}private:char* _str;};

3.3.2 现代版写法的String类

classString{public:String(constchar* str =""){if(nullptr== str){assert(false);return;} _str =newchar[strlen(str)+1];strcpy(_str, str);}String(const String& s):_str(nullptr){ String strTmp(s._str);swap(_str, strTmp._str);}// 对比下和上面的赋值那个实现比较好？ String&operator=(String s){swap(_str, s._str);return*this;}/* String& operator=(const String& s) { if(this != &s) { String strTmp(s); swap(_str, strTmp._str); } return *this; } */~String(){if(_str){delete[] _str; _str =nullptr;}}private:char* _str;};

3.4 写时拷贝(了解)

写时拷贝就是一种拖延症，是在浅拷贝的基础之上增加了引用计数的方式来实现的。
引用计数：用来记录资源使用者的个数。在构造时，将资源的计数给成1，每增加一个对象使用该
资源，就给计数增加1，当某个对象被销毁时，先给该计数减1，然后再检查是否需要释放资源，
如果计数为1，说明该对象时资源的最后一个使用者，将该资源释放；否则就不能释放，因为还有
其他对象在使用该资源

3.5 string类的模拟实现

这里简单的模拟数据类型为 char 的string ,但是库里面的是满足各种编码的模板类

#pragmaon#include<iostream>#include<cstring>#include<cassert>usingnamespace std;classmystring{public://构造mystring(constchar* s =""){//空指针非法if(s ==nullptr){assert(false);return;} _str =newchar[strlen(s)+1];//分配空间 给'\0' 留一个空间strcpy(_str, s); _capacity=_size =strlen(s);}//拷贝mystring(const mystring& s){ _str =newchar[s._capacity+1];strcpy(_str, s._str); _size = s._size; _capacity = s._capacity;}//析构~mystring(){delete[] _str; _str =nullptr; _size = _capacity =0;}//赋值 mystring&operator=(const mystring& s){delete[] _str; _str =newchar[strlen(s._str)+1];strcpy(_str, s._str); _size = s._size; _capacity = s._capacity;return*this;}//比较运算符重载//两个字符串相加 mystring operator+(const mystring& s)const{ mystring strtmp =*this;//添加前 先确保空间够用int newlenth =strlen(s._str)+strlen(_str)+1;if(newlenth > strtmp._capacity){//扩容char* newchar =newchar[newlenth*2];strcpy(newchar, strtmp._str);delete[] strtmp._str; strtmp._str = newchar; strtmp._capacity = newlenth*2;}strcat(strtmp._str, s._str); strtmp._size = newlenth-1;return strtmp;}//+= mystring&operator+=(const mystring& s){ mystring strtmp = s;*this=(*this+ strtmp);return*this;}//.....//标准流 重载friend ostream&operator<<(ostream& out,const mystring s);friend istream&operator>>(istream& in, mystring& s);//其他函数接口 如swap find。。。voidswap(mystring& s){//三个值直接交换即可 空间上无需操作 std::swap(_str, s._str); std::swap(_size, s._size); std::swap(_capacity, s._capacity);}voidprint()const{for(int i =0; i < _size; i++){printf("%c", _str[i]);}printf("\n");}private:char* _str;int _size;int _capacity;}; ostream&operator<<(ostream& out,const mystring s){ s.print();return out;} istream&operator>>(istream& in, mystring& s){//直接覆盖数据即可 无需清除之前的数据 修改_size 范围即可int i =0;char ch;while(in.get(ch)&& ch !=' '&& ch !='\n'){// 从in流读取，而非stdin s._str[i++]= ch;} s._str[i]='\0'; s._size = i;return in;}

结语：

从用 char 手动计算长度、反复处理内存溢出的 “踩坑”，到用 std::string 一行代码实现拼接、查找的 “丝滑”，再到亲手拆解其动态扩容、深拷贝的底层逻辑 ——string 类的学习*，本质是一场对 C++“封装思想” 的实践：它把复杂的内存管理藏在底层，把简洁的接口留给开发者，既解决了 C 语言字符串的痛点，又为后续 STL 容器的学习打下了 “容器 + 迭代器 + 接口” 的认知基础

或许你现在对 “小字符串优化（SSO）”“迭代器失效” 等细节仍有疑惑，或许在模拟实现时曾为深拷贝与浅拷贝的边界纠结 —— 但没关系，技术的精进本就是 “先用透，再深究” 的过程。后续可以试着用 string 结合 STL 算法（如 sort 排序字符串、find_if 筛选字符），或是在项目中用 stringstream 处理复杂字符串转换，让知识在实践中落地。

如果你在学习中发现了 string 更高效的用法，或是对模拟实现的细节有不同思路，欢迎在评论区留言交流；也可以持续关注《C++：从代码到机器》专栏，后续会针对 “string 高级特性”“字符串性能优化” 等主题展开更深入的解析，和你一起在 C++ 的世界里，从 “会用” 走向 “吃透”～