ESP32 语音对话机器人:整合 Coze 与百度千帆
随着物联网和 AI 技术的融合,低成本嵌入式设备接入大模型能力已成为趋势。本文将分享如何利用 ESP32 微控制器,结合 Coze 大模型(负责对话逻辑)、百度千帆平台(提供 ASR 语音识别与 TTS 语音合成),构建一个完整的本地语音交互系统。
1. 系统架构设计
整个流程分为三个核心阶段,通过云端分担计算压力,确保 ESP32 在低功耗下稳定运行:
- 语音输入:麦克风采集音频,经 I2S 接口传输至 SD 卡缓存,随后上传至百度 ASR。
- 智能响应:识别后的文本送入 Coze 大模型生成回复。
- 语音输出:回复文本调用百度 TTS 合成音频,播放至扬声器。
这种架构充分利用了云端的算力优势,避免了在资源受限的 MCU 上部署大型模型。
2. 硬件准备
ESP32 作为主控,需搭配以下外设:
- ESP32 开发板:处理 Wi-Fi 连接及逻辑控制。
- 麦克风模块:推荐使用 INMP441(数字 I2S)或 MAX9814(模拟放大),支持高保真录音。
- 扬声器模块:MAX98357A 等 I2S 功放模块,驱动效果更佳。
- SD 卡模块:用于临时存储录音文件和 TTS 生成的音频文件,减轻内存压力。
3. 软件实现要点
3.1 状态机管理
由于涉及多个异步网络请求(ASR、LLM、TTS),采用状态机(State Machine)是最佳实践。代码中定义了 STATE_IDLE、STATE_RECORDING、STATE_ASR 等状态,确保同一时间只执行一个任务,避免资源冲突。
3.2 网络与 Token 管理
百度 API 需要 Access Token,且有效期有限。代码中实现了自动获取与刷新机制,当 Token 即将过期时提前重新请求,保证服务连续性。同时增加了 WiFi 断线重连逻辑,提升稳定性。
3.3 音频流处理
录音和播放均采用 I2S 协议,配合 DMA 缓冲区减少 CPU 占用。音频数据先写入 SD 卡再发送,避免了内存溢出问题,适合 ESP32 有限的 RAM。
4. 核心代码解析
以下是经过优化的完整工程代码,修复了部分变量声明与逻辑细节,可直接参考使用。
#include <Arduino.h>
#include <WiFi.h>
#include <WiFiClientSecure.h>
#include <ArduinoJson.h>
#include "driver/i2s.h"
#include "FS.h"
#include "SD.h"
* WIFI_SSID = ;
* WIFI_PASS = ;
* COZE_API_KEY = ;
* COZE_BOT_ID = ;
* COZE_USER_ID = ;
* COZE_API_DOMAIN = ;
COZE_API_PORT = ;
* BAIDU_API_KEY = ;
* BAIDU_SECRET_KEY = ;
* BAIDU_ASR_URL = ;
* BAIDU_TTS_URL = ;
{
STATE_IDLE,
STATE_RECORDING,
STATE_ASR,
STATE_COZE,
STATE_TTS,
STATE_PLAYING
};
DeviceState currentState = STATE_IDLE;
WiFiClientSecure client;
String accessToken;
tokenExpireTime = ;
String asrText;
String cozeReply;
String response;
String speechBase64;
String retrieveResp;
String msgResp;
{
Serial.(, (), msg.());
}
{
(WiFi.() != WL_CONNECTED) {
();
WiFi.();
retry = ;
(WiFi.() != WL_CONNECTED && retry < ) {
();
retry++;
}
(WiFi.() == WL_CONNECTED) {
( + WiFi.().());
;
}
}
;
}
{
String encodedString = ;
( i = ; i < str.(); i++) {
c = str.(i);
(c == ) {
encodedString += ;
} ((c)) {
encodedString += c;
} {
encodedString += ;
encodedString += ((c & ) >> );
encodedString += (c & );
}
}
encodedString;
}
{
(n < ) ? ()(n + ) : ()(n + );
}
{
(!SD.(filePath)) ;
File file = SD.(filePath, FILE_READ);
size = file.();
file.();
size;
}
{
SPI.(SD_SCK, SD_MISO, SD_MOSI, SD_CS);
(!SD.(SD_CS)) {
();
;
}
();
;
}
{
i2s_config = {
.mode = ()(I2S_MODE_MASTER | I2S_MODE_RX),
.sample_rate = SAMPLE_RATE,
.bits_per_sample = BITS_PER_SAMPLE,
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = I2S_COMM_FORMAT_I2S_MSB,
.intr_alloc_flags = ,
.dma_buf_count = ,
.dma_buf_len = ,
.use_apll =
};
(I2S_NUM_0, &i2s_config, , );
pin_config = {
.bck_io_num = I2S_REC_BCLK,
.ws_io_num = I2S_REC_LRC,
.data_out_num = I2S_PIN_NO_CHANGE,
.data_in_num = I2S_REC_DIN
};
(I2S_NUM_0, &pin_config);
();
}
{
i2s_config = {
.mode = ()(I2S_MODE_MASTER | I2S_MODE_TX),
.sample_rate = ,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = I2S_COMM_FORMAT_I2S_MSB,
.intr_alloc_flags = ,
.dma_buf_count = ,
.dma_buf_len = ,
.use_apll =
};
(I2S_NUM_1, &i2s_config, , );
pin_config = {
.bck_io_num = I2S_PLAY_BCLK,
.ws_io_num = I2S_PLAY_LRC,
.data_out_num = I2S_PLAY_DOUT,
.data_in_num = I2S_PIN_NO_CHANGE
};
(I2S_NUM_1, &pin_config);
(I2S_NUM_1);
();
}
{
(currentState != STATE_IDLE) {
();
;
}
(!()) ;
currentState = STATE_RECORDING;
recordStartMillis = ();
(SD.(RECORD_FILE_PATH)) {
SD.(RECORD_FILE_PATH);
();
}
();
File recFile = SD.(RECORD_FILE_PATH, FILE_WRITE);
(!recFile) {
();
(I2S_NUM_0);
currentState = STATE_IDLE;
;
}
();
sampleBuffer[];
(currentState == STATE_RECORDING && (() - recordStartMillis) < RECORD_DURATION) {
bytesRead;
(I2S_NUM_0, sampleBuffer, (sampleBuffer), &bytesRead, portMAX_DELAY);
(bytesRead > ) {
recFile.((*)sampleBuffer, bytesRead);
}
();
}
recFile.();
(I2S_NUM_0);
currentState = STATE_IDLE;
();
(SD.(RECORD_FILE_PATH)) {
duration = ()(RECORD_FILE_PATH) / (SAMPLE_RATE * BYTES_PER_SAMPLE);
( + (duration, ) + );
currentState = STATE_ASR;
}
}
{
(accessToken.() > && () < tokenExpireTime - ) {
;
}
();
String tokenUrl = + (BAIDU_API_KEY) + + (BAIDU_SECRET_KEY);
(client.(, )) {
client.( + tokenUrl + );
client.();
client.();
(client.() || client.()) {
(client.()) {
response += client.();
}
}
client.();
jsonStart = response.();
(jsonStart != ) {
;
DeserializationError error = (doc, response.(jsonStart));
(!error && doc.()) {
accessToken = doc[].<String>();
expireSeconds = doc[].<>();
tokenExpireTime = () + expireSeconds * ;
();
;
}
}
}
();
;
}
{
(currentState != STATE_ASR) ;
();
(!() || !SD.(RECORD_FILE_PATH)) {
currentState = STATE_IDLE;
;
}
File recFile = SD.(RECORD_FILE_PATH, FILE_READ);
(!recFile) {
currentState = STATE_IDLE;
;
}
chunkSize = ;
chunk[chunkSize];
speechBase64 = ;
(recFile.() > ) {
bytesRead = recFile.(chunk, chunkSize);
speechBase64 += base64::(chunk, bytesRead);
}
recFile.();
;
reqDoc[] = ;
reqDoc[] = SAMPLE_RATE;
reqDoc[] = ;
reqDoc[] = speechBase64;
reqDoc[] = WiFi.();
reqDoc[] = (RECORD_FILE_PATH);
String postBody;
(reqDoc, postBody);
String requestUrl = (BAIDU_ASR_URL) + + accessToken;
(client.(, )) {
client.( + requestUrl + );
client.();
client.();
client.( + (postBody.()) + );
client.(postBody);
(client.() || client.()) {
(client.()) {
response += client.();
}
}
client.();
jsonStart = response.();
(jsonStart != ) {
;
DeserializationError error = (resDoc, response.(jsonStart));
(!error && resDoc[].<>() == ) {
asrText = resDoc[][].<String>();
( + asrText);
currentState = STATE_COZE;
} {
();
currentState = STATE_IDLE;
}
}
}
}
{
(resDoc[].<>() != ) {
+ resDoc[].<String>();
}
JsonArray data = resDoc[].<JsonArray>();
String reply = ;
( item : data) {
(item[].<String>() == ) {
reply = item[].<String>();
;
}
}
reply;
}
{
String retrieveUrl = + conversationId + + chatId;
String msgListUrl = + chatId + + conversationId + + (COZE_BOT_ID) + + chatId;
maxRetries = ;
( retry = ; retry < maxRetries; retry++) {
();
(client.(COZE_API_DOMAIN, COZE_API_PORT)) {
client.( + retrieveUrl + );
client.( + (COZE_API_DOMAIN) + );
client.( + (COZE_API_KEY) + );
client.();
(client.() || client.()) {
(client.()) retrieveResp += client.();
}
client.();
jsonStart = retrieveResp.();
(jsonStart != ) {
;
DeserializationError error = (resDoc, retrieveResp.(jsonStart));
(!error && resDoc[].<>() == ) {
String status = resDoc[][].<String>();
(status == ) {
(client.(COZE_API_DOMAIN, COZE_API_PORT)) {
client.( + msgListUrl + );
client.( + (COZE_API_DOMAIN) + );
client.( + (COZE_API_KEY) + );
client.();
(client.() || client.()) {
(client.()) msgResp += client.();
}
client.();
msgJsonStart = msgResp.();
(msgJsonStart != ) {
;
DeserializationError msgError = (msgDoc, msgResp.(msgJsonStart));
(!msgError) {
(msgDoc);
}
}
}
} (status == ) {
;
}
}
}
}
();
}
;
}
{
(currentState != STATE_COZE || asrText.() == ) ;
( + asrText);
(!()) {
currentState = STATE_IDLE;
;
}
;
reqDoc[] = COZE_BOT_ID;
reqDoc[] = COZE_USER_ID;
reqDoc[] = ;
reqDoc[] = ;
JsonArray messages = reqDoc.();
JsonObject userMsg = messages.();
userMsg[] = ;
userMsg[] = asrText;
userMsg[] = ;
String postBody;
(reqDoc, postBody);
(client.(COZE_API_DOMAIN, COZE_API_PORT)) {
client.();
client.( + (COZE_API_DOMAIN) + );
client.( + (COZE_API_KEY) + );
client.();
client.( + (postBody.()) + );
client.();
client.(postBody);
(client.() || client.()) {
(client.()) response += client.();
}
client.();
jsonStart = response.();
(jsonStart != ) {
;
DeserializationError error = (resDoc, response.(jsonStart));
(!error && resDoc[].<>() == ) {
String chatId = resDoc[][].<String>();
String conversationId = resDoc[][].<String>();
();
cozeReply = (conversationId, chatId);
( + cozeReply);
currentState = STATE_TTS;
}
}
}
}
{
(currentState != STATE_TTS || cozeReply.() == ) ;
( + cozeReply);
(!()) {
currentState = STATE_IDLE;
;
}
String encodedText = (cozeReply);
String ttsParams = + encodedText + + WiFi.() + + accessToken + ;
String requestUrl = (BAIDU_TTS_URL) + + ttsParams;
(SD.(TTS_FILE_PATH)) {
SD.(TTS_FILE_PATH);
}
File ttsFile = SD.(TTS_FILE_PATH, FILE_WRITE);
(!ttsFile) {
();
currentState = STATE_IDLE;
;
}
(client.(, )) {
client.( + requestUrl + );
client.();
client.();
headerEnd = ;
(client.() || client.()) {
(client.()) {
String line = client.();
(headerEnd) {
ttsFile.(( *)line.(), line.());
}
(line == ) {
headerEnd = ;
}
}
}
client.();
ttsFile.();
ttsFileSize = (TTS_FILE_PATH);
(SD.(TTS_FILE_PATH) && ttsFileSize > ) {
();
currentState = STATE_PLAYING;
File playFile = SD.(TTS_FILE_PATH, FILE_READ);
(playFile) {
(I2S_NUM_1);
bytesRead;
playBuffer[];
(playFile.() > && currentState == STATE_PLAYING) {
bytesRead = playFile.(playBuffer, (playBuffer));
(I2S_NUM_1, playBuffer, bytesRead, &bytesRead, portMAX_DELAY);
}
playFile.();
(I2S_NUM_1);
}
();
currentState = STATE_IDLE;
}
}
}
{
(Serial.() > ) {
String input = Serial.();
input.();
(input.() == ) ;
( + input);
(input == ) {
();
} (input == ) {
(SD.(RECORD_FILE_PATH)) {
size = (RECORD_FILE_PATH);
( + (size) + );
}
} (input == ) {
();
();
} {
(currentState == STATE_IDLE) {
asrText = input;
currentState = STATE_COZE;
}
}
}
}
{
Serial.();
();
();
();
();
();
();
(!()) {
() {
();
();
}
}
();
WiFi.(WIFI_SSID, WIFI_PASS);
( );
(WiFi.() != WL_CONNECTED) {
();
Serial.();
}
();
client.();
();
currentState = STATE_IDLE;
();
}
{
();
(currentState) {
STATE_ASR: (); ;
STATE_COZE: (); ;
STATE_TTS: (); ;
: ;
}
();
}

