想要实现类似豆包或微信的语音输入体验,云端 API 虽准但涉及隐私,本地模型则是免费且离线的优选方案。这里记录一下基于 Faster-Whisper 的本地实时语音转文本部署过程。
环境搭建
在虚拟环境中安装核心依赖即可:
pip install faster-whisper pyaudio
若需 GPU 加速,请确保已正确安装 CUDA 和 cuDNN 环境。没有显卡则默认使用 CPU 推理。
模型下载
Faster-Whisper 支持多种模型规格,根据性能需求选择:
- Tiny/Base/Small:轻量级,速度快
- Medium/Large-v2/v3:精度高,资源消耗大
- Distil-Large-v3:蒸馏版,兼顾速度与效果
手动下载时,进入 Hugging Face 仓库的 "Files and versions" 页面,将 config.json、model.bin、tokenizer.json、vocabulary.json 等关键文件放入同一文件夹。例如下载 large-v3 版本,解压后路径指向该目录即可。
录音与转录脚本
核心逻辑分为两部分:一是通过 pyaudio 采集音频并保存为临时 WAV 文件,二是调用 Whisper 模型进行转录。下面是一个完整的示例代码,包含 GPU 检测与 VAD(语音活动检测)过滤。
# -*- coding: utf-8 -*-
import os
import sys
import time
import wave
import tempfile
import threading
import torch
import pyaudio
from faster_whisper import WhisperModel
AUDIO_BUFFER = 5 # 录音切片时长(秒)
def record_audio(p, device):
# 创建临时文件存储音频
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
filename = f.name
wave_file = wave.open(filename, "wb")
wave_file.setnchannels(int(device["maxInputChannels"]))
wave_file.setsampwidth(p.get_sample_size(pyaudio.paInt16))
wave_file.setframerate(int(device[]))
():
wave_file.writeframes(in_data)
(in_data, pyaudio.paContinue)
:
stream = p.(=pyaudio.paInt16,
channels=(device[]),
rate=(device[]),
frames_per_buffer=,
=,
input_device_index=device[],
stream_callback=callback)
time.sleep(AUDIO_BUFFER)
Exception e:
()
:
():
stream.stop_stream()
stream.close()
wave_file.close()
filename
():
:
segments, info = model.transcribe(
filename, beam_size=,
language=,
vad_filter=,
vad_parameters=(min_silence_duration_ms=)
)
segment segments:
( % (segment.start, segment.end, segment.text))
Exception e:
()
:
os.path.exists(filename):
os.remove(filename)
():
()
torch.cuda.is_available():
device =
compute_type =
()
:
device =
compute_type =
()
model_path =
:
model = WhisperModel(model_path, device=device, compute_type=compute_type, local_files_only=)
()
Exception e:
()
pyaudio.PyAudio() p:
:
default_mic = p.get_default_input_device_info()
()
()
( * )
()
:
filename = record_audio(p, default_mic)
thread = threading.Thread(target=whisper_audio, args=(filename, model))
thread.start()
OSError:
()
KeyboardInterrupt:
()
Exception e:
()
__name__ == :
main()


