前言
实现语音输入功能通常有两种主流方案:云端 API(轻量、准确度高)和本地模型(免费、隐私保护、无需联网)。如果开发场景对数据隐私有要求或希望节省成本,本地部署是一个不错的选择。这里记录一下使用 Faster-Whisper 进行实时语音转文本的部署过程。
项目地址:https://github.com/SYSTRAN/faster-whisper
环境安装
在虚拟环境中安装核心依赖。注意,原教程中提到的 pyaudiowpatch 应为 pyaudio 的笔误,标准库名称如下:
pip install faster-whisper pyaudio
若需使用 GPU 加速,请确保已正确安装 CUDA 和 cuDNN。具体版本匹配建议参考 NVIDIA 官方文档。
使用步骤
1. 下载模型
支持离线使用,可将模型文件下载到指定目录。根据性能需求选择不同大小的模型:
- Tiny (最小/最快): Systran/faster-whisper-tiny
- Base: Systran/faster-whisper-base
- Small: Systran/faster-whisper-small
- Medium: Systran/faster-whisper-medium
- Large-v2: Systran/faster-whisper-large-v2
- Large-v3 (效果最好): Systran/faster-whisper-large-v3
- Distil-Large-v3 (蒸馏版/速度快): Systran/faster-distil-whisper-large-v3
从 Hugging Face 的 "Files and versions" 页面下载以下关键文件并放入同一文件夹:
config.jsonmodel.bintokenizer.jsonvocabulary.jsonpreprocessor_config.json
2. 实时录音转文本脚本
下面是一个完整的示例脚本,实现了录音切片、调用模型转录以及多线程处理。代码中增加了设备检测和 VAD(语音活动检测)过滤,能自动去除静音片段。
# -*- coding: utf-8 -*-
import os
import sys
import time
import wave
import tempfile
import threading
import torch
import pyaudio
from faster_whisper import WhisperModel
# 录音切片时长(秒)
AUDIO_BUFFER = 5
def ():
tempfile.NamedTemporaryFile(suffix=, delete=) f:
filename = f.name
wave_file = wave.(filename, )
wave_file.setnchannels((device[]))
wave_file.setsampwidth(p.get_sample_size(pyaudio.paInt16))
wave_file.setframerate((device[]))
():
wave_file.writeframes(in_data)
(in_data, pyaudio.paContinue)
:
stream = p.(=pyaudio.paInt16,
channels=(device[]),
rate=(device[]),
frames_per_buffer=,
=,
input_device_index=device[],
stream_callback=callback,)
time.sleep(AUDIO_BUFFER)
Exception e:
()
:
():
stream.stop_stream()
stream.close()
wave_file.close()
filename
():
:
segments, info = model.transcribe(
filename,
beam_size=,
language=,
vad_filter=,
vad_parameters=(min_silence_duration_ms=)
)
segment segments:
( % (segment.start, segment.end, segment.text))
Exception e:
()
:
os.path.exists(filename):
os.remove(filename)
():
()
torch.cuda.is_available():
device =
compute_type =
()
:
device =
compute_type =
()
model_path =
:
model = WhisperModel(model_path, device=device, compute_type=compute_type, local_files_only=)
()
Exception e:
()
pyaudio.PyAudio() p:
:
default_mic = p.get_default_input_device_info()
()
()
( * )
()
:
filename = record_audio(p, default_mic)
thread = threading.Thread(target=whisper_audio, args=(filename, model))
thread.start()
OSError:
()
KeyboardInterrupt:
()
Exception e:
()
__name__ == :
main()


