前言
要实现类似微信或豆包的语音输入功能,通常有云端 API 和本地模型两种方案。云端准确度高但涉及隐私和费用,本地模型则免费且数据不出域。Faster-Whisper 作为 Whisper 的高效实现,非常适合本地实时语音识别场景。
环境准备
在虚拟环境中安装依赖。核心库是 faster-whisper,录音部分推荐使用 pyaudiowpatch(Windows 下音频兼容性更好)或标准 pyaudio。
pip install faster-whisper
pip install pyaudiowpatch
模型下载
支持多种模型尺寸,从 Tiny 到 Large-v3。如果服务器无法联网,可手动从 Hugging Face 下载模型文件(config.json, model.bin, tokenizer.json 等),放入指定文件夹后设置 local_files_only=True。
核心实现
以下是完整的实时录音转文本脚本。代码采用了多线程机制,录音与转录并行进行,同时集成了 VAD(语音活动检测)过滤静音片段。
# -*- coding: utf-8 -*-
import os
import sys
import time
import wave
import tempfile
import threading
import torch
import pyaudiowpatch as pyaudio
from faster_whisper import WhisperModel
AUDIO_BUFFER = 5
def record_audio(p, device):
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
filename = f.name
wave_file = wave.open(filename, "wb")
wave_file.setnchannels(int(device["maxInputChannels"]))
wave_file.setsampwidth(p.get_sample_size(pyaudio.paInt16))
wave_file.setframerate(int(device["defaultSampleRate"]))
def callback(in_data, frame_count, time_info, status):
wave_file.writeframes(in_data)
(in_data, pyaudio.paContinue)
:
stream = p.(=pyaudio.paInt16, channels=(device[]),
rate=(device[]), frames_per_buffer=,
=, input_device_index=device[], stream_callback=callback)
time.sleep(AUDIO_BUFFER)
Exception e:
()
:
():
stream.stop_stream()
stream.close()
wave_file.close()
filename
():
:
segments, info = model.transcribe(
filename, beam_size=, language=, vad_filter=,
vad_parameters=(min_silence_duration_ms=)
)
segment segments:
( % (segment.start, segment.end, segment.text))
Exception e:
()
:
os.path.exists(filename):
os.remove(filename)
():
()
torch.cuda.is_available():
device =
compute_type =
()
:
device =
compute_type =
()
model_path =
:
model = WhisperModel(model_path, device=device, compute_type=compute_type, local_files_only=)
()
Exception e:
()
pyaudio.PyAudio() p:
:
default_mic = p.get_default_input_device_info()
()
( * )
()
:
filename = record_audio(p, default_mic)
thread = threading.Thread(target=whisper_audio, args=(filename, model))
thread.start()
OSError:
()
KeyboardInterrupt:
()
Exception e:
()
__name__ == :
main()


