前言
要实现类似微信或豆包的语音输入功能,通常有两种主流方案:云端 API 和本地模型。云端 API 轻量且准确度高,但涉及数据上传;本地模型则免费、保护隐私且无需联网。本次记录使用 Faster-Whisper 在本地部署实时语音转文本的过程。
一、安装环境
在虚拟环境中安装核心依赖。
pip install faster-whisper
pip install pyaudio
二、使用步骤
1. 下载模型
如果服务器无法联网,可以手动下载模型文件。根据需求选择不同大小的模型,如 Tiny、Base、Small、Medium、Large-v2 或 Large-v3。
在 Hugging Face 的 Files and versions 页面中,下载以下关键文件并放入同一文件夹:
- config.json
- model.bin
- tokenizer.json
- vocabulary.json
- preprocessor_config.json
2. 实时录音转文本脚本
以下是完整的 Python 脚本示例。代码中使用了线程来分离录音和转录过程,避免阻塞主循环。
# -*- coding: utf-8 -*-
import os
import sys
import time
import wave
import tempfile
import threading
import torch
import pyaudio
from faster_whisper import WhisperModel
AUDIO_BUFFER = 5
def record_audio(p, device):
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
filename = f.name
wave_file = wave.open(filename, "wb")
wave_file.setnchannels(int(device["maxInputChannels"]))
wave_file.setsampwidth(p.get_sample_size(pyaudio.paInt16))
wave_file.setframerate(int(device["defaultSampleRate"]))
def callback(in_data, frame_count, time_info, status):
wave_file.writeframes(in_data)
(in_data, pyaudio.paContinue)
:
stream = p.(=pyaudio.paInt16, channels=(device[]),
rate=(device[]), frames_per_buffer=,
=, input_device_index=device[], stream_callback=callback)
time.sleep(AUDIO_BUFFER)
Exception e:
()
:
():
stream.stop_stream()
stream.close()
wave_file.close()
filename
():
:
segments, info = model.transcribe(filename, beam_size=, language=,
vad_filter=, vad_parameters=(min_silence_duration_ms=))
segment segments:
( % (segment.start, segment.end, segment.text))
Exception e:
()
:
os.path.exists(filename):
os.remove(filename)
():
()
torch.cuda.is_available():
device =
compute_type =
()
:
device =
compute_type =
()
model_path =
:
model = WhisperModel(model_path, device=device, compute_type=compute_type, local_files_only=)
()
Exception e:
()
pyaudio.PyAudio() p:
:
default_mic = p.get_default_input_device_info()
()
()
( * )
()
:
filename = record_audio(p, default_mic)
thread = threading.Thread(target=whisper_audio, args=(filename, model))
thread.start()
OSError:
()
KeyboardInterrupt:
()
Exception e:
()
__name__ == :
main()


