前言
要想实现像豆包、微信等一样的语音输入功能,通常有两种主流方案:云端 API(轻量、准确度极高)和 本地模型(免费、隐私、无需联网)。由于目前开发的系统需要添加一个语音识别功能,刚好记录一下使用 Faster-Whisper 实时语音输入转文本。
电脑有显卡的话可以参考官方文档安装 cuda 和 cudnn。
一、安装环境
在你的虚拟环境安装 faster-whisper,命令如下:
pip install faster-whisper
安装录音库
pip install pyaudio
二、使用步骤
1.下载模型
手动下载(离线使用) 如果你的服务器无法联网,或者你想把模型放在指定文件夹,可以手动下载。根据需求点击链接下载:
- Tiny (最小/最快): Systran/faster-whisper-tiny
- Base: Systran/faster-whisper-base
- Small: Systran/faster-whisper-small
- Medium: Systran/faster-whisper-medium
- Large-v2: Systran/faster-whisper-large-v2
- Large-v3 (效果最好): Systran/faster-whisper-large-v3
- Distil-Large-v3 (蒸馏版/速度快): Systran/faster-distil-whisper-large-v3
在 Hugging Face 的'Files and versions'页面中,下载以下几个关键文件(放入同一个文件夹):
config.jsonmodel.bintokenizer.jsonvocabulary.jsonpreprocessor_config.json
把下载的模型文件放到一个文件夹内。
2.实时录音转文本脚本
代码如下:
# -*- coding: utf-8 -*-
import os
import sys
import time
wave
tempfile
threading
torch
pyaudio
faster_whisper WhisperModel
AUDIO_BUFFER =
():
tempfile.NamedTemporaryFile(suffix=, delete=) f:
filename = f.name
wave_file = wave.(filename, )
wave_file.setnchannels((device[]))
wave_file.setsampwidth(p.get_sample_size(pyaudio.paInt16))
wave_file.setframerate((device[]))
():
wave_file.writeframes(in_data)
(in_data, pyaudio.paContinue)
:
stream = p.(=pyaudio.paInt16, channels=(device[]), rate=(device[]), frames_per_buffer=,
=, input_device_index=device[], stream_callback=callback,)
time.sleep(AUDIO_BUFFER)
Exception e:
()
:
():
stream.stop_stream()
stream.close()
wave_file.close()
filename
():
:
segments, info = model.transcribe(
filename, beam_size=, language=, vad_filter=, vad_parameters=(min_silence_duration_ms=))
segment segments:
( % (segment.start, segment.end, segment.text))
Exception e:
()
:
os.path.exists(filename):
os.remove(filename)
():
()
torch.cuda.is_available():
device =
compute_type =
()
:
device =
compute_type =
()
model_path =
:
model = WhisperModel(model_path, device=device, compute_type=compute_type, local_files_only=)
()
Exception e:
()
pyaudio.PyAudio() p:
:
default_mic = p.get_default_input_device_info()
()
()
( * )
()
:
filename = record_audio(p, default_mic)
thread = threading.Thread(target=whisper_audio, args=(filename, model))
thread.start()
OSError:
()
KeyboardInterrupt:
()
Exception e:
()
__name__ == :
main()


