Faster-Whisper 本地实时语音识别部署实战

想要实现类似豆包或微信的语音输入体验，云端 API 虽准但涉及隐私，本地模型则是免费且离线的优选方案。这里记录一下基于 Faster-Whisper 的本地实时语音转文本部署过程。

环境搭建

在虚拟环境中安装核心依赖即可：

pip install faster-whisper pyaudio

若需 GPU 加速，请确保已正确安装 CUDA 和 cuDNN 环境。没有显卡则默认使用 CPU 推理。

模型下载

Faster-Whisper 支持多种模型规格，根据性能需求选择：

Tiny/Base/Small：轻量级，速度快
Medium/Large-v2/v3：精度高，资源消耗大
Distil-Large-v3：蒸馏版，兼顾速度与效果

手动下载时，进入 Hugging Face 仓库的 "Files and versions" 页面，将 config.json、model.bin、tokenizer.json、vocabulary.json 等关键文件放入同一文件夹。例如下载 large-v3 版本，解压后路径指向该目录即可。

录音与转录脚本

核心逻辑分为两部分：一是通过 pyaudio 采集音频并保存为临时 WAV 文件，二是调用 Whisper 模型进行转录。下面是一个完整的示例代码，包含 GPU 检测与 VAD（语音活动检测）过滤。

# -*- coding: utf-8 -*-
import os
import sys
import time
import wave
import tempfile
import threading
import torch
import pyaudio
from faster_whisper import WhisperModel

AUDIO_BUFFER = 5  # 录音切片时长（秒）

def record_audio(p, device):
    # 创建临时文件存储音频
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        filename = f.name
        wave_file = wave.open(filename, "wb")
        wave_file.setnchannels(int(device["maxInputChannels"]))
        wave_file.setsampwidth(p.get_sample_size(pyaudio.paInt16))
        wave_file.setframerate((device[]))

     ():
        wave_file.writeframes(in_data)
         (in_data, pyaudio.paContinue)

    :
        stream = p.(=pyaudio.paInt16,
                        channels=(device[]),
                        rate=(device[]),
                        frames_per_buffer=,
                        =,
                        input_device_index=device[],
                        stream_callback=callback)
        time.sleep(AUDIO_BUFFER)  
     Exception  e:
        ()
    :
           ():
            stream.stop_stream()
            stream.close()
        wave_file.close()
     filename

 ():
    
    :
        
        segments, info = model.transcribe(
            filename, beam_size=,
            language=,
            vad_filter=,
            vad_parameters=(min_silence_duration_ms=)
        )
         segment  segments:
            ( % (segment.start, segment.end, segment.text))
     Exception  e:
        ()
    :
        
         os.path.exists(filename):
            os.remove(filename)

 ():
    ()
    
     torch.cuda.is_available():
        device = 
        compute_type =   
        ()
    :
        device = 
        compute_type =   
        ()

    
    model_path = 
    :
        model = WhisperModel(model_path, device=device, compute_type=compute_type, local_files_only=)
        ()
     Exception  e:
        ()
        

     pyaudio.PyAudio()  p:
        :
            default_mic = p.get_default_input_device_info()
            ()
            ()
            ( * )
            ()
             :
                filename = record_audio(p, default_mic)
                thread = threading.Thread(target=whisper_audio, args=(filename, model))
                thread.start()
         OSError:
            ()
         KeyboardInterrupt:
            ()
         Exception  e:
            ()

 __name__ == :
    main()

Faster-Whisper 本地实时语音识别部署实战

环境搭建

模型下载

录音与转录脚本

更多推荐文章

相关免费在线工具

常见问题排查

1. cuDNN 版本冲突

2. CUDA 库缺失

3. VAD 过滤器报错

更多推荐文章

相关免费在线工具

Faster-Whisper 本地实时语音识别部署实战

环境搭建

模型下载

录音与转录脚本

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

常见问题排查

1. cuDNN 版本冲突

2. CUDA 库缺失

3. VAD 过滤器报错

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具