Faster-Whisper 本地实时语音识别部署实战

前言

实现语音输入功能通常有两种主流方案：云端 API（轻量、准确度高）和本地模型（免费、隐私保护、无需联网）。对于需要处理敏感数据或离线环境的场景，本地部署是更优选择。本文记录使用 Faster-Whisper 进行实时语音转文本的完整流程，该框架基于 Whisper 优化，推理速度更快。

环境安装

在虚拟环境中安装核心依赖。注意，pyaudio 用于录音，faster-whisper 负责模型推理。

pip install faster-whisper pyaudio

若需 GPU 加速，请确保已正确安装 CUDA 和 cuDNN 环境。

模型准备

Faster-Whisper 支持多种模型尺寸，可根据硬件性能选择。如果服务器无法联网，可手动下载模型文件至指定目录。

推荐模型及下载地址：

Tiny (最小/最快): Systran/faster-whisper-tiny
Base: Systran/faster-whisper-base
Small: Systran/faster-whisper-small
Medium: Systran/faster-whisper-medium
Large-v2: Systran/faster-whisper-large-v2
Large-v3 (效果最好): Systran/faster-whisper-large-v3
Distil-Large-v3 (蒸馏版/速度快): Systran/faster-distil-whisper-large-v3

从 Hugging Face 的 "Files and versions" 页面下载以下关键文件并放入同一文件夹：

config.json
model.bin
tokenizer.json
vocabulary.json
preprocessor_config.json

代码实现

以下是完整的实时录音转文本脚本。代码中使用了 threading 来分离录音与转录过程，避免阻塞主线程。

# -*- coding: utf-8 -*-
import os
import sys
 time
 wave
 tempfile
 threading
 torch
 pyaudio
 faster_whisper  WhisperModel


AUDIO_BUFFER = 

 ():
    
     tempfile.NamedTemporaryFile(suffix=, delete=)  f:
        filename = f.name
        wave_file = wave.(filename, )
        wave_file.setnchannels((device[]))
        wave_file.setsampwidth(p.get_sample_size(pyaudio.paInt16))
        wave_file.setframerate((device[]))

     ():
        
        wave_file.writeframes(in_data)
         (in_data, pyaudio.paContinue)

    :
        stream = p.(=pyaudio.paInt16,
                        channels=(device[]),
                        rate=(device[]),
                        frames_per_buffer=,
                        =,
                        input_device_index=device[],
                        stream_callback=callback,)
        time.sleep(AUDIO_BUFFER)  
     Exception  e:
        ()
    :
           ():
            stream.stop_stream()
            stream.close()
        wave_file.close()
     filename

 ():
    
    :
        
        segments, info = model.transcribe(
            filename,
            beam_size=,
            language=,
            vad_filter=,
            vad_parameters=(min_silence_duration_ms=)
        )
         segment  segments:
            ( % (segment.start, segment.end, segment.text))
     Exception  e:
        ()
    :
        
         os.path.exists(filename):
            os.remove(filename)

 ():
    ()
    
     torch.cuda.is_available():
        device = 
        compute_type =   
        ()
    :
        device = 
        compute_type =   
        ()

    
    model_path = 
    :
        model = WhisperModel(model_path, device=device, compute_type=compute_type, local_files_only=)
        ()
     Exception  e:
        ()
        

     pyaudio.PyAudio()  p:
        :
            default_mic = p.get_default_input_device_info()
            ()
            ()
            ( * )
            ()
             :
                filename = record_audio(p, default_mic)
                thread = threading.Thread(target=whisper_audio, args=(filename, model))
                thread.start()
         OSError:
            ()
         KeyboardInterrupt:
            ()
         Exception  e:
            ()

 __name__ == :
    main()

Faster-Whisper 本地实时语音识别部署实战

前言

环境安装

模型准备

代码实现

更多推荐文章

相关免费在线工具

常见问题排查

1. cuDNN 版本冲突

2. DLL 缺失问题

3. VAD 过滤报错

总结

更多推荐文章

相关免费在线工具

Faster-Whisper 本地实时语音识别部署实战

前言

环境安装

模型准备

代码实现

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

常见问题排查

1. cuDNN 版本冲突

2. DLL 缺失问题

3. VAD 过滤报错

总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具