使用 Faster-Whisper 实现本地实时语音转文本

前言

要想实现语音输入功能，通常有两种主流方案：云端 API（轻量、准确度极高）和本地模型（免费、隐私、无需联网）。本文记录一下使用 Faster-Whisper 实时语音输入转文本的部署过程。

Faster-Whisper 官网地址：https://github.com/SYSTRAN/faster-whisper

一、安装环境

在你的虚拟环境安装 faster-whisper，命令如下：

pip install faster-whisper

安装录音库：

pip install pyaudiowpatch

二、使用步骤

1. 下载模型

手动下载（离线使用）：如果你的服务器无法联网，或者你想把模型放在指定文件夹，可以手动下载。根据需求点击链接下载：

Tiny (最小/最快): Systran/faster-whisper-tiny
Base: Systran/faster-whisper-base
Small: Systran/faster-whisper-small
Medium: Systran/faster-whisper-medium
Large-v2: Systran/faster-whisper-large-v2
Large-v3 (效果最好): Systran/faster-whisper-large-v3
Distil-Large-v3 (蒸馏版/速度快): Systran/faster-distil-whisper-large-v3

在 Hugging Face 的 'Files and versions' 页面中，下载以下几个关键文件（放入同一个文件夹）：

config.json
model.bin
tokenizer.json
vocabulary.json
preprocessor_config.json

将下载的模型文件放到一个文件夹内。

2. 实时录音转文本脚本

代码如下：


 os
 sys
 time
 wave
 tempfile
 threading
 torch
 pyaudiowpatch  pyaudio
 faster_whisper  WhisperModel


AUDIO_BUFFER = 

 ():
    
     tempfile.NamedTemporaryFile(suffix=, delete=)  f:
        filename = f.name
        wave_file = wave.(filename, )
        wave_file.setnchannels((device[]))
        wave_file.setsampwidth(p.get_sample_size(pyaudio.paInt16))
        wave_file.setframerate((device[]))

         ():
            
            wave_file.writeframes(in_data)
             (in_data, pyaudio.paContinue)

        :
            stream = p.(=pyaudio.paInt16, channels=(device[]), rate=(device[]), frames_per_buffer=, =, input_device_index=device[], stream_callback=callback,)
            time.sleep(AUDIO_BUFFER)  
         Exception  e:
            ()
        :
               ():
                stream.stop_stream()
                stream.close()
            wave_file.close()
         filename

 ():
    
    :
        
        segments, info = model.transcribe(filename, beam_size=, language=, vad_filter=, vad_parameters=(min_silence_duration_ms=))
         segment  segments:
            ( % (segment.start, segment.end, segment.text))
     Exception  e:
        ()
    :
        
         os.path.exists(filename):
            os.remove(filename)

 ():
    ()
    
     torch.cuda.is_available():
        device = 
        compute_type =   
        ()
    :
        device = 
        compute_type =   
        ()

    
    model_path = 
    :
        model = WhisperModel(model_path, device=device, compute_type=compute_type, local_files_only=)
        ()
     Exception  e:
        ()
        

     pyaudio.PyAudio()  p:
        :
            default_mic = p.get_default_input_device_info()
            ()
            ()
            ( * )
            ()
             :
                filename = record_audio(p, default_mic)
                thread = threading.Thread(target=whisper_audio, args=(filename, model))
                thread.start()
         OSError:
            ()
         KeyboardInterrupt:
            ()
         Exception  e:
            ()

 __name__ == :
    main()

使用 Faster-Whisper 实现本地实时语音转文本

前言

一、安装环境

二、使用步骤

1. 下载模型

2. 实时录音转文本脚本

更多推荐文章

相关免费在线工具

3. 报错解决方法

总结

更多推荐文章

相关免费在线工具

使用 Faster-Whisper 实现本地实时语音转文本

前言

一、安装环境

二、使用步骤

1. 下载模型

2. 实时录音转文本脚本

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

3. 报错解决方法

总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具