Ubuntu 24.04 下使用 Docker Compose 本地部署 Whisper 服务

项目背景

Whisper 是 OpenAI 于 2022 年开源的自动语音识别系统，其核心优势在于极强的鲁棒性。即便面对口音、背景噪音或专业术语等复杂场景，它也能保持较高的识别准确率，在英语语音识别上已接近人类水平。

技术原理与模型选择

Whisper 的强大能力源于其独特的设计：

端到端 Transformer 架构：输入音频被分割成 30 秒片段并转换为对数梅尔频谱图，由编码器提取特征，解码器预测文本。
大规模多任务训练：模型在高达 68 万小时的多语言数据上训练，支持近百种语言及转录、翻译、时间戳生成等任务。
统一格式：通过特殊标记，单个模型即可替代传统流程中的多个阶段。

根据资源与精度需求，可选模型规格如下：

模型名称	参数量	磁盘空间	适用场景
tiny	约 39 M	~75 MB	快速演示，资源极度敏感
base	约 74 M	~140 MB	平衡速度与基本准确率
small	约 244 M	~480 MB	良好准确率与速度的折中
medium	约 769 M	~1.5 GB	追求较高准确率
large	约 1550 M	~3 GB	最高准确率，支持所有任务

部署方案

我们提供两种服务形态：基于 FastAPI 的 RESTful 接口和基于 Gradio 的 Web 界面。两者均通过 Docker Compose 编排，方便在 Ubuntu 24.04 环境下一键启动。

1. FastAPI 服务实现

FastAPI 适合集成到现有业务系统中。代码中加入了音频预处理逻辑（重采样、降噪），以提升识别效果。

# whisper_fastapi.py
from fastapi import FastAPI, File, UploadFile, HTTPException
from fastapi.responses import JSONResponse
import whisper
import tempfile
import os
import numpy as np
from scipy import signal
 librosa
 uvicorn
 soundfile  sf

app = FastAPI(
    title=,
    description=,
    version=
)

model = 

 ():
     model
     model  :
        model = whisper.load_model()
     model

 ():
    
    :
        y, sr = librosa.load(audio_path, sr=)
        b, a = signal.butter(, , , fs=sr)
        y = signal.filtfilt(b, a, y)
        y = y / np.(np.(y))
        temp_path = tempfile.mktemp(suffix=)
        sf.write(temp_path, y, sr)
         temp_path
     Exception  e:
        ()
         audio_path


  ():
    ()
    load_whisper_model()
    ()


  ():
    valid_extensions = {, , , , , , }
    file_extension = os.path.splitext(file.filename)[].lower()
     file_extension   valid_extensions:
         HTTPException(status_code=, detail=)

    temp_path = 
    processed_audio = 
    :
         tempfile.NamedTemporaryFile(delete=, suffix=file_extension)  temp_file:
            content =  file.read()
            temp_file.write(content)
            temp_path = temp_file.name

        processed_audio = preprocess_audio(temp_path)
        model = load_whisper_model()

        result = model.transcribe(
            processed_audio,
            language=,
            task=,
            beam_size=,
            best_of=,
            temperature=,
            patience=,
            suppress_tokens=[-]
        )
         JSONResponse(content={: , : result[], : result.get(, ), : file.filename})
     Exception  e:
         HTTPException(status_code=, detail=)
    :
         temp_path  os.path.exists(temp_path): os.unlink(temp_path)
         processed_audio  processed_audio != temp_path  os.path.exists(processed_audio): os.unlink(processed_audio)


  ():
     JSONResponse(content={: , : model   })

 __name__ == :
    uvicorn.run(app, host=, port=)

# whisper_gradio.py import gradio as gr import whisper import tempfile import os import numpy as np from scipy import signal import librosa model = None def load_whisper_model(): global model if model is None: model = whisper.load_model("medium") return model def preprocess_audio(audio_path): try: y, sr = librosa.load(audio_path, sr=16000) b, a = signal.butter(4, 100, 'highpass', fs=sr) y = signal.filtfilt(b, a, y) y = y / np.max(np.abs(y)) temp_path = tempfile.mktemp(suffix='.wav') librosa.output.write_wav(temp_path, y, sr) return temp_path except Exception as e: print(f"音频预处理失败：{str(e)}") return audio_path def transcribe_audio(audio_file): if audio_file is None: return "错误：请上传一个音频文件。" try: model = load_whisper_model() processed_audio = preprocess_audio(audio_file) result = model.transcribe( processed_audio, language="zh", task="transcribe", beam_size=5, best_of=5, temperature=0.0, patience=1.0, suppress_tokens=[-1] ) if processed_audio != audio_file: try: os.unlink(processed_audio) except: pass return result["text"] except Exception as e: return f"转录过程中出现错误：{str(e)}" with gr.Blocks(title="Whisper 音频转录") as demo: gr.Markdown("# 🎤 Whisper 音频转录") gr.Markdown("上传 MP3、WAV、OGG 等音频文件，使用优化的参数将其转换为文本") with gr.Row(): with gr.Column(): audio_input = gr.Audio(sources=["upload"], type="filepath", label="上传音频文件", interactive=True) submit_btn = gr.Button("开始转录", variant="primary") with gr.Column(): text_output = gr.Textbox(label="转录结果", placeholder="转录文本将显示在这里...", lines=10, max_lines=15) submit_btn.click(fn=transcribe_audio, inputs=audio_input, outputs=text_output) gr.Markdown("""### 使用说明 1. 点击'上传音频文件'或拖放文件 2. 支持格式：MP3, WAV, OGG, M4A, FLAC 等 3. 首次使用需下载模型，请耐心等待 """) if __name__ == "__main__": demo.launch(server_name="0.0.0.0", server_port=7862, share=False)

Ubuntu 24.04 下使用 Docker Compose 本地部署 Whisper 服务

项目背景

技术原理与模型选择

部署方案

1. FastAPI 服务实现

更多推荐文章

相关免费在线工具

2. Gradio 界面实现

Docker 配置

基础镜像与依赖

依赖清单 (requirements.txt)

编排文件 (docker-compose.yml)

实测与优化建议

更多推荐文章

相关免费在线工具

Ubuntu 24.04 下使用 Docker Compose 本地部署 Whisper 服务

项目背景

技术原理与模型选择

部署方案

1. FastAPI 服务实现

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

2. Gradio 界面实现

Docker 配置

基础镜像与依赖

依赖清单 (requirements.txt)

编排文件 (docker-compose.yml)

实测与优化建议

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具