JAVA接入本地TTS模型sherpa-onnx实现离线文本转语音
选择合适的 TTS 大模型
最近博主在负责的一项业务中,需要实现“文字实时转语音”的功能,一开始使用的是 阿里云智能语音合成服务。它的 API 简单易用,接入后很快就能跑通。
然而,随着需求推进,领导提出:在断网环境中也必须正常使用。一些业务现场无法连接外网,而云端 TTS 显然无法满足此要求。
于是博主开始调研可 离线部署的 TTS 大模型。在 AI 小伙伴的协助下,测试了多个可本地运行的模型,包括:
- Piper
- 百度飞桨 PaddleSpeech
- Sherpa-ONNX TTS
最终的选型经历如下:
- Piper:多次尝试仍无法成功运行,兼容问题较多
- PaddleSpeech:模型体积较大,部署复杂,与 Java 生态结合不友好
- Sherpa-ONNX:模型轻量、性能不错、部署简单,尤其是提供完整的 Java API,对 Java 开发者极度友好
综合评估后,Sherpa-ONNX 成为最佳选择。特别是在 Java 项目中,无需额外的 Python 服务,也无需多语言混合部署,直接在 Java 中调用即可实现高质量离线 TTS。
实战教程
官方文档参考:
https://k2-fsa.github.io/sherpa/onnx/java-api/non-android-java.html
要跑通 Sherpa-ONNX 的 Java TTS 功能,需要下载两个核心 JAR 包:
- 纯 Java 实现的 jar(跨平台通用)
- 包含 C++ 底层 JNI 的 jar(按平台区分,如 win-x64、linux-x64等)
博主选用的是 v1.12.10 版本,小伙伴们也可以根据需要选择更新版本。
下载好 JAR 后,就可以参考官网示例代码:
https://github.com/k2-fsa/sherpa-onnx/blob/master/java-api-examples/NonStreamingTtsKokoroZhEn.java
这里需要特别注意一点:
官方示例中缺少一个必要的配置项:
.setDictDir(dictDir)由于模型使用字典文件,因此必须手动补上,否则会出现加载失败的问题。
博主选择的预训练模型是:
kokoro-multi-lang-v1_0
模型介绍与下载地址可参考:
https://k2-fsa.github.io/sherpa/onnx/tts/all/Chinese-English/kokoro-multi-lang-v1_0.html
下载预训练模型后,准备工作就完成了。接下来即可运行示例代码,实现离线文本转语音。
核心代码
项目结构

下面贴出博主基于官方示例改造后的核心代码:
AudioProcessingUtil
packagecom.example.demo;importcom.k2fsa.sherpa.onnx.GeneratedAudio;importcom.k2fsa.sherpa.onnx.OfflineTts;importjavax.sound.sampled.*;importjava.io.File;importjava.io.IOException;importjava.security.MessageDigest;importjava.security.NoSuchAlgorithmException;publicclassAudioProcessingUtil{privatestaticfinalString OUTPUT_DIR ="./tts-output/";static{// 确保输出目录存在File dir =newFile(OUTPUT_DIR);if(!dir.exists()){ dir.mkdirs();}}/** * 生成TTS音频并转换格式 * * @param tts TTS引擎实例 * @param text 要转换的文本 * @param speakerId 说话人ID * @param speed 语速 * @return 转换后的音频文件路径 * @throws Exception 处理过程中可能抛出的异常 */publicstaticStringgenerateAndConvertAudio(OfflineTts tts,String text,int speakerId,float speed)throwsException{// 生成文本的MD5作为文件名String textMd5 =md5(text);// 生成音频long start =System.currentTimeMillis();GeneratedAudio audio = tts.generate(text, speakerId, speed);long stop =System.currentTimeMillis();System.out.printf("-- elapsed : %.3f seconds\n",(stop - start)/1000.0f);// 保存为临时WAV文件String tempWaveFilename = OUTPUT_DIR + textMd5 +".wav"; audio.save(tempWaveFilename);// 转换音频格式为48kHz双声道File convertedFile =convertAudioFormat(newFile(tempWaveFilename), textMd5);// 删除临时文件newFile(tempWaveFilename).delete();if(convertedFile ==null){thrownewRuntimeException("音频转换失败");}return convertedFile.getAbsolutePath();}/** * 转换音频格式为48kHz双声道 * * @param sourceFile 源文件 * @param textMd5 文本MD5值(用于生成输出文件名) * @return 转换后的文件 */privatestaticFileconvertAudioFormat(File sourceFile,String textMd5){AudioInputStream sourceStream =null;AudioInputStream convertedStream =null;File outputFile =null;try{// 读取源音频文件 sourceStream =AudioSystem.getAudioInputStream(sourceFile);AudioFormat sourceFormat = sourceStream.getFormat();System.out.println(String.format("源音频格式 - 采样率: %sHz, 声道数: %s, 位深: %sbit", sourceFormat.getSampleRate(), sourceFormat.getChannels(), sourceFormat.getSampleSizeInBits()));// 定义目标格式: 48000Hz, 双声道(2), 16bit, signed, little-endianAudioFormat targetFormat =newAudioFormat(AudioFormat.Encoding.PCM_SIGNED,48000.0F,// 采样率: 48kHz16,// 位深: 16bit2,// 声道数: 2(立体声)4,// 帧大小: 2声道 * 2字节(16bit) = 4字节48000.0F,// 帧率等于采样率false// little-endian);// 检查是否支持转换if(!AudioSystem.isConversionSupported(targetFormat, sourceFormat)){System.out.println("不支持直接转换,尝试分步转换");// 先转换采样率和声道数AudioFormat intermediateFormat =newAudioFormat(AudioFormat.Encoding.PCM_SIGNED,48000.0F, sourceFormat.getSampleSizeInBits(),2,// 先转为双声道 sourceFormat.getSampleSizeInBits()/8*2,48000.0F, sourceFormat.isBigEndian()); convertedStream =AudioSystem.getAudioInputStream(intermediateFormat, sourceStream);// 再转换其他属性if(!AudioSystem.isConversionSupported(targetFormat, intermediateFormat)){System.err.println("无法转换音频格式,使用原始文件");returnnull;} convertedStream =AudioSystem.getAudioInputStream(targetFormat, convertedStream);}else{// 直接转换 convertedStream =AudioSystem.getAudioInputStream(targetFormat, sourceStream);}// 生成输出文件 outputFile =newFile(OUTPUT_DIR + textMd5 +"_48k.wav");// 写入转换后的音频AudioSystem.write(convertedStream,AudioFileFormat.Type.WAVE, outputFile);System.out.println(String.format("音频转换成功 - 采样率: 48000Hz, 声道数: 2, 文件: %s", outputFile.getName()));return outputFile;}catch(UnsupportedAudioFileException e){System.err.println("不支持的音频文件格式: "+ e.getMessage());returnnull;}catch(IOException e){System.err.println("音频文件读写失败: "+ e.getMessage());returnnull;}catch(Exception e){System.err.println("音频格式转换失败: "+ e.getMessage());returnnull;}finally{// 关闭流try{if(convertedStream !=null) convertedStream.close();if(sourceStream !=null) sourceStream.close();}catch(IOException e){System.err.println("关闭音频流失败: "+ e.getMessage());}}}/** * 计算字符串的MD5值 */privatestaticStringmd5(String input){try{MessageDigest md =MessageDigest.getInstance("MD5");byte[] messageDigest = md.digest(input.getBytes());StringBuilder hexString =newStringBuilder();for(byte b : messageDigest){String hex =Integer.toHexString(0xff& b);if(hex.length()==1){ hexString.append('0');} hexString.append(hex);}return hexString.toString();}catch(NoSuchAlgorithmException e){thrownewRuntimeException(e);}}}DemoApplication
packagecom.example.demo;importorg.springframework.boot.SpringApplication;importorg.springframework.boot.autoconfigure.SpringBootApplication;importorg.springframework.context.annotation.Bean;importorg.springframework.web.servlet.config.annotation.ResourceHandlerRegistry;importorg.springframework.web.servlet.config.annotation.WebMvcConfigurer;@SpringBootApplicationpublicclassDemoApplication{publicstaticvoidmain(String[] args){SpringApplication.run(DemoApplication.class, args);}@BeanpublicWebMvcConfigurerwebMvcConfigurer(){returnnewWebMvcConfigurer(){@OverridepublicvoidaddResourceHandlers(ResourceHandlerRegistry registry){// 确保静态资源能被正确访问 registry.addResourceHandler("/**").addResourceLocations("classpath:/static/");}};}}TtsConfig
packagecom.example.demo;importcom.k2fsa.sherpa.onnx.*;importorg.springframework.context.annotation.Bean;importorg.springframework.context.annotation.Configuration;@ConfigurationpublicclassTtsConfig{@BeanpublicOfflineTtsofflineTts(){String model ="./kokoro-multi-lang-v1_0/model.onnx";String voices ="./kokoro-multi-lang-v1_0/voices.bin";String tokens ="./kokoro-multi-lang-v1_0/tokens.txt";String dataDir ="./kokoro-multi-lang-v1_0/espeak-ng-data";String dictDir ="./kokoro-multi-lang-v1_0/dict";String lexicon ="./kokoro-multi-lang-v1_0/lexicon-us-en.txt,./kokoro-multi-lang-v1_0/lexicon-zh.txt";OfflineTtsKokoroModelConfig kokoroModelConfig =OfflineTtsKokoroModelConfig.builder().setModel(model).setVoices(voices).setTokens(tokens).setDataDir(dataDir).setLexicon(lexicon).setDictDir(dictDir).build();OfflineTtsModelConfig modelConfig =OfflineTtsModelConfig.builder().setKokoro(kokoroModelConfig).setNumThreads(2).setDebug(true).build();OfflineTtsConfig config =OfflineTtsConfig.builder().setModel(modelConfig).build();returnnewOfflineTts(config);}}TtsController
packagecom.example.demo;importcom.k2fsa.sherpa.onnx.*;importorg.springframework.beans.factory.annotation.Autowired;importorg.springframework.http.ResponseEntity;importorg.springframework.web.bind.annotation.*;@RestController@RequestMapping("/api/tts")publicclassTtsController{@AutowiredprivateOfflineTts tts;@PostMapping("/generate")publicResponseEntity<TtsResponse>generateTts(@RequestBodyTtsRequest request){try{String text = request.getText();int speakerId = request.getSpeakerId()!=null? request.getSpeakerId():47;float speed = request.getSpeed()!=null? request.getSpeed():1.0f;// 使用工具类处理TTS生成和转换String filePath =AudioProcessingUtil.generateAndConvertAudio(tts, text, speakerId, speed);returnResponseEntity.ok(newTtsResponse(filePath,"success"));}catch(Exception e){ e.printStackTrace();returnResponseEntity.internalServerError().body(newTtsResponse(null,"处理失败: "+ e.getMessage()));}}}TtsRequest
packagecom.example.demo;publicclassTtsRequest{privateString text;privateInteger speakerId;privateFloat speed;// Getters and setterspublicStringgetText(){return text;}publicvoidsetText(String text){this.text = text;}publicIntegergetSpeakerId(){return speakerId;}publicvoidsetSpeakerId(Integer speakerId){this.speakerId = speakerId;}publicFloatgetSpeed(){return speed;}publicvoidsetSpeed(Float speed){this.speed = speed;}}TtsResponse
packagecom.example.demo;publicclassTtsResponse{privateString filePath;privateString message;publicTtsResponse(String filePath,String message){this.filePath = filePath;this.message = message;}// Getters and setterspublicStringgetFilePath(){return filePath;}publicvoidsetFilePath(String filePath){this.filePath = filePath;}publicStringgetMessage(){return message;}publicvoidsetMessage(String message){this.message = message;}}index.html
<!DOCTYPEhtml><html><head><metacharset="UTF-8"><title>TTS服务测试</title><style>body{font-family: Arial, sans-serif;max-width: 800px;margin: 0 auto;padding: 20px;}.form-group{margin-bottom: 15px;}label{display: block;margin-bottom: 5px;font-weight: bold;}input, textarea, button{width: 100%;padding: 8px;box-sizing: border-box;}textarea{height: 100px;resize: vertical;}button{background-color: #007bff;color: white;border: none;cursor: pointer;font-size: 16px;}button:hover{background-color: #0056b3;}#result{margin-top: 20px;padding: 10px;background-color: #f8f9fa;border: 1px solid #dee2e6;}.audio-container{margin-top: 10px;}</style></head><body><h1>TTS文本转语音服务</h1><divclass="form-group"><labelfor="text">输入文本:</label><textareaid="text"placeholder="请输入要转换为语音的文本">运维人员正在操作,请别操作电脑</textarea></div><divclass="form-group"><labelfor="speakerId">说话人ID (0-52):</label><inputtype="number"id="speakerId"min="0"max="52"value="47"></div><divclass="form-group"><labelfor="speed">语速 (0.5-2.0):</label><inputtype="number"id="speed"min="0.5"max="2.0"step="0.1"value="1.0"></div><buttononclick="generateTts()">生成语音</button><divid="result"></div><script>asyncfunctiongenerateTts(){const text = document.getElementById('text').value;const speakerId =parseInt(document.getElementById('speakerId').value);const speed =parseFloat(document.getElementById('speed').value);if(!text){alert('请输入文本');return;}const requestData ={ text: text, speakerId: speakerId, speed: speed };const resultDiv = document.getElementById('result'); resultDiv.innerHTML ='<p>正在生成语音...</p>';try{const response =awaitfetch('/api/tts/generate',{ method:'POST', headers:{'Content-Type':'application/json'}, body:JSON.stringify(requestData)});const data =await response.json();if(response.ok && data.filePath){// 更新结果区域 resultDiv.innerHTML =` <p>语音生成成功!</p> <p>文件路径: ${data.filePath}</p> <div> <audio controls> <source src="file://${data.filePath}" type="audio/wav"> 您的浏览器不支持音频播放。 </audio> </div> <p><a href="file://${data.filePath}" download>下载音频文件</a></p> `;}else{ resultDiv.innerHTML =`<p>语音生成失败: ${data.message ||'未知错误'}</p>`;}}catch(error){ console.error('Error:', error); resultDiv.innerHTML ='<p>发生错误,请查看控制台。</p>';}}</script></body></html>pom文件
<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>4.0.0</version><relativePath/><!-- lookup parent from repository --></parent><groupId>com.weixin.wt</groupId><artifactId>demo</artifactId><version>0.0.1-SNAPSHOT</version><name>demo</name><description>demo</description><url/><licenses><license/></licenses><developers><developer/></developers><scm><connection/><developerConnection/><tag/><url/></scm><properties><java.version>17</java.version></properties><dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter</artifactId></dependency><!-- 添加WebStarter依赖 --><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><optional>true</optional></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-test</artifactId><scope>test</scope></dependency><!-- 添加 Sherpa-onnx 核心依赖 --><dependency><groupId>com.k2fsa.sherpa.onnx</groupId><artifactId>sherpa-onnx</artifactId><version>1.12.10</version><scope>system</scope><systemPath>${project.basedir}/lib/sherpa-onnx-v1.12.10.jar</systemPath></dependency><!--Windows平台依赖 --><dependency><groupId>com.k2fsa.sherpa.onnx</groupId><artifactId>sherpa-onnx-native-lib-win</artifactId><version>1.12.10</version><scope>system</scope><systemPath>${project.basedir}/lib/sherpa-onnx-native-lib-win-x64-v1.12.10.jar</systemPath></dependency><!--Linux平台依赖 --><dependency><groupId>com.k2fsa.sherpa.onnx</groupId><artifactId>sherpa-onnx-native-lib-linux</artifactId><version>1.12.10</version><scope>system</scope><systemPath>${project.basedir}/lib/sherpa-onnx-native-lib-linux-x64-v1.12.10.jar</systemPath></dependency></dependencies><build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><configuration><annotationProcessorPaths><path><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId></path></annotationProcessorPaths></configuration></plugin><plugin><groupId>org.springframework.boot</groupId><artifactId>spring-boot-maven-plugin</artifactId><configuration><excludes><exclude><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId></exclude></excludes><!-- 包含系统范围的依赖 --><includeSystemScope>true</includeSystemScope></configuration></plugin></plugins></build></project>效果图


至此,离线TTS模型部署成功。