SpringBoot 整合 Weka 机器学习框架实战

一、前言

在微服务开发中，经常会遇到需要进行推荐的场景，比如在电商场景中，需要基于用户的过往购买历史，选品喜欢，轨迹等指标信息进行选品推荐，再比如在社交应用中，需要根据用户的操作历史、个人标签等推荐一些功能从而展示给用户。尽管这样的推荐操作可以结合一些大数据、机器学习等推荐算法来做，但是对一些小型的系统来说，考虑到对接成本、学习成本和维护成本等因素，本篇将详细介绍下在微服务应用中如何对接第三方的机器学习框架，从而实现特定场景下的业务推荐。

二、Java 对接使用的机器学习技术方案

接下来介绍几种适合 Java 开发语言模式下的机器学习框架技术方案。

2.1 Weka 介绍

Weka（Waikato Environment for Knowledge Analysis）是一款由新西兰怀卡托大学开发的开源机器学习与数据挖掘软件，以其易用性、丰富的算法库和可视化界面闻名。它支持从数据预处理到模型评估的全流程，适合教学、研究和快速原型开发。官网：https://ml.cms.waikato.ac.nz/weka/

Weka，是一款免费、非商业化、基于 Java 的开源的机器学习与数据挖掘软件，并提供了 maven 依赖，拥有丰富的 Java API。

文章配图

Github 地址：Weka Wiki

文章配图

2.1.1 Weka 主要特点

Weka 具备如下特点：

开源免费：基于 GNU 通用公共许可证，用户可自由使用、修改和分发。
跨平台支持：基于 Java 开发，可在 Windows、macOS、Linux 等系统运行。
图形用户界面（GUI）：提供直观的可视化操作，无需编程即可完成数据挖掘任务。
命令行接口：支持通过脚本自动化处理大规模数据或批量任务。
丰富的算法库：集成数百种机器学习算法，涵盖分类、回归、聚类、关联规则挖掘等。

2.1.2 Weka 核心功能模块

Weka 通过不同界面模块实现数据挖掘流程：

Explorer：主界面，包含以下子面板：
- Preprocess：数据清洗、特征选择、离散化、标准化等。
- Classify：分类算法（如决策树、SVM、神经网络）训练与评估。
- Cluster：聚类分析（如 K-Means、DBSCAN）。
- Associate：关联规则挖掘（如 Apriori 算法）。
- Select attributes：特征选择与降维。
- Visualize：数据可视化（散点图、箱线图等）。
Experimenter：设计对比实验，评估不同算法性能。
Knowledge Flow：通过拖拽组件构建数据处理流程图，支持复杂工作流。
Simple CLI：命令行交互，适合快速测试或脚本调用。

2.1.3 Weka 支持的算法类型

Weka 支持如下算法类型

分类：决策树（J48/C4.5）、随机森林、朴素贝叶斯、SVM、逻辑回归、神经网络等。
回归：线性回归、M5 规则树、支持向量回归（SVR）。
聚类：K-Means、EM、DBSCAN、层次聚类。

<properties> <maven.compiler.source>17</maven.compiler.source> <maven.compiler.target>17</maven.compiler.target> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <satoken.version>1.37.0</satoken.version> <spring-ai.version>1.0.0-M6</spring-ai.version> <spring-ai-alibaba.version>1.0.0-M6.1</spring-ai-alibaba.version> </properties> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>3.2.2</version> <relativePath/> </parent> <dependencies>  <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-csv</artifactId> <version>1.9.0</version> </dependency> <dependency> <groupId>nz.ac.waikato.cms.weka</groupId> <artifactId>weka-stable</artifactId> <version>3.8.6</version> </dependency> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.83</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-aop</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency>  <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.28</version> </dependency> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> </dependency> <dependency> <groupId>com.alibaba.cloud.ai</groupId> <artifactId>spring-ai-alibaba-starter</artifactId> <version>${spring-ai-alibaba.version}</version> </dependency> </dependencies> <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-bom</artifactId> <version>${spring-ai.version}</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <repositories> <repository> <name>Central Portal Snapshots</name> <id>central-portal-snapshots</id> <url>https://central.sonatype.com/repository/maven-snapshots/</url> <releases> <enabled>false</enabled> </releases> <snapshots> <enabled>true</enabled> </snapshots> </repository> <repository> <id>spring-milestones</id> <name>Spring Milestones</name> <url>https://repo.spring.io/milestone</url> <snapshots> <enabled>false</enabled> </snapshots> </repository> <repository> <id>spring-snapshots</id> <name>Spring Snapshots</name> <url>https://repo.spring.io/snapshot</url> <releases> <enabled>false</enabled> </releases> </repository> </repositories>

package com.congge.command.v3; import weka.classifiers.trees.J48; import weka.core.Instances; import weka.core.converters.ConverterUtils.DataSource; /** * 基于本地的训练文件样本数据进行预测 */ public class ShelfRecommendationSystemV3 { private J48 decisionTree; private Instances dataStructure; /** * 加载训练数据并构建模型 */ public void trainModel(String csvFilePath) throws Exception { // 1. 加载 CSV 数据 DataSource source = new DataSource(csvFilePath); Instances data = source.getDataSet(); // 2. 设置类别属性（货架类型） if (data.classIndex() == -1) { data.setClassIndex(data.numAttributes() - 1); } // 3. 保存数据结构用于后续预测 this.dataStructure = new Instances(data, 0); // 4. 构建决策树模型 decisionTree = new J48(); decisionTree.buildClassifier(data); System.out.println("模型训练完成！"); } /** * 预测新货物的推荐货架类型 */ public String predictShelfType(double length, double width, double height, double weight, boolean fragile) throws Exception { if (decisionTree == null || dataStructure == null) { throw new IllegalStateException("模型尚未训练，请先调用 trainModel 方法"); } // 创建新实例 weka.core.Instance instance = new weka.core.DenseInstance(6); instance.setDataset(dataStructure); // 设置属性值 instance.setValue(0, length); instance.setValue(1, width); instance.setValue(2, height); instance.setValue(3, weight); instance.setValue(4, fragile ? "true" : "false"); // 进行预测 double prediction = decisionTree.classifyInstance(instance); return dataStructure.classAttribute().value((int) prediction); } /** * 评估模型准确率（使用训练数据简单评估） */ public void evaluateModel(String csvFilePath) throws Exception { DataSource source = new DataSource(csvFilePath); Instances data = source.getDataSet(); if (data.classIndex() == -1) { data.setClassIndex(data.numAttributes() - 1); } // 简单评估（实际应用中应该使用交叉验证） weka.classifiers.Evaluation eval = new weka.classifiers.Evaluation(data); eval.evaluateModel(decisionTree, data); System.out.println("\n模型评估结果:"); System.out.println(eval.toSummaryString()); System.out.println(eval.toClassDetailsString()); System.out.println(eval.toMatrixString()); } public static void main(String[] args) { ShelfRecommendationSystemV3 system = new ShelfRecommendationSystemV3(); try { // 1. 训练模型 String csvPath = "data/shelf_data.csv"; system.trainModel(csvPath); // 2. 评估模型（可选） system.evaluateModel(csvPath); // 3. 测试预测 System.out.println("\n测试预测:"); // 测试用例 1: 小而轻的货物 String recommendation1 = system.predictShelfType(30, 20, 10, 1, false); System.out.printf("货物 (30x20x10cm, 1kg, 非易碎) 推荐货架：%s%n", recommendation1); // 测试用例 2: 大而重的货物 String recommendation2 = system.predictShelfType(200, 100, 180, 200, false); System.out.printf("货物 (200x100x180cm, 200kg, 非易碎) 推荐货架：%s%n", recommendation2); // 测试用例 3: 易碎的中等大小货物 String recommendation3 = system.predictShelfType(60, 40, 30, 5, true); System.out.printf("货物 (60x40x30cm, 5kg, 易碎) 推荐货架：%s%n", recommendation3); } catch (Exception e) { e.printStackTrace(); } } }

CREATE TABLE product_shelf_mapping ( id INT AUTO_INCREMENT PRIMARY KEY, product_name VARCHAR(100), length DOUBLE NOT NULL COMMENT '长度 (cm)', width DOUBLE NOT NULL COMMENT '宽度 (cm)', height DOUBLE NOT NULL COMMENT '高度 (cm)', weight DOUBLE NOT NULL COMMENT '重量 (kg)', shelf_type ENUM('small_shelf', 'high_shelf', 'floor_stack') NOT NULL COMMENT '货架类型', last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4; -- 清空表（测试时使用，生产环境请移除） TRUNCATE TABLE product_shelf_mapping; -- 插入测试数据 INSERT INTO product_shelf_mapping (product_name, length, width, height, weight, shelf_type) VALUES -- 小货架（体积小、重量轻） ('小型电子元件', 15, 10, 5, 0.2, 'small_shelf'), ('办公文具套装', 30, 20, 8, 0.5, 'small_shelf'), ('手机配件盒', 25, 15, 5, 0.3, 'small_shelf'), ('化妆品样品', 12, 8, 6, 0.1, 'small_shelf'), ('LED 灯泡', 10, 10, 15, 0.15, 'small_shelf'), -- 高位货架（中等体积、中等重量，可堆叠） ('书籍礼盒', 40, 30, 20, 2.5, 'high_shelf'), ('厨房调料套装', 35, 25, 15, 1.8, 'high_shelf'), ('玩具模型箱', 50, 40, 30, 3.0, 'high_shelf'), ('家庭清洁用品', 45, 35, 25, 2.2, 'high_shelf'), ('宠物食品袋', 55, 40, 20, 4.0, 'high_shelf'), -- 地堆货架（超长、超重或异形货物） ('大型家具部件', 200, 80, 30, 25.0, 'floor_stack'), ('健身器材包装', 180, 60, 40, 30.0, 'floor_stack'), ('工业设备箱', 150, 100, 50, 50.0, 'floor_stack'), ('自行车包装箱', 160, 70, 20, 18.0, 'floor_stack'), ('长条形管材', 300, 10, 10, 12.0, 'floor_stack'), -- 边界值测试（接近分类阈值的货物） ('中型电器箱', 60, 50, 40, 8.0, 'high_shelf'), -- 接近地堆但重量稍轻 ('超重小物件', 20, 20, 10, 15.0, 'floor_stack'), -- 重量大但体积小 ('细长物品', 150, 5, 5, 3.0, 'floor_stack'), -- 超长但重量轻 ('扁平大件', 120, 90, 2, 5.0, 'floor_stack'); -- 面积大但高度低

package com.congge.command.v2; import lombok.Data; import weka.classifiers.trees.J48; import weka.core.*; import java.sql.*; import java.util.ArrayList; import java.util.List; import java.util.Random; /** * 基于数据库的样本数据进行预测 */ public class ShelfRecommendationSystem { // MySQL 数据库配置 private static final String DB_URL = "jdbc:mysql://<host>:3306/<database>"; private static final String DB_USER = "<username>"; private static final String DB_PASSWORD = "<password>"; public static void main(String[] args) { try { // 1. 从 MySQL 加载训练数据 List<Product> trainingData = loadTrainingDataFromMySQL(); // 2. 转换为 Weka Instances 格式 Instances instances = createInstances(trainingData); // 3. 训练决策树模型 J48 tree = trainDecisionTree(instances); // 4. 测试模型 testModel(tree, instances); // 5. 示例：对新产品进行货架推荐 Product newProduct = new Product(); newProduct.setLength(120); newProduct.setWidth(80); newProduct.setHeight(30); newProduct.setWeight(25); String recommendedShelf = recommendShelf(tree, newProduct); System.out.println("\n推荐货架类型：" + recommendedShelf); } catch (Exception e) { e.printStackTrace(); } } /** * 从 MySQL 数据库加载训练数据 */ private static List<Product> loadTrainingDataFromMySQL() throws SQLException { List<Product> products = new ArrayList<>(); try (Connection conn = DriverManager.getConnection(DB_URL, DB_USER, DB_PASSWORD)) { String query = "SELECT length, width, height, weight, shelf_type FROM product_shelf_mapping"; try (Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery(query)) { while (rs.next()) { Product product = new Product(); product.setLength(rs.getDouble("length")); product.setWidth(rs.getDouble("width")); product.setHeight(rs.getDouble("height")); product.setWeight(rs.getDouble("weight")); product.setShelfType(rs.getString("shelf_type")); products.add(product); } } } System.out.println("从 MySQL 加载了 " + products.size() + " 条训练数据"); return products; } /** * 创建 Weka Instances 对象 */ private static Instances createInstances(List<Product> products) { // 定义属性 ArrayList<Attribute> attributes = new ArrayList<>(); // 数值型属性：长、宽、高、重 attributes.add(new Attribute("length")); attributes.add(new Attribute("width")); attributes.add(new Attribute("height")); attributes.add(new Attribute("weight")); // nominal 属性：货架类型 ArrayList<String> shelfTypes = new ArrayList<>(); shelfTypes.add("small_shelf"); shelfTypes.add("high_shelf"); shelfTypes.add("floor_stack"); attributes.add(new Attribute("shelf_type", shelfTypes)); // 创建 Instances 对象 Instances instances = new Instances("ProductShelfRelation", attributes, 0); instances.setClassIndex(instances.numAttributes() - 1); // 设置分类属性 // 添加数据 for (Product product : products) { double[] values = new double[instances.numAttributes()]; values[0] = product.getLength(); values[1] = product.getWidth(); values[2] = product.getHeight(); values[3] = product.getWeight(); // 设置分类属性值 switch (product.getShelfType()) { case "small_shelf": values[4] = 0; break; case "high_shelf": values[4] = 1; break; case "floor_stack": values[4] = 2; break; default: values[4] = Utils.missingValue(); } instances.add(new DenseInstance(1.0, values)); } return instances; } /** * 训练决策树模型 */ private static J48 trainDecisionTree(Instances instances) throws Exception { // 设置分类器选项 String[] options = new String[1]; options[0] = "-U"; // 使用未修剪的树 J48 tree = new J48(); tree.setOptions(options); // 训练模型 tree.buildClassifier(instances); // 输出模型 System.out.println("\n训练好的决策树模型:"); System.out.println(tree); return tree; } /** * 测试模型准确率 */ private static void testModel(J48 tree, Instances instances) throws Exception { // 随机划分训练集和测试集 (70% 训练，30% 测试) instances.randomize(new Random(0)); int trainSize = (int) Math.round(instances.numInstances() * 0.7); int testSize = instances.numInstances() - trainSize; Instances train = new Instances(instances, 0, trainSize); Instances test = new Instances(instances, trainSize, testSize); // 重新训练模型 tree.buildClassifier(train); // 测试模型 int correct = 0; for (int i = 0; i < test.numInstances(); i++) { Instance inst = test.instance(i); double predicted = tree.classifyInstance(inst); double actual = inst.classValue(); if (predicted == actual) { correct++; } } double accuracy = (double) correct / test.numInstances() * 100; System.out.printf("\n模型准确率：%.2f%% (%d/%d)\n", accuracy, correct, test.numInstances()); } /** * 为新产品推荐货架类型 */ public static String recommendShelf(J48 tree, Product product) throws Exception { // 创建属性列表 (必须与训练数据相同) ArrayList<Attribute> attributes = new ArrayList<>(); attributes.add(new Attribute("length")); attributes.add(new Attribute("width")); attributes.add(new Attribute("height")); attributes.add(new Attribute("weight")); ArrayList<String> shelfTypes = new ArrayList<>(); shelfTypes.add("small_shelf"); shelfTypes.add("high_shelf"); shelfTypes.add("floor_stack"); attributes.add(new Attribute("shelf_type", shelfTypes)); // 创建 Instances 对象 Instances instances = new Instances("Temp", attributes, 0); instances.setClassIndex(instances.numAttributes() - 1); // 创建新产品实例 double[] values = new double[instances.numAttributes()]; values[0] = product.getLength(); values[1] = product.getWidth(); values[2] = product.getHeight(); values[3] = product.getWeight(); //values[4] = Instance.missingValue(); // 分类属性设为缺失值 values[4] = Utils.missingValue(); instances.add(new DenseInstance(1.0, values)); // 进行预测 double prediction = tree.classifyInstance(instances.instance(0)); // 返回预测结果 return shelfTypes.get((int) prediction); } } @Data class Product { private double length; private double width; private double height; private double weight; private String shelfType; }

package com.congge.command.v5; import org.springframework.ai.chat.client.ChatClient; import org.springframework.ai.chat.messages.UserMessage; import org.springframework.ai.chat.prompt.Prompt; import org.springframework.web.bind.annotation.PostMapping; import org.springframework.web.bind.annotation.RequestBody; import org.springframework.web.bind.annotation.RequestMapping; import org.springframework.web.bind.annotation.RestController; @RestController @RequestMapping("/client/ai") public class ChatController { private final ChatClient chatClient; public ChatController(ChatClient.Builder chatClientBuilder) { this.chatClient = chatClientBuilder.build(); } // localhost:8082/client/ai/chat?message=今天北京的天气如何 // localhost:8082/client/ai/chat?message=你是谁 // localhost:8082/client/ai/chat?message=当前时间是多少 @PostMapping("/chat") public String chat(@RequestBody Product product){ Prompt prompt = new Prompt(new UserMessage(buildPrompt(product))); String content = chatClient.prompt(prompt).call().content(); return content; } private static String buildPrompt(Product product) { return String.format(""" 你是一名仓库管理专家，需要根据商品尺寸和重量推荐合适的货架类型。货架类型定义： 1. small_shelf: 体积小于 5000cm³且重量小于 1kg 的商品 2. high_shelf: 体积 5000-60000cm³且重量 1-10kg 的商品 3. floor_stack: 体积大于 60000cm³或重量大于 10kg 或形状特殊的商品（如超长、超宽）商品信息：名称：%s 尺寸：%.1fcm x %.1fcm x %.1fcm 重量：%.2fkg 体积：%.1fcm³ 密度：%.2fg/cm³ 请根据以上信息，只返回推荐货架类型，格式为：推荐货架类型：[类型] """, product.getProductName(), product.getLength(), product.getWidth(), product.getHeight(), product.getWeight(), product.getVolume(), product.getDensity()); } }

SpringBoot 整合 Weka 机器学习框架实战

一、前言

二、Java 对接使用的机器学习技术方案

2.1 Weka 介绍

2.1.1 Weka 主要特点

2.1.2 Weka 核心功能模块

2.1.3 Weka 支持的算法类型

更多推荐文章

相关免费在线工具

2.1.4 Weka 应用场景

2.2 Apache Spark MLlib 介绍

2.2.1 核心模块介绍

2.2.2 核心功能介绍

2.2.3 技术优势

2.2.4 适用场景

2.3 EasyRec 介绍

2.3.1 技术亮点

2.3.2 技术核心模块架构

2.3.3 技术优势

2.3.4 适用场景

三、基于 Weka 实现货品推荐上架操作案例

3.1 案例需求

3.2 前置准备

3.2.1 数据准备

3.2.2 导入核心依赖

3.3 基于本地样本数据实现方案

3.3.1 导入核心依赖

3.3.2 添加本地样本数据

3.3.3 完整的实现代码

3.4 基于数据库的样本数据实现方案

3.4.1 添加一张测试样本表

3.4.2 导入依赖

3.4.3 完整的实现代码

3.5 基于 AI 大模型实现方案

3.5.1 添加配置信息

3.5.2 添加测试接口

3.5.3 效果测试

四、写在文末

更多推荐文章

相关免费在线工具

SpringBoot 整合 Weka 机器学习框架实战

一、前言

二、Java 对接使用的机器学习技术方案

2.1 Weka 介绍

2.1.1 Weka 主要特点

2.1.2 Weka 核心功能模块

2.1.3 Weka 支持的算法类型

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

2.1.4 Weka 应用场景

2.2 Apache Spark MLlib 介绍

2.2.1 核心模块介绍

2.2.2 核心功能介绍

2.2.3 技术优势

2.2.4 适用场景

2.3 EasyRec 介绍

2.3.1 技术亮点

2.3.2 技术核心模块架构

2.3.3 技术优势

2.3.4 适用场景

三、基于 Weka 实现货品推荐上架操作案例

3.1 案例需求

3.2 前置准备

3.2.1 数据准备

3.2.2 导入核心依赖

3.3 基于本地样本数据实现方案

3.3.1 导入核心依赖

3.3.2 添加本地样本数据

3.3.3 完整的实现代码

3.4 基于数据库的样本数据实现方案

3.4.1 添加一张测试样本表

3.4.2 导入依赖

3.4.3 完整的实现代码

3.5 基于 AI 大模型实现方案

3.5.1 添加配置信息

3.5.2 添加测试接口

3.5.3 效果测试

四、写在文末

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具