跳到主要内容SeaTunnel 2.3.11 + Web 1.0.3 Docker 部署与 Kafka 同步 Hive/ES 实战 | 极客日志Javajava
SeaTunnel 2.3.11 + Web 1.0.3 Docker 部署与 Kafka 同步 Hive/ES 实战
演示 SeaTunnel 2.3.11 与 Web 1.0.3 的 Docker 集群部署方案,涵盖 Kafka 数据源配置、Hive 元数据库连接及 Elasticsearch 集成。通过 docker-compose 编排服务,实现从 Kafka 到 Hive 和 ES 的数据同步任务创建与执行,并包含常见启动报错排查与依赖包安装细节。
萤火微光2 浏览 SeaTunnel 2.3.11 + Web 1.0.3 Docker 部署与 Kafka 同步 Hive/ES 实战
本文将基于 SeaTunnel 2.3.11 和 Web 1.0.3 版本,演示如何通过 Docker 快速搭建集群环境,并完成从 Kafka 到 Hive 及 Elasticsearch 的数据同步流程。重点涵盖依赖包管理、配置文件调整以及常见启动报错的排查。
环境准备
目录结构规划
建议按以下结构组织项目文件,便于维护依赖和配置:
seatunnel-docker/
├── docker-compose.yml # 编排文件
├── hive/ # Hive 相关配置
│ ├── hive-site.xml
│ └── lib/ # 依赖 jar 包
├── init-sql/ # MySQL 初始化脚本
├── seatunnel/ # SeaTunnel 服务端
│ ├── Dockerfile
│ └── apache-seatunnel-2.3.11/
└── seatunnel-web/ # SeaTunnel Web
├── Dockerfile
└── apache-seatunnel-web-1.0.3-bin/
下载核心组件
获取 SeaTunnel 二进制包和源码构建 Web 端:
wget https://dlcdn.apache.org/seatunnel/2.3.11/apache-seatunnel-2.3.11-bin.tar.gz
git clone https://github.com/apache/seatunnel-web.git
cd seatunnel-web
sh build.sh
安装必要依赖
Hive Metastore 容器需要 PostgreSQL JDBC 驱动,且同步任务可能缺少部分 Hive 依赖。请手动下载以下 jar 包放入对应目录:
wget https://jdbc.postgresql.org/download/postgresql-42.5.1.jar
wget https://repo1.maven.org/maven2/org/apache/hive/hive-exec/3.1.3/hive-exec-3.1.3.jar
wget https://repo1.maven.org/maven2/org/apache/hive/hive-metastore/3.1.3/hive-metastore-3.1.3.jar
wget https://repo.maven.apache.org/maven2/org/apache/thrift/libfb303/0.9.3/libfb303-0.9.3.jar
Docker 部署配置
docker-compose.yml 编排
此文件定义了所有服务网络及依赖关系。注意 extra_hosts 和 volumes 的配置,这能解决容器间域名解析和数据持久化问题。
version:
'3.9'
networks:
seatunnel-network:
driver:
bridge
ipam:
config:
-
subnet:
172.16
.0
.0
/24
services:
hive-metastore-db:
image:
postgres:15
container_name:
hive-metastore-db
environment:
POSTGRES_DB:
metastore_db
POSTGRES_USER:
hive
POSTGRES_PASSWORD:
hive123456
ports:
-
"5432:5432"
volumes:
-
./hive-metastore-db-data:/var/lib/postgresql/data
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.2
hive-metastore:
image:
apache/hive:4.0.0
container_name:
hive-metastore
depends_on:
hive-metastore-db:
condition:
service_healthy
environment:
SERVICE_NAME:
metastore
DB_DRIVER:
postgres
SERVICE_OPTS:
>
--Djavax.jdo.option.ConnectionDriverName=org.postgresql.Driver
-Djavax.jdo.option.ConnectionURL=jdbc:postgresql://hive-metastore-db:5432/metastore_db
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=hive123456
ports:
-
"9083:9083"
volumes:
-
./hive/lib/postgresql-42.5.1.jar:/opt/hive/lib/postgresql-42.5.1.jar
-
./hive/hive-site.xml:/opt/hive/conf/hive-site.xml
-
./hive-warehouse:/opt/hive/data/warehouse
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.3
hive-server2:
image:
apache/hive:4.0.0
container_name:
hive-server2
depends_on:
-
hive-metastore
environment:
HIVE_SERVER2_THRIFT_PORT:
10000
SERVICE_NAME:
hiveserver2
IS_RESUME:
"true"
SERVICE_OPTS:
"-Dhive.metastore.uris=thrift://hive-metastore:9083"
ports:
-
"10000:10000"
-
"10002:10002"
volumes:
-
./hive-warehouse:/opt/hive/data/warehouse
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.4
mysql-seatunnel:
image:
mysql:8.0.42
container_name:
mysql-seatunnel
environment:
MYSQL_ROOT_PASSWORD:
root123456
MYSQL_DATABASE:
seatunnel
MYSQL_ROOT_HOST:
'%'
ports:
-
"3806:3306"
volumes:
-
./mysql_data:/var/lib/mysql
-
./init-sql:/docker-entrypoint-initdb.d
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.5
seatunnel-master:
build:
context:
./seatunnel
dockerfile:
Dockerfile
image:
seatunnel:2.3.11
container_name:
seatunnel-master
extra_hosts:
-
"hive-metastore:172.16.0.3"
-
"hive-metastore-db:172.16.0.2"
environment:
SEATUNNEL_HOME:
/opt/seatunnel
command:
>
sh -c " cd /opt/seatunnel && exec bin/seatunnel-cluster.sh -r master "
ports:
-
"5801:5801"
volumes:
-
./seatunnel/apache-seatunnel-2.3.11/:/opt/seatunnel/
-
./logs/master:/opt/seatunnel/logs
-
./hive-warehouse:/opt/hive/data/warehouse
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.10
seatunnel-worker1:
image:
seatunnel:2.3.11
container_name:
seatunnel-worker1
extra_hosts:
-
"hive-metastore:172.16.0.3"
-
"hive-metastore-db:172.16.0.2"
environment:
SEATUNNEL_HOME:
/opt/seatunnel
command:
>
sh -c " cd /opt/seatunnel && exec bin/seatunnel-cluster.sh -r worker "
volumes:
-
./seatunnel/apache-seatunnel-2.3.11/:/opt/seatunnel/
-
./logs/worker1:/opt/seatunnel/logs
-
./hive-warehouse:/opt/hive/data/warehouse
depends_on:
-
seatunnel-master
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.11
seatunnel-worker2:
image:
seatunnel:2.3.11
container_name:
seatunnel-worker2
extra_hosts:
-
"hive-metastore:172.16.0.3"
-
"hive-metastore-db:172.16.0.2"
environment:
SEATUNNEL_HOME:
/opt/seatunnel
command:
>
sh -c " cd /opt/seatunnel && exec bin/seatunnel-cluster.sh -r worker "
volumes:
-
./seatunnel/apache-seatunnel-2.3.11/:/opt/seatunnel/
-
./logs/worker2:/opt/seatunnel/logs
-
./hive-warehouse:/opt/hive/data/warehouse
depends_on:
-
seatunnel-master
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.12
seatunnel-web:
build:
context:
./seatunnel-web
dockerfile:
Dockerfile
image:
seatunnel-web:1.0.3
container_name:
seatunnel-web
extra_hosts:
-
"hive-metastore:172.16.0.3"
-
"hive-metastore-db:172.16.0.2"
environment:
SEATUNNEL_HOME:
/opt/seatunnel
SEATUNNEL_WEB_HOME:
/opt/seatunnel-web
ports:
-
"8801:8801"
volumes:
-
./seatunnel/apache-seatunnel-2.3.11/:/opt/seatunnel/
-
./seatunnel-web/apache-seatunnel-web-1.0.3-bin/:/opt/seatunnel-web/
-
./logs/web:/opt/seatunnel-web/logs
-
./hive-warehouse:/opt/hive/data/warehouse
depends_on:
-
seatunnel-master
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.13
SeaTunnel 服务端配置
Dockerfile
FROM eclipse-temurin:8-jdk-ubi9-minimal
WORKDIR /opt/seatunnel/
ENV SEATUNNEL_HOME=/opt/seatunnel
ENV PATH=$PATH:$SEATUNNEL_HOME/bin
EXPOSE 5801
CMD ["sh", "bin/seatunnel-cluster.sh", "-r", "master"]
Hazelcast 集群配置
SeaTunnel 引擎依赖 Hazelcast 进行分布式通信。需分别修改 Master 和 Worker 的配置文件。
hazelcast:
cluster-name: seatunnel
network:
rest-api:
enabled: false
join:
tcp-ip:
enabled: true
member-list:
- seatunnel-master:5801
- seatunnel-worker1:5802
- seatunnel-worker2:5802
port:
auto-increment: false
port: 5801
properties:
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
hazelcast.heartbeat.interval.seconds: 2
hazelcast:
cluster-name: seatunnel
network:
join:
tcp-ip:
enabled: true
member-list:
- seatunnel-master:5801
- seatunnel-worker1:5802
- seatunnel-worker2:5802
port:
auto-increment: false
port: 5802
properties:
hazelcast.logging.type: log4j2
hazelcast-client.yaml (Web 端连接用)
hazelcast-client:
cluster-name: seatunnel
connection-strategy:
connection-retry:
cluster-connect-timeout-millis: 3000
network:
cluster-members:
- seatunnel-master:5801
安装连接器插件
在 SeaTunnel 目录下运行安装脚本,确保 Source/Sink 组件下拉框有数据可选。
cd seatunnel/apache-seatunnel-2.3.11/
sh bin/install-plugin.sh
SeaTunnel Web 配置
application.yml
重点配置数据库连接和 JWT 密钥,避免启动后 Token 校验失败。
server:
port: 8801
spring:
main:
allow-circular-references: true
datasource:
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://mysql-seatunnel:3306/seatunnel?useSSL=false&useUnicode=true&characterEncoding=utf-8&allowMultiQueries=true&allowPublicKeyRetrieval=true
username: root
password: root123456
jwt:
expireTime: 86400
secretKey: a3f5c8d2e1b4098765432109abcdef1234567890abcdef
algorithm: HS256
启动脚本修正
如果容器启动后立即退出,检查 seatunnel-backend-daemon.sh。默认脚本可能使用了后台模式导致日志无法捕获或进程异常退出。建议去掉 nohup 和末尾的 &,改为前台运行以便 Docker 监控。
$JAVA_HOME/bin/java $JAVA_OPTS -cp "$CLASSPATH" $SPRING_OPTS org.apache.seatunnel.app.SeatunnelApplication >> "${LOGDIR}/seatunnel.out" 2>&1
echo "seatunnel-web started"
插件映射配置
将服务端生成的 plugin-mapping.properties 复制到 Web 配置目录,确保组件识别一致。
cp seatunnel/apache-seatunnel-2.3.11/connectors/plugin-mapping.properties seatunnel-web/apache-seatunnel-web-1.0.3-bin/conf/
启动服务
使用 docker compose 一键拉起所有服务:
docker compose up -d --build
访问 Web UI 页面,默认账号密码为 admin/admin。
任务配置实战
1. 登录与语言设置
登录后进入设置界面,建议将界面语言调整为中文,操作更直观。
2. 配置数据源
在「数据源」菜单中依次添加 Kafka、Elasticsearch 和 Hive。
- Kafka: 填写 Broker 地址。
- Hive: 元数据库地址填
thrift://hive-metastore:9083。
- ES: 填写节点地址。
3. 创建虚拟表
进入「虚拟表」菜单,选择已配置的数据源,定义字段映射关系。这一步是为了让 SeaTunnel 能够识别目标表的 Schema。
4. 配置同步任务
这是核心步骤。新建同步任务,拖拽 Source、FieldMapper、Sink 组件。
- Source: 选择 Kafka 数据源,指定 Topic。
- FieldMapper: 点击「模型」按钮,自动匹配或手动调整字段类型。
- Sink: 选择 Hive 或 ES 数据源。
注意: 保存任务时需配置 job mode,否则可能报错 job env can't be empty, please change config。
5. 启动任务
配置完成后保存并启动任务,观察执行日志确认数据流转情况。
Hive 数据验证
任务执行后,可通过 Beeline 客户端验证数据是否写入成功。
docker exec -it hive-server2 beeline -u jdbc:hive2://localhost:10000
CREATE TABLE IF NOT EXISTS default.test_user_data3 (
user_id STRING,
type STRING,
content STRING
) STORED AS PARQUET;
SELECT * FROM default.test_user_data3 LIMIT 10;
常见问题排查
1. Web 容器启动即退出
检查 seatunnel-backend-daemon.sh 是否去除了后台模式符号。同时查看 /opt/seatunnel-web/logs/seatunnel.out 日志。
2. 访问页面报错 Secret Key Null
错误提示 secret key byte array cannot be null or empty。请确认 application.yml 中的 jwt.secretKey 已配置且不为空。
3. Hive 地址解析异常
日志出现 Illegal character in hostname。这是因为容器内部 DNS 解析问题。需在 docker-compose.yml 的 seatunnel-master 和 seatunnel-web 中添加 extra_hosts 绑定 IP:
extra_hosts:
- "hive-metastore:172.16.0.3"
- "hive-metastore-db:172.16.0.2"
4. NoClassDefFoundError
同步报错缺少类。请确保 seatunnel/apache-seatunnel-2.3.11/lib/ 目录下存在 hive-exec, hive-metastore, libfb303 等依赖 jar 包。
5. 任务显示成功但无数据
检查 docker-compose.yml 中是否挂载了 hive-warehouse 卷。如果未挂载,数据可能写入容器临时层,重启后丢失或不可见。
volumes:
- ./hive-warehouse:/opt/hive/data/warehouse
6. 查看任务执行位置
若需定位具体 Worker 节点,可查阅 Master 日志:
cat ./logs/master/seatunnel-engine-master.log | grep "will be executed on worker"
相关免费在线工具
- Keycode 信息
查找任何按下的键的javascript键代码、代码、位置和修饰符。 在线工具,Keycode 信息在线工具,online
- Escape 与 Native 编解码
JavaScript 字符串转义/反转义;Java 风格 \uXXXX(Native2Ascii)编码与解码。 在线工具,Escape 与 Native 编解码在线工具,online
- JavaScript / HTML 格式化
使用 Prettier 在浏览器内格式化 JavaScript 或 HTML 片段。 在线工具,JavaScript / HTML 格式化在线工具,online
- JavaScript 压缩与混淆
Terser 压缩、变量名混淆,或 javascript-obfuscator 高强度混淆(体积会增大)。 在线工具,JavaScript 压缩与混淆在线工具,online
- Base64 字符串编码/解码
将字符串编码和解码为其 Base64 格式表示形式即可。 在线工具,Base64 字符串编码/解码在线工具,online
- Base64 文件转换器
将字符串、文件或图像转换为其 Base64 表示形式。 在线工具,Base64 文件转换器在线工具,online