跳到主要内容
SeaTunnel 2.3.11 + Web 1.0.3 Docker 部署实战:Kafka 同步 Hive/ES | 极客日志
Java java
SeaTunnel 2.3.11 + Web 1.0.3 Docker 部署实战:Kafka 同步 Hive/ES SeaTunnel 2.3.11 结合 Web 1.0.3 通过 Docker 容器化部署,实现 Kafka 到 Hive 和 Elasticsearch 的数据同步流程。涵盖环境准备、配置详解、任务创建及常见问题排查。
laoliangsh 发布于 2026/4/10 更新于 2026/5/22 18 浏览SeaTunnel 2.3.11 + Web 1.0.3 Docker 部署实战:Kafka 同步 Hive/ES
这篇实战记录整理了如何使用 Docker 容器化部署 SeaTunnel 2.3.11 和 Web 1.0.3,并配置 Kafka 虚拟表及数据源,最终实现从 Kafka 到 Hive 和 Elasticsearch 的数据同步。整个过程涵盖了环境准备、核心配置、任务创建以及常见问题的排查。
环境准备
目录结构规划
建议按照以下结构组织项目文件,方便后续维护:
seatunnel-docker/
├── docker-compose.yml # 编排文件
├── hive/ # Hive 配置
│ ├── hive-site.xml
│ └── lib/ # 依赖 jar 包
├── init -sql/ # 初始化 SQL
│ └── seatunnel_server_mysql.sql
├── seatunnel/ # SeaTunnel 服务端
│ ├── Dockerfile
│ └── apache-seatunnel-2.3 .11 /
└── seatunnel-web/ # SeaTunnel Web
├── Dockerfile
└── apache-seatunnel-web-1.0 .3 -bin/
下载与构建
首先获取 SeaTunnel 的二进制包和 Web 端的源码。Web 端需要本地编译构建。
wget https://dlcdn.apache.org/seatunnel/2.3.11/apache-seatunnel-2.3.11-bin.tar.gz
git clone https://github.com/apache/seatunnel-web.git
cd seatunnel-web
sh build.sh
依赖包准备
Hive Metastore 容器运行时需要 PostgreSQL 驱动,同时 SeaTunnel 同步 Hive 还需要几个核心依赖。确保将这些 jar 包放入对应容器的 lib 目录中。
wget https://jdbc.postgresql.org/download/postgresql-42.5.1.jar
wget https://repo1.maven.org/maven2/org/apache/hive/hive-exec/3.1.3/hive-exec-3.1.3.jar
wget https://repo1.maven.org/maven2/org/apache/hive/hive-metastore/3.1.3/hive-metastore-3.1.3.jar
wget https://repo.maven.apache.org/maven2/org/apache/thrift/libfb303/0.9.3/libfb303-0.9.3.jar
Docker 部署
docker-compose.yml 配置
这是整个架构的核心。注意网络隔离和 IP 绑定,特别是 Hive Metastore 的解析问题。
[ , ]
[ , , , , ]
version:
'3.9'
networks:
seatunnel-network:
driver:
bridge
ipam:
config:
-
subnet:
172.16
.0
.0
/24
services:
hive-metastore-db:
image:
postgres:15
container_name:
hive-metastore-db
environment:
POSTGRES_DB:
metastore_db
POSTGRES_USER:
hive
POSTGRES_PASSWORD:
hive123456
ports:
-
"5432:5432"
volumes:
-
./hive-metastore-db-data:/var/lib/postgresql/data
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.2
healthcheck:
test:
"CMD-SHELL"
"pg_isready -U hive -d metastore_db"
interval:
5s
timeout:
5s
retries:
10
hive-metastore:
image:
apache/hive:4.0.0
container_name:
hive-metastore
depends_on:
hive-metastore-db:
condition:
service_healthy
environment:
SERVICE_NAME:
metastore
DB_DRIVER:
postgres
SERVICE_OPTS:
>--Djavax.jdo.option.ConnectionDriverName=org.postgresql.Driver
-Djavax.jdo.option.ConnectionURL=jdbc:postgresql://hive-metastore-db:5432/metastore_db
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=hive123456
ports:
-
"9083:9083"
volumes:
-
./hive/lib/postgresql-42.5.1.jar:/opt/hive/lib/postgresql-42.5.1.jar
-
./hive/hive-site.xml:/opt/hive/conf/hive-site.xml
-
./hive-warehouse:/opt/hive/data/warehouse
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.3
hive-server2:
image:
apache/hive:4.0.0
container_name:
hive-server2
depends_on:
-
hive-metastore
environment:
HIVE_SERVER2_THRIFT_PORT:
10000
SERVICE_NAME:
hiveserver2
IS_RESUME:
"true"
SERVICE_OPTS:
"-Dhive.metastore.uris=thrift://hive-metastore:9083"
ports:
-
"10000:10000"
-
"10002:10002"
volumes:
-
./hive-warehouse:/opt/hive/data/warehouse
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.4
mysql-seatunnel:
image:
mysql:8.0.42
container_name:
mysql-seatunnel
environment:
MYSQL_ROOT_PASSWORD:
root123456
MYSQL_DATABASE:
seatunnel
MYSQL_ROOT_HOST:
'%'
ports:
-
"3806:3306"
volumes:
-
./mysql_data:/var/lib/mysql
-
./init-sql:/docker-entrypoint-initdb.d
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.5
command:
--default-authentication-plugin=mysql_native_password
healthcheck:
test:
"CMD"
"mysqladmin"
"ping"
"-h"
"localhost"
interval:
10s
timeout:
5s
retries:
5
seatunnel-master:
build:
context:
./seatunnel
dockerfile:
Dockerfile
image:
seatunnel:2.3.11
container_name:
seatunnel-master
extra_hosts:
-
"hive-metastore:172.16.0.3"
-
"hive-metastore-db:172.16.0.2"
environment:
-
SEATUNNEL_HOME=/opt/seatunnel
command:
>
sh
-c
" cd /opt/seatunnel && exec bin/seatunnel-cluster.sh -r master "
ports:
-
"5801:5801"
volumes:
-
./seatunnel/apache-seatunnel-2.3.11/:/opt/seatunnel/
-
./logs/master:/opt/seatunnel/logs
-
./hive-warehouse:/opt/hive/data/warehouse
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.10
seatunnel-worker1:
image:
seatunnel:2.3.11
container_name:
seatunnel-worker1
extra_hosts:
-
"hive-metastore:172.16.0.3"
-
"hive-metastore-db:172.16.0.2"
environment:
-
SEATUNNEL_HOME=/opt/seatunnel
command:
>
sh
-c
" cd /opt/seatunnel && exec bin/seatunnel-cluster.sh -r worker "
volumes:
-
./seatunnel/apache-seatunnel-2.3.11/:/opt/seatunnel/
-
./logs/worker1:/opt/seatunnel/logs
-
./hive-warehouse:/opt/hive/data/warehouse
depends_on:
-
seatunnel-master
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.11
seatunnel-worker2:
image:
seatunnel:2.3.11
container_name:
seatunnel-worker2
extra_hosts:
-
"hive-metastore:172.16.0.3"
-
"hive-metastore-db:172.16.0.2"
environment:
-
SEATUNNEL_HOME=/opt/seatunnel
command:
>
sh
-c
" cd /opt/seatunnel && exec bin/seatunnel-cluster.sh -r worker "
volumes:
-
./seatunnel/apache-seatunnel-2.3.11/:/opt/seatunnel/
-
./logs/worker2:/opt/seatunnel/logs
-
./hive-warehouse:/opt/hive/data/warehouse
depends_on:
-
seatunnel-master
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.12
seatunnel-web:
build:
context:
./seatunnel-web
dockerfile:
Dockerfile
image:
seatunnel-web:1.0.3
container_name:
seatunnel-web
extra_hosts:
-
"hive-metastore:172.16.0.3"
-
"hive-metastore-db:172.16.0.2"
environment:
-
SEATUNNEL_HOME=/opt/seatunnel
-
SEATUNNEL_WEB_HOME=/opt/seatunnel-web
ports:
-
"8801:8801"
volumes:
-
./seatunnel/apache-seatunnel-2.3.11/:/opt/seatunnel/
-
./seatunnel-web/apache-seatunnel-web-1.0.3-bin/:/opt/seatunnel-web/
-
./logs/web:/opt/seatunnel-web/logs
-
./hive-warehouse:/opt/hive/data/warehouse
depends_on:
-
seatunnel-master
networks:
seatunnel-network:
ipv4_address:
172.16
.0
.13
SeaTunnel 服务端配置
Dockerfile FROM eclipse-temurin:8-jdk-ubi9-minimal
WORKDIR /opt/seatunnel/
ENV SEATUNNEL_HOME=/opt/seatunnel
ENV PATH=$PATH:$SEATUNNEL_HOME/bin
EXPOSE 5801
CMD ["sh", "bin/seatunnel-cluster.sh", "-r", "master"]
Hazelcast 集群配置 SeaTunnel 使用 Hazelcast 进行分布式通信,Master 和 Worker 的配置略有不同。
客户端配置 (hazelcast-client.yaml)
hazelcast-client:
cluster-name: seatunnel
properties:
hazelcast.logging.type: log4j2
connection-strategy:
connection-retry:
cluster-connect-timeout-millis: 3000
network:
cluster-members:
- seatunnel-master:5801
Master 节点配置 (hazelcast-master.yaml)
hazelcast:
cluster-name: seatunnel
network:
rest-api:
enabled: false
endpoint-groups:
CLUSTER_WRITE:
enabled: true
DATA:
enabled: true
join:
tcp-ip:
enabled: true
member-list:
- seatunnel-master:5801
- seatunnel-worker1:5802
- seatunnel-worker2:5802
port:
auto-increment: false
port: 5801
properties:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
hazelcast.heartbeat.failuredetector.type: phi-accrual
hazelcast.heartbeat.interval.seconds: 2
hazelcast.max.no.heartbeat.seconds: 180
hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 10
hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200
hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 100
Worker 节点配置 (hazelcast-worker.yaml)
hazelcast:
cluster-name: seatunnel
network:
join:
tcp-ip:
enabled: true
member-list:
- seatunnel-master:5801
- seatunnel-worker1:5802
- seatunnel-worker2:5802
port:
auto-increment: false
port: 5802
properties:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
hazelcast.heartbeat.failuredetector.type: phi-accrual
hazelcast.heartbeat.interval.seconds: 2
hazelcast.max.no.heartbeat.seconds: 180
hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 10
hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200
hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 100
安装连接器依赖 如果在 Web 界面配置 Source 组件时下拉框没有数据,说明缺少对应的插件依赖。可以在服务端执行以下命令自动安装。
cd seatunnel/apache-seatunnel-2.3.11/
sh bin/install-plugin.sh
SeaTunnel Web 配置
Dockerfile FROM eclipse-temurin:8-jdk-ubi9-minimal
WORKDIR /opt/seatunnel-web/
ENV SEATUNNEL_WEB_HOME=/opt/seatunnel-web
ENV SEATUNNEL_HOME=/opt/seatunnel
EXPOSE 8801
CMD ["sh", "bin/seatunnel-backend-daemon.sh", "start"]
application.yml 配置 server:
port: 8801
spring:
main:
allow-circular-references: true
application:
name: seatunnel
jackson:
date-format: yyyy-MM-dd HH:mm:ss
datasource:
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://mysql-seatunnel:3306/seatunnel?useSSL=false&useUnicode=true&characterEncoding=utf-8&allowMultiQueries=true&allowPublicKeyRetrieval=true
username: root
password: root123456
jwt:
expireTime: 86400
secretKey: a3f5c8d2e1b4098765432109abcdef1234567890abcdef
algorithm: HS256
启动脚本调整 如果容器启动后立即退出,通常是因为后台模式配置不当。编辑 seatunnel-web/apache-seatunnel-web-1.0.3-bin/bin/seatunnel-backend-daemon.sh,去掉 nohup 和最后的 &,改为标准输出重定向。
$JAVA_HOME /bin/java $JAVA_OPTS \-cp "$CLASSPATH " $SPRING_OPTS \ org.apache.seatunnel.app.SeatunnelApplication >>"${LOGDIR} /seatunnel.out" 2>&1
echo "seatunnel-web started"
插件映射配置 将服务端生成的插件映射文件复制到 Web 端配置目录,这一步在实际验证中有时可省略,但为了保险起见建议执行。
cd seatunnel-docker
cp seatunnel/apache-seatunnel-2.3.11/connectors/plugin-mapping.properties seatunnel-web/apache-seatunnel-web-1.0.3-bin/conf/plugin-mapping.properties
其他组件配置
Hive 配置 <?xml version="1.0" encoding="UTF-8" ?>
<configuration >
<property >
<name > hive.metastore.uris</name >
<value > thrift://hive-metastore:9083</value >
</property >
<property >
<name > hive.metastore.warehouse.dir</name >
<value > /opt/hive/data/warehouse</value >
</property >
<property >
<name > metastore.metastore.event.db.notification.api.auth</name >
<value > false</value >
</property >
</configuration >
确保 lib 目录下包含 postgresql-42.5.1.jar。
MySQL 初始化 将 Web 端的 SQL 脚本拷贝到挂载目录,以便容器启动时自动初始化数据库表结构。
cd seatunnel-docker
cp seatunnel-web/apache-seatunnel-web-1.0.3-bin/script/seatunnel_server_mysql.sql init-sql/seatunnel_server_mysql.sql
启动服务
docker compose up -d --build
open http://localhost:8801
运行示例
登录与语言设置 首次登录后,建议在设置中配置语言,避免后续操作出现乱码。
配置数据源 在创建任务前,需要先注册好数据源。这里以 Kafka、Elasticsearch 和 Hive 为例。
配置虚拟表 进入「虚拟表」菜单,点击「创建」按钮,选择已配置的数据源,填写表名并保存。这一步是为了让 SeaTunnel 能识别元数据。
配置同步任务
Kafka -> Hive 同步
进入「任务」→「同步任务定义」,点击「创建」。
拖拽 Source、FieldMapper、Sink 组件。
Source : 选择 Kafka 数据源。
FieldMapper : 配置字段映射关系。
Sink : 选择 Hive 数据源。
保存前需配置 job mode,否则报错 job env can't be empty。
Kafka -> Elasticsearch 同步 流程类似,只需将 Sink 组件切换为 Elasticsearch。
通用流程总结
进入任务管理页,新建同步任务。
拖拽 Source、FieldMapper、Sink 组件构建链路。
双击 Source 配置 Kafka 信息。
双击 FieldMapper 确认字段映射。
双击 Sink 配置目标(Hive 或 ES)。
保存并启动任务。
Hive 相关操作
创建表 通过 Beeline 进入 HiveServer2 容器执行 SQL。
docker exec -it hive-server2 beeline -u jdbc:hive2://localhost:10000 -e "CREATE TABLE IF NOT EXISTS default.test_user_data3 (user_id STRING, type STRING, content STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;"
docker exec -it hive-server2 beeline -u jdbc:hive2://localhost:10000 -e "CREATE TABLE IF NOT EXISTS default.test_user_data3 (user_id STRING, type STRING, content STRING) STORED AS PARQUET;"
查看与查询
docker exec -it hive-server2 beeline -u jdbc:hive2://localhost:10000 -e "SHOW TABLES IN default; DESCRIBE default.test_user_data3;"
docker exec -it hive-server2 beeline -u jdbc:hive2://localhost:10000 -e "SELECT * FROM default.test_user_data3 LIMIT 10;"
常见问题排查
容器启动即退出 如果是 seatunnel-web 容器启动后立即退出,请检查 seatunnel-backend-daemon.sh 脚本。确保去掉了 nohup 和末尾的 &,改用标准输出重定向。
登录报错 Secret Key Null 访问页面提示 secret key byte array cannot be null or empty,通常是 application.yml 中的 jwt.secretKey 未配置或为空。请确保配置了 32 位以上的密钥。
jwt:
secretKey: a3f5c8d2e1b4098765432109abcdef1234567890abcdef
Hive 地址解析异常 日志中出现 Illegal character in hostname,说明 Docker 内部域名解析有问题。需要在 docker-compose.yml 的 extra_hosts 中显式绑定 IP。
extra_hosts:
- "hive-metastore:172.16.0.3"
- "hive-metastore-db:172.16.0.2"
NoClassDefFoundError Hive 同步报错缺少类定义,检查 seatunnel/apache-seatunnel-2.3.11/lib 目录下是否放入了必要的 Hive 依赖包。
hive-exec-3.1.3.jar
hive-metastore-3.1.3.jar
libfb303-0.9.3.jar
任务成功但无数据 如果任务显示成功但 Hive 中没有数据,检查 docker-compose.yml 中是否挂载了 hive-warehouse 目录到宿主机共享路径。
volumes:
- ./hive-warehouse:/opt/hive/data/warehouse
查看任务日志 可以通过 Master 节点的日志查看任务具体执行位置。
./logs/master/seatunnel-engine-master.log
Task [TaskGroupLocation{jobId=1080750681855361026, pipelineId=1, taskGroupId=2}] will be executed on worker [[seatunnel-worker2] :5801], slotID [2] ...
相关免费在线工具 Keycode 信息 查找任何按下的键的javascript键代码、代码、位置和修饰符。 在线工具,Keycode 信息在线工具,online
Escape 与 Native 编解码 JavaScript 字符串转义/反转义;Java 风格 \uXXXX(Native2Ascii)编码与解码。 在线工具,Escape 与 Native 编解码在线工具,online
JavaScript / HTML 格式化 使用 Prettier 在浏览器内格式化 JavaScript 或 HTML 片段。 在线工具,JavaScript / HTML 格式化在线工具,online
JavaScript 压缩与混淆 Terser 压缩、变量名混淆,或 javascript-obfuscator 高强度混淆(体积会增大)。 在线工具,JavaScript 压缩与混淆在线工具,online
Base64 字符串编码/解码 将字符串编码和解码为其 Base64 格式表示形式即可。 在线工具,Base64 字符串编码/解码在线工具,online
Base64 文件转换器 将字符串、文件或图像转换为其 Base64 表示形式。 在线工具,Base64 文件转换器在线工具,online