window10本地运行datax与datax-web

window10本地运行datax与datax-web

搭建 dataX

前置条件

下载 datax 编译好的包

https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz

进入目录,使用 powershell 打开

执行解压命令

tar -xzf datax.tar.gz 

得到

配置 windows 命令行支持 utf-8

不配置这个命令,执行同步命令会出现中文乱码

先把日志“乱码”变成可读中文,最简单两步即可:

  1. 临时改当前窗口编码
    打开 cmd,先执行
chcp 65001 

再运行

python datax.py C:\Users\longz\Downloads\1.json 

日志就会以 UTF-8 输出,中文正常显示。

  1. 永久生效(可选)
    在 Windows 10/11 上可以把系统全局控制台编码改成 UTF-8:
    • 设置 → 时间和语言 → 区域 → 相关设置管理语言设置更改系统区域设置 → 勾选 “Beta: 使用 Unicode UTF-8 提供全球语言支持” → 重启。
      之后所有 cmd/PowerShell 窗口默认就是 UTF-8,不会再出现方块或乱码。

或者直接在控制面板找

把如下的复选框勾上

重启

准备测试数据

新建两个库,相同的表名,user

CREATE TABLE IF NOT EXISTS `user` ( `id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, `username` VARCHAR(50) NOT NULL, `password` VARCHAR(100) NOT NULL, `email` VARCHAR(100) NOT NULL, `created_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; 

在 test1 库,新增测试数据。

我这里用存储过程,直接调用即可生成测试数据。

DELIMITER $$ CREATEPROCEDURE GenerateUserTestData(IN num_records INT)BEGINDECLARE i INTDEFAULT1;DECLARE v_username VARCHAR(50);DECLARE v_password VARCHAR(100);DECLARE v_email VARCHAR(100);WHILE i <= num_records DO-- 生成随机用户名(如 User_12345)SET v_username = CONCAT('User_', FLOOR(10000+ RAND()*90000));-- 生成随机密码(简化示例,实际应加密)SET v_password = CONCAT('Pass_', FLOOR(100000+ RAND()*900000));-- 生成随机邮箱(如 [email protected])SET v_email = CONCAT('user', FLOOR(10000+ RAND()*90000),'@example.com');INSERTINTO`user`(username, password, email)VALUES(v_username, v_password, v_email);-- 每1000条提交一次,避免事务过大IF i %1000=0THENCOMMIT;ENDIF;SET i = i +1;ENDWHILE;END$$ DELIMITER;

生成完成后,目标,使用 datax 同步 test1 库的数据到 test2 库

新建 datax 的同步脚本

{"job":{"content":[{"reader":{"name":"mysqlreader","parameter":{"username":"root","password":"root","connection":[{"jdbcUrl":["jdbc:mysql://localhost:3306/test1"],"table":["user"]}],"column":["*"],"splitPk":""}},"writer":{"name":"mysqlwriter","parameter":{"username":"root","password":"root","connection":[{"jdbcUrl":"jdbc:mysql://localhost:3306/test2","table":["user"]}],"column":["*"],"writeMode":"insert"}}}],"setting":{"speed":{"channel":1}}}}

执行同步命令

python datax.py D:\Information_Technology\worksapce_tool\datax-new\datax\job\test-user.json 

执行完成

D:\Information_Technology\worksapce_tool\datax-new\datax\bin>python datax.py D:\Information_Technology\worksapce_tool\datax-new\datax\job\test-user.json DataX (DATAX-OPENSOURCE-3.0), From Alibaba ! Copyright (C)2010-2017, Alibaba Group. All Rights Reserved. 2025-08-08 14:00:36.524 [main] INFO MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN 2025-08-08 14:00:36.526 [main] INFO MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]2025-08-08 14:00:36.536 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl2025-08-08 14:00:36.541 [main] INFO Engine - the machine info => osInfo: Windows 10 amd64 10.0 jvmInfo: Oracle Corporation 1.825.341-b10 cpu num: 16 totalPhysicalMemory: -0.00G freePhysicalMemory: -0.00G maxFileDescriptorCount: -1 currentOpenFileDescriptorCount: -1 GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size PS Eden Space |256.00MB |256.00MB Code Cache |240.00MB |2.44MB Compressed Class Space |1,024.00MB |0.00MB PS Survivor Space |42.50MB |42.50MB PS Old Gen |683.00MB |683.00MB Metaspace | -0.00MB |0.00MB 2025-08-08 14:00:36.552 [main] INFO Engine - {"content":[{"reader":{"name":"mysqlreader", "parameter":{"username":"root", "password":"****", "connection":[{"jdbcUrl":["jdbc:mysql://localhost:3306/test1"], "table":["user"]}], "column":["*"], "splitPk":""}}, "writer":{"name":"mysqlwriter", "parameter":{"username":"root", "password":"****", "connection":[{"jdbcUrl":"jdbc:mysql://localhost:3306/test2", "table":["user"]}], "column":["*"], "writeMode":"insert"}}}], "setting":{"speed":{"channel":1 }}}2025-08-08 14:00:36.565 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false 2025-08-08 14:00:36.566 [main] INFO JobContainer - DataX jobContainer starts job. 2025-08-08 14:00:36.566 [main] INFO JobContainer - Set jobId =0 Fri Aug 08 14:00:36 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or setuseSSL=true and provide truststore for server certificate verification. 2025-08-08 14:00:42.909 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true. 2025-08-08 14:00:42.910 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改. Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or setuseSSL=true and provide truststore for server certificate verification. 2025-08-08 14:00:43.091 [job-0] INFO OriginalConfPretreatmentUtil - table:[user] all columns:[ id,username,password,email,created_at ]. 2025-08-08 14:00:43.091 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改. 2025-08-08 14:00:43.092 [job-0] INFO OriginalConfPretreatmentUtil - Write data [ insert INTO %s (id,username,password,email,created_at) VALUES(?,?,?,?,?)], which jdbcUrl like:[jdbc:mysql://localhost:3306/test2?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false]2025-08-08 14:00:43.093 [job-0] INFO JobContainer - jobContainer starts to do prepare ... 2025-08-08 14:00:43.093 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader]do prepare work .2025-08-08 14:00:43.093 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter]do prepare work .2025-08-08 14:00:43.094 [job-0] INFO JobContainer - jobContainer starts to dosplit... 2025-08-08 14:00:43.094 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels. 2025-08-08 14:00:43.097 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks. 2025-08-08 14:00:43.097 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks. 2025-08-08 14:00:43.115 [job-0] INFO JobContainer - jobContainer starts to do schedule ... 2025-08-08 14:00:43.117 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups. 2025-08-08 14:00:43.119 [job-0] INFO JobContainer - Running by standalone Mode. 2025-08-08 14:00:43.124 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for[1] tasks. 2025-08-08 14:00:43.126 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated. 2025-08-08 14:00:43.127 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated. 2025-08-08 14:00:43.136 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started 2025-08-08 14:00:43.140 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select * from user ] jdbcUrl:[jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or setuseSSL=true and provide truststore for server certificate verification. Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or setuseSSL=true and provide truststore for server certificate verification. Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or setuseSSL=true and provide truststore for server certificate verification. 2025-08-08 14:00:43.179 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select * from user ] jdbcUrl:[jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]. 2025-08-08 14:00:43.456 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[321]ms 2025-08-08 14:00:43.456 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks. 2025-08-08 14:00:53.142 [job-0] INFO StandAloneJobContainerCommunicator - Total 100 records, 5192 bytes | Speed 519B/s, 10 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00% 2025-08-08 14:00:53.142 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks. 2025-08-08 14:00:53.142 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter]do post work. 2025-08-08 14:00:53.143 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader]do post work. 2025-08-08 14:00:53.143 [job-0] INFO JobContainer - DataX jobId [0] completed successfully. 2025-08-08 14:00:53.144 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: D:\Information_Technology\worksapce_tool\datax-new\datax\hook 2025-08-08 14:00:53.144 [job-0] INFO JobContainer - [total cpu info]=> averageCpu | maxDeltaCpu | minDeltaCpu -1.00% | -1.00% | -1.00% [total gc info]=> NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime PS MarkSweep |1|1|1|0.016s |0.016s |0.016s PS Scavenge |1|1|1|0.008s |0.008s |0.008s 2025-08-08 14:00:53.144 [job-0] INFO JobContainer - PerfTrace not enable!2025-08-08 14:00:53.145 [job-0] INFO StandAloneJobContainerCommunicator - Total 100 records, 5192 bytes | Speed 519B/s, 10 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00% 2025-08-08 14:00:53.146 [job-0] INFO JobContainer - 任务启动时刻 :2025-08-08 14:00:36 任务结束时刻 :2025-08-08 14:00:53 任务总计耗时 : 16s 任务平均流量 : 519B/s 记录写入速度 : 10rec/s 读出记录总数 :100 读写失败总数 :0

搭建 datax-web

下载源码

git clone https://github.com/WeiYe-Jing/datax-web.git

创建数据库

执行bin/db下面的datax_web.sql文件(注意老版本更新语句有指定库名)

修改项目配置

1.修改datax_admin下resources/application.yml文件

#数据源 datasource: username: root password: root url: jdbc:mysql://localhost:3306/datax_web?serverTimezone=Asia/Shanghai&useLegacyDatetimeCode=false&useSSL=false&nullNamePatternMatchesAll=true&useUnicode=true&characterEncoding=UTF-8 driver-class-name: com.mysql.jdbc.Driver 

修改数据源配置,目前仅支持mysql

# 配置mybatis-plus打印sql日志 logging: level: com.wugui.datax.admin.mapper: error path: ./data/applogs/admin 

修改日志路径path

 # datax-web email mail: host: smtp.qq.com port: 25 username: [email protected] password: xxx properties: mail: smtp: auth: true starttls: enable: true required: true socketFactory: class: javax.net.ssl.SSLSocketFactory 

修改邮件发送配置(不需要可以不修改)

2.修改datax_executor下resources/application.yml文件

# log config logging: config: classpath:logback.xml path: ./data/applogs/executor/jobhandler 

修改日志路径path

datax: job: admin: ### datax-web admin address addresses: http://127.0.0.1:8080 executor: appname: datax-executor ip: port: 9999 ### job log path logpath: ./data/applogs/executor/jobhandler ### job log retention days logretentiondays: 30 executor: jsonpath: D:\\Information_Technology\\worksapce_tool\\tmp\\executor\\json\\ pypath: D:\\Information_Technology\\worksapce_tool\\datax-new\\datax\\bin\\datax.py 

修改datax.job配置

  • admin.addresses datax_admin部署地址,如调度中心集群部署存在多个地址则用逗号分隔,执行器将会使用该地址进行"执行器心跳注册"和"任务结果回调";
  • executor.appname 执行器AppName,每个执行器机器集群的唯一标示,执行器心跳注册分组依据;
  • executor.ip 默认为空表示自动获取IP,多网卡时可手动设置指定IP,该IP不会绑定Host仅作为通讯实用;地址信息用于 “执行器注册” 和 “调度中心请求并触发任务”;
  • executor.port 执行器Server端口号,默认端口为9999,单机部署多个执行器时,注意要配置不同执行器端口;
  • executor.logpath 执行器运行日志文件存储磁盘路径,需要对该路径拥有读写权限;
  • executor.logretentiondays 执行器日志文件保存天数,过期日志自动清理, 限制值大于等于3时生效; 否则, 如-1, 关闭自动清理功能;
  • executor.jsonpath datax json临时文件保存路径
  • pypath DataX启动脚本地址,例如:xxx/datax/bin/datax.py(这个路径是上面搭建 dataX 已经创建好的启动脚本)
    如果系统配置DataX环境变量(DATAX_HOME),logpath、jsonpath、pypath可不配,log文件和临时json存放在环境变量路径下。

启动项目

本地idea开发环境
  • 1.运行datax_admin下 DataXAdminApplication

2.运行datax_executor下 DataXExecutorApplication

在这里插入图片描述

admin启动成功后日志会输出三个地址,两个接口文档地址,一个前端页面地址

启动成功

启动成功后打开页面(默认管理员用户名:admin 密码:123456)
http://localhost:8080/index.html#/dashboard

在这里插入图片描述

实战

项目管理-添加项目
配置数据源
创建执行器
构建任务生成同步 json

点构建,会自动生成同步 datax 脚本,复制 json,回到菜单

创建任务

创建任务,按照如下填写,并把 json 贴过来,并把 byte 的值改为 0

执行成功

Read more

【金仓数据库】ksql 指南(五) —— 创建与管理索引和视图(KingbaseES 查询优化核心)

【金仓数据库】ksql 指南(五) —— 创建与管理索引和视图(KingbaseES 查询优化核心)

引言 掌握表的基本运作之后,若想优化查询效率并简化数据访问,就要去学习“索引”和“视图”的运用,索引类似于“书籍目录”,可以极大地加快查询速度;视图类似“数据窗口”,能够隐藏复杂的查询逻辑,还能控制数据的可见性。本文就“ksql命令行操作索引与视图”展开论述,把从“作用到创建,再到查看,维持直至删除”的全过程拆解成实际操作步骤,并结合例子和避坑提示,以使初学者能够领悟并付诸实行。 文章目录 * 引言 * 一、前置准备:确认操作基础(衔接前文,确保连贯) * 1.1 1. 连接数据库并切换目标模式 * 1.2 2. 插入测试数据(用于验证索引 / 视图效果) * 二、索引管理:给表 “加目录”,加速查询 * 2.1 1.

By Ne0inhk
从 Express 到企业级架构:NestJS 实战指南与深度解析

从 Express 到企业级架构:NestJS 实战指南与深度解析

在 Node.js 的后端开发生态中,Express 长期以来以其极简主义占据统治地位。然而,随着项目规模的扩大,缺乏约束的“自由”往往会导致代码结构混乱,也就是我们常说的“意大利面条式代码”。 为了解决这个问题,NestJS 应运而生。NestJS 是一个用于构建高效、可扩展且易于维护的企业级后端应用的框架。它基于 TypeScript 构建,深受 Angular 架构的影响,引入了模块化、依赖注入(DI)和装饰器等先进概念。 本文将结合一个包含待办事项(Todos)管理和 PostgreSQL 数据库连接的实战 Demo,带你深入理解 NestJS 的核心架构。 一、 为什么选择 NestJS? 在开始写代码之前,我们需要理解 NestJS 试图解决什么问题。 1. 架构标准化:Express 让你自己决定文件放哪,而

By Ne0inhk
Go语言零基础小白学习知识点【基础版详解】

Go语言零基础小白学习知识点【基础版详解】

✅ 纯白话拆解+代码示例+实战场景,零基础能直接照着敲 ✅ 技术适配:基于Go 1.23(LTS长期支持版,企业主流),聚焦高并发、云原生核心场景 ✅ 条理清晰:从“环境搭建→基础语法→核心特性→实战入门”层层拆解,每个知识点落地到代码 ✅ 核心目标:小白不仅“懂概念”,更能“写得出、跑得起”,掌握Go语言入门核心能力 一、前置准备:先搞定环境和核心认知 1. Go语言是什么? Go(又称Golang)是谷歌2009年推出的编程语言,2026年已是云原生、高并发后端的首选语言——简单说: * 快:运行速度接近C/C++,编译速度秒杀Java; * 简单:语法比Java/Python更简洁,零基础3天能写业务代码; * 强:天生支持高并发,写直播、聊天、

By Ne0inhk
告别重复数据烦恼!MySQL ON DUPLICATE KEY UPDATE 优雅解决存在更新/不存在插入难题

告别重复数据烦恼!MySQL ON DUPLICATE KEY UPDATE 优雅解决存在更新/不存在插入难题

目录 * 前言 * 一、基本概念 * 1、什么是 ON DUPLICATE KEY UPDATE? * 2、工作原理 * 3、基本语法 * 二、使用场景 * 1、计数器更新 * 2、配置项更新 * 3、购物车商品更新 * 三、高级用法 * 1、条件更新 * 2、多表关联 * 3、批量操作优化 * 四、其他处理冲突的方案 * 1、REPLACE INTO * 2、INSERT IGNORE 前言 在日常的数据库操作中,我们经常会遇到这样的场景:“如果数据存在,就更新它;如果不存在,就插入一条新的”。这种模式通常被称为 “Upsert”(Update + Insert)。在

By Ne0inhk