Hive常见故障多案例FAQ宝典 --项目总结(宝典一)

名称	说明
HiveServer	一个集群内可部署多个HiveServer，负荷分担。对外提供Hive数据库服务，将用户提交的HQL语句进行编译，解析成对应的Yarn任务或者HDFS操作，从而完成数据的提取、转换、分析。
MetaStore	一个集群内可部署多个MetaStore，负荷分担。提供Hive的元数据服务，负责Hive表的结构和属性信息读、写、维护和修改。提供Thrift接口，供HiveServer、Impala、WebHCat等MetaStore客户端来访问，操作元数据。
WebHCat	一个集群内可部署多个WebHCat，负荷分担。提供Rest接口，通过Rest执行Hive命令，提交MapReduce任务。
Hive客户端	包括人机交互命令行Beeline、提供给JDBC应用的JDBC驱动、提供给Python应用的Python驱动、提供给Mapreduce的HCatalog相关JAR包。

【1】参数及配置类常见故障案例如下：

执行set命令的时候报cannot modify xxx at runtime.

症状

执行set命令时报以下错误：

0: jdbc:hive2://xxx.xxx.xxx.xxx:21066/> set mapred.job.queue.name=QueueA; Error: Error while processing statement: Cannot modify mapred.job.queue.name at list of params that are allowed to be modified at runtime (state=42000,code=1)

解决方法

方案1：

登录集群 Manager页面，选择“集群 > 服务 > Hive > 配置 > 全部配置 > Hive > 安全”。
将要添加的参数添加到配置项hive.security.authorization.sqlstd.confwhitelist中。
点击保存并重启HiveServer后即可。如下图所示：

方案2：

登录集群 Manager页面，单击“集群 > 服务 > Hive > 配置 > 全部配置 > Hive > 安全”。
找到选项hive.security.whitelist.switch，选择OFF，点击保存并重启即可。

怎样在Hive提交任务的时候指定队列？

解决方法

如下，在执行语句前通过下述参数设置：

set mapred.job.queue.name=QueueA; select count(*) from rc;

提交任务后，可在Yarn页面看到，任务已经提交到队列QueueA了。（说明：队列的名称区分大小写，如写成queueA,Queuea均无效。）

如何在导入表时指定输出的文件压缩格式

解决方法

如需要全局设置，既对所有表都进行压缩；可以在 Manager页面上进行全局配置，如下：hive.exec.compress.output=true; 这个一定要选择true，否则下面选项不会生效。

如需在session级设置，只需要在执行命令前做如下设置即可：

set hive.exec.compress.output=true; set mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;

当前Hive支持以下几种压缩格式：

org.apache.hadoop.io.compress.BZip2Codec org.apache.hadoop.io.compress.Lz4Codec org.apache.hadoop.io.compress.DeflateCodec org.apache.hadoop.io.compress.SnappyCodec org.apache.hadoop.io.compress.GzipCodec

desc描述表过长时，无法显示完整

解决方法

扩展：可通过beeline -help看到很多关于客户端显示的设置。如下：

 -u <database url> the JDBC URL to connect to -n <username> the username to connect as -p <password> the password to connect as -d <driver class> the driver class to use -i <init file> script file for initialization -e <query> query that should be executed -f <exec file> script file that should be executed --hiveconf property=value Use value for given property --color=[true/false] control whether color is used for display --showHeader=[true/false] show column names in query results --headerInterval=ROWS; the interval between which heades are displayed --fastConnect=[true/false] skip building table/column list for tab-completion --autoCommit=[true/false] enable/disable automatic transaction commit --verbose=[true/false] show verbose error messages and debug info --showWarnings=[true/false] display connection warnings --showNestedErrs=[true/false] display nested errors --numberFormat=[pattern] format numbers using DecimalFormat pattern --force=[true/false] continue running script even after errors --maxWidth=MAXWIDTH the maximum width of the terminal --maxColumnWidth=MAXCOLWIDTH the maximum width to use when displaying columns --silent=[true/false] be more silent --autosave=[true/false] automatically save preferences --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv] format mode for result display Note that csv, and tsv are deprecated - use csv2, tsv2 instead --truncateTable=[true/false] truncate table column when it exceeds length --delimiterForDSV=DELIMITER specify the delimiter for delimiter-separated values output format (default: |) --isolation=LEVEL set the transaction isolation level --nullemptystring=[true/false] set to true to get historic behavior of printing null as empty string --socketTimeOut=n socket connection timeout interval, in second. The default value is 300.

启动时，设置参数maxWidth=20000即可，如下：

[[email protected] logs]# beeline --maxWidth=2000 scan complete in 3ms Connecting to …… Beeline version 1.1.0 by Apache Hive

增加分区列后再insert数据显示为NULL

症状

执行如下命令：

create table test_table( col1 string, col2 string ) PARTITIONED BY(p1 string) STORED AS orc tblproperties('orc.compress'='SNAPPY'); alter table test_table add partition(p1='a'); insert into test_table partition(p1='a') select col1,col2 from temp_table; alter table test_table add columns(col3 string); insert into test_table partition(p1='a') select col1,col2,col3 from temp_table; 这个时候select * from test_table where p1='a' 看见的列col3全为NULL alter table test_table add partition(p1='b'); insert into test_table partition(p1='b') select col1,col2,col3 from temp_table; select * from test_table where p1='b' 能看见col3有不为NULL的值

解决方法

add column的时候加入cascade关键字即可，如下：

alter table test_table add columns(col3 string) cascade;

如何设置hive on spark 模式及提交任务到指定队列

解决方法

如下，在执行语句前通过下述参数设置：

set hive.execution.engine = spark; set spark.yarn.queue = testQueue;

提交任务后，可在Yarn页面看到，如下任务已经提交到队列testQueue了。(说明：队列的名称区分大小写，如写成testqueue,TestQueue均无效。)

hive on spark应用如何设置spark应用的参数？

解决方法

Hive通过spark引擎在执行SQL语句前，可以通过set命令来设置Spark应用相关参数。
- 对于memoryOverhead参数，默认的单位是M，set命令中不能带有单位，否则会报错。
- 其他参数可通过同样的方式进行set。
- 参数设置只对当前session有效。

以下为与Spark相关的内存参数。

set spark.executor.memory = 1g; // executor内存大小 set spark.driver.memory = 1g; // driver内存大小 set spark.yarn.executor.memoryOverhead = 2048; // executor overhead 内存大小 set spark.yarn.driver.memoryOverhead = 1024; // driver overhead memory 大小

如何设置map和reduce个数

解决方法

reduce个数控制(以下命令中的值为默认值，默认不设置个数)

set mapred.reduce.tasks=-1;

设置每个reduce处理的数据量（默认值为256M）：

set hive.exec.reducers.bytes.per.reducer=*256000000*;

map个数控制。map数无法直接控制，需通过设置每个map加载数据量来控制map数（以下命令中的为默认值256M）：

set mapreduce.input.fileinputformat.split.maxsize=*256000000*;

说明：参数建议只对单个session设置，示例中参数设置只对当前session有效。

MapReduce任务内存溢出问题处理

症状

MapReduce任务运行中出现各种内存溢出问题，例如：java heap space、out of memory以及AM日志中的Full GC等。

解决方法

AM则是查看AM日志中的GC打印，有Full GC则存在AM内存溢出：

确定内存溢出的是map阶段还是reduce阶段还是AM，报错信息中“m”为map，“r”为reduce，如下图所示为map阶段内存溢出：

AM内存：AM需要的内存量：

set yarn.app.mapreduce.am.resource.mb=*1536*;

AM的JVM最大使用内存：

set yarn.app.mapreduce.am.command-opts=-Xmx*1024*m;

reduce内存：每个Reduce Task需要的内存量：

set mapreduce.reduce.memory.mb=*4096*;

每个Reduce Task的JVM最大使用内存：

set mapreduce.reduce.java.opts=-Xmx*3276*M;

map内存：每个Map Task需要的内存量：

set mapreduce.map.memory.mb=*4096*;

每个Map Task的JVM最大使用内存：

set mapreduce.map.java.opts=-Xmx*3276*M;

以下为与map/reduce/AM内存相关的参数（以下值都为默认值，需根据实际情况翻倍调整）：（说明：参数设置只对当前session有效。）实例都为session级别生效，如需设置全局参数，需要把JVM最大使用内存写全。例如：AM JVM最大使用内存（可在beeline命令行中执行set yarn.app.mapreduce.am.command-opts;命令获取）：

yarn.app.mapreduce.am.command-opts=-Xmx1024m -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -verbose:gc

Hive常见故障多案例FAQ宝典 --项目总结(宝典一)

Ne0inhk

架构概述

【1】参数及配置类常见故障案例如下：

执行set命令的时候报cannot modify xxx at runtime.

怎样在Hive提交任务的时候指定队列？

如何在导入表时指定输出的文件压缩格式

desc描述表过长时，无法显示完整

增加分区列后再insert数据显示为NULL

如何设置hive on spark 模式及提交任务到指定队列

hive on spark应用如何设置spark应用的参数？

如何设置map和reduce个数

MapReduce任务内存溢出问题处理

Read more

MCP插件使用（browser-tools-mcp为例）

【Linux】poll 多路转接：select 的改良版，以及它留下的遗憾

[特殊字符]颠覆MCP！Open WebUI新技术mcpo横空出世！支持ollama！轻松支持各种MCP Server！Cline+Claude3.7轻松开发论文检索MCP Server！

Qwen3+Qwen Agent 智能体开发实战，打开大模型MCP工具新方式！（一）