Hive 表的数据加载
文章目录
hive表中加载数据
准备数据:
链接:
提取码:njk8
直接向分区表中插入数据
create table score3 like score;
hive (myhive)> create table score3 like score;
OK
Time taken: 0.191 seconds
hive (myhive)> desc score3;
OK
col_name data_type comment
s_id string
c_id string
s_score int
month string
# Partition Information
# col_name data_type comment
month string
Time taken: 0.035 seconds, Fetched: 9 row(s)
hive (myhive)>
insert into table score3
partition(month ='201807')
values ('001','002','100');
hive (myhive)> insert into table score3
> partition(month='201807')
> values('001','002','100');
Query ID = root_20191120170909_5a3320b8-c7c3-4588-9355-61d8e09a9829
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0009, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0009/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job -kill job_1574208142489_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-11-20 17:09:56,925 Stage-1 map = 0%, reduce = 0%
2019-11-20 17:09:57,981 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.0 sec
MapReduce Total cumulative CPU time: 1 seconds 0 msec
Ended Job = job_1574208142489_0009
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop01:8020/user/hive/warehouse/myhive.db/score3/month=201807/.hive-staging_hive_2019-11-20_17-09-50_926_2625285990599242385-1/-ext-10000
Loading data to table myhive.score3 partition (month=201807)
Partition myhive.score3{month=201807} stats: [numFiles=1, numRows=1, totalSize=12, rawDataSize=11]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.0 sec HDFS Read: 4119 HDFS Write: 287664 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 0 msec
OK
_col0 _col1 _col2
Time taken: 8.303 seconds
hive (myhive)> select * from score3;
OK
score3.s_id score3.c_id score3.s_score score3.month
001 002 100 201807
Time taken: 0.033 seconds, Fetched: 1 row(s)
hive (myhive)>
可以看到,数据插入成功了。
通过查询插入数据
通过load方式加载数据
load data local inpath '/home/score.csv'
overwrite into table score
partition(month='201806');
hive (myhive)> load data local inpath '/home/score.csv' overwrite into table score
> partition(month='201806');
Loading data to table myhive.score partition (month=201806)
Partition myhive.score{month=201806} stats: [numFiles=1, numRows=0, totalSize=162, rawDataSize=0]
OK
Time taken: 0.198 seconds
hive (myhive)> select * from score;
OK
score.s_id score.c_id score.s_score score.month
01 01 80 201806
01 02 90 201806
01 03 99 201806
02 01 70 201806
02 02 60 201806
02 03 80 201806
03 01 80 201806
03 02 80 201806
03 03 80 201806
04 01 50 201806
04 02 30 201806
04 03 20 201806
05 01 76 201806
05 02 87 201806
06 01 31 201806
06 03 34 201806
07 02 89 201806
07 03 98 201806
Time taken: 0.053 seconds, Fetched: 18 row(s)
hive (myhive)>
通过查询方式加载数据
create table score4 like score;
insert overwrite table score4
partition(month = '201806')
select s_id,c_id,s_score from score;
hive (myhive)> create table score4 like score;
OK
Time taken: 0.046 seconds
hive (myhive)> insert overwrite table score4 partition(month='201806')
> select s_id,c_id,s_score from score;
Query ID = root_20191120171616_de597311-ac29-4903-85a9-31663cb5b092
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0010, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0010/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job -kill job_1574208142489_0010
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2019-11-20 17:16:25,172 Stage-1 map = 0%, reduce = 0%
2019-11-20 17:16:26,226 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.23 sec
MapReduce Total cumulative CPU time: 1 seconds 230 msec
Ended Job = job_1574208142489_0010
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0011, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0011/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job -kill job_1574208142489_0011
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2019-11-20 17:16:37,320 Stage-3 map = 0%, reduce = 0%
2019-11-20 17:16:38,360 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 1.06 sec
MapReduce Total cumulative CPU time: 1 seconds 60 msec
Ended Job = job_1574208142489_0011
Loading data to table myhive.score4 partition (month=201806)
Partition myhive.score4{month=201806} stats: [numFiles=1, numRows=18, totalSize=162, rawDataSize=144]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Cumulative CPU: 1.23 sec HDFS Read: 17932 HDFS Write: 583767 SUCCESS
Stage-Stage-3: Map: 1 Cumulative CPU: 1.06 sec HDFS Read: 2569 HDFS Write: 287444 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 290 msec
OK
s_id c_id s_score
Time taken: 23.912 seconds
hive (myhive)> select * from score4;
OK
score4.s_id score4.c_id score4.s_score score4.month
01 01 80 201806
01 02 90 201806
01 03 99 201806
02 01 70 201806
02 02 60 201806
02 03 80 201806
03 01 80 201806
03 02 80 201806
03 03 80 201806
04 01 50 201806
04 02 30 201806
04 03 20 201806
05 01 76 201806
05 02 87 201806
06 01 31 201806
06 03 34 201806
07 02 89 201806
07 03 98 201806
Time taken: 0.285 seconds, Fetched: 18 row(s)
注: 关键字overwrite 必须要有
多插入模式
常用于实际生产环境当中,将一张表拆开成两部分或者多部分
给score表加载数据
load data local inpath '/home/score.csv'
overwrite into table score
partition(month='201806');
hive (myhive)> load data local inpath '/home/score.csv'
> overwrite into table score partition(month='201806');
Loading data to table myhive.score partition (month=201806)
Partition myhive.score{month=201806} stats: [numFiles=1, numRows=0, totalSize=162, rawDataSize=0]
OK
Time taken: 1.833 seconds
hive (myhive)> select * from score;
OK
score.s_id score.c_id score.s_score score.month
01 01 80 201806
01 02 90 201806
01 03 99 201806
02 01 70 201806
02 02 60 201806
02 03 80 201806
03 01 80 201806
03 02 80 201806
03 03 80 201806
04 01 50 201806
04 02 30 201806
04 03 20 201806
05 01 76 201806
05 02 87 201806
06 01 31 201806
06 03 34 201806
07 02 89 201806
07 03 98 201806
Time taken: 0.19 seconds, Fetched: 18 row(s)
hive (myhive)>
创建第一部分表:
create table score_first( s_id string,c_id string)
partitioned by (month string)
row format delimited fields terminated by '\t' ;
hive (myhive)> create table score_first(s_id string,c_id string)
> partitioned by (month string)
> row format delimited fields terminated by '\t';
OK
Time taken: 0.075 seconds
创建第二部分表:
create table score_second(c_id string,s_score int)
partitioned by (month string)
row format delimited fields terminated by '\t';
hive (myhive)> create table score_second(c_id string,s_score int)
> partitioned by (month string)
> row format delimited fields terminated by '\t';
OK
Time taken: 0.031 seconds
分别给第一部分与第二部分表加载数据
from score
insert overwrite table score_first partition(month='201806') select s_id,c_id
insert overwrite table score_second partition(month = '201806') select c_id,s_score;
hive (myhive)> from score
> insert overwrite table score_first partition(month='201806') select s_id,c_id
> insert overwrite table score_second partition(month='201806') select c_id,s_score;
Query ID = root_20191120172828_61923504-26bb-4e0a-a4bf-6e6d8dda0823
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0012, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0012/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job -kill job_1574208142489_0012
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 0
2019-11-20 17:28:47,905 Stage-2 map = 0%, reduce = 0%
2019-11-20 17:28:48,993 Stage-2 map = 50%, reduce = 0%, Cumulative CPU 1.13 sec
2019-11-20 17:28:50,055 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.34 sec
MapReduce Total cumulative CPU time: 1 seconds 340 msec
Ended Job = job_1574208142489_0012
Stage-5 is filtered out by condition resolver.
Stage-4 is selected by condition resolver.
Stage-6 is filtered out by condition resolver.
Stage-11 is filtered out by condition resolver.
Stage-10 is selected by condition resolver.
Stage-12 is filtered out by condition resolver.
Launching Job 4 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0013, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0013/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job -kill job_1574208142489_0013
Hadoop job information for Stage-4: number of mappers: 1; number of reducers: 0
2019-11-20 17:28:56,493 Stage-4 map = 100%, reduce = 0%, Cumulative CPU 0.59 sec
MapReduce Total cumulative CPU time: 590 msec
Ended Job = job_1574208142489_0013
Launching Job 5 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0014, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0014/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job -kill job_1574208142489_0014
Hadoop job information for Stage-10: number of mappers: 1; number of reducers: 0
2019-11-20 17:29:06,652 Stage-10 map = 0%, reduce = 0%
2019-11-20 17:29:07,724 Stage-10 map = 100%, reduce = 0%, Cumulative CPU 0.88 sec
MapReduce Total cumulative CPU time: 880 msec
Ended Job = job_1574208142489_0014
Loading data to table myhive.score_first partition (month=201806)
Loading data to table myhive.score_second partition (month=201806)
Partition myhive.score_first{month=201806} stats: [numFiles=1, numRows=18, totalSize=108, rawDataSize=90]
Partition myhive.score_second{month=201806} stats: [numFiles=1, numRows=18, totalSize=108, rawDataSize=90]
MapReduce Jobs Launched:
Stage-Stage-2: Map: 2 Cumulative CPU: 1.34 sec HDFS Read: 21181 HDFS Write: 587275 SUCCESS
Stage-Stage-4: Map: 1 Cumulative CPU: 0.59 sec HDFS Read: 2414 HDFS Write: 288850 SUCCESS
Stage-Stage-10: Map: 1 Cumulative CPU: 0.88 sec HDFS Read: 2515 HDFS Write: 288859 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 810 msec
OK
c_id s_score
Time taken: 34.837 seconds
hive (myhive)> select * from score_first;
OK
score_first.s_id score_first.c_id score_first.month
01 01 201806
01 02 201806
01 03 201806
02 01 201806
02 02 201806
02 03 201806
03 01 201806
03 02 201806
03 03 201806
04 01 201806
04 02 201806
04 03 201806
05 01 201806
05 02 201806
06 01 201806
06 03 201806
07 02 201806
07 03 201806
Time taken: 0.274 seconds, Fetched: 18 row(s)
hive (myhive)> select * from score_second;
OK
score_second.c_id score_second.s_score score_second.month
01 80 201806
02 90 201806
03 99 201806
01 70 201806
02 60 201806
03 80 201806
01 80 201806
02 80 201806
03 80 201806
01 50 201806
02 30 201806
03 20 201806
01 76 201806
02 87 201806
01 31 201806
03 34 201806
02 89 201806
03 98 201806
Time taken: 0.574 seconds, Fetched: 18 row(s)
hive (myhive)>
查询语句中创建表并加载数据(as select)
将查询的结果保存到一张表当中去
create table score6 as select * from score;
hive (myhive)> create table score6 as select * from score;
Query ID = root_20191120173131_99f1dcc6-ed42-4cfc-a191-95622e9ab877
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0015, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0015/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job -kill job_1574208142489_0015
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2019-11-20 17:32:07,400 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 0.75 sec
2019-11-20 17:32:08,469 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.06 sec
MapReduce Total cumulative CPU time: 1 seconds 60 msec
Ended Job = job_1574208142489_0015
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0016, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0016/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job -kill job_1574208142489_0016
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2019-11-20 17:32:17,466 Stage-3 map = 0%, reduce = 0%
2019-11-20 17:32:18,513 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 0.88 sec
MapReduce Total cumulative CPU time: 880 msec
Ended Job = job_1574208142489_0016
Moving data to: hdfs://hadoop01:8020/user/hive/warehouse/myhive.db/score6
Table myhive.score6 stats: [numFiles=1, numRows=18, totalSize=288, rawDataSize=270]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Cumulative CPU: 1.06 sec HDFS Read: 17311 HDFS Write: 583511 SUCCESS
Stage-Stage-3: Map: 1 Cumulative CPU: 0.88 sec HDFS Read: 2192 HDFS Write: 287167 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 940 msec
OK
score.s_id score.c_id score.s_score score.month
Time taken: 20.513 seconds
hive (myhive)> select * from score6;
OK
score6.s_id score6.c_id score6.s_score score6.month
01 01 80 201806
01 02 90 201806
01 03 99 201806
02 01 70 201806
02 02 60 201806
02 03 80 201806
03 01 80 201806
03 02 80 201806
03 03 80 201806
04 01 50 201806
04 02 30 201806
04 03 20 201806
05 01 76 201806
05 02 87 201806
06 01 31 201806
06 03 34 201806
07 02 89 201806
07 03 98 201806
Time taken: 0.122 seconds, Fetched: 18 row(s)
hive (myhive)>
创建表时通过location指定加载数据路径
1)创建表,并指定在hdfs上的位置
create external table score7 (s_id string,c_id string,s_score int) row format delimited fields terminated by '\t' location '/myscore7';
hive (myhive)> create external table score7 (s_id string,c_id string,s_score int)
> row format delimited fields terminated by '\t'
> location '/myscore7';
OK
Time taken: 0.171 seconds
hive (myhive)>
2)上传数据到hdfs上
hdfs dfs -mkdir -p /myscore7
hdfs dfs -put score.csv /myscore7
3)查询数据
select * from score7;
hive (myhive)> select * from score7;
OK
score7.s_id score7.c_id score7.s_score
01 01 80
01 02 90
01 03 99
02 01 70
02 02 60
02 03 80
03 01 80
03 02 80
03 03 80
04 01 50
04 02 30
04 03 20
05 01 76
05 02 87
06 01 31
06 03 34
07 02 89
07 03 98
Time taken: 0.493 seconds, Fetched: 18 row(s)
hive (myhive)>
export导出与import 导入 hive表数据(内部表操作)
create table teacher2 like teacher;
export table teacher to '/export/teacher';
import table teacher2 from '/export/teacher';
hive (myhive)> create table teacher2 like teacher;
OK
Time taken: 0.292 seconds
hive (myhive)> export table teacher to '/export/teacher';
Copying data from file:/tmp/root/dd7ed4bc-f1f2-4b5f-910a-1404a963c5a5/hive_2019-11-20_19-50-27_410_9054804966162880336-1/-local-10000/_metadata
Copying file: file:/tmp/root/dd7ed4bc-f1f2-4b5f-910a-1404a963c5a5/hive_2019-11-20_19-50-27_410_9054804966162880336-1/-local-10000/_metadata
Copying data from hdfs://hadoop01:8020/user/hive/warehouse/myhive.db/teacher
Copying file: hdfs://hadoop01:8020/user/hive/warehouse/myhive.db/teacher/techer.csv
OK
Time taken: 0.821 seconds
hive (myhive)> import table teacher2 from '/export/teacher';
Copying data from hdfs://hadoop01:8020/export/teacher/data
Copying file: hdfs://hadoop01:8020/export/teacher/data/techer.csv
Loading data to table myhive.teacher2
OK
Time taken: 0.364 seconds
hive (myhive)> select * from teacher2;
OK
teacher2.t_id teacher2.t_name
01 张三
02 李四
03 王五
Time taken: 0.029 seconds, Fetched: 3 row(s)
hive (myhive)>