Hive 表的数据加载

Hive 表的数据加载

文章目录

hive表中加载数据

准备数据:

链接:
提取码:njk8

直接向分区表中插入数据

create table score3 like score;
hive (myhive)> create table score3 like score;
OK
Time taken: 0.191 seconds

hive (myhive)> desc score3;
OK
col_name        data_type       comment
s_id                    string                                      
c_id                    string                                      
s_score                 int                                         
month                   string                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
month                   string                                      
Time taken: 0.035 seconds, Fetched: 9 row(s)
hive (myhive)> 
insert into table score3 
partition(month ='201807') 
values ('001','002','100');
hive (myhive)> insert into table score3
             > partition(month='201807')
             > values('001','002','100');
Query ID = root_20191120170909_5a3320b8-c7c3-4588-9355-61d8e09a9829
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0009, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0009/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job  -kill job_1574208142489_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-11-20 17:09:56,925 Stage-1 map = 0%,  reduce = 0%
2019-11-20 17:09:57,981 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.0 sec
MapReduce Total cumulative CPU time: 1 seconds 0 msec
Ended Job = job_1574208142489_0009
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop01:8020/user/hive/warehouse/myhive.db/score3/month=201807/.hive-staging_hive_2019-11-20_17-09-50_926_2625285990599242385-1/-ext-10000
Loading data to table myhive.score3 partition (month=201807)
Partition myhive.score3{month=201807} stats: [numFiles=1, numRows=1, totalSize=12, rawDataSize=11]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.0 sec   HDFS Read: 4119 HDFS Write: 287664 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 0 msec
OK
_col0   _col1   _col2
Time taken: 8.303 seconds


hive (myhive)> select * from score3;
OK
score3.s_id     score3.c_id     score3.s_score  score3.month
001     002     100     201807
Time taken: 0.033 seconds, Fetched: 1 row(s)
hive (myhive)> 

可以看到,数据插入成功了。

通过查询插入数据

通过load方式加载数据

load data local inpath '/home/score.csv' 
overwrite into table score 
partition(month='201806');
hive (myhive)> load data local inpath '/home/score.csv' overwrite into table score
             > partition(month='201806');
Loading data to table myhive.score partition (month=201806)
Partition myhive.score{month=201806} stats: [numFiles=1, numRows=0, totalSize=162, rawDataSize=0]
OK
Time taken: 0.198 seconds


hive (myhive)> select * from score;
OK
score.s_id      score.c_id      score.s_score   score.month
01      01      80      201806
01      02      90      201806
01      03      99      201806
02      01      70      201806
02      02      60      201806
02      03      80      201806
03      01      80      201806
03      02      80      201806
03      03      80      201806
04      01      50      201806
04      02      30      201806
04      03      20      201806
05      01      76      201806
05      02      87      201806
06      01      31      201806
06      03      34      201806
07      02      89      201806
07      03      98      201806
Time taken: 0.053 seconds, Fetched: 18 row(s)
hive (myhive)> 

通过查询方式加载数据

create table score4 like score;
insert overwrite table score4 
partition(month = '201806') 
select s_id,c_id,s_score from score;
hive (myhive)> create table score4 like score;
OK
Time taken: 0.046 seconds

hive (myhive)> insert overwrite table score4 partition(month='201806')
             > select s_id,c_id,s_score from score;
Query ID = root_20191120171616_de597311-ac29-4903-85a9-31663cb5b092
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0010, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0010/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job  -kill job_1574208142489_0010
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2019-11-20 17:16:25,172 Stage-1 map = 0%,  reduce = 0%
2019-11-20 17:16:26,226 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.23 sec
MapReduce Total cumulative CPU time: 1 seconds 230 msec
Ended Job = job_1574208142489_0010
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0011, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0011/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job  -kill job_1574208142489_0011
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2019-11-20 17:16:37,320 Stage-3 map = 0%,  reduce = 0%
2019-11-20 17:16:38,360 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.06 sec
MapReduce Total cumulative CPU time: 1 seconds 60 msec
Ended Job = job_1574208142489_0011
Loading data to table myhive.score4 partition (month=201806)
Partition myhive.score4{month=201806} stats: [numFiles=1, numRows=18, totalSize=162, rawDataSize=144]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 1.23 sec   HDFS Read: 17932 HDFS Write: 583767 SUCCESS
Stage-Stage-3: Map: 1   Cumulative CPU: 1.06 sec   HDFS Read: 2569 HDFS Write: 287444 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 290 msec
OK
s_id    c_id    s_score
Time taken: 23.912 seconds


hive (myhive)> select * from score4;
OK
score4.s_id     score4.c_id     score4.s_score  score4.month
01      01      80      201806
01      02      90      201806
01      03      99      201806
02      01      70      201806
02      02      60      201806
02      03      80      201806
03      01      80      201806
03      02      80      201806
03      03      80      201806
04      01      50      201806
04      02      30      201806
04      03      20      201806
05      01      76      201806
05      02      87      201806
06      01      31      201806
06      03      34      201806
07      02      89      201806
07      03      98      201806
Time taken: 0.285 seconds, Fetched: 18 row(s)

注: 关键字overwrite 必须要有

多插入模式

常用于实际生产环境当中,将一张表拆开成两部分或者多部分

给score表加载数据

load data local inpath '/home/score.csv' 
overwrite into table score 
partition(month='201806');
hive (myhive)> load data local inpath '/home/score.csv' 
             > overwrite into table score partition(month='201806');
Loading data to table myhive.score partition (month=201806)
Partition myhive.score{month=201806} stats: [numFiles=1, numRows=0, totalSize=162, rawDataSize=0]
OK
Time taken: 1.833 seconds


hive (myhive)> select * from score;
OK
score.s_id      score.c_id      score.s_score   score.month
01      01      80      201806
01      02      90      201806
01      03      99      201806
02      01      70      201806
02      02      60      201806
02      03      80      201806
03      01      80      201806
03      02      80      201806
03      03      80      201806
04      01      50      201806
04      02      30      201806
04      03      20      201806
05      01      76      201806
05      02      87      201806
06      01      31      201806
06      03      34      201806
07      02      89      201806
07      03      98      201806
Time taken: 0.19 seconds, Fetched: 18 row(s)
hive (myhive)> 

创建第一部分表:

create table score_first( s_id string,c_id  string) 
partitioned by (month string) 
row format delimited fields terminated by '\t' ;
hive (myhive)> create table score_first(s_id string,c_id string)
             > partitioned by (month string) 
             > row format delimited fields terminated by '\t';
OK
Time taken: 0.075 seconds

创建第二部分表:

create table score_second(c_id string,s_score int) 
partitioned by (month string) 
row format delimited fields terminated by '\t';
hive (myhive)> create table score_second(c_id string,s_score int)
             > partitioned by (month string)
             > row format delimited fields terminated by '\t';
OK
Time taken: 0.031 seconds

分别给第一部分与第二部分表加载数据

from score 
insert overwrite table score_first partition(month='201806') select s_id,c_id 
insert overwrite table score_second partition(month = '201806')  select c_id,s_score;	
hive (myhive)> from score
             > insert overwrite table score_first partition(month='201806') select s_id,c_id
             > insert overwrite table score_second partition(month='201806') select c_id,s_score;
Query ID = root_20191120172828_61923504-26bb-4e0a-a4bf-6e6d8dda0823
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0012, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0012/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job  -kill job_1574208142489_0012
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 0
2019-11-20 17:28:47,905 Stage-2 map = 0%,  reduce = 0%
2019-11-20 17:28:48,993 Stage-2 map = 50%,  reduce = 0%, Cumulative CPU 1.13 sec
2019-11-20 17:28:50,055 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 1.34 sec
MapReduce Total cumulative CPU time: 1 seconds 340 msec
Ended Job = job_1574208142489_0012
Stage-5 is filtered out by condition resolver.
Stage-4 is selected by condition resolver.
Stage-6 is filtered out by condition resolver.
Stage-11 is filtered out by condition resolver.
Stage-10 is selected by condition resolver.
Stage-12 is filtered out by condition resolver.
Launching Job 4 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0013, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0013/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job  -kill job_1574208142489_0013
Hadoop job information for Stage-4: number of mappers: 1; number of reducers: 0
2019-11-20 17:28:56,493 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 0.59 sec
MapReduce Total cumulative CPU time: 590 msec
Ended Job = job_1574208142489_0013
Launching Job 5 out of 5
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0014, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0014/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job  -kill job_1574208142489_0014
Hadoop job information for Stage-10: number of mappers: 1; number of reducers: 0
2019-11-20 17:29:06,652 Stage-10 map = 0%,  reduce = 0%
2019-11-20 17:29:07,724 Stage-10 map = 100%,  reduce = 0%, Cumulative CPU 0.88 sec
MapReduce Total cumulative CPU time: 880 msec
Ended Job = job_1574208142489_0014
Loading data to table myhive.score_first partition (month=201806)
Loading data to table myhive.score_second partition (month=201806)
Partition myhive.score_first{month=201806} stats: [numFiles=1, numRows=18, totalSize=108, rawDataSize=90]
Partition myhive.score_second{month=201806} stats: [numFiles=1, numRows=18, totalSize=108, rawDataSize=90]
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 2   Cumulative CPU: 1.34 sec   HDFS Read: 21181 HDFS Write: 587275 SUCCESS
Stage-Stage-4: Map: 1   Cumulative CPU: 0.59 sec   HDFS Read: 2414 HDFS Write: 288850 SUCCESS
Stage-Stage-10: Map: 1   Cumulative CPU: 0.88 sec   HDFS Read: 2515 HDFS Write: 288859 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 810 msec
OK
c_id    s_score
Time taken: 34.837 seconds


hive (myhive)> select * from score_first;
OK
score_first.s_id        score_first.c_id        score_first.month
01      01      201806
01      02      201806
01      03      201806
02      01      201806
02      02      201806
02      03      201806
03      01      201806
03      02      201806
03      03      201806
04      01      201806
04      02      201806
04      03      201806
05      01      201806
05      02      201806
06      01      201806
06      03      201806
07      02      201806
07      03      201806
Time taken: 0.274 seconds, Fetched: 18 row(s)


hive (myhive)> select * from score_second;
OK
score_second.c_id       score_second.s_score    score_second.month
01      80      201806
02      90      201806
03      99      201806
01      70      201806
02      60      201806
03      80      201806
01      80      201806
02      80      201806
03      80      201806
01      50      201806
02      30      201806
03      20      201806
01      76      201806
02      87      201806
01      31      201806
03      34      201806
02      89      201806
03      98      201806
Time taken: 0.574 seconds, Fetched: 18 row(s)
hive (myhive)> 

查询语句中创建表并加载数据(as select)

将查询的结果保存到一张表当中去

create table score6 as select * from score;
hive (myhive)> create table score6 as select * from score;
Query ID = root_20191120173131_99f1dcc6-ed42-4cfc-a191-95622e9ab877
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0015, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0015/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job  -kill job_1574208142489_0015
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2019-11-20 17:32:07,400 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 0.75 sec
2019-11-20 17:32:08,469 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.06 sec
MapReduce Total cumulative CPU time: 1 seconds 60 msec
Ended Job = job_1574208142489_0015
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1574208142489_0016, Tracking URL = http://hadoop01:8088/proxy/application_1574208142489_0016/
Kill Command = /export/servers/hadoop-2.6.0-cdh5.14.0/bin/hadoop job  -kill job_1574208142489_0016
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2019-11-20 17:32:17,466 Stage-3 map = 0%,  reduce = 0%
2019-11-20 17:32:18,513 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 0.88 sec
MapReduce Total cumulative CPU time: 880 msec
Ended Job = job_1574208142489_0016
Moving data to: hdfs://hadoop01:8020/user/hive/warehouse/myhive.db/score6
Table myhive.score6 stats: [numFiles=1, numRows=18, totalSize=288, rawDataSize=270]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 1.06 sec   HDFS Read: 17311 HDFS Write: 583511 SUCCESS
Stage-Stage-3: Map: 1   Cumulative CPU: 0.88 sec   HDFS Read: 2192 HDFS Write: 287167 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 940 msec
OK
score.s_id      score.c_id      score.s_score   score.month
Time taken: 20.513 seconds


hive (myhive)> select * from score6;
OK
score6.s_id     score6.c_id     score6.s_score  score6.month
01      01      80      201806
01      02      90      201806
01      03      99      201806
02      01      70      201806
02      02      60      201806
02      03      80      201806
03      01      80      201806
03      02      80      201806
03      03      80      201806
04      01      50      201806
04      02      30      201806
04      03      20      201806
05      01      76      201806
05      02      87      201806
06      01      31      201806
06      03      34      201806
07      02      89      201806
07      03      98      201806
Time taken: 0.122 seconds, Fetched: 18 row(s)
hive (myhive)> 

创建表时通过location指定加载数据路径

1)创建表,并指定在hdfs上的位置

create external table score7 (s_id string,c_id string,s_score int) row format delimited fields terminated by '\t' location '/myscore7';
hive (myhive)> create external table score7 (s_id string,c_id string,s_score int)
             > row format delimited fields terminated by '\t'
             > location '/myscore7';
OK
Time taken: 0.171 seconds
hive (myhive)> 

2)上传数据到hdfs上

hdfs dfs -mkdir -p /myscore7
hdfs dfs -put score.csv /myscore7

3)查询数据

select * from score7;
hive (myhive)> select * from score7;
OK
score7.s_id     score7.c_id     score7.s_score
01      01      80
01      02      90
01      03      99
02      01      70
02      02      60
02      03      80
03      01      80
03      02      80
03      03      80
04      01      50
04      02      30
04      03      20
05      01      76
05      02      87
06      01      31
06      03      34
07      02      89
07      03      98
Time taken: 0.493 seconds, Fetched: 18 row(s)
hive (myhive)> 

export导出与import 导入 hive表数据(内部表操作)

create table teacher2 like teacher;
export table teacher to  '/export/teacher';
import table teacher2 from '/export/teacher';
hive (myhive)> create table teacher2 like teacher;
OK
Time taken: 0.292 seconds


hive (myhive)> export table teacher to '/export/teacher';
Copying data from file:/tmp/root/dd7ed4bc-f1f2-4b5f-910a-1404a963c5a5/hive_2019-11-20_19-50-27_410_9054804966162880336-1/-local-10000/_metadata
Copying file: file:/tmp/root/dd7ed4bc-f1f2-4b5f-910a-1404a963c5a5/hive_2019-11-20_19-50-27_410_9054804966162880336-1/-local-10000/_metadata
Copying data from hdfs://hadoop01:8020/user/hive/warehouse/myhive.db/teacher
Copying file: hdfs://hadoop01:8020/user/hive/warehouse/myhive.db/teacher/techer.csv
OK
Time taken: 0.821 seconds


hive (myhive)> import table teacher2 from '/export/teacher';
Copying data from hdfs://hadoop01:8020/export/teacher/data
Copying file: hdfs://hadoop01:8020/export/teacher/data/techer.csv
Loading data to table myhive.teacher2
OK
Time taken: 0.364 seconds


hive (myhive)> select * from teacher2;
OK
teacher2.t_id   teacher2.t_name
01      张三
02      李四
03      王五
Time taken: 0.029 seconds, Fetched: 3 row(s)
hive (myhive)>