关于hive数据定义和数据操作(四详解Hive数据类型)
Hive中数据类型分为数值类型、字符串类型、日期类型、复合类型以及其他类型。下面分别进行介绍。
1.Hive的原生数据类型1.1数值类型(7种)最为常用,批示字符串类型。
VARCHAR的长度指示器范围为1~65535,如果值长度超过定义长度,该字符串会被自动截断
CHAR的长度是固定的,不足部分使用空格补足。CHAR的长度指示器范围为1~255
2.Hive复合数据类型- 原生类型
hive> create table t_test(a string, b int, c bigint, d float, e double, f tinyint, g smallint);
- 日期类型
1,zhangsan,1986-09-09
2,lisi,1988-02-28
3,wangwu,1992-17-16
hive> create table t_customer(id int, name string, birthday date)
row format delimited fields terminated by ',';
# 导入数据
load data local inpath '/root/customer.txt' into table t_customer;
# 查看表结构
hive> desc t_customer;
OK
id int
name string
birthday date
Time taken: 0.115 seconds, Fetched: 3 row(s)
# 查看数据
hive (mydb)> select * from t_customer;
OK
t_customer.id t_customer.name t_customer.birthday
1 zhangsan 1986-09-09
2 lisi 1988-02-28
3 wangwu 1993-05-16
Time taken: 0.228 seconds, Fetched: 3 row(s)
- 复合数据类型 ARRAY<data_type>
战狼2,吴京:吴刚:龙母,2017-08-16
三生三世十里桃花,刘亦菲:痒痒,2017-08-20
知否知否应是绿肥红瘦,赵丽颖:李易峰,2017-12-20
#建表
create table t_movie(name string, actors array<string>,publish_date date)
row format delimited fields terminated by ','
collection items terminated by ':';
# 导入数据
hive (mydb)> load data local inpath '/home/hadoop/movie.txt' into table t_movie;
Loading data to table mydb.t_movie
OK
Time taken: 0.385 seconds
# 查询数据
hive (mydb)> select * from t_movie;
OK
t_movie.name t_movie.actors t_movie.publish_date
战狼2 ["吴京","吴刚","龙母"] 2017-08-16
三生三世十里桃花 ["刘亦菲","痒痒"] 2017-08-20
知否知否应是绿肥红瘦 ["赵丽颖","李易峰"] 2017-12-20
Time taken: 0.227 seconds, Fetched: 3 row(s)
hive (mydb)> select name, actors[0] as main_actor from t_movie;
OK
name main_actor
战狼2 吴京
三生三世十里桃花 刘亦菲
知否知否应是绿肥红瘦 赵丽颖
Time taken: 0.737 seconds, Fetched: 3 row(s)
hive (mydb)> select name, actors from t_movie where array_contains(actors,'吴刚');
OK
name actors
战狼2 ["吴京","吴刚","龙母"]
Time taken: 0.374 seconds, Fetched: 1 row(s)
hive (mydb)> select name, size(actors) as size from t_movie;
OK
name size
战狼2 3
三生三世十里桃花 2
知否知否应是绿肥红瘦 2
Time taken: 0.27 seconds, Fetched: 3 row(s)
- 复合数据类型 MAP<primitive_type, data_type>
# 有如下数据,存入sanguo.txt
1,关羽,武力:97#智力:92#忠诚度:88,蜀国,29
2,刘备,武力:77#智力:94#忠诚度:99,蜀国,34
3,张飞,武力:94#智力:81#忠诚度:90,蜀国,28
4,赵云,武力:98#智力:93#忠诚度:94,蜀国,35
5,马超,武力:93#智力:78#忠诚度:85,蜀国,29
6,黄忠,武力:91#智力:83#忠诚度:86,蜀国,60
7,许褚,武力:90#智力:60#忠诚度:91,魏国,33
8,典韦,武力:97#智力:55#忠诚度:94,魏国,40
9,张辽,武力:90#智力:82#忠诚度:92,魏国,28
# 建表语句
hive (mydb)> create table t_sanguo(id int, name string, lables map<string, int>, country string, age int)
> row format delimited fields terminated by ','
> collection items terminated by '#'
> map keys terminated by ':';
# 上传数据
hive (mydb)> load data local inpath '/home/hadoop/sanguo.txt'
> overwrite into table t_sanguo;
# 查询
hive (mydb)> select * from t_sanguo;
OK
t_sanguo.id t_sanguo.name t_sanguo.lables t_sanguo.country t_sanguo.age
1 关羽 {"武力":97,"智力":92,"忠诚度":88} 蜀国 29
2 刘备 {"武力":77,"智力":94,"忠诚度":99} 蜀国 34
3 张飞 {"武力":94,"智力":81,"忠诚度":90} 蜀国 28
4 赵云 {"武力":98,"智力":93,"忠诚度":94} 蜀国 35
5 马超 {"武力":93,"智力":78,"忠诚度":85} 蜀国 29
6 黄忠 {"武力":91,"智力":83,"忠诚度":86} 蜀国 60
7 许褚 {"武力":90,"智力":60,"忠诚度":91} 魏国 33
8 典韦 {"武力":97,"智力":55,"忠诚度":94} 魏国 40
9 张辽 {"武力":90,"智力":82,"忠诚度":92} 魏国 28
Time taken: 0.333 seconds, Fetched: 9 row(s)
hive (mydb)> select id, name, lables["武力"], country, age from t_sanguo;
OK
id name _c2 country age
1 关羽 97 蜀国 29
2 刘备 77 蜀国 34
3 张飞 94 蜀国 28
4 赵云 98 蜀国 35
5 马超 93 蜀国 29
6 黄忠 91 蜀国 60
7 许褚 90 魏国 33
8 典韦 97 魏国 40
9 张辽 90 魏国 28
Time taken: 0.306 seconds, Fetched: 9 row(s)
注:可以在Hive中用命令hive>show functions; 查看所有与map相关的方法(map_keys, map_values)
- 复合数据类型 STRUCT < col_name : data_type,... >
# 假如有下列数据,建立文件actors.txt
1,zhaowei,39:female:bejing:huanzhugege
2,zhourunfa,62:male:hk:yingxiongbense
3,qiushuzheng,54:female:taiwan:ludingji
4,zhouxingchi,58:male:hk:tangbohudianqiuxiang
# 建表
hive (mydb)> create table t_actors(id int, name string, info struct<age:string, city:string, magnum_opus:string>)
> row format delimited fields terminated by ','
> collection items terminated by ':';
# 导入数据
hive (mydb)> load data local inpath '/home/hadoop/actors.txt' overwrite into table t_actors;
Loading data to table mydb.t_actors
OK
Time taken: 0.363 seconds
# 查询表结构
hive (mydb)> desc t_actors;
OK
col_name data_type comment
id int
name string
info struct<age:string,city:string,magnum_opus:string>
Time taken: 0.073 seconds, Fetched: 3 row(s)
# 查询
hive (mydb)> select * from t_actors;
OK
t_actors.id t_actors.name t_actors.info
1 zhaowei {"age":"39","city":"female","magnum_opus":"bejing"}
2 zhourunfa {"age":"62","city":"male","magnum_opus":"hk"}
3 qiushuzheng {"age":"54","city":"female","magnum_opus":"taiwan"}
4 zhouxingchi {"age":"58","city":"male","magnum_opus":"hk"}
Time taken: 0.208 seconds, Fetched: 4 row(s)
hive (mydb)> select name,info.city, info.magnum_opus from t_actors;
OK
name city magnum_opus
zhaowei female bejing
zhourunfa male hk
qiushuzheng female taiwan
zhouxingchi male hk
Time taken: 0.221 seconds, Fetched: 4 row(s)
# 新建文件timefile.txt
1,2019-08-21 11:12:13.1,2019-08-21,2019-08-21 11:12:13.1
4,2019-08-21 11:12:13.123456789,2019-08-21,2019-08-21 11:12:13.123456789
5,2019-08-21 11:12:13,2019-08-21,2019-08-21 11:12:13
6,2020-06-09 09:09:09.1234567891,2019-06-09,2019-06-09 09:09:09.1234567891
# 创建表
hive (mydb)> create table tf(id int, ts timestamp, ddate date, sdate string)
> row format delimited
> fields terminated by ','
# 导入数据
hive (mydb)> load data local inpath '/home/hadoop/timefile.txt'
> overwrite into table tf;
Loading data to table mydb.tf
OK
Time taken: 0.35 seconds
# 查询表结构
hive (mydb)> desc tf;
OK
col_name data_type comment
id int
ts timestamp
ddate date
sdate string
Time taken: 0.078 seconds, Fetched: 4 row(s)
# 查询表数据
hive (mydb)> select * from tf;
OK
tf.id tf.ts tf.ddate tf.sdate
1 2019-08-21 11:12:13.1 2019-08-21 2019-08-21 11:12:13.1
4 2019-08-21 11:12:13.123456789 2019-08-21 2019-08-21 11:12:13.123456789
5 2019-08-21 11:12:13 2019-08-21 2019-08-21 11:12:13
6 NULL 2019-06-09 2019-06-09 09:09:09.1234567891
Time taken: 0.197 seconds, Fetched: 4 row(s)
- timestamp转string
hive (mydb)> select cast(ts as string) from tf;
OK
ts
2019-08-21 11:12:13.1
2019-08-21 11:12:13.123456789
2019-08-21 11:12:13
NULL
Time taken: 0.229 seconds, Fetched: 4 row(s)
- timestamp -> date
hive (mydb)> select cast(ts as date) from tf;
OK
ts
2019-08-21
2019-08-21
2019-08-21
NULL
Time taken: 0.189 seconds, Fetched: 4 row(s)
- date -> string
hive (mydb)> select cast(ddate as string) from tf;
OK
ddate
2019-08-21
2019-08-21
2019-08-21
2019-06-09
Time taken: 0.215 seconds, Fetched: 4 row(s)
- date -> timestamp
hive (mydb)> select cast(ddate as timestamp) from tf;
OK
ddate
2019-08-21 00:00:00
2019-08-21 00:00:00
2019-08-21 00:00:00
2019-06-09 00:00:00
Time taken: 0.271 seconds, Fetched: 4 row(s)
- string -> timestamp
hive (mydb)> select cast(sdate as timestamp) from tf;
OK
sdate
2019-08-21 11:12:13.1
2019-08-21 11:12:13.123456789
2019-08-21 11:12:13
2019-06-09 09:09:09.123456789
Time taken: 0.23 seconds, Fetched: 4 row(s)
- string -> date
hive (mydb)> select cast(sdate as date) from tf;
OK
sdate
2019-08-21
2019-08-21
2019-08-21
2019-06-09
Time taken: 0.175 seconds, Fetched: 4 row(s)
免责声明:本文仅代表文章作者的个人观点,与本站无关。其原创性、真实性以及文中陈述文字和内容未经本站证实,对本文以及其中全部或者部分内容文字的真实性、完整性和原创性本站不作任何保证或承诺,请读者仅作参考,并自行核实相关内容。文章投诉邮箱:anhduc.ph@yahoo.com