The below article will describe the steps required to read AVRO files using HAWQ. In the below example, we will showcase 2 different approaches as an example to read AVRO files
Case 1: Read raw AVRO files on HDFS filesystem.
gpadmin=# CREATE EXTERNAL TABLE hawq_avro_example (number int, first_name varchar, last_name varchar) LOCATION ('pxf://hdm1:50070/tmp/doctors.avro?Profile=Avro') FORMAT 'custom' (formatter='pxfwritable_import');
[gpadmin@hdw3 ~]$hdfs dfs -copyFromLocal doctors.avro /tmp/doctors.avro
gpadmin=# select * from hawq_avro_example ; number | first_name | last_name
--------+-------------+-----------
6 | Colin | Baker
3 | Jon | Pertwee
4 | Tom | Baker
5 | Peter | Davison
11 | Matt | Smith
1 | William | Hartnell
7 | Sylvester | McCoy
8 | Paul | McGann
2 | Patrick | Troughton
9 | Christopher | Eccleston
10 | David | Tennant
(11 rows)
Case 2: Read from Hive tables based on AVRO schema. In the below example, we have defined an avro schema and had included extra_field column just to showcase that default value can be read in Hive table and used in HAWQ if required.
hive> CREATE TABLE doctors
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.literal'='{
"namespace": "testing.hive.avro.serde",
"name": "doctors",
"type": "record",
"fields": [
{
"name":"number",
"type":"int",
"doc":"Order of playing the role"
},
{
"name":"first_name",
"type":"string",
"doc":"first name of actor playing role"
},
{
"name":"last_name",
"type":"string",
"doc":"last name of actor playing role"
},
{
"name":"extra_field",
"type":"string",
"doc:":"an extra field not in the original file",
"default":"fishfingers and custard"
}
]
}');
OK
Time taken: 5.789 seconds
hive> LOAD DATA LOCAL INPATH '/tmp/doctors.avro' INTO TABLE doctors;
gpadmin=# CREATE EXTERNAL TABLE hive_avro_test (number int, first_name varchar, last_name varchar, extra_field varchar) LOCATION ('pxf://hdm1.phd.local:50070/doctors?PROFILE=Hive') FORMAT 'custom' (formatter='pxfwritable_import');
CREATE EXTERNAL TABLE
gpadmin=# select * from hive_avro_test ;
number | first_name | last_name | extra_field
--------+-------------+-----------+-------------------------
6 | Colin | Baker | fishfingers and custard
3 | Jon | Pertwee | fishfingers and custard
4 | Tom | Baker | fishfingers and custard
5 | Peter | Davison | fishfingers and custard
11 | Matt | Smith | fishfingers and custard
1 | William | Hartnell | fishfingers and custard
7 | Sylvester | McCoy | fishfingers and custard
8 | Paul | McGann | fishfingers and custard
2 | Patrick | Troughton | fishfingers and custard
9 | Christopher | Eccleston | fishfingers and custard
10 | David | Tennant | fishfingers and custard
(11 rows)
No comments:
Post a Comment