I'm using HIVE with two tables searching like (pretty much):
-TABLE1 understood to be [(Variables : string),(Value1 : int),(Value2 : int)]
with area "Variables" searching like "x0,x1,x2,x3,...,xn"
-TABLE2 define as [(Value1Sum : int),(Value2Sum : int),(X1 : string),(X4 : string),(X17 : string)]
I "convert" table1 to table2 using the query :
INSERT OVERWRITE TABLE table2 SELECT sum(v1), sum(v2), x1, x4, x17 FROM (SELECT Value1 as v1, Value2 as v2, split(Variables, ",") as x1, split(Variables, ",") as x4, split(Variables, ",") as x17 FROM Table1) tmp GROUP BY tmp.x1, tmp.x4, tmp.x17
Does Hive call 3 occasions the split function ?
It is possible to way to really make it more elegant ?
It is possible to way to really make it more generic ?
Yes it'll call split every time. You may make it a little more elegant:
Why don't you define Variables being an array column to begin with? They you have access to elements directly:
select Varaibles from table1
I am presuming you are utilizing an exterior table, so it can be done like so:
create external table table1(variables array<string>, a int, b int) ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY ',' LOCATION 'hdfs://somewhere'