It is easy for Java to connect to Hive using JDBC. But the computational
ability of Hive is less than that of
SQL in other databases. So to deal with uncommon computations, data should be
retrieved before further operation is performed using Java. Thus the code for
will be complicated.
But if esProc is used to help with the Java
programming, the complicated operation as a result of using Hive in Java will become simpler. The
following example will show how esProc works with Java in detail. orders is a table in Hive containing the detailed data of
sales orders. Now it is required to compute the year-on-year comparison and
link relative ratio. The data is as follows:
ORDERID CLIENT SELLERID AMOUNT ORDERDATE
1 UJRNP 17 392 2008/11/2
15:28
2 SJCH 6 4802 2008/11/9
15:28
3 UJRNP 16 13500 2008/11/5
15:28
4 PWQ 9 26100 2008/11/8
15:28
5 PWQ 11 4410 2008/11/12
15:28
6 HANAR 18 6174 2008/11/7
15:28
7 EGU 2 17800 2008/11/6
15:28
8 VILJX 7 2156 2008/11/9
15:28
9 JAYB 14 17400 2008/11/12
15:28
10 JAXE 19 19200 2008/11/12
15:28
11 SJCH 7 13700 2008/11/10
15:28
12 QUICK 11 21200 2008/11/13
15:28
13 HL 12 21400 2008/11/21
15:28
14 JAYB 1 7644 2008/11/16
15:28
15 MIP 16 3234 2008/11/19
15:28
…
Link relative ratio refers to comparison between the current data
and data of the previous period, using month as the time interval. For example,
divide the sales figure in April by that in March and we get the link relative
ratio of April. Year-on-year comparison is the comparison between the current
data and data of the corresponding period of the previous year, which means,
for example, dividing the sales figure of April 2014 by that of April 2013. Since
Hive provides no window functions,
it cannot complete the computation unless using nested SQL. But Hive supports very poor subquery and
usually the computation should be performed outside of the database. With
esProc, however, the computation can be realized easily. The code is as
follows:
A1: Connect to the
database through JDBC using the data source Hive
defined in advance.
A2: Query the data in
the database by the time period using external parameters begin and end. Such as
begin="2011-01-01 00:00:00", end="2014-07-08 00:00:00"
(i.e. the current date which can be obtained using now() function).
A3: Group orders by the year and the month and sum up to get the sales of
each month.
A4: Add a new field Irr, which is the monthly link relative
ratio. The expression is mAmount/mAmount[-1], in which mAmount represents the sales in the current time period and mAmount[-1] represents the sales in the
previous one. Note that the link relative ratio of the initial month (January
of 2011) is empty.
A5: Sort the data in A4 by the month
and the year before we compute the year-on-year comparison. Complete code
should be =A4.sort(m,y). But since A4
has been sorted by the year, here we just need to sort it by the month, that is
A4.sort(m), which has a better
performance.
A6: Add another new field yoy, which is the year-on-year
comparison of the monthly sales figure. The expression is if(m==m[-1],mAmount/mAmount[-1],null), which means the year-on-year
comparison is valid only between the same months of the two time periods. The
year-on-year comparison of each month in the initial year (the year of 2011) is
empty.
A7:Sort the data in A6
by the year in descending order and by the month in ascending order. Note that
the data is valid up to July of 2014. The result is as follows:
A8:Close Hive database
connection.
A9:Return the result.
This block of code can be called by Java
using esProc JDBC to get the final result (the above esProc program will be
saved as test.dfx). The code for this
is as follows:
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
// call the esProc program (the stored
procedure); test is the file name of dfx
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call
test(?,?)");
// set the parameters
st.setObject(1,"2011-01-01 00:00:00");//begin
st.setObject(1,"2014-07-08 00:00:00");//end
// execute esProc stored procedure
st.execute();
// get the result set
ResultSet set = st.getResultSet();
It is the same way in which esProc accesses Hive and other ordinary
databases. Just configure their JDBC while detailed process is omitted here.
No comments:
Post a Comment