There is a type of text files that they are
too big to be entirely loaded into the memory, yet as the data have been sorted
by a certain column and if they are imported in groups according to this
column, they can be all put into the memory for computing. These text files
include the call detail record of a telecom company, statistics of visitors on
a website, information of members of a shopping mall, etc.
A great deal of complicated code, which is
difficult to maintain, is required if Java is used to realize the operation. But
with the assistance of esProc in programming, it is easier for Java to deal
with this kind of problems. Let's look at how esProc works through an example.
The text file sOrder.txt has a lot of information of orders, with tab being the separator and the first
row being the column names. The data have been sorted by SellerID. Now it is
required to import a group of data with the same SellerID at a time, and process each group of data in the same way.
Part of the data of the file sOrder.txt is as follows:
The esProc script is as follows:
A1: cursor function opens the file as a cursor. tab is used as the separator by default and function option @t means the first row will be imported
as the column names. If only the first four columns are imported and comma is
used as the separator, the code should be written as cursor@t(OrderID, Client, SellerId, Amount;
",").
A2: for
A1 ;SellerId. The code is to fetch data by loop from
cursor A1. By fetching a group of data with the same SellerID each time, all the data will be stored in the memory. Please
note the for statement here. for cs,n - a way of code writing in esProc – means
fetching n records from cursor cs at a time. for
cs;x means fetching a group of records from cursor cs at a time, with the same x
field in each group and the data having been sorted by x in advance. In this example the data have been sorted, otherwise sortx function can be used to sort them.
Besides being
a field, the x in the statement for cs;x can also be an expression. This means
rows of data would be fetched each time until expression x changes, such as the line of code for
A14 ;floor(SellerId/10), which groups together the data whose SellerID is from 0 to 9 and those whose SellerID is from 10 to 19, and in which floor function means getting the integer
part. If each SellerID hasn't many corresponding records, the above statement
can fetch more data at once. Thus the computational performance will be
increased.
B3-C3: This is the
for statement's loop body which performs
the same operation of data processing on each group. The process of data
processing is not our focus here. We'll design a case like this: Compute the
sales amount of each salesperson (SellerID),
and append the sales record of the salesperson to the file result.txt if the sales amount is greater than 10,000.
One thing worth noting is that the working
scope of for statements can be shown only
by the indentation without being marked by braces or begin/end. Moreover, a
loop variable can be represented by the name of the cell where for is placed. In this example, that means
A2 is the records corresponding to the current SellerID, and A2.sum(Amount) represents
performing summing up on the Amount
field of this group of records. export
function is used to export a group of data to a file and function option @a means the exportation will be
realized by appending.
The above esProc script has done all the
data processing. Then the rest of the work is to integrate the script with Java
program through JDBC. Detailed code in Java is as follows:
// create a connection using eProc JDBC
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
// call esProc script, whose name is test
st
=(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test()");
st.execute();//execute the esProc
stored procedure
Note: This case doesn't require returning
the computed result to Java program. But sometimes we’ll append the computed
result to a cell (say cell B2) and return it to Java program for further
processing. In that case, a line of code should be added to the esProc script,
like entering result B2 in cell A4, which
means outputting the data in B2 to JDBC interface.
Thus the following Java program needs
another line of code too to receive the returned result. The code will be
written as ResultSet set = st.getResultSet(); after
the execute.
No comments:
Post a Comment