It is simple for Java to sort data in text
files of small size. But when big files are involved, you need to import data
segmentally, write out the sorting result of each segment in temporary files
and at last, merge these temporary files. The programming will be rather
complicated. Even if the file is small enough to be loaded to the memory, you
will have to parse the data type of the text file. The job is not difficult but
the code is long.
However, these problems can be avoided by
using esProc to help with programming in Java. Let’s look at in detail how this
will happen. Now you are required to sort the employee data in the text file employee.txt by STATE in ascending order
and by BIRTHDAY in descending order. It is assumed that the data size of the
file is huge and exceeds the memory capacity.
employee.txt is of the following format:
EID NAME SURNAME GENDER STATE BIRTHDAY HIREDATE DEPT SALARY
1 Rebecca Moore F California 1974-11-20 2005-03-11 R&D 7000
2 Ashley Wilson F New York 1980-07-19 2008-03-16 Finance 11000
3 Rachel Johnson F New Mexico 1970-12-17 2010-12-01 Sales 9000
4 Emily Smith F Texas 1985-03-07 2006-08-15 HR 7000
5 Ashley Smith F Texas 1975-05-13 2004-07-30 R&D 16000
6 Matthew Johnson M California 1984-07-07 2005-07-07 Sales 11000
7 Alexis Smith F Illinois 1972-08-16 2002-08-16 Sales 9000
8 Megan Wilson F California 1979-04-19 1984-04-19 Marketing 11000
9 Victoria Davis F Texas 1983-12-07 2009-12-07 HR 3000
10 Ryan Johnson M Pennsylvania 1976-03-12 2006-03-12 R&D 13000
11 Jacob Moore M Texas 1974-12-16 2004-12-16 Sales 12000
12 Jessica Davis F New York 1980-09-11 2008-09-11 Sales 7000
13 Daniel Davis M Florida 1982-05-14 2010-05-14 Finance 10000
…
Implementation approach: call the esProc
script with Java, import and compute the data, and then return the result in
the form of ResultSet to Java. To
perform sorting by STATE in ascending order and by BIRTHDAY in descending
order, esProc will use an input parameter "sortBy" as the sorting expression,
as shown in the following figure:
The value of "sortBy" is STATE,-BIRTHDAY. The field with a minus
before it represents the opposite number, which is valid for string data,
numerical data and date data.
A1:Define a file cursor object, with
the first row being the title and tab
being the field separator by default.
A2:Perform sorting according to the
expression, using macro to realize parsing the expression dynamically. The "sortBy"
in this process is an input parameter. In executing, esProc will first compute
the expression enclosed by ${…}, then replace ${…} with the computed result
acting as the macro string value and interpret and execute the code. The final
code to be executed in this example is =A1.sortx(STATE,-BIRTHDAY;1000000).
A3:Return the result cursor to external program. While Java receives
the returned result and traverses the data with ResultSet, esProc will automatically fetch the data corresponding
to the cursor. If the sorted data are to be written into other files, the code
in A3 should be modified into =file("D:/employee_result.txt").export@t(A2).
If the sorting fields and order are
changed, you just modify sortBy– the
parameter. For example, if the data are required to be sorted by NAME in
ascending order and by STATE and BIRTHDAY in descending order, the value of
sortBy will be written as NAME,-STATE,-BIRTHDAY.
sortx function performs sorting by importing data segmentally according
to buffer rows, write the results of sorting each segment into temporary files,
redistribute the memory usage and then merge these temporary files. Here the
parameter 1000000 refers to buffer rows. The principle of assigning value to it
is to make the best of the memory to reduce the number of temporary files as
far as possible. The number of temporary files is related to the size of both
the physical memory and the records, and should be evaluated during
programming. Generally, the recommended number is between magnitudes of several
hundred thousand to a magnitude of one million.
The code of calling this piece of code
(which is saved as test.dfx) in Java
with esProc JDBC is as follows:
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
//call the program in esProc (the stored
procedure); test is the name of file dfx
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call
test(?)");
//set the parameters
st.setObject(1,"NAME,-STATE,-BIRTHDAY");
//execute the esProc stored procedure
st.execute();
//get the result set, which is the eligible set
of employees
ResultSet set = st.getResultSet();
If the script is simple, the code can be
written directly into the program in Java that calls the esProc JDBC. It won't
be necessary to write a special esProc script file (test.dfx):
st=(com. esproc.jdbc.InternalCStatement)con.createStatement();
ResultSet set=st.executeQuery("=file(\"D:/employee.txt\").cursor@t().sortx(NAME,-STATE,BIRTHDAY;1000000)");
This piece of code in Java calls a line of
code in esProc script directly, that is, get the data from the text file,
compute them and return the result set to set–
the object of ResultSet.
Parameter sortBy can be written as STATE,-BIRTHDAY, or STATE:1,BIRTHDAY:-1. And it is no need to modify the calling program of Java.
No comments:
Post a Comment