5/27/2015

Performance Test of File Retrieval in esProc

This article aims to test esProc performance of retrieving files and compares it with the result got by using Oracle JDBC.

The test data is some order records saved in the orders.b file. Import the file with esProc and its content would be like this:
ORDERID CLIENT     SELLERID AMOUNT ORDERDATE NOTE
1       287  47     5825         2013-05-31       gafcaghafdgie f ci…
2       89     22     8681         2013-05-04       gafcaghafdgie f ci…
3       47     67     7702         2009-11-22       gafcaghafdgie f ci…
4       76     85     8717         2011-12-13       gafcaghafdgie f ci…
5       307  81     8003         2008-06-01       gafcaghafdgie f ci…
6       366  39     6948         2009-09-25       gafcaghafdgie f ci…
7       295  8       1419         2013-11-11       gafcaghafdgie f ci…
8       496  35     6018         2011-02-18       gafcaghafdgie f ci…
9       273  37     9255         2011-05-04       gafcaghafdgie f ci…
10     212  0       2155         2009-03-22       gafcaghafdgie f ci…
note field is the string field for increasing each record’s length and doesn’t have practical meaning. Test process: Write a Java program to call esProc JDBC and Oracle JDBC respectively for retrieving and traversing the same data, and record the time taken. The Java program, esProc script, the data file and Oracle are all held in one machine, so no network is needed to transmit them.

Data volume: 890M
Hardware and software configuration for the test machine: Normal PC
CPU: Core(TM) i5-3450   4 cores, 4 threads
Memory: 16GB
SSD
OSCENT OS
Oracle11g
esProc version: 3.1

esProc script esProcjdbc.dfx for data retrieval (cellset parameter fileName is the file name):

A
1
=file(fileName).import@b()
2
result A1

The test program of calling esProc JDBC by Java:
import java.sql.*;
import java.text.SimpleDateFormat;
import java.util.Date;

public class esproc {
         public static void testDataServer(String input){
                   Connection con = null;
                   try{
                            SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");                                    Date begin=new Date();
                            System.out.println("begin"+df.format(begin));
                            long diff=0;
                            Class.forName("com.esproc.jdbc.InternalDriver");
                            con= DriverManager.getConnection("jdbc:esproc:local://");
                            com.esproc.jdbc.InternalCStatement st=
                                     (com.esproc.jdbc.InternalCStatement)con.prepareCall("call esProcjdbc(?)");
                            st.setObject(1,input);
                            ResultSet set =st.executeQuery();
                            int len=0;
                            while(set.next())
                            {
                                     String orderid=set.getString("ORDERID");
                                     String client=set.getString("CLIENT");
                                     String sellerid=set.getString("SELLERID");
                                     float amount=set.getFloat("AMOUNT");
                                     Timestamp ORDERDATE=set.getTimestamp("ORDERDATE");
                                     String note=set.getString("NOTE");
                                     len++;
                            }
                            System.out.println("len="+len);
                            diff=((new Date()).getTime()-begin.getTime())/1000;
                            System.out.println("time of all="+diff);
                   }
                   catch(Exception e){
                            System.out.println(e);
                   }
                   finally{
                            if (con!=null) {
                                     try {
                                               con.close();
                                     }
                                     catch(Exception e) {
                                               System.out.println(e);
                                     }
                            }
                   }
         }

         public static void main(String[] args) {
                   testDataServer(“/ssd/data/orders.b”);
         }

}

The SQL statement that Oracle implements:
Select * from orders;

The test program of calling Oracle JDBC by Java:
package jdbc;
import java.sql.*;
import java.util.Date;
import java.text.SimpleDateFormat;

public class oracle {
         public static void testDataServer(String input){
                   Connection con = null;
                   try{

                            SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");                                    Date begin=new Date();
                            System.out.println("begin"+df.format(begin));
                            long diff=0;
                            Class.forName("oracle.jdbc.driver.OracleDriver");
                            con= DriverManager.getConnection("jdbc:oracle:thin:@192.168.1.21:1521:orcl","test","test");
                            Statement st=con.createStatement();
                            ResultSet set =st.executeQuery(input);
                            int len=0;
                            while(set.next())
                            {
                                     String orderid=set.getString("ORDERID");
                                     String client=set.getString("CLIENT");
                                     String sellerid=set.getString("SELLERID");
                                     float amount=set.getFloat("AMOUNT");
                                     Timestamp ORDERDATE=set.getTimestamp("ORDERDATE");
                                     String note=set.getString("NOTE");
                                     len++;
                            }
                            System.out.println("len="+len);
                            diff=((new Date()).getTime()-begin.getTime())/1000;
                            System.out.println("time of all="+diff);
}
                   catch(Exception e){
                            System.out.println(e);
                   }
                   finally{
                            if (con!=null) {
                                     try {
                                               con.close();
                                     }
                                     catch(Exception e) {
                                               System.out.println(e);
                                     }
                            }
                   }
         }
         public static void main(String[] args) {
                   testDataServer(“select * from orders”);
         }

}

Test result:

esProc
Oracle
Execution time
13 seconds
43 seconds

Conclusion:
Data retrieval using Oracle JDBC is more than one time slower than that via esProc JDBC. The reason is that data flow needs to be converted to object through Oracle JDBC, which takes time. There is no need for Java-based esProc to do the same thing, so it works much faster.

One point to note is that the time taken by the first execution of calling Oracle JDBC to retrieve data is not used by the test. On the other hand, because the data volume is well within 6G, Oracle’s maximum memory capacity, it can make full use of the cache memory to save data. Therefore most of the execution time is devoted to conversion to object via JDBC.

This shows that esProc approach of retrieving data into the external memory is more efficient than Oracle’s throughput capacity.

Generally data computing within database will achieve higher performance. But in some cases you need to retrieve a great amount of data from the database. For example, data outside the database is involved, or the algorithm is too complex and specialized to be expressed in SQL. In these cases, JDBC for database connectivity will be the key factor to determine the performance. 


Yet better performance can be achieved by using esProc to retrieve data from the database and save it in files for use in later computation. To handle or display a large amount of historical data by report applications, it is more efficient to retrieve data into files before processing than to process it by retrieving data directly from the database. Within the log analysis system, you can save data as files and use esProc to handle the computation if you must retrieve data from the database using a programming language to perform certain operations involving log files. 

No comments:

Post a Comment