5/25/2015

Performance Test of Concurrent File Processing with esProc

This article aims to test esProc performance in processing files concurrently, based on filter, group and aggregate operations and comparison with Oracle.

Test data includes tables of ordering information of different years. With esProc, files (Orders2009.b, Orders2010.b, Orders2011.b and Orders2012.b) of binary format are imported. Below is a selection from the source data:
ORDERID CLIENT     SELLERID AMOUNT ORDERDATE NOTE
1       287  47     5825         2013-05-31       gafcaghafdgie f ci…
2       89     22     8681         2013-05-04       gafcaghafdgie f ci…
3       47     67     7702         2009-11-22       gafcaghafdgie f ci…
4       76     85     8717         2011-12-13       gafcaghafdgie f ci…
5       307  81     8003         2008-06-01       gafcaghafdgie f ci…
6       366  39     6948         2009-09-25       gafcaghafdgie f ci…
7       295  8       1419         2013-11-11       gafcaghafdgie f ci…
8       496  35     6018         2011-02-18       gafcaghafdgie f ci…
9       273  37     9255         2011-05-04       gafcaghafdgie f ci…
10     212  0       2155         2009-03-22       gafcaghafdgie f ci…
In which note is a string field for increasing each record’s length and hasn’t practical meaning.

Oracle imports the four tables Orders2009, Orders2010, Orders2011 and Orders2012. The data is the same as that of those files. The number of parallel threads is 4. esProc and Oracle use single thread, 2 threads and 4 threads respectively to process the computational task. Each subtask has an input parameter year.


esProc script group.dfx for data grouping and aggregation:

Oracle SQL1 for data grouping and aggregation:
"select /*+ Parallel(4) */ client,sum(amount),count(orderid) from orders"+string(year)+" group by client"

esProc script select.dfx for data filtering:

Oracle SQL2 for data filtering:
"select * from orders1g"+string(year)+" where SELLERID=1 and CLIENT=50"

Hardware description:
CPU: Core(TM) i5-3450  4 cores and 4 threads
Memory capacity: 16GB
SSD

Note:
1.       This performance test focuses on cases in which the result set is small and can be entirely loaded into memory.
2.       The input parameters for the four concurrent tasks are four different years. 
3.       The test result adopts the average value, the maximum value and the minimum value of results of four consecutive computations. Both esProc and Oracle might use the disk cache.
4.       Oracle’s available memory space is 6G.

Test result one: Each subtask handles 1G data. The total data is 4G.
Time unit: Second

1.Group and aggregate

2. Query and filter

Test result two: Each subtask handles 4G data. The total data is 16G.
Time unit: Second

1.Group and aggregate

2.Query and filter
Conclusion
1.       Oracle cache can be brought into full play in handling the 4G total data that is within the Oracle memory capacity. In this case Oracle gets a higher performance.

2.       When the total data is 16G, which exceeds Oracle’s available memory, esProc is generally better than Oracle if using the parallel processing. 

3.       On the same conditions, esProc has a steady computational time, while Oracle has a big gap between the maximum time and the minimum time. 

No comments:

Post a Comment