This article aims to test esProc
performance in processing files concurrently, based on filter, group and
aggregate operations and comparison with Oracle.
Test data includes tables of ordering
information of different years. With esProc, files (Orders2009.b, Orders2010.b,
Orders2011.b and Orders2012.b) of binary format are imported. Below is a
selection from the source data:
ORDERID CLIENT SELLERID AMOUNT ORDERDATE NOTE
1 287 47 5825 2013-05-31 gafcaghafdgie f ci…
2 89 22 8681 2013-05-04 gafcaghafdgie f ci…
3 47 67 7702 2009-11-22 gafcaghafdgie f ci…
4 76 85 8717 2011-12-13 gafcaghafdgie f ci…
5 307 81 8003 2008-06-01 gafcaghafdgie f ci…
6 366 39 6948 2009-09-25 gafcaghafdgie f ci…
7 295 8 1419 2013-11-11 gafcaghafdgie f ci…
8 496 35 6018 2011-02-18 gafcaghafdgie f ci…
9 273 37 9255 2011-05-04 gafcaghafdgie f ci…
10 212 0 2155 2009-03-22 gafcaghafdgie f ci…
…
In which note is a string field for
increasing each record’s length and hasn’t practical meaning.
Oracle imports the four tables Orders2009,
Orders2010, Orders2011 and Orders2012. The data is the same as that of those
files. The number of parallel threads is 4. esProc and Oracle use single
thread, 2 threads and 4 threads respectively to process the computational task.
Each subtask has an input parameter year.
esProc script group.dfx for data grouping
and aggregation:
Oracle SQL1 for data grouping and
aggregation:
"select /*+ Parallel(4) */
client,sum(amount),count(orderid) from orders"+string(year)+" group
by client"
Oracle SQL2 for data filtering:
"select * from
orders1g"+string(year)+" where SELLERID=1 and CLIENT=50"
Hardware description:
CPU: Core(TM) i5-3450 4 cores and 4 threads
Memory capacity: 16GB
SSD
Note:
1.
This performance test focuses
on cases in which the result set is small and can be entirely loaded into
memory.
2.
The input parameters for the four
concurrent tasks are four different years.
3.
The test result adopts the
average value, the maximum value and the minimum value of results of four consecutive
computations. Both esProc and Oracle might use the disk cache.
4.
Oracle’s available memory space
is 6G.
Test result one: Each subtask handles 1G
data. The total data is 4G.
Time unit: Second
1.Group and aggregate
2. Query and filter
Test result two: Each subtask handles 4G
data. The total data is 16G.
Time unit: Second
2.Query and filter
Conclusion
1.
Oracle cache can be brought
into full play in handling the 4G total data that is within the Oracle memory
capacity. In this case Oracle gets a higher performance.
2.
When the total data is 16G,
which exceeds Oracle’s available memory, esProc is generally better than Oracle
if using the parallel processing.
3.
On the same conditions, esProc
has a steady computational time, while Oracle has a big gap between the maximum
time and the minimum time.
No comments:
Post a Comment