This article focuses on testing performance
of esProc’s handling of file traversal, using commonly used group and aggregate
operations as well as query and filter operations. The performance test will
include evaluation of the influence of using multithreads, and comparison
between esProc approach and Oracle counterpart for the same operation.
We use ordering information as the test
data. The following is a selection of the Orders.b file in esProc binary
format:
ORDERID CLIENT SELLERID AMOUNT ORDERDATE NOTE
1 287 47 5825 2013-05-31 gafcaghafdgie f ci…
2 89 22 8681 2013-05-04 gafcaghafdgie f ci…
3 47 67 7702 2009-11-22 gafcaghafdgie f ci…
4 76 85 8717 2011-12-13 gafcaghafdgie f ci…
5 307 81 8003 2008-06-01 gafcaghafdgie f ci…
6 366 39 6948 2009-09-25 gafcaghafdgie f ci…
7 295 8 1419 2013-11-11 gafcaghafdgie f ci…
8 496 35 6018 2011-02-18 gafcaghafdgie f ci…
9 273 37 9255 2011-05-04 gafcaghafdgie f ci…
10 212 0 2155 2009-03-22 gafcaghafdgie f ci…
…
NOTE field is added for the purpose of
increasing each record’s length but hasn’t practical meaning.
esProc script for grouping and aggregating
data:
Oracle approach to grouping and aggregating
data using SQL:
select /*+
Parallel(8) */ client,sellerid,count(orderid),sum(amount) from orders group by
client,sellerid。
Oracle approach to querying and filtering
data using SQL:
select /*+
Parallel(4) */ * from orders where sellerid=1 and client=50 and
to_char(orderdate,'yyyy')>= 2013。
Two types of hardware are used
for the test:PC server (disk
array) and regular PC (SSD)
Configuration for a PC server:
CPU: 2 Xeon
E5620 ( 4 cores 8 threads, 16 threads in
total)
Memory: 24GB
Hard drive: 800G
HDD
Configurain for a regular PC:
CPU:1 Core(TM)
i5-3450 (4 cores)
Memory: 16GB
SSD
Note: We will get the test result by
computing the average value of running a script 3 times, during which the machine
won’t be restarted. Both Oracle program and esProc program might store data
temporarily in the disk.
Test
results for the PC server:
Unit: Seconds
Test 1 for data
grouping and aggregation (data volume is 25G):Test 2 for data grouping and aggregation (data volume is 10G):
Test 3 for data grouping and aggregation (data volume is 3.5G):
Test 1 for data query and filtering (data volume is 25G):
Test 2 for data query and filtering (data volume is 10G):
Test 3 for data query and filtering (data volume is 3.5G):
Test result summary on PC server (disk
array):
1.Maximum memory supported by
Oracle database is 12G. When data volume is below this limit, like 10G and
3.5G, Oracle’s performance is better than esProc’s.
2.When data volume is greater
than 12G, like 25G, the use of parallel computing has little effect of
increasing performance. In grouping and aggregating data, esProc’s performance
is significantly higher than Oracle; the two are nearly even in performing data
query and filtering.
3.esProc makes quite a leap in
performance in each addition of the number of parallel tasks from 1 to 4. But
during the increase from 4 to 16, its performance improves not as much as
Oracle’s.
Test
results for the regular PC:
Unit:Seconds
Test 1 for data
grouping and aggregation (data volume is 25G):Test 2 for data grouping and aggregation (data volume is 10G):
Test 3 for data grouping and aggregation (data volume is 3.5G):
Test 1 for data query and filtering (data volume is 25G):
Test 2 for data query and filtering (data volume is 10G):
Test 3 for data query and filtering (data volume is 3.5G):
Test result summary on regular PC:
1.In this case the maximum memory
supported by Oracle database is 6G. When data volume is within the memory
capacity, like 3.5G, Oracle performs better than esProc.
2. When data volume exceeds 12G,
like 25G and 10G, parallel processing enhances performance very little. In
grouping and aggregating data, esProc’s performance is significantly higher
than Oracle; but only slightly better in performing data query and filtering.
Performance
test summary:
According to the test results of both types
of hardware, Oracle can achieve a better performance when its available memory
can hold all data. The reason may be that it stores the data temporarily in the
memory, instead of actually reading the disk. So it can work faster. In comparison,
esProc, with no disk cache, reads the file for each computing action and thus
can only make use of the limited buffer provided by the file system.
When a single task involves data that is
more than the available memory can hold, esProc’s multithreading approach can
achieve higher performance.
On the other hand, although PC server
supports 16-thread computational power, performance rarely increases when the number
of parallel tasks is nearly 16, due to the limited throughput of the disk
array. In view of this, an esProc solution should set a reasonable number of
parallel tasks according to the machine’s hard drive performance.
No comments:
Post a Comment