5/28/2015

Performance Test of esProc In-Memory Computing

In this article, we’ll test the performance of esProc in handling in-memory small data computing, and compare with that of Oracle when performing the same computation.

The test involves two cases: normal simple computing and complicated related computing:

The test data used in normal computing is order information, as shown below:
ORDERID CLIENT     SELLERID AMOUNT          ORDERDATE
1       287  47     5825         2013-05-31      
2       89     22     8681         2013-05-04      
3       47     67     7702         2009-11-22      
4       76     85     8717         2011-12-13      
5       307  81     8003         2008-06-01      
6       366  39     6948         2009-09-25      
7       295  8       1419         2013-11-11      
8       496  35     6018         2011-02-18      
9       273  37     9255         2011-05-04      
10     212  0       2155         2009-03-22      
esProc script for normal computing:

A
B
1
=date("2013-09-01")
=date("2013-11-01")
2
=file("/ssd/data/orders.b").import@b()
3
=A2.select(ORDERDATE>A1 && ORDERDATE<B1)
4
=A3.groups@u(CLIENT,SELLERID;sum(AMOUNT):S,count(ORDERID):C)
5
=A4.select(C==13)

6
result A5

Note about the esProc script: A2 imports data in one go and its result can be used repeatedly. So it will not be counted in the execution time of in-memory computing, which only includes the time spent in executing steps from A3 to A6.
  
SQL script that Oracle uses:
select client,sellerid,sum(amount) s,count(orderid) c
from orders
where orderdate>to_date('2013-09-01','yyyy-MM-dd')
and orderdate<to_date('2013-11-01','yyyy-MM-dd')
group by client,sellerid
having count(orderid)=13

There are 8 million rows of data in orders.
Test result:

esProc
Oracle
Execution time
0.570 seconds
0.623 seconds
Note: Execute Oracle’s SQL script four times and get the average of the last three times. In this way Oracle can make full use of its in-memory cached data. As for esProc script, we’ll execute it three times and get the average time.

During related computing testing, esProc imports the following files - orders.b, orders_detail.b, employee.b, department.b and performance.b. The relationships among them are as follows:

esProc script for data importing:

A
B
1
="/ssd/data/"

2
=file(A1+"orders_detail.b").import@b()
3
=file(A1+"orders.b").import@b().primary(ORDERID).index()
4
=file(A1+"employee.b").import@b().primary(EID).index()
5
=file(A1+"department.b").import@b().primary(DEPT).index()
6
=file(A1+"performance.b").import@b().primary(EID).index()
7
=A2.switch(ORDERID,A3:ORDERID)
=A3.switch(SELLERID,A4:EID)
8
=A4.switch(DEPT,A5:DEPT)
=A5.switch(MANAGER,A4:EID)
9
=A4.switch(EID,A6:EID)
=A2.select(ORDERID.SELLERID.DEPT.MANAGER.EID.BONUS>6000)
10
=B9.groups@u(ORDERID.CLIENT:CLIENT,ORDERID.SELLERID.EID.EID:EID,PRODUCT;sum(PRICE*QUANTITY):S,count(ORDERID):C)
11
=A10.select(S>24500 && S<25000)
result A11
Note about esProc script: The code from A1 to A9 imports files and establishes the relationships (import and switch). This is done at one time and the result can be used repeatedly. This part of execution will not be included in the execution time of in-memory computing. The execution time is the time spent in executing steps from B9 to B11.

Oracle’s SQL
select o.client,o.sellerid,od.product,sum(od.price*od.quantity) s,count(o.orderid) c
from orders_detail od left join orders  o on o.orderid=od.orderid
left join employee e on o.sellerid=e.eid
left join department d on d.dept=e.dept
left join employee e1 on d.manager=e1.eid
left join performance p on p.employeeid=e1.eid
where p.bonus>=6000
group by o.client,o.sellerid,od.product
having sum(od.price*od.quantity) between 24500 and 25000

Test one: Both the orders table and the orders_detail table have 8 million rows. There are less than 1,000 rows in other tables.
Test result:

esProc
Oracle
Execution time
4.9 seconds
9.7 seconds
Test two: Both the orders table and the orders_detail table have 4 million rows. There are less than 1,000 rows in other tables.

Test result:

esProc
Oracle
Execution time
2.3 seconds
5.1 seconds

Similarly, we execute Oracle’s SQL script four times and get the average of the last three times, in order to let Oracle make full use of its in-memory cached data. We execute esProc script three times and get the average.

Conclusion:
For small data in-memory computing, when the computation is simple without file relating, esProc is a little faster than Oracle. This is probably because, though Oracle can cache data, it still needs to convert the cached data (without using the physical disk) before processing it. When handling the complicated related computing, esProc works much faster than Oracle. The reason is that Oracle uses the hash algorithm to join two tables, while esProc uses a pointer to reference foreign key values, and thus does not need to compute hash values and query matching values.

Therefore, when there is enough available memory capacity, esProc can achieve much higher performance than conventional relational database if we import data into memory and arrange it beforehand. esProc is great in achieving high performance for in-memory computing.

Test environment:
CPU: Core(TM) i5-3450   four cores, four threads
Memory capacity: 16GB
OSCENT OS
Oracle11gMaximum memory capacity is 14G
esProc 3.1 version: Maximum memory capacity is 14G


No comments:

Post a Comment