11/23/2014

esProc Helps Process Heterogeneous DataSources in Java - MongoDB

MongoDB doesn't support join directly. The unity JDBC recommended by the official website can perform the join operation after retrieving data out. But the advanced functions, like join, group, functions and expressions, are only provided by the paid version of unity JDBC. Even the paid version doesn't support the complicated SQL operations, such as subquery, window functions, etc. The free JDBC drivers can only support the most basic SQL statements.

Using free esProc working with MongoDB can realize the above-mentioned complicated structured (or semi-structured) computations. We'll take join as an example to illustrate the method in detail.

As shown in the following, the file -orders- in MongoDB contains the sales orders, and employee contains the employee information:

MongoDB shell version: 2.6.4
connecting to: test
>db.orders.find();
{ "_id" : ObjectId("5434f88dd00ab5276493e270"), "ORDERID" : 1, "CLIENT" : "UJRNP
", "SELLERID" : 17, "AMOUNT" : 392, "ORDERDATE" : "2008/11/2 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e271"), "ORDERID" : 2, "CLIENT" : "SJCH"
, "SELLERID" : 6, "AMOUNT" : 4802, "ORDERDATE" : "2008/11/9 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e272"), "ORDERID" : 3, "CLIENT" : "UJRNP
", "SELLERID" : 16, "AMOUNT" : 13500, "ORDERDATE" : "2008/11/5 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e273"), "ORDERID" : 4, "CLIENT" : "PWQ",
 "SELLERID" : 9, "AMOUNT" : 26100, "ORDERDATE" : "2008/11/8 15:28" }
>db.employee.find();
{ "_id" : ObjectId("5437413513bdf2a4048f3480"), "EID" : 1, "NAME" : "Rebecca", "
SURNAME" : "Moore", "GENDER" : "F", "STATE" : "California", "BIRTHDAY" : "1974-1
1-20", "HIREDATE" : "2005-03-11", "DEPT" : "R&D", "SALARY" : 7000 }
{ "_id" : ObjectId("5437413513bdf2a4048f3481"), "EID" : 2, "NAME" : "Ashley", "S
URNAME" : "Wilson", "GENDER" : "F", "STATE" : "New York", "BIRTHDAY" : "1980-07-
19", "HIREDATE" : "2008-03-16", "DEPT" : "Finance", "SALARY" : 11000 }
{ "_id" : ObjectId("5437413513bdf2a4048f3482"), "EID" : 3, "NAME" : "Rachel", "S
URNAME" : "Johnson", "GENDER" : "F", "STATE" : "New Mexico", "BIRTHDAY" : "1970-
12-17", "HIREDATE" : "2010-12-01", "DEPT" : "Sales", "SALARY" : 9000 }


SELLERID in orders corresponds to EID in employee. Query the information of sales orders that satisfies the criterion that the property of STATE of employee is California. The expression of querying condition can be passed to esProc program as a parameter, as shown below: 

As esProc isn't equipped with the java driver of MongoDB, the latter should be put into the [esProc installation directory]\common\jdbc in advance when esProc is used to access MongoDB (esProc requires a driver of 2.12.2 version or above, i.e. mongo-java-driver-2.12.2.jar).
        
The jar can be downloaded from the URL https://github.com/mongodb/mongo-java-driver/releases.

The code written in esProc is as follows:

A1: Connect to the MongoDB database. The IP and port number is localhost:27017, the name of the database is test and both the user name and the password are test. If any other parameters are needed, write them in line with the format mongo://ip:port/db?arg=value&…

A2: Fetch data from the MongoDB using find function to create a cursor. The set is orders. The filtering criterion is null and the specified key _id need not be fetched. It can be seen that this find function is similar to the find function of MongoDB. By fetching and processing data in batches, the esProc's cursor can avoid the memory overflow caused by big data importing. Since the data here are small, fetch function is used to fetch them at once.

A3: Fetch data from employee.

A4: switch function is used to switch the values of field SELLERID in A2 (orders) to the references of the corresponding records in A3 (employee).

A5: Filter the data according to the criterion. Here macro is used to realize dynamically parsing the expression, in which where is the input parameter. esProc will compute the expression surrounded by ${…} first, take the result as a macro string value and replace ${…} with it, and then interpret and execute the code. The code ultimately executed is =A2.select(SELLERID.STATE=="California"). Since the values of SELLERID have been switched to the references of corresponding records in employee, you can just write SELLERID.STATE.

A6Switch back the values of SELLERID in the filtering result to ordinary values.

A7Return the eligible result set to Java program.

If the filtering condition is changed, you just need to modify the parameter where without changing the program. For example, the condition is changed to "select the information of sales orders where STATE is California or that where CLIENT is PWQ". The value of the parameter where can be written as CLIENT=="PWQ"|| SELLERID.STATE=="California".

esProc JDBC is used to call this block of code in Java program and get the result (save the above esProc block of code as test.dfx and put the java driver of MongoDB into the classpath of Java program). The code is as follows:

// create a connection using esProc JDBC
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
// call the esProc program (stored procedure), in which test is the name of file dfx
com.esproc.jdbc.InternalCStatementst =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test(?)");
    // set the parameters
st.setObject(1,"CLIENT==\"PWQ\"||SELLERID.STATE==e\"California\"");        
// execute the esProc stored procedure
ResultSet set =st.executeQuery(); 


No comments:

Post a Comment