MongoDB doesn’t support join. The unity
JDBC recommended by its official website can perform join operation after
retrieving data out, but charges a fee for the service. Other free JDBC drivers
can only support the basic SQL statements, without join included. If you use programming
languages, like Java, to retrieve data first and proceed to perform join
operation, the process will still be complicated.
However, the join operation can be realized
by using esProc, which is free of charge, to help MongoDB with the computation.
Here is an example for illustrating the method in detail.
In MongoDB, there is a document - orders – that
holds the order data, and another file – employee
– for storing employee data, as shown in the following:
MongoDB shell version: 2.6.4
connecting to: test
> db.orders.find();
{ "_id" :
ObjectId("5434f88dd00ab5276493e270"), "ORDERID" : 1,
"CLIENT" : "UJRNP
", "SELLERID" : 17,
"AMOUNT" : 392, "ORDERDATE" : "2008/11/2 15:28" }
{ "_id" :
ObjectId("5434f88dd00ab5276493e271"), "ORDERID" : 2,
"CLIENT" : "SJCH"
, "SELLERID" : 6,
"AMOUNT" : 4802, "ORDERDATE" : "2008/11/9 15:28"
}
{ "_id" : ObjectId("5434f88dd00ab5276493e272"),
"ORDERID" : 3, "CLIENT" : "UJRNP
", "SELLERID" : 16,
"AMOUNT" : 13500, "ORDERDATE" : "2008/11/5 15:28"
}
{ "_id" :
ObjectId("5434f88dd00ab5276493e273"), "ORDERID" : 4,
"CLIENT" : "PWQ",
"SELLERID" : 9, "AMOUNT" :
26100, "ORDERDATE" : "2008/11/8 15:28" }
…
> db.employee.find();
{ "_id" :
ObjectId("5437413513bdf2a4048f3480"), "EID" : 1,
"NAME" : "Rebecca", "
SURNAME" : "Moore",
"GENDER" : "F", "STATE" : "California",
"BIRTHDAY" : "1974-1
1-20", "HIREDATE" :
"2005-03-11", "DEPT" : "R&D",
"SALARY" : 7000 }
{ "_id" :
ObjectId("5437413513bdf2a4048f3481"), "EID" : 2,
"NAME" : "Ashley", "S
URNAME" : "Wilson",
"GENDER" : "F", "STATE" : "New York",
"BIRTHDAY" : "1980-07-
19", "HIREDATE" :
"2008-03-16", "DEPT" : "Finance",
"SALARY" : 11000 }
{ "_id" :
ObjectId("5437413513bdf2a4048f3482"), "EID" : 3,
"NAME" : "Rachel", "S
URNAME" : "Johnson",
"GENDER" : "F", "STATE" : "New Mexico",
"BIRTHDAY" : "1970-
12-17", "HIREDATE" :
"2010-12-01", "DEPT" : "Sales",
"SALARY" : 9000 }
…
The SELLERID in orders corresponds to EID in employee.
Please select from employee all order
information in which the state is California. orders holds big data and thus cannot be loaded into the memory
entirely. But employee has fewer data and so has the filtering result of orders.
The conditional expression for selecting
data can be passed to the esProc program as a parameter, as shown in the
following figure:
esProc code:
A1: Connect to MongoDB. Both IP and the
port number are localhost:27017. The
database name, user name and the password all are test.
A2: find
function is used to fetch data from MongoDB and create a cursor. orders is the collection, with a
filtering condition being null and _id
, the specified key, not being fetched. esProc uses the same parameter format in
find function as that in find statement of MongoDB. esProc's
cursor supports fetching and processing data in batches, thereby avoiding the
memory overflow caused by importing big data at once.
A3: Fetch data from employee. Because the data size to be fetched is not big, you can
use fetch function to get them all at
once.
A4: switch
function is used to convert values of SELLERID field in A2's cursor into the
record references in A3's employee.
A5: Select the desired data according to
the condition. Here a macro is used to dynamically parse an expression, in
which where is the input parameter.
In esProc, the expression surrounded by ${…}
will be first computed, the result be taken as the macro string value to replace
${…} , and then the code be
interpreted and executed. Therefore the actual code executed in this step is =A4.select(SELLERID.STATE=="California").
Since SELLERID has been converted into the references of corresponding records
in employee, you can write the code
in such a way as SELLERID.STATE. As
the data size of the filtering result is not big, you can fetch the data all at
once. But if the data size is still rather big, you can fetch the data in batches,
say, 10,000 rows per batch, which can be expressed as fetch(10000).
A6: Reconvert the values
of SELLERID in the filtering result into ordinary ones.
When the filtering condition is changed,
you need not change the whole program, but modifying the parameter where. For example, the condition is
changed to "orders information in which state is California or CLIENT is PWQ",
then the value of where can be
expressed as CLIENT=="PWQ"||
SELLERID.STATE=="California".
The thing is that esProc isn't equipped
with a Java driver included in MongoDB. So to access MongoDB using esProc, you
must put MongoDB's Java driver (a version of 2.12.2 or above is required for esProc,
e.g. mongo-java-driver-2.12.2.jar) into [esProc installation
directory]\common\jdbc beforehand.
The script written in esProc which is used
to help MongoDB with the computation is easy to be integrated into the Java
program. You just need to add another line of code - A7 – that is, result A6, for outputting a result in
the form of resultset to Java
program. For the detailed code, please refer to esProc Tutorial. In the same way, MongoDB's Java driver must be put
into the classpath of a Java program before the latter accesses MongoDB by
calling an esProc program.
No comments:
Post a Comment