Data Process Engine: esProc Helps with Computation in MongoDB

MongoDB script has limited computational ability in realizing complicated operations, so it is difficult to solve problems of this kind using it alone. In many cases, you can only perform further computations after retrieving the desired data out. And there is no less difficulty in trying to realizing this kind of set operations with high-level programming languages like Java. In this case, you can use esProc to help with the computation in MongoDB. An example will be provided for explaining how esProc works.

There is a collection – test – in MongoDB, as shown below:

> db.test.find({},{"_id":0})

{ "value" : NumberLong(112937552) }

{ "value" : NumberLong(715634640) }

{ "value" : NumberLong(487229712) }

{ "value" : NumberLong(79198330) }

{ "value" : NumberLong(440998943) }

{ "value" : NumberLong(93148782) }

{ "value" : NumberLong(553008873) }

{ "value" : NumberLong(336369168) }

{ "value" : NumberLong(369669461) }

…

Specifically, test includes multiple values, each of which is a digital string. It is required that each digital string be compared with all the other digital strings and find the biggest same digit and the biggest different digit in each digital string. If the number 1 exists both in the first row and in the nth row, their same digit will be counted as one. If the number exists only in the first row, and there is no such a number in the nth row, you can count one different digit.

esProc code:

A1: Connect to MongoDB. Both IP and the port number is localhost:27017. The database name, user name and the password all are test.

A2: find function is used to fetch data from MongoDB and create a cursor. orders is the collection, the filtering condition is null and _id , the specified key, won't be fetched. It can be seen that esProc uses the same parameter format in find function as that in find statement in MongoDB. esProc's cursor supports fetching and processing data in batches, thereby avoiding the memory overflow caused by importing big data at once. As the data size is not big, fetch function is used to get the records altogether from the cursor.

A3: Add two new columns to A2 for storing the biggest same and different numbers. And, at the same time, convert values into strings.

A4: Perform loop on the collection in A3, the loop body covers an area of B4-D10.

B4: Get the value on the current loop.

C4: Use array@s to split the column value into a sequence consisting of single characters and remove the duplicate values.

B5: Perform an inner loop on the collection in A3. The loop body is C6-D10.

C5: If the loop position of the inner loop is the same as the current one in the outer loop, that is, they hold the same value, skip the current inner loop and move on to the next.

C6: Get the value on the current inner loop.

C7: Define two variables - same and diff – for storing the same numbers and different numbers respectively got through the current comparison. The initial value is defined as zero.

C8: loop function is used to examine one by one in the inner loop the numerical values of the sequence formed by splitting values in the outer loop. If a same value is caught, the value of same will increase by one; otherwise the value of diff will increase by one.

C9, C10: Compare same and diff with those in A4, and reassign the bigger values to the same and diff in A4.

The final result after the code is executed is as follows:

Note: esProc isn't equipped with a Java driver included in MongoDB. So to access MongoDB using esProc, you must put MongoDB's Java driver (a version of 2.12.2 or above is required for esProc, e.g. mongo-java-driver-2.12.2.jar) into [esProc installation directory]\common\jdbc beforehand.

The esProc script used to help MongoDB in the computation is easy to be integrated into the Java program. You just need to add another line of code – A11 – that is, result A3, for outputting a result in the form of resultset to Java program. For the detailed code, please refer to esProc Tutorial. In the same way, MongoDB's Java driver must be put into the classpath of a Java program before the latter accesses MongoDB by calling an esProc program.

Data Process Engine

menu

12/02/2014

esProc Helps with Computation in MongoDB – Digital Comparison

No comments:

Post a Comment