Data Process Engine: April 2015

4/29/2015

Parse and Update Dynamic JSON into Database with esProc

Problem source: http://bbs.csdn.net/topics/390611005

Below is JSON data (s.json) the system acquires:

{

"SUCCESS": [

{

"MESSAGE": "IMEI Service List",

"LIST": {

"MOVISTAR SPAIN": {

"GROUPNAME": "MOVISTAR SPAIN",

"SERVICES": {

"3": {

"SERVICEID": 32,

"SERVICENAME": "MOVISTAR NOKIA INSTANTE",

"CREDIT": 4,

"TIME": "1-30 Minutes",

"INFO": "…… ",

"Requires.Network": "None",

"Requires.Mobile": "None",

"Requires.Provider": "None",

"Requires.PIN": "None",

"Requires.KBH": "None",

"Requires.MEP": "None",

"Requires.PRD": "None",

"Requires.Type": "None",

"Requires.Locks": "None",

"Requires.Reference": "None"

"8": {

"SERVICEID": 77,

"SERVICENAME": "MOVISTAR NOKIA 20 NCK",

"CREDIT": 12,

"TIME": "1-30 Minutes",

"INFO": "……",

"Requires.Network": "None",

"Requires.Mobile": "None",

"Requires.Provider": "None",

"Requires.PIN": "None",

"Requires.KBH": "None",

"Requires.MEP": "None",

"Requires.PRD": "None",

"Requires.Type": "None",

"Requires.Locks": "None",

"Requires.Reference": "None"

}

"VODAFONE SPAIN": {

"GROUPNAME": "VODAFONE SPAIN",

"SERVICES": {

"5": {

"SERVICEID": 50,

"SERVICENAME": "VODAFONE NOKIA BB5 SL3",

"CREDIT": 5,

"TIME": "1-60 Minutes",

"INFO": "……",

"Requires.Network": "None",

"Requires.Mobile": "None",

"Requires.Provider": "None",

"Requires.PIN": "None",

"Requires.KBH": "None",

"Requires.MEP": "None",

"Requires.PRD": "None",

"Requires.Type": "None",

"Requires.Locks": "None",

"Requires.Reference": "None"

"10": {

"SERVICEID": 95,

"SERVICENAME": "VODAFONE SONY&;SONY ERIC(RAPIDO)",

"CREDIT": 16,

"TIME": "1-24 Hours",

"INFO": "……",

"Requires.Network": "None",

"Requires.Mobile": "None",

"Requires.Provider": "None",

"Requires.PIN": "None",

"Requires.KBH": "None",

"Requires.MEP": "None",

"Requires.PRD": "None",

"Requires.Type": "None",

"Requires.Locks": "None",

"Requires.Reference": "None"

}

"apiversion": "2.0.0"

}

Based on above JSON data, you need to update database tables with property values of corresponding section. Below is the two tables need updating:

Create table [dbo].[Groups]

(

[ID] [int] IDENTITY(1,1) NOT NULL, --id

[Groupname] [nvarchar] (50) not null default(''), --name

[groupid] [int] not null default(0),

CONSTRAINT [PK_Groups_id] PRIMARY KEY CLUSTERED

(

[id] ASC

)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]

) ON [PRIMARY]

CREATE TABLE [dbo].[Services](

[id] [int] IDENTITY(1,1) NOT NULL, --id

[Serviceid] [int] not null default(0),

[Servicename] [nvarchar] (50) not null default(''),

[groupid] [int] not null default(0),

[Credit] [decimal] not null default(0.00),

[Time] [nvarchar] (50) not null default(''),

[INFO] [nvarchar] (3000) not null default(''),

[Network] [nvarchar] (100) not null default('none'),

[Mobile] [nvarchar] (100) not null default('none'),

[Provider] [nvarchar] (100) not null default('none'),

[PIN] [nvarchar] (100) not null default('none'),

[KBH] [nvarchar] (100) not null default('none'),

[MEP] [nvarchar] (100) not null default('none'),

[PRD] [nvarchar] (100) not null default('none'),

[Type] [nvarchar] (100) not null default('none'),

[Locks] [nvarchar] (100) not null default('none'),

[Reference] [nvarchar] (100) not null default('none'),

[isstatus] [nvarchar] (1) not null default('0'),

[remark] [nvarchar] (255) not null default(''),

[Pricingid] [int] not null default(0),

CONSTRAINT [PK_Services_id] PRIMARY KEY CLUSTERED

(

[id] ASC

)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]

) ON [PRIMARY]

The property for SERVICES is groupid, such as 3, 5, 8, 10. The rest of the fields correspond to other properties respectively. That the property names under LIST and SERVICES are not fixed makes the data parsing difficult. Compared with common high-level languages, esProc supports dynamic data structure and set operations and thus can provide easy solution. esProc script is as follows:

A1: Read JSON file into strings and convert them into a cascaded table sequence with rows and columns using import@j().

A2-A3: Create empty table sequences based on the two target tables, in order to store parsing results that will be updated into the database in one go.

A4-B4: Run a loop in A4 and calculate the number of sections under LIST in B4.

B5-C6: Get the content of each section of LIST by loop and calculate the number of sections under SERVICES in C6.

C7-D8: Get property names and values from each SERVICES’s section by loop.

D9-D10: Write parsing results respectively back into the empty table sequences in A2 and A3.

A11: Update A2’s data into groups table through the primary key groupid.

A12: Update A3’s data into services table through the primary key Serviceid.

4/28/2015

Join MongoDB Collections with esProc

Problem source: http://stackoverflow.com/questions/29396985/is-there-a-where-like-relation-function-when-using-pymongo

It is difficult to join MongoDB collections through hardcoding as it doesn’t directly support joins. Yet you can use esProc to perform inner join, left join and full join between collections or join the documents. Here is an example.

Logically, the two collections - categories and rules – have a referenced and referencing relationship through the key field cat. They need to be joined using left join to retrieve title, regex, cat field from categories and path field from rules. Below is a part of the original data:

esProc code:

A1: Connect to MongoDB. The connection string format is mongo://ip:port/db?arg=value&…

A2, B2: Retrieve data from MongoDB using find function, sort it and create a cursor. A2’s cursor contains data from the left table that requires sorting by cat. In esProc find function, which is analogous to the combination of MongoDB find, sort and limit function, the filtering criterion syntax follows the MongoDB rules.

A3: Perform left join through the key field cat. @x means joining the two cursors. @1 means performing left join. The two options can work together.

A4: Retrieve desired field from A3. _1 and _2 respectively represent the two cursors under joining.

A5: Fetch data from the cursor as follows:

One point to note is that if A4 produces too much data to be loaded into memory, you can use export function to write it into a file.

A6: Close MongoDB connection.

4/27/2015

Group MongoDB Collection and Find Top N members in esProc

Problem source：https://groups.google.com/forum/#!topic/mongodb-user/Rjv6lyUCe1s .

Collection last3 has two fields: variable and timestamp. You need to first group documents by variable and find from each group the top 3 ones with the latest timestamp, and then find from the three documents the one with the earliest timestamp.

Below is the selection from last3：

esProc code：

A1: Connect to MongoDB. The connection string format is mongo://ip:port/db?arg=value&…

A2: Use find function to retrieve data from MongoDB, sort it and create a cursor. last3 is the collection name; no filtering criterion is specified; and all fields except _id will be retrieved and sorted by variable. In esProc find function, which is analogous to the combination of MongoDB find, sort and limit function, the filtering criterion syntax follows the MongoDB rules.

A3: Fetch data from the cursor by loop, getting a group of documents with the same variable field each time. A3’s working range is the indented B3 to B4, where A3 can be used to reference the loop variable. A3’s result is in-memory data. The following is one of the results of data fetching:

B3: Find from the current group of documents the three ones with the latest timestamp.

B4: Append each of B3’s loop results to B2. The result of B2 is as follows:

A5：Find the document with the earliest timestamp from B2. It is as follows:

A6：Close MongoDB connection.

4/26/2015

esProc Implements Foreign Key Relationship for MongoDB Collections

Problem source：http://stackoverflow.com/questions/29392169/populating-field-values-for-referred-documents-in-aggregate-call-in-mongoose .

With MongoDB’s built-in API, you implement a foreign key relationship through hardcoding. The hardcode is not intuitive and difficult to write. In this case you can use esProc to handle this. For example:

Collection UserCourseProgres records the relationship between users and courses. Its courseid is the foreign key that points to _id field in Collection Course. You need to find the number of users who study each course. Course names use title field in Course.

esProc code：

A1: Connect to MongoDB. The connection string format is mongo://ip:port/db?arg=value&…

A2: Compute the number of people who study each course. Here aggregate function is used to retrieve data from MongoDB. This function derives from MongoDB. Its first parameter is collection name and the second one is the aggregate expression whose syntax follows MongoDB rules. A2’s result is in-memory data, as shown below:

A3: Find course names from Course. find function is used here to retrieve data from MongoDB. This function derives from MongoDB. Its second parameter is the filtering criterion whose syntax follows MongoDB rules. The function returns a cursor. Since there are only a few courses, fetch function is used to fetch cursor data into memory. The result is as follows:

A4: Switch A3’s foreign key values into corresponding records in A2. The result is:

A5：Access in-memory data using object and create a new two-dimensional table as follows:

A6：Disconnect from MongoDB.

4/24/2015

esProc Exports Unstructured MongoDB Data as CSV Files

Problem source：https://plus.google.com/+VicNgrail/posts/ebS9JUtFopw .

MongoDB allows storing unstructured data in it. But it is somewhat difficult to export the data as standard structured data. esProc, however, makes it an easy job, with MongoDB’s cooperation. Let’s look at the steps for doing this.

Below is some data from Collection test：

/* 0 */

{

"_id" : ObjectId("5518f6f8a82a704fe4216a43"),

"id" : "No1",

"cars" : {

"name" : "Putin",

"car" : ["porche", "bmw"]

}

/* 1 */

{

"_id" : ObjectId("5518f745a82a704fe4216a44"),

"id" : "No2",

"cars" : {

"name" : "jack",

"car" : ["Toyota", "Jetta", "Audi"]

}

You need to export it as a CSV file with the following layout：

esProc code：

A1: Connect to MongoDB. Connection string format is mongo://ip:port/db?arg=value&…

A2: Retrieve data from MongoDB using find function and generate a cursor with the retrieved data. The collection name is test. There are no filtering criteria and all fields except _id are desired. find functions in esProc and MongoDB are alike. The esProc version follows MongoDB for syntax of filtering criteria.

A3: Retrieve desired fields to create a structured two-dimensional table, which is in the form of cursor. In the code, ~ represents every document in A2; conj function concatenates data together.

A4: Export data from A3 as a comma separated text file. @t means exporting with column names. esProc engine manages buffers automatically, fetching a batch of data each time from the cursor into the memory for computation.

A4: Close MongoDB connection.

For users who want independent management of each batch of data, esProc provides the following approach：

A3: Run a loop to fetch data from the cursor into memory, 1,000 rows each time. A3’s working range is the indented B3 and B4, in which A3 is used to reference the loop variable. A3’s data is as follows:

B3：Convert the current batch of data to structured two-dimensional table, as shown below:

B4：Append the result of processing the current batch to the file. @a means data appending.

4/23/2015

Merge MongoDB Document in esProc

Problem source：https://groups.google.com/forum/#!topic/mongodb-user/BpgEaRqrKsA .

Below is a selection of Collection C1：

{

"_id" : ObjectId("55014006e4b0333c9531043e"),

"acls" : {

"append" : {

"users" : [ObjectId("54f5bfb0336a15084785c393") ],

"groups" : [ ]

"edit" : {

"groups" : [ ],

"users" : [

ObjectId("54f5bfb0336a15084785c392")

]

"fullControl" : {

"users" : [ ],

"groups" : [ ]

"read" : {

"users" : [ ObjectId("54f5bfb0336a15084785c392"), ObjectId("54f5bfb0336a15084785c398")],

"groups" : [ ]

}

name: "ABC"

}

{

"_id" : ObjectId("55014006e4b0333c9531043f"),

"acls" : {

"append" : {

"users" : [ObjectId("54f5bfb0336a15084785c365") ],

"groups" : [ ]

"edit" : {

"groups" : [ ],

"users" : [

ObjectId("54f5bfb0336a15084785c392")

]

"fullControl" : {

"users" : [ ],

"groups" : [ ]

"read" : {

"users" : [ ObjectId("54f5bfb0336a15084785c392"), ObjectId("54f5bfb0336a15084785c370")],

"groups" : [ ]

}

name: "ABC"

}

You need to group the collection by name. Each group contains the users field of the document corresponding to a same name and does not allow duplicate members. The expected result may like this:

{

result : [

{

_id: "ABC",

readUsers : [

ObjectId("54f5bfb0336a15084785c393"),

ObjectId("54f5bfb0336a15084785c392"),

ObjectId("54f5bfb0336a15084785c398"),

ObjectId("54f5bfb0336a15084785c365"),

ObjectId("54f5bfb0336a15084785c370")

]

}

]

}

esProc code：

A1: Connect to MongoDB. The connection string format is mongo://ip:port/db?arg=value&…

A2: Use find function to retrieve data from MongoDB, sort it and create a cursor. c1 is the collection name; no filtering criterion is specified; and all fields except _id will be retrieved and sorted by name. In esProc find function, which is analogous to the combination of MongoDB find, sort and limit function, the filtering criterion syntax follows the MongoDB rules.

A3: Fetch data from the cursor by loop, getting a group of documents with the same name field each time. A3’s working range is the indented B3 to B5, where A3 can be used to reference the loop variable.

B3: Retrieve all users fields from the current group of documents, as shown below:

B4: Merge users fields from all documents of the current group and remove duplicate members.

B5: Append each result of B4’s loop to B2. Finally B2 becomes this:

B2 is the final result we want. If the result is too big to be loaded into the memory, you can use export@j function in B5 to convert each of B4’s results to a JSON string and then append them to the text file one by one.

A6: Disconnect from MongoDB.

4/21/2015

Generate a Result Set with Dyamic Columns with esProc

Problem source：http://bbs.csdn.net/topics/391012377

Below is a selection from the original table (tb1):

Prjno Subtask Ddate Num

P9996 P9996-sub002 2015-01-01 123

P9996 P9996-sub002 2015-01-02 134

P9996 P9996-sub002 2015-01-03 345

P9996 P9996-sub002 2015-01-04 55

T0071 T-007-01 2015-01-01 3333

T0071 T-007-01 2015-01-02 356

T0071 T-007-01 2015-01-03 178

According to a specified date, you need to get all projects before this date in the same month. Suppose the input date is 2015-01-03, you’ll get this:

Prjno Subtask 2015-01-01 2015-01-02 2015-01-03

P9996 P9996-sub002 123 134 345

T0071 T-007-01 3333 356 178

esProc code for doing this:

A1: Query data from the beginning of the month to the specified date. d_date is an input date parameter, like 2015-01-03. pdate@m(d_date) calculates the first date of the current month.

A2: Create an empty result table sequence with dynamic columns according to the sequence of dates from the first date of the month to the specified date:

A3: The first part A1.group(Prjno,Subtask) groups A1’s data by Prjno and Subtask (esProc data grouping will keep the detail data of each group), then ~.groups(Ddate;sum(Num):Num), one by by, groups each group of data by the date and aggregate Num valules; finally, A2.record() writes each group name and the aggregate value into A2’s result table sequence. The following is the final result:

In a similar way any database is called, esProc can be called by the reporting tool or a JAVA program. The computed result in the form of ResultSet can be returned to JAVA main program via esProc JDBC. You can see related documents for detailed method.

4/20/2015

Query List Fields in MongoDB Subdocuments in esProc

Problem source： https://groups.google.com/forum/#!msg/mongodb-user/HqzXSh5DZek/ffZG0TQ1w8cJ .

Collection Cbetween contains cascaded subdocuments, in which the List-type dataList field includes a series of strings, each of which has multiple numbers. You need to find strings according to the criterion that the first number is greater than 6154 and less than or equal to 6155. Below is one of Cbetween’s subdocuments:

The eligible strings include "6154.5,37.9,1.529,1.429,1.429","6155,30.4,1.505,1.532,1.543".

esProc code：

A1: Connect to MongoDB. The connection string format is mongo://ip:port/db?arg=value&…

A2: Retrieve data from MongoDB using find function and create a cursor. The name of the collection is Cbettwen. There is no filtering criterion. Retrieve all fields except _id field. Syntax of filtering criterion in esProc find function, which is similar to its MongoDB counterpart, follows MongoDB rules.

A3: Find the eligible strings. conj function concatenates results of filtering each subdocument in A2; ~ represents each member of an upper level of table sequence. new function is used to create a new table sequence and #1 represents the first field of the table sequence. array function can split a string into a sequence with comma being default delimiter; @1 means splitting the string into two members with the first delimiter being the boundary.

A4: Fetch cursor data in batches to get data from the memory. The result is as follows:

A5：Disconnect from MongoDB.

menu

4/29/2015

4/28/2015

4/27/2015

4/26/2015

4/24/2015

4/23/2015

4/21/2015

4/20/2015