Data Process Engine: February 2016

Sometimes you need to reference data sources dynamically through a parameter, merge data sources into one, or dynamically pass data source name to a subreport/Table control. Often reporting tools - especially those with support for single data source, such as BIRT and Jasper – have to use a high-level language to accomplish these requirements, or trade security for reduced complexity.

esProc encapsulates rich functions for handling structured data, as well as supports dynamically parsing expressions and handling multiple/heterogeneous data sources, and realizing dynamic data sources with simple scripts. Therefore, it is suitable for serving as the computing tool for preparing data needed for building reports. The reporting tool executes an esProc script as it executes a database stored procedure, passes parameters to the script and gets the returned esProc result through JDBC. Learn more from How to Use esProc to Assist Reporting Tools.

Now let’s look at the typical problems about dynamic data sources and their esProc solutions:

Switching between data sources dynamically

myDB1 and oraDB are data sources that point to different databases. Each holds a sOrder table with the same structure. The report requires connecting to data sources dynamically via a parameter, querying sOrder for orders whose amounts are greater than 1,000, and displaying them.

Below is a selection from the sOrder table in myDB1:

OrderID	Client	SellerId	Amount	OrderDate
1	WVF	5	440	2009-02-03
2	UFS	13	1863	2009-07-05
4	JFS	27	671	2009-07-08
5	DSG	15	3730	2009-07-09
6	JFE	10	1445	2009-07-10
7	OLF	16	625	2009-07-11
8	PAER	29	2491	2010-07-12
9	DY	20	518	2010-07-15
10	JDR	17	1120	2010-07-16

Here is a selection from the sOrder table in oraDB:

OrderID	Client	SellerId	Amount	OrderDate
101	WAN	22	396	2010-11-07
102	SPL	15	142	2010-11-08
103	LA	23	713	2010-11-11
104	ERN	5	5678	2010-11-11
105	FUR	28	154	2009-11-12
106	BSF	27	10742	2009-11-13
107	RHD	4	569	2009-11-14
108	BDR	12	480	2010-11-15
109	OFS	17	1106	2009-11-18

esProc script:

=${pSource}.query("select * from sOrder where Amount>?",pAmount)

Both pSource and pAmount are report parameters. pSource represents the data source name; ${…} indicates parsing a string or a string variable into an expression. pAmount stands for the order amount.

When pSource=“myDB1”, A1 has the following result:

When pSource=“oraDB”, A1 gets this result:

Performing a multi-data-source pre-join

The mySQL database stores a Sales table holding orders from different sellers per day. Its SellerID field contains seller numbers. In MSSQL database there is an emp table of seller information in which EID field contains seller numbers. Create a report to display order numbers, order dates, order amounts, seller names and their departments, based on the condition that the orders should be within the last N days (say 30 days) or belong to certain important departments (say Marketing and Finance). Below are selections of the original tables:

Database table sales

OrderID	Client	SellerId	Amount	OrderDate
1	WVF Vip	1	440	2014-11-03
2	UFS Com	1	1863	2015-01-01
3	SWFR	2	1813	2014-11-01
4	JFS Pep	2	671	2015-01-01
5	DSG	1	3730	2015-01-01
6	JFE	1	1445	2015-01-01
7	OLF	3	625	2015-01-01
8	PAER	3	2490	2015-01-01

Database table emp

EId	State	Dept	Name	Gender	Salary	Birthday
2	New York	Marketing	Ashley	F	11001	1980-07-19
3	New Mexico	Sales	Rachel	F	9000	1970-12-17
4	Texas	HR	Emily	F	7000	1985-03-07
5	Texas	R&D	Ashley	F	16000	1975-05-13
6	California	Sales	Matthew	M	11000	1984-07-07
7	Illinois	Sales	Alexis	F	9000	1972-08-16
8	California	Marketing	Megan	F	11000	1979-04-19
1	Texas	HR	Victoria	F	3000	1983-12-07

esProc script:

A1, A2: Database queries. myDB1 and myDB2 point to MySQL and MSSQL respectively.

A3: Replace A1’s SellerID field with the corresponding records in A2 according to the key field EID. The result is as follows (the data items in blue contain sub-members):

By default, when there is not a corresponding record in A2 for a SellerID value, switch function retains the A1’s record while the SellerID shows an empty value. The effect is like a left join. Use @i option if you want to perform an inner join. The code is A1.switch@i(SellerId,A2:EId) .

A4: Filter on the result of join. The first filtering criterion is that orders are within the last N days (this corresponds to parameter days), whose expression is OrderDate>=after(date(now()),days*-1). The second one is that orders belong to several important departments (this corresponds to parameter depts), whose expression is depts.array().pos(SellerId.Dept). The operator || denotes the logical OR operation.

after function calculates the relative time duration. array function splits a string into a set using delimiters. pos function locates a member in a set. SellerId.Dept means Dept field in the record corresponding to SellerID field.

Both days and depts are parameters transferred from the reporting tool. Suppose their values are respectively 30 and “Marketing,Finance”, then A4’s result is as follows:

A5: Get the fields of interest from A4. Here is the final result:

Combining result sets with union

Result sets ds1 and ds2 have the same structure, but they come from different data sources – MySQL and a text file respectively. Now concatenate them and display the result in a cross table. Below is the original data:

ds1	ds2
id name time 1 name1 2010-07-22 11:01:02.903 2 name2 2010-07-22 11:01:02.903 3 name3 2010-07-22 11:01:02.903	id name time 1 t2_name1 2010-07-22 11:01:02.903 2 t2_name2 2010-07-22 11:01:02.903 3 t2_name3 2010-07-22 11:01:02.903 1 t3_name1 2010-07-22 11:01:02.920 2 t3_name2 2010-07-22 11:01:02.920 3 t3_name3 2010-07-22 11:01:02.920

esProc script:

A3: Concatenate the two data sets. The reporting tool’s work is just to create a simple cross table based on the resulting one data set.

Handling different data sources for main report and subreport

For a reporting tool that can support only one data source, if the reporting requires different data sources for the main report and the subreport, it needs to pass in the database URL explicitly or use Java classes to combine the different data sources into one. The former approach is vulnerable to security problems and the latter one produces complicated code. esProc, however, is able to cope easily. Here is an example.

Build a report with a subreport to display order information of sellers whose salaries are within a certain range. The main report’s data source is an emp table (in MySQL database) and the subreport’s data comes from a sales table (MSSQL database).

esProc scripts:

empEsProc.dfx（for the main report）

A1: Query the emp table in MySQL database by the salary range.

salesEsProc.dfx（for the subreport）

A1: Rrtrieve orders records from the sales table in MSSQL database according to employee IDs. Suppose eid=1, the result would be:

You can see that the two data sources have been joined into one source with esProc. The reporting tool just needs to call the corresponding esProc script for the main report and the subreport.

Some reporting systems do support multiple data sources. But it is hard to handle reports with different and changeable data sources. In that case, using esProc to generate a single data source can make the handling work easier.

For the same reason, another reporting problem can be solved through esProc’s single-data-source solution. That is the “multiple subreports, multiple data sources” problem, which means there are many subreports (or table controls) within one report and each has its own data source.

Performing dynamic join between main report and its subreports

A main report may use many subreports whose data sources come from multiple databases. The reporting needs to display the result of dynamic join between these data sources and the one the main report uses. esProc implements the task in a simple way. For example:

The main table org is stored in the data source Master. Every record of the org table corresponds a subtable that has a separate data source. For example when org.org_id=“ORG_S”, the record’s subtable is the User able in the data source S_odaURL; when org.org_id=“ORG_T”, the record’s subtable is the User able in the data source T_odaURL. There are more subtables with names all being User. You need to join all subtables dynamically with the main table and display the result data set in a report. Below is the logical relationship between these tables:

esProc script:

A1: Execute the SQL statement to retrieve data from the org table in Master data source. arg1 is a parameter passed from the reporting tool. When arg1=”ORG”, the result would be:

A2: Loop through A1’s records to associate one subtable each time, concatenating each result of join into B2. esProc uses the natural indentation to represent the loop statement’s working range. Here the loop body is B2-B7 where A2 is used to reference a loop variable and #A2 is used to reference the loop number.

B2: Get the data source name for each record according to its org_id field. The result during the first loop is “S_odaURL”.

B3: This is the explicit connection to corresponding data source.

B4: Filter on data in the User table.

B5: Append three columns, which derive from the main table, to B4’s subtable. The result during the first loop is:

B6: Concatenate B5’s result into B1. The operator “|” is equivalent to union function. When the loop is over, B1 will have collected all data the reporting needs, as shown below:

B7: Close data source connection explicitly.

A8: Return B1 to the reporting tool explicitly. The default execution is to return the result of the last cell.

Displaying data by different time units specified by parameter

Here is a reporting task that requires using a line graph to show the change of sales over a recent period. unitType is a report parameter, representing time units. If unitType="hour", show sales amount every five minutes in the past one hour. If unitType="day", show sales amount per hour during the past day. And show sales amount per day over the past one week if unitType="week". The data originates from the orders table in which Amount field contains order amount. t is used to reference the order time.

esProc script:

A1: An empty result set used to store the time sequences generated from B2-B4.

A2-B4: Generate different time sequences according to the report parameter unitType. B2 generates 12 time points, with an interval of 5 minutes between each other. B3 generates time points in one day, and B4 generates time points during one week.

A5: Loop through A1 to calculate the sales amount of each time interval. “~” represents the current member and “~[-1] " represents the previous one. In the case of unitType="day", a one-field result set containing 12 records will be generated. Then you can plot the chart to show the result set.

Data Process Engine

menu

2/23/2016

A Standard Way of Realizing Dynamic Data Sources for Report Creation

Switching between data sources dynamically

Performing a multi-data-source pre-join

Combining result sets with union

Handling different data sources for main report and subreport

Performing dynamic join between main report and its subreports

Displaying data by different time units specified by parameter