2/23/2016

A Standard Way of Realizing Dynamic Data Sources for Report Creation

Sometimes you need to reference data sources dynamically through a parameter, merge data sources into one, or dynamically pass data source name to a subreport/Table control. Often reporting tools - especially those with support for single data source, such as BIRT and Jasper – have to use a high-level language to accomplish these requirements, or trade security for reduced complexity.

esProc encapsulates rich functions for handling structured data, as well as supports dynamically parsing expressions and handling multiple/heterogeneous data sources, and realizing dynamic data sources with simple scripts. Therefore, it is suitable for serving as the computing tool for preparing data needed for building reports. The reporting tool executes an esProc script as it executes a database stored procedure, passes parameters to the script and gets the returned esProc result through JDBC. Learn more from How to Use esProc to Assist Reporting Tools.

Now let’s look at the typical problems about dynamic data sources and their esProc solutions:

Switching between data sources dynamically

myDB1 and oraDB are data sources that point to different databases. Each holds a sOrder table with the same structure. The report requires connecting to data sources dynamically via a parameter, querying sOrder for orders whose amounts are greater than 1,000, and displaying them.

Below is a selection from the sOrder table in myDB1:
OrderID
Client
SellerId
Amount
OrderDate
1
WVF
5
440
2009-02-03
2
UFS
13
1863
2009-07-05
4
JFS
27
671
2009-07-08
5
DSG
15
3730
2009-07-09
6
JFE
10
1445
2009-07-10
7
OLF
16
625
2009-07-11
8
PAER
29
2491
2010-07-12
9
DY
20
518
2010-07-15
10
JDR
17
1120
2010-07-16
Here is a selection from the sOrder table in oraDB:
OrderID
Client
SellerId
Amount
OrderDate
101
WAN
22
396
2010-11-07
102
SPL
15
142
2010-11-08
103
LA
23
713
2010-11-11
104
ERN
5
5678
2010-11-11
105
FUR
28
154
2009-11-12
106
BSF
27
10742
2009-11-13
107
RHD
4
569
2009-11-14
108
BDR
12
480
2010-11-15
109
OFS
17
1106
2009-11-18
esProc script:
=${pSource}.query("select * from sOrder where Amount>?",pAmount)
Both pSource and pAmount are report parameters. pSource represents the data source name; ${…} indicates parsing a string or a string variable into an expression. pAmount stands for the order amount.


When pSource=“myDB1”, A1 has the following result:

When pSource=“oraDB”, A1 gets this result:

Performing a multi-data-source pre-join

The mySQL database stores a Sales table holding orders from different sellers per day. Its SellerID field contains seller numbers. In MSSQL database there is an emp table of seller information in which EID field contains seller numbers. Create a report to display order numbers, order dates, order amounts, seller names and their departments, based on the condition that the orders should be within the last N days (say 30 days) or belong to certain important departments (say Marketing and Finance). Below are selections of the original tables:

Database table sales
OrderID
Client
SellerId
Amount
OrderDate
1
WVF Vip
1
440
2014-11-03
2
UFS Com
1
1863
2015-01-01
3
SWFR
2
1813
2014-11-01
4
JFS Pep
2
671
2015-01-01
5
DSG
1
3730
2015-01-01
6
JFE
1
1445
2015-01-01
7
OLF
3
625
2015-01-01
8
PAER
3
2490
2015-01-01

Database table emp
EId
State
Dept
Name
Gender
Salary
Birthday
2
New York
Marketing
Ashley
F
11001
1980-07-19
3
New Mexico
Sales
Rachel
F
9000
1970-12-17
4
Texas
HR
Emily
F
7000
1985-03-07
5
Texas
R&D
Ashley
F
16000
1975-05-13
6
California
Sales
Matthew
M
11000
1984-07-07
7
Illinois
Sales
Alexis
F
9000
1972-08-16
8
California
Marketing
Megan
F
11000
1979-04-19
1
Texas
HR
Victoria
F
3000
1983-12-07

esProc script:

A1, A2: Database queries. myDB1 and myDB2 point to MySQL and MSSQL respectively.
A3: Replace A1’s SellerID field with the corresponding records in A2 according to the key field EID. The result is as follows (the data items in blue contain sub-members):

By default, when there is not a corresponding record in A2 for a SellerID value, switch function retains the A1’s record while the SellerID shows an empty value. The effect is like a left join. Use @i option if you want to perform an inner join. The code is A1.switch@i(SellerId,A2:EId) .

A4: Filter on the result of join. The first filtering criterion is that orders are within the last N days (this corresponds to parameter days), whose expression is OrderDate>=after(date(now()),days*-1). The second one is that orders belong to several important departments (this corresponds to parameter depts), whose expression is depts.array().pos(SellerId.Dept). The operator || denotes the logical OR operation.

after function calculates the relative time duration. array function splits a string into a set using delimiters. pos function locates a member in a set. SellerId.Dept means Dept field in the record corresponding to SellerID field.

Both days and depts are parameters transferred from the reporting tool. Suppose their values are respectively 30 and “Marketing,Finance”, then A4’s result is as follows:

A5: Get the fields of interest from A4. Here is the final result:

Combining result sets with union

Result sets ds1 and ds2 have the same structure, but they come from different data sources – MySQL and a text file respectively. Now concatenate them and display the result in a cross table. Below is the original data:
ds1
ds2
id    name        time
1    name1        2010-07-22 11:01:02.903
2    name2        2010-07-22 11:01:02.903
3    name3        2010-07-22 11:01:02.903
id    name        time
1    t2_name1    2010-07-22 11:01:02.903
2    t2_name2    2010-07-22 11:01:02.903
3    t2_name3    2010-07-22 11:01:02.903
1    t3_name1    2010-07-22 11:01:02.920
2    t3_name2    2010-07-22 11:01:02.920
3    t3_name3    2010-07-22 11:01:02.920


esProc script:

A3: Concatenate the two data sets. The reporting tool’s work is just to create a simple cross table based on the resulting one data set.

Handling different data sources for main report and subreport

For a reporting tool that can support only one data source, if the reporting requires different data sources for the main report and the subreport, it needs to pass in the database URL explicitly or use Java classes to combine the different data sources into one. The former approach is vulnerable to security problems and the latter one produces complicated code. esProc, however, is able to cope easily. Here is an example.

Build a report with a subreport to display order information of sellers whose salaries are within a certain range. The main report’s data source is an emp table (in MySQL database) and the subreport’s data comes from a sales table (MSSQL database).

esProc scripts:

empEsProc.dfxfor the main report

A1: Query the emp table in MySQL database by the salary range.

salesEsProc.dfxfor the subreport

A1: Rrtrieve orders records from the sales table in MSSQL database according to employee IDs. Suppose eid=1, the result would be:

You can see that the two data sources have been joined into one source with esProc. The reporting tool just needs to call the corresponding esProc script for the main report and the subreport.

Some reporting systems do support multiple data sources. But it is hard to handle reports with different and changeable data sources. In that case, using esProc to generate a single data source can make the handling work easier.

For the same reason, another reporting problem can be solved through esProc’s single-data-source solution. That is the “multiple subreports, multiple data sources” problem, which means there are many subreports (or table controls) within one report and each has its own data source. 

Performing dynamic join between main report and its subreports

A main report may use many subreports whose data sources come from multiple databases. The reporting needs to display the result of dynamic join between these data sources and the one the main report uses. esProc implements the task in a simple way. For example:

The main table org is stored in the data source Master. Every record of the org table corresponds a subtable that has a separate data source. For example when org.org_id=“ORG_S”, the record’s subtable is the User able in the data source S_odaURL; when org.org_id=“ORG_T”, the record’s subtable is the User able in the data source T_odaURL. There are more subtables with names all being User. You need to join all subtables dynamically with the main table and display the result data set in a report. Below is the logical relationship between these tables:

esProc script:

A1: Execute the SQL statement to retrieve data from the org table in Master data source. arg1 is a parameter passed from the reporting tool. When arg1=”ORG”, the result would be:

A2: Loop through A1’s records to associate one subtable each time, concatenating each result of join into B2. esProc uses the natural indentation to represent the loop statement’s working range. Here the loop body is B2-B7 where A2 is used to reference a loop variable and #A2 is used to reference the loop number.

B2: Get the data source name for each record according to its org_id field. The result during the first loop is “S_odaURL”.

B3: This is the explicit connection to corresponding data source.

B4: Filter on data in the User table.

B5: Append three columns, which derive from the main table, to B4’s subtable. The result during the first loop is:

B6: Concatenate B5’s result into B1. The operator “|” is equivalent to union function. When the loop is over, B1 will have collected all data the reporting needs, as shown below:

B7: Close data source connection explicitly.

A8: Return B1 to the reporting tool explicitly. The default execution is to return the result of the last cell.

Displaying data by different time units specified by parameter

Here is a reporting task that requires using a line graph to show the change of sales over a recent period. unitType is a report parameter, representing time units. If unitType="hour", show sales amount every five minutes in the past one hour. If unitType="day", show sales amount per hour during the past day. And show sales amount per day over the past one week if unitType="week". The data originates from the orders table in which Amount field contains order amount. t is used to reference the order time.

esProc script:

A1: An empty result set used to store the time sequences generated from B2-B4.

A2-B4: Generate different time sequences according to the report parameter unitType. B2 generates 12 time points, with an interval of 5 minutes between each other. B3 generates time points in one day, and B4 generates time points during one week.

A5: Loop through A1 to calculate the sales amount of each time interval. “~” represents the current member and “~[-1] " represents the previous one. In the case of unitType="day", a one-field result set containing 12 records will be generated. Then you can plot the chart to show the result set.