Data Process Engine: February 2015

2/27/2015

esProc Simplifies SQL-style Computations – Interval Merging

There are many complicated SQL-style computations in database application development. Interval merging is one of them, such as the summation of non-overlapping time intervals and merging of overlapping time intervals, etc. Due to the lack of orderliness of an SQL set, SQL solves the problems by resorting to recursive method that is difficult to be used with a database that supports not enough recursive functions. Besides, we need to write a nested SQL statement containing multilayered subqueries that is too lengthy for realization and modification.

In contrast, esProc has an easier way for the computation. It merges intervals step by step through intuitive script. Let’s look at an example.

Table Udetail stores user operation records in detail. Some of the original data are as follows:

ID UID ST ET

1 1001 2014-9-1 10:00:00 2014-9-2 11:30:00

2 1001 2014-9-1 10:30:00 2014-9-2 11:00:00

3 1001 2014-9-3 11:00:00 2014-9-4 12:00:00

4 1001 2014-9-4 10:00:00 2014-9-5 13:00:00

5 1001 2014-9-4 15:00:00 2014-9-5 18:00:00

6 1002 2014-9-1 11:00:00 2014-9-2 11:30:00

7 1002 2014-9-1 10:30:00 2014-9-2 11:00:00

In this table, ST and ET represent respectively the starting time and ending time of the operations and each user may have overlapping operational time intervals. Based on a specified user, we are to:

1.Merge the overlapping time intervals so as to create a new record of time intervals;

2.Summarize the total time of non-overlapping intervals.

esProc does it as follows:

An explanation of the script：

A1=db1.query("select * from udetail where UID=?",arg1)

Retrieve data of the specified user from the database. arg1 is an external parameter. Suppose the value of it is 1001, then the query result is as follows:

A2= A1.sort(ST).select(ET>ET[-1]).run(max(ST,ET[-1]):ST)

This line of code first sorts A1’s table by the starting time (.sort(ST)) and selects records by the condition that each ET is greater than the previous one, i.e. to delete the intervals completely covered by others. For every two overlapping records, get the minimum ST value and the maximum ET value to create a new time interval. The result is as follows:

It can be seen that sets are ordered in esProc, so the previous record can be referenced by its number. It is very different from SQL. If the continuous time intervals need to be combined into one interval, A2’s code can be like this：A2=A1.sort(ST).select(ET>ET[-1]).run(if(ST<ET[-1],ST[-1],ST):ST).group(ST;~.m(-1).ET:ET)

And the following result will be got:

A3=A2.sum(interval@s(ST,ET))

Summarize the total overlapping time. The result is as follows:

If the total non-overlapping time is wanted (without details), the code can be

A1.sort(ST).select(ET>ET[-1]).sum(interval@s(max(ST,ET[-1]),ET))

If the whole step of time is relatively small, the code can be modified as

A1.(periods@s(ST,ET)).union().len()-2, which gets the result by counting the number of time points.

Finally, an esProc script can be called by the reporting tool or the Java program much like they call a database. It returns a result in the form of ResultSet through JDBC provided esProc. See related documents for more details.

2/26/2015

esProc Assists Report Development – Dynamically Relate Multiple Data Sources to a Master Report

Unconventional statistical tasks are not uncommon during report creation, but they are difficult if handled solely by a reporting tool like Jasper or BIRT, or SQL. For example it is troublesome to display result of dynamically relating a master report to its corresponding subreports existing in multiple databases. Yet esProc, with its powerful computing engine for processing structured data, can assist the handling of the case. It is also conveniently to be integrated by the reporting tool. We’ll discuss it through an example.

Master report org resides in data source Master. The subreports to which its records correspond reside in multiple data sources. For instance when org.org_id=“ORG_S”, the record’s corresponding subreport is table User of data source S_odaURL; when org.org_id=“ORG_T”, the record’s corresponding subreport is table User of data source T_odaURL. There are more than two subreports but all their names are User. The final report requires that these subreports be related to the master report dynamically. The following figure shows the logical relation between them:

esProc code for performing the operation:

A1=Master.query("select * from org where org_id like '"+arg1+"%' ")

Execute the SQL statement to retrieve data from table org of data source Master. arg1 is a parameter passed from the report, which is for data filtering. Suppose arg1=“ORG”, then A1’s result is as follows:

A2: for A1

Loop through A1’s records one by one, dynamically relate a subreport each time and then write it to B2. Note that esProc uses the indentation to represent a loop statement’s working range, like B2-B7 in this example. In the loop body, A2 is used to reference the loop variable and #A2 can be used to reference the loop number.

B2=right(A2.org_id,1)+"_odaURL"

Compute data source name of the corresponding subreport according to the current record’s org_id field. For the first loop, B2’s result is “S_odaURL”.

B3=connect(B2)

Connect to a data source by its name. Note that data source Master in A1 has been configured to be automatically connected and thus can be used directly. In B3, the data source needs to be connected manually using connect function.

B4=B3.query("select * from user where org=?",A2.org_id)

Retrieve data from table User in B3’s data source according to the specified condition.

B5=B4.derive(A2.org_id:org_id,A2.org_manager:manager,A2.org_title:title)

Append three new fields that come from the master report to B4’s subreport and rename them org_id, manager and title respectively. For the first loop, B5’s result is as follows:

B6=B1=B1|B5

Append B5’s result to B1 (operator “|” is equal to union function). After loops are executed, B1 will get all data needed by the report:

B7=B3.close()

Close the data source connection.

A8: result B1

Return B1’s table to the reporting tool. esProc provides JDBC to integrate with the reporting tool, which will identify it as a database. See related documents for the integration solution.

A seasoned programmer may replace for statement with esProc’s long statement to make the code more concise:

Create a simple grouped report with, for instance, BIRT. The template is as follows:

Define parameter pVar in the report to correspond to its counterpart in the esProc script. The following is the preview of the final report:

2/25/2015

esProc Assists Report Development – Irregular Cross Row Calculation

During data reporting there are many unconventional statistical tasks that are difficult to be handled solely by reporting tools, like Jasper and BIRT, or SQL. One case is to specify a particular record in the original data against which we cross-calculate all other records and derive new field values like percentages and sums. esProc, with its powerful computing engine for processing structured data, can assist the handling of the case. It can also be integrated conveniently by the reporting tool. We’ll look at how esProc handles the case through an example.

Database table majorSum stores the total number of patients of a certain hospital and the number of patients of each key department. We want to report proportions of patient number of each key department to the total patient number of the hospital in alphabetical order by the departments and, at the same time, keep the total number of patients in the end. The original data are as follows:

The expected report is as follows:

department	patient
Accident and emergency (A&E)	7.03%
Anaesthetics	12.37%
Breast screening	12.37%
Discharge lounge	14.61%
Ear nose and throat	15.06%
Haematology	17.43%
Neurology	5.23%
Cardinal Community Hospital	44562

esProc prepares necessary data as follows:

A1=myDB1.query("select department,patient from majorSum order by department")

Execute the SQL statement to retrieve data from table majorSum of data source myDB1. The result is as follows:

A2=A1.maxp(patient)

Get the record that stores the total number of patients in the hospital. maxp function gets the record that has the maximum field value – Cardinal Community Hospital in this example.

A3=A1\A2

Remove A2, the record of total number, from A1’s table. Operator “\” represents getting the difference between sets.

A4=A3.run(string(patient/A2.patient,"#.##%"):patient)

Compute proportions of the patient number of each department to the total number. run function computes by looping through A3’s records. string function converts a number into a string. The result is as follows:

A5=A4|A2

Concatenate the record of the total number of patients and other records containing figures shown by percentage. Operator “|” is equal to union function. esProc supports generic two-dimensional table, so strings and numbers can be stored in the same field. A5’s result is what the report needs:

A6:result A5

Return A5’s table to the reporting tool. esProc provides JDBC to integrate with the reporting tool, which will identify it as a database. See related documents for the integration solution.

The above step-by-step computation is used for observing the intermediate results conveniently. But for a seasoned programmer, these steps can be condensed into two lines of code:

A1=myDB1.query("select department,patient from majorSum order by department")

A2:result(total=A1.maxp(patient),(A1\total).run(string(patient/total.patient,"#.##%"):patient)|total)

Create a simple list report with, for instance, BIRT. The template is as follows:

The following is the preview of the final report:

The reporting tool calls the esProc script in the same way as that in which it calls the stored procedure. Save the esProc script as, say irregulProportion.dfx, to be called by call irregulProportion() in BIRT Stored Procedure Report Designer. Besides, esProc supports report parameters too.

2/17/2015

esProc Integration & Application: Integration with JasperReport

esProc provides standard JDBC interface to be integrated easily with the reporting tool. This article will explain respectively the integration of esProc and JasperReport in development environment and the deployment in WEB server-side.

1. Integration in development environment

Access esProc JDBC to call the esProc script using JasperReport iReport Designer through the following steps:

1.1. Create configuration files

Create the two configuration files – config.xml and dfxConfig.xml – necessary for esProc JDBC.

1.1.1. config.xml

<?xml version="1.0" encoding="UTF-8"?>

</dfxPathList>

</DB></DBList>

<logLevel>DEBUG</logLevel>

</Config>

1.1.2. dfxConfig.xml

<?xml version="1.0" encoding="GBK"?>

</dfxConfig>

1.1.3.JAR Load driver jars

esProc JDBC driver includes five jars - dm.jar, poi-3.7-20101029.jar, log4j_128.jar, icu4j_3_4_5.jar and dom4j-1.6.1.jar. They can be found in esProc IDE’s [installation directory]\esProc\lib directory. If esProc also uses other databases as the datasources, then the driver jars of these databases need to be added too. For instance, mysql-connector-java-5.1.5-bin.jar has been added here. Place all these jars in the classpath’s root path.

Except for these jars, the directory (such as D:\lib\config) of the two configuration files – config.xml and dfxConfig.xml – created in the first step needs to be placed in the classpath’s root path too. Or compress the configuration files into the root directory of any of the jars above, with their names remaining strictly the same.

Configure the classpath on the Tool Option bar, as shown in the following figure:

1.2 Create a datasource

Create a new datasource – esproc – on the datasource configuration window. Select the JDBC driver com.esproc.jdbc.InternalDriver and enter the JDBC URL as jdbc:esproc:local://. Leave the username and password blank.

esProc JDBC is a fully embedded computing engine. All computations are performed in its embedded package, thus the URL is local. On the other hand, since it isn’t a complete database, it has neither the username nor the password.

After filling all necessary information, click on “Test” to see whether the connection to the datasource has been successfully established.

1.3 Deploy esProc script

Copy the esProc script (dfx) to dfxPath configured in config.xml (such as D:/dfx configured in the above). Here we’ll take the query of the closing stock price in a certain month as an example. The esProc script is as follows:

1.4 Call esProc script in IDE

Create a report and set the query

Create a new report – esProcReport, write the query statement and call the esProc script using call dfxName(), in which dfx name is stocks.

Select all the fields and execute the code by clicking on Next Step to the end.

Because the esProc script needs to receive the parameter representing the year and the month, it is necessary to set the parameter for use in the query for the report after it is created. In the following figure, add the parameter representing the year and the month and edit the query for the report.

Then drag the parameter to the specified position of the query statement, thus the query becomes as follows:

Edit the report

Drag all fields under Fields to the report’s detail band and complete the edition, as shown below:

Preview the report

Click on Preview and enter the parameter to see the query result:

This is the integration of esProc script and JasperReport in the development environment. Next we’ll move on to explain how to deploy them in the web server-side.

2. Deploy web server-side

Select a suitable application server. For this purpose, Tomcat6.0 is used here for illustration.

2.1 create a web application

Create a J2EE application, which is named, say, JasperReport, and copy it to Tomcat’s webapps directory.

2.2 Create configuration files

Two configuration files – config.xml and dfxConfig.xml – are needed. The configuration information is the same as that for integration in development environment.

2.3 Deploy necessary jars

JasperReport’s driver jar

Deploy jars necessary for JasperReport into the application’s WEB-INF\lib directory. The JasperReport jar can be found in ireport\modules\ext directory of iReport Designer’s installation directory.

esProc JDBC’s driver jar

Copy the jars - dm.jar, poi-3.7-20101029.jar, log4j_128.jar, icu4j_3_4_5.jar and dom4j-1.6.1.jar - used in the above integration in the development environment and MYSQL’s driver jar - mysql-connector-java-5.1.5-bin.jar – to the application’s WEB-INF\lib directory. Then copy config.xml and dfxConfig.xml to the application’s WEB-INF\classes directory.

2.4 Copy the report

Copy the completed report (whose file extension is .jasper) to the corresponding directory of the application. For example, copy esProcReport.jasper to the application’s reportFiles directory.

2.5 Define and Publish the JSP report

The following code is to define and publish the JSP report, such as showReport.jsp:

<%@ page language="java" contentType="text/html; charset=UTF-8"

pageEncoding="UTF-8"%>

<%@ page

import="java.io.*,

net.sf.jasperreports.engine.*,

net.sf.jasperreports.engine.util.*,

java.util.*,java.sql.*,

net.sf.jasperreports.engine.export.*"%>

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

<html>

<head>

<title>Show JasperReport</title>

</head>

<body>

String report = request.getParameter("report");

File reportFile = new File(application.getRealPath("reportFiles/"

+ report));

//Load report object

JasperReport jasperReport = (JasperReport) JRLoader

.loadObject(reportFile.getPath());

//Pass report parameter through Map

Map parameters = new HashMap();

//Assign value to parameter p1 of the report

parameters.put("p1", 200901);

//Connect to esProc

Class.forName("com.esproc.jdbc.InternalDriver");

Connection conn = DriverManager.getConnection(

"jdbc:esproc:local://", "", "");

//Export the report to the page

JasperPrint jasperPrint = JasperFillManager.fillReport(

jasperReport, parameters, conn);

JRHtmlExporter exporter = new JRHtmlExporter();

exporter.setParameter(JRExporterParameter.JASPER_PRINT, jasperPrint);

exporter.setParameter(JRExporterParameter.OUTPUT_WRITER, out);

exporter.setParameter(

JRHtmlExporterParameter.IS_USING_IMAGES_TO_ALIGN,

Boolean.FALSE);

exporter.exportReport();

out.flush();

conn.close();

</body>

</html>

2.6 Preview the report

Launch Tomcat and access the website http://127.0.0.1:8080/JasperReport/reportJsp/showReport.jsp?report=esProcReport.jasper to preview the report as shown below, which means the deployment is completed.

2/16/2015

esProc Assists Report Development – Inter-row Calculation

Unconventional statistical tasks are not uncommon in creating reports with reporting tools like Jasper and BIRT. One of the cases is to display the result of certain comparisons between the current record and the next/previous record. It is difficult to handle it using only the reporting tool or the SQL. Yet esProc can assist the handling thanks to its powerful computing engine for processing structured data and the feature of being integration-friendly. An example will be cited to explain the esProc method of doing it.

Table salesAll holds orders of per seller, per day. In the report, we need to show sales amount of every month of a certain year according to the ranking order, the differences between neighboring rankings and the growth rate between the current month and the previous one. Some of the source data are as follows:

esProc code for handling it:

A1: Execute the SQL statement to calculate sales amount of each month in a certain year and to sort them by the month. theYear is a parameter passed from the report. The result is as follows:

A2:=A1.derive((amount-amount[-1])/amount[-1]:LRR)

Append a field to A1’s table and compute the link relative ratio for the current month. esProc uses amount[-1] to reference the previous record. The result is as follows:

A3=A2.sort(-amount)

This line of code sorts A2’s table by amount field in descending order.

A4=A3.derive(amount-amount[1]:DIFF)

This line calculates differences between rankings. esProc uses amount[1] to reference the next record. A4’s result is what the report needs:

result A4

This line returns A4’s table to the reporting tool. esProc provides JDBC interface for integrating with the reporting tool that will identify esProc as a database. See related documents for the integration solution.

Then create a simple report with, for instance, JasperReport. The template is as follows

Define a parameter – pthisYear – in the report to correspond to its counterpart in the esProc script. Click on Preview to see the final report:

The reporting tool calls the esProc script in the same way as that in which it calls the stored procedure. Save the esProc script as, say crossrow.dfx, to be called by call crossrow($P{pthisYear}) in JasperReports SQL designer.

2/15/2015

esProc Assists Report Development – Create a Tabular Report by Transposition

Unconventional statistical tasks are difficult to be handled solely by the reporting tool like Jasper and BIRT, or the SQL. For example when the original data are not arranged as required by the tabular report, they need to be transposed for display. However, with the assistance of esProc that has a powerful computing engine for processing structured data and is integration-friendly, we can deal with this case more easily. The following example will teach you how data are transposed.

The database table KPIDetail stores detail data of performance assessment. The KPI Subtotals for different time ranges can be computed using simple SQL statements (group by or union), as shown below:

As the following figure shows, the tabular report will be created based on those source data:

It can be seen that the source data cannot be used directly for tabular report display and thus need to be transposed.

Taking the computation of KPISubtoal as an example, the following esProc code provides a universal method of transposing source data arranged in any format:

A1=myDB1.query("select * from KPISubtoal")

Query the database for the source data. The result is as follows:

A2=A1.fname().to(2,)

This line of code produces a set of A1’s field names except the first one. fname function creates a set of field names, for instance A1.fname()=[“range”,”registrations”,”deposits”,”games”]. to function returns a consecutive subsequence between two given integers of the original sequence, for instance .to(2,4) will return a subsequence consisting of the 2nd, 3^rd and 4th members; if the second parameter is omitted, the subsequence will consist of members starting from the first parameter to the end of the original sequence. A2’s result is as follows:

A3=create(KPI).record(A2)

This line of code creates a two-dimensional table for storing the transposed data. It only has one field – KPI – for the time being whose values come from A2, as shown below:

A4: for A1

This loop statement traverses A1’s records, transposes the row data to column data and stores the transposed data in A3. The loop body consists of cells from B4 to B6, the indentation part; in it A4 is used to reference the loop variable.

B4=columnName=A4.#1

This line of code gets the value of the first field of the current record and assigns it to the variable columnName. We can also use A4.range to get the same result, but the sequence number of the field - #1 – is used here for the sake of universality. During the first loop, the value of columnName is “today”.

B5= A4.array().to(2,)

This line of code gets field values of the current record beginning from its second field. A4.array() gets the field values of the loop variable (i.e. the current record) to form a set. For the first loop, B5’s result is as follows:

B6=A3=eval("A3.derive(B5(#):"+columnName+")")

This line of code adds a new column to A3’s two-dimensional table. The column name is the value of columnName and its values are B5’s result. “#” represents the sequence numbers of A3’s members. eval function parses a string into an expression. For the first loop, B6’s expression is A3=A3.derive(B5(#):today). The result is as follows:

After A4’s loop is executed, A3 will get all the transposed data as shown below:

result A3

This line of code returns A3’s two-dimensional table to the reporting tool. esProc provides JDBC interface for integrating with the reporting tool that will identify esProc as a database. See related documents for the integration solution.

Then create a simple tabular report with, for instance, JasperReport. The template is as follows:

A preview of the final report is as follows:

The reporting tool calls the esProc script in the same way as that in which it calls the stored procedure. Save the esProc script as, say esProcKPI.dfx, to be called by esProcKPI in JasperReport’s SQL designer. If query parameters have been entered to the esProc script, then use esProcKPI $P{parameter} to call the script.

menu