It is difficult to deal with some
unconventional statistical tasks using the reporting tool, like Jasper and
BIRT, alone or SQL. One example is the crosstab in which both the row headers and
column headers are intervals, and whose measurement comes from anther database
table. With powerful structured data computing engine and being
integration-friendly, esProc can conveniently handle the case. I’ll explain the
process of realizing dynamic data source through the following example.
account_no is the primary key of table account_detail,
which has a one-to-many relationship with both table Paysoft_result and NAEDO
through foreign key custno and
foreign key customer_code. Report
design requires that, according to the external parameters, empirica_score field of table
account_detail be divided into segments that are used as the row headers and
that mfin_score field also be divided
into segments that are used as the column headers. The computation of measurement
is to divide the number of records of table Paysoft_result
to which the account_no, which is the
intersection-point where row headers and column headers meet, corresponds by
that of table mfin_score to which it
corresponds.
The following figure shows relations
between the database tables and between certain fields and the report:
esProc will perform the data preparation using the following code:
A1=myDB1.query("select
* from account_detail order by empirica_score,mfin_score")
This line of code retrieves data from table
account_detail. myDB1 is the data
source name that points to the database. query
function executes the SQL query statement. A1’s result is as follows:
A2=myDB1.query("select
* from paysoft_result")
B2=myDB1.query("select
* from NAEDO")
Then retrieve data from paysoft_result and NAEDO respectively in A2 and B2 likewise. Results are shown separately as follows:
A3=rowList.array()
B3=colList.array()
These two lines of code convert the parameters passed from the report into esProc sequences. Parameter rowList represents the row headers, like “560,575,585,595,605,615,625,635,645,654,665”, which includes ten consecutive intervals; parameter colList represents the column headers, like “39, 66, 91, 116, 137, 155”, which includes five consecutive intervals. array function is used to convert a string separated by commas into a sequence. The converting results are as follows:
A4=A1.select(empirica_score>=A3(1)
&& mfin_score>=B3(1))
In this
example certain data in the source table exceed the range of the specified
interval. For instance, customer “No501”’s empirica_sore
is 540, which is smaller than the lower limit of the interval – 560. This line
of code will filter out the data that are smaller than the lower limit in order
to increase performance and simplify the expression.
select function executes data query or data filtering. empirica_score is a field of A1, A3(1) represents the first member of A3, i.e. 560, the lower limit of the interval. The logical operator “&&” means “AND”. A4’s result is as follows:
A5=A4.group(A3.pselect(empirica_score<~[1]):row,
B3.pselect(mfin_score<~[1]):col;
~:accounts,
A2.select(accounts.(account_no).pselect(~==custno)):p,
B2.select(accounts.(account_no).pselect(~==customer_code)):n
)
This
line of code groups table account_detail
in A4 according to the intervals in A3 (rowList) and those in B3(colList) and
find out each group’s corresponding records in A2(paysoft_result) and B2(NAEDO).
group function is used to group data according to multiple fields (or grouping criteria). The syntax is A.group(field1,field2…). It is also used to compute subtotals or perform subsequent computations based on each group of data. The syntax is A.group(field1,field2… ; subtotal1,subtotal2…) . Fields of the grouped data can be renamed with “:new name”. The result of the above grouping operation contains five fields, which are row, col, accounts, p, n respectively. A5’s result is as follows:
To compute the grouping criterion row: Group A4’s empirica_score field according to A3’s intervals using the code A3.pselect(empirica_score<~[1]). pselect
function finds the sequence numbers of the eligible members in A3. “~”
represents the current member of A3, ~[-1] represents
its previous member and ~[1] represents the
next one. The current interval is (empirica_score>=~
&& empirica_score<~[1]). As all values of account_no field are bigger than the lower
limit of the interval, the expression can be simplified as empirica_score<~[1]. According to “560,575,585,595,605,615,625,635,645,654,665”,
A1 can be divided into 10 intervals - 560-574,575-584,585-594,595-604,605-614,615-624,625-634,635-644,645-654,655-664
– whose sequence numbers are from 1 to 10 in order.
To compute the grouping criterion col: Similarly, group A4’s mfin_score field according to B3’s
intervals with the code B3.pselect(mfin_score<~[1]). According to “39,66,91,116,137,155”, A1 can be
divided into 5 intervals - 39-65,66-90,91-115,116-136,137-154 – whose sequence
numbers are from 1 to 5 respectively.
To compute the summary field p: Find out records corresponding to accounts from A2 using the code A2.select(accounts.(account_no).pselect(~==custno)). select function accesses A2’s data and selects the eligible data by the filtering criterion. In the case of “row=1,col=1” and “row=2,col=5”, the records corresponding to column p are shown separately as follows (the relationship between accounts and A2 is one-to-many):
To compute summary field n: Similarly, find out records corresponding to accounts from B2 using the code B2.select(accounts.
(account_no).pselect(~==customer_code)). In the case of “row=1,col=1” and “row=2,col=5”, the records corresponding to column n are shown separately as follows:
A6=A5.derive(p.count():pCount,n.count():nCount)
This line of code appends to A5 the new columns pCount and nCount for computing the number of records of p and n in each group. The result is as follows:
A7=A6.derive(pCount/nCount:rate)
This line of code appends column rate to A6. The arithmetic is dividing pCount by nCount. The result is as follows:
A8=A7.run(string(A3(row))+"-"+string(A3(row+1)-1):row,string(B3(col))+"-"+string(B3(col+1)-1):col)
This line of code converts the sequence numbers in row field and col field into the corresponding intervals. run function performs the same computation on every member of A6 (a member is a row, where, for instance, row=1 and col=1). string function converts a number into a string. The expression “A3()” gets A3’s members by their sequence numbers, A3(1), for instance, is 560. A8’s result is as follows:
The result of A8 contains the three fields
the report requires. Then we only need to combine them into a new
two-dimensional table and return it to the reporting tool through JDBC
interface. This job will be done in A9 with the code result
A8.new(row,col,rate).
Note: esProc provides the operator
parentheses to compute the expressions separated by commas in order and return
the last expression’s value. With the parentheses, the code from A4 to A7 can be encapsulated into a single
line:
A4=A1.select(empirica_score>=A3(1)
&& mfin_score>=B3(1)).group(
A3.pselect(empirica_score<~[1]):row,
B3.pselect(mfin_score<~[1]):col;
(accounts=~,A2.count(accounts.(account_no).pselect(~==custno))
/
B2.count(accounts.(account_no).pselect(~==customer_code))):rate
)
The result is as follows:
A9 is the data set the reporting tool needs. Now let’s design a simple crosstab with JasperReport in the following template:
Three points should be noted: Don’t place the crosstab in the detail band; configure the property of Data Pre Sorted as true; define parameters corresponding to those in the esProc script in the report, such as pRowLlist and pColList. A preview of the report is as follows:
The reporting tool calls the esProc script
in the same way as that in which it calls the stored procedure. Save the esProc
script as, say unregul.dfx, to be
called by unregul $P{pRowList},$P{pColList} in
JasperReport’s SQL designer. See related documents for detailed integration
solution.
No comments:
Post a Comment