There are many
complicated SQL-style computations in database application development.
Interval merging is one of them, such as the summation of non-overlapping time
intervals and merging of overlapping time intervals, etc. Due to the lack of
orderliness of an SQL set, SQL solves the problems by resorting to recursive
method that is difficult to be used with a database that supports not enough
recursive functions. Besides, we need to write a nested SQL statement
containing multilayered subqueries that is too lengthy for realization and
modification.
In contrast, esProc has
an easier way for the computation. It merges intervals step by step through
intuitive script. Let’s look at an example.
Table Udetail stores user operation records in
detail. Some of the original data are as follows:
ID UID ST ET
1 1001 2014-9-1 10:00:00 2014-9-2 11:30:00
2 1001 2014-9-1 10:30:00 2014-9-2 11:00:00
3 1001 2014-9-3 11:00:00 2014-9-4 12:00:00
4 1001 2014-9-4 10:00:00 2014-9-5 13:00:00
5 1001 2014-9-4 15:00:00 2014-9-5 18:00:00
6 1002 2014-9-1 11:00:00 2014-9-2 11:30:00
7 1002 2014-9-1
10:30:00 2014-9-2 11:00:00
In this table, ST and ET represent
respectively the starting time and ending time of the operations and each user
may have overlapping operational time intervals. Based on a specified user, we are
to:
1.Merge
the overlapping time intervals so as to create a new record of time intervals;
2.Summarize
the total time of non-overlapping intervals.
esProc does it as
follows:
An explanation of the script:
A1=db1.query("select * from udetail where UID=?",arg1)
Retrieve data of the specified user from the database. arg1 is an external parameter. Suppose the value of it is 1001, then the query result is as follows:
A2= A1.sort(ST).select(ET>ET[-1]).run(max(ST,ET[-1]):ST)
This line of code first sorts A1’s table by the starting time (.sort(ST)) and selects records by the condition that each ET is greater than the previous one, i.e. to delete the intervals completely covered by others. For every two overlapping records, get the minimum ST value and the maximum ET value to create a new time interval. The result is as follows:
It can be seen that sets are
ordered in esProc, so the previous record can be referenced by its number. It
is very different from SQL. If the continuous time
intervals need to be combined into one interval, A2’s code can be like this:A2=A1.sort(ST).select(ET>ET[-1]).run(if(ST<ET[-1],ST[-1],ST):ST).group(ST;~.m(-1).ET:ET)
And the following result will
be got:
A3=A2.sum(interval@s(ST,ET))
Summarize the total overlapping time. The result is as follows:
If the total non-overlapping
time is wanted (without details), the code can be
A1.sort(ST).select(ET>ET[-1]).sum(interval@s(max(ST,ET[-1]),ET))
If the whole step of time is
relatively small, the code can be modified as
A1.(periods@s(ST,ET)).union().len()-2,
which gets the result by counting the number of time points.
Finally, an esProc script can
be called by the reporting tool or the Java program much like they call a
database. It returns a result in the form of ResultSet through JDBC provided
esProc. See related documents for more details.
No comments:
Post a Comment