Using
esProc, it is quite convenient to group data in memory. There are several main
types of grouping based on how data is grouped. Here we’ll illustrate equal
grouping, alignment grouping and enumeration grouping respectively with an
example.
Equal grouping
That
data is grouped by one or more certain fields (or one or more computed fields
derived from fields) of the current data set is equal grouping. By this method
each group is a subset of the original data set.
Case description: Group
sales orders by the year.
Data description: The
original data is as follow:
The
above data set (table sequence) can be imported from a database or a file. For
example:
A1=file("E:/sales.txt").import@t()
esProc code:
Computed result:
Code explanation:
1.In this example, the grouping criterion
comes from the OrderDate field. The order dates can be converted to the years with
year(OrderDate), then data of the
same year will be grouped together.
2.There may be more than one
field for grouping data. For example, the data could be grouped by both the
year and the sellerID to put records of each seller in each year into a group. Below
is the code for doing it: A1.group(year(OrderDate),SellerId)
3.Often, the grouped data are to
be aggregated, like getting each year’s sales amount based on A2’s data. The
code is:
A2.new(year(OrderDate):y,~.sum(Amount):a)
Computed result is as follows:
Or we can combine grouping and
summarizing into one step:
A1.group(year(OrderDate):y,~.sum(Amount):a)
Alternatively, we can choose the
groups function with better
performance yet less flexibility:
A1.groups(year(OrderDate):y; sum(Amount):a)
Of course, sometimes we have to perform grouping and aggregate
separately in order to reuse the code and improve computational efficiency,
like the scenario in which one of A2’s group needs filtering and another one
requires relational computing. In another scenario, the summarized data of a
certain group is unusual and worth further study, then this group can be used
directly in the subsequent computations without the need of filtering it
again.
4. By default, esProc’s group
function will group data using hash algorithm. But for ordered data, the
comparison of adjacent rows, which is equivalent to merge operation, may have higher performance. This approach can be implemented
by using @o option with group function. For example:
A1.group@o(year(OrderDate),SellerId)
Alignment grouping
Equal grouping groups data by the field(s)
coming from within the dataset. If the grouping criterion is one or more fields
of another data set, a user-defined array, or a parameter list, etc., the grouping
model will be referred as an alignment grouping.
Different from the equal grouping,
alignment grouping may produce empty subsets, which means no members in the
original data can satisfy a certain grouping condition. It may also lead to
incomplete grouping, that is, there may be members that will not appear in any
group. Neither would happen with equal grouping.
Case
description: Group the orders table according to the
list of best 10 sellers selected by KPIs.
The orders table
in the previous example will also be used here. Data is stored in A1.
The best 10 sellers list is stored in B1 as follows:
The list of sellers may come from an
intermediate table, or be generated by a piece of code. It’s not important how it
is produced in this example.
Computed result:
Code explanation:
1. In this example, the
grouping criterion (list of sellers) is outside of the data set being grouped.
After data is grouped, each group contains only the data of one seller, and
groups are arranged according to the order of members in the sellers list.
2. Because sellers in the orders table outnumber the
best sellers, some of the orders will not appear in any group. We can use function
option @n to store those orders in one additional group, as shown below:
A1.align@a@n(B1:empID,SellerId)
This group will be put in the end:
3. Sometimes not all members of the grouping criterion will fall in the data set to be grouped. For instance, the grouping criterion is “a list of newly-employed sellers”. In this case, it’s normal to produce empty groups. If we modify the first record of the list into empID=100, the result will be:
Enumeration grouping
The grouping criterion for enumeration
grouping is even more flexible. It could be any boolean expression. The records
satisfying the value of the expression will be put into the same group.
Similar to alignment grouping, this is also
the incomplete grouping as it probably produces empty subsets or a result in
which some members are not included in any group. Moreover, this type of grouping
may have the result that certain members appear in more than one group.
Case
description: Dividing orders into four groups, they
are: A. order amount is less than 1,000; B. order amount is less than 2,000; C.
order amount is less than 3,000; D. order amount is less than 10,000. Special requirement:
data cannot be grouped repeatedly, that is, if an order has been in group A, it
must not be put into group B, C, or D.
Ungrouped
data set:
The orders table in previous examples will still
be used. Data is stored in A1.
esProc
code:
A2=["?<=1000","?<=2000","?<=3000","?<=10000"]
A3=A1.enum(A2,Amount)
Computed result:
Case explanation:
1. In this example, grouping criteria
are multiple flexible expressions. Each record will be compared with each of
the expressions. Those records that can match the same expression will be put
into the same group. Groups are arranged according to the order of the grouping
criteria as well.
2. By default, enumeration
grouping will not produce duplicate members in the result. Because after group
A’s data is selected out, expression B will be matched with the rest of the
records, as this example has shown so far. But the use of function option @r
allows us to match expression B with all records, which will produce duplicate
members. For example the result of A3=A1.enum@r(A2,Amount)
is as follows:
Likewise, if values of an enumeration expression fall outside of the data to be grouped, it will correspond to an empty group. Besides, if certain records cannot match any expression, function option @n can be used to group these surplus records together.
No comments:
Post a Comment