12/09/2014

esProc Related Computing – Create and Derive New Table Sequences

Table sequences are frequently used in esProc. A table sequence can be retrieved from a database, created based on data in a sequence or another table sequence using new function, or derived by using derive function to add a single or multiple columns to an existing record sequence or table sequence.

1. Create a table sequence based on existing data

A.new() function can be used to create a new table sequence based on the data of an existing sequence or table sequence. For example:

A1 is a simple sequence. A2 is a table sequence of employee information got from a database. A3 is a sequence consisting of records of employees of the state of Texas. A4, A5 and A6 create new table sequences respectively based on the sequence, table sequence and record sequence using new function.

The table sequence in A4 is:

The table sequence in A5 is:

The table sequence in A6 is:

As can be seen from the above results, A.new() function computes the expression by looping each member in A and enter each result into the new table sequence. new function can reference members of A; if A is a table sequence or a record sequence, field values of its records can be used for the computation. By default, the field names of the new table sequence are the expressions used for generating the table sequence, for instance, age(BIRTHDAY). If the fields in a new table sequence are directly from the old one, the old field names will be used in the case that no new names are given to the fields, like EID and GENDER, which are actually expressions for generating the new fields. Data can be selected as needed to create a new table sequence.

It is inconvenient to use expressions as the field names of the new table sequence. Usually new names will be given to the fields of the new table sequence. For example:

Results of A4, A5 and A6 are as follows:
It can be seen that, in order to rename the new fields while using new function, one just add the desired names after the expressions for generating the corresponding fields, with a colon separating the name and the expression. For instance, NAME+" "+SURNAME:FullName represents that employees’ full names are computed according to the NAME and SURNAME fields of the records in the original table sequence and the corresponding field in the new table sequence is named FullName. In this case, new function works on the basis of the old sequence, table sequence or record sequence, so it cannot use the field names of the resulting table sequence, like FullName.

By the way, if it is a sequence consisting of continuous numbers that according to which the new table sequence is created, such as the sequence in A1, the expression can be simply written as n.new(). For example:

A1 is the abbreviated form of the expression in A4 in the previous example, and both have the same results:

A3 executes new function with numbers likewise and generates 100 records. Put the records of the first 100 employees in the employee table into five groups randomly and the resulting table sequence is as follows:

When new function is used to create a new table sequence, the fields of the new table sequence are allowed to be referenced in its expression. For example:

So A2 computes the employees' full names and references the full name of the previous employee in the following field when using new function to create a new table sequence. Its result is as follows:

Sometimes the new field names are allowed to be the same as those in the original table sequence. For instance, NAME is still used in A3 to be the name of the field of employees' full names: 

But notice that a problem thus arises in A4's code, in which NAME[-1]:PREV is used to reference the name of the previous employee and name the new field PREV. Since the original table sequence in A1 also has a NAME field, which takes the priority over the one in the new sequence if the two have the same names, A4 gets the following result:
It can be seen that A4 only gets the employee names in the original table sequence, rather than the newly computed full names.

In this case, one can only get the PREV field using the field in A1, like what A5 does. Result is as follows:

2.Create a sequence based on existing data

The loop function A.(x) can also be used to generate data using an existing sequence, table sequence or record sequence. Let's learn more about it and compare it with A.new(). For example:

Results of A2 and A3 are as follows:

It can be seen that the results are sequences instead of table sequences. Generally one can only compute a single field with the loop function A.(x).

If multiple fields are to be computed, the code should include these multiple fields, like what A4 does. Result of A4 is as follows:

3.Derive a new table sequence based on a sequence or a record sequence

Sometimes it is no need to create a new table sequence. One just needs to add data as required to the existing ones. In this case, derive function can be used to add computed fields to a table sequence or a record sequence and thus derive a new one. For example:

Data in the original table sequence in A1 are as follows:

A2 derives a new table sequence by using derive function to add an age field to the original table sequence. Result is as follows:

It can be seen that an age field which is computed according to BIRTHDAY field has been appended to the original fields. The new field hasn't been renamed and so the expression age(BIRTHDAY) for generating it is its name. Actually derive function can be considered as the simplified writing style of new function, it also generates a new table sequence when executed. But the new table sequence in A2 won't have any impact on the table sequence in A1.

A3 adds the FullName field to the table sequence in A2 and gets the following table sequence:

It is clear that all the fields of the table sequence in A2 have been copied, only with a new field being appended. In the expression in A3, the new field is named FullName. Similarly, the operation in A3 won’t affect the result of A2.

A4 reuses the original table sequence in A1 and adds a new field named FullName to it, but generates no expression. Result is as follows:

It can be seen that, without an expression, the program will generate a null field and return it.

Multiple fields can be added simultaneously through derive function. In this example, result of A2 is as follows:

It is the same as the result got by adding fields step by step.

A3 adds an Age field and a Group field which is computed with Age field and GENDER field to the original table sequence. Result is as follows:

The expression in A3 not only adds two fields to the original sequence simultaneously, but references the data of the new Age field in the second field – Group. With derive function a generated field can be referenced by a field generated later than it. But please note that, in esProc, if a number needs to be concatenated with a string, first it should be converted to a string.
Another thing worthy of note is that, like new function, derive function will also create a new table sequence rather than purely add a column to the existing one. Each record will be recreated in the new table sequence during the execution. Thus both derive function and new function are not sufficiently efficient. But the efficiency can be much increased if one executes derive only once when adding multiple fields. So the coding method in A2 is better than the step-by-step method in the previous example. Even some data cannot be specified at the beginning, null fields are allowed to be generated during the execution of derive function and they can get assigned by T.run() function later. For example:

With this method of executing derive function, it is unnecessary to recreate the records in the new table sequence and only value assignment is needed. So this method is more efficient than the two-step execution.

4.Create a table sequence using new function and derive function consecutively

In some cases, a new table sequence requires adding some computed columns but, at the same time, abandoning all the fields of the original table sequence or adjusting the order of the new fields. The consecutive use of new function and derive function can deal with these operations. For example:

After FullName is computed according to the employees' NAME and SURNAME in the table sequence in A1, NAME and SURNAME are not needed any more. Besides, the fields need to be rearranged. Move FullName forward, for example, as the fields generated by derive function will always come last. To do this A2 uses derive function and new function together. Result of A2 is as follows:

A2 is equal to the effect of two-step execution that includes A3 and A4. When they are used consecutively, new function is actually executed based on the result of executing derive function, like the result A3 gets:

So new function can reference the fields in the original table sequence, like EID, or the new field – FullName generated by derive function.

It is more common to first add fields using derive function and then arrange the result using new function when they are used consecutively. But one can also arrange the desired fields first using new function and then generate the computed fields using derive function. For example:

In this example, Group field needs to be computed using the new Age field, so A2 first arranges the table sequence in A1 using new function and then computes Group field using derive function. Result of A2 is as follows:

It needs to be made clear that this time derive function is executed based on the result of new function, and thus no data of the original table sequence in A1 can be referenced, like NAME, SURNAME, and etc. 

No comments:

Post a Comment