During processingthe text file,sometimes we
need to remove duplicate rows from the grouped data. But the operation becomes
complicated when the file under processing is too big to be entirely loaded
into the memory. esProc’s group operation supports importing a whole group of
data with the file cursor, as well as many options. So it can handle this kind
of operation effortlessly. The following example will show you how it works.
The file EPRom.log has four columns separated by tab and its data have been
grouped by the second column. Duplicate rows need to be removed (only the first
row of each group is wanted). Some of the source data are as follows:
esProc code for doing this:
A1=file("E:\\EPRom.log").import()
This line of code imports the file into the memory. By default, the separator is tab and column names are _1,_2,_3……. If it is a CSV file, the separator needs to be specified using the code import(;”,”). If the file’s first row contains column names, @toption can be used to import them, that is, import@t(). The result of A1 is as follows:
A2=A1.group@1o(_2)
This line of code gets the first row from every group.The group field is _2, the second field. This is the case’s final result, as shown below.
By default, group function will regroup the data. For instance, A1.group(_2)will divide A1 into two groups by the second field, as the following shows:
But the use of@o option won’t make that happen. For instance, result of A1.group@o(_2)is as follows:
With @1
option, the function will get the first row of every group. By using both @1 and @o, we’ve hit the target of this case.
A1=file("E:\\EPRom.log").cursor()
This line of code opens the log file in the
form of cursor. cursor function
returns a cursor object according to the corresponding file object, with tab
being the separator and _1,_2…_n being column names by default. Notice that the
code merely creates the cursor object without importing data. Data importing
will be started by for statement or fetch function.
B1=
file("e:\\result.txt")
This line of code creates a file object to
which the computed results can be exported later on.
A2:for A1;_2
This line of code imports data from cursor
A1 by loop. Each time it imports a group of data with the same second column
(column nameis _2). In this step the data are truly being imported into the
memory.
Here for
statement deserves special attention. In esProc, for
cs,nimportsn records from cursor
cseach time. for
cs;ximports a group of records with the same x field from cursor cseach
time, the condition is that records have been grouped by x field.
The x
in for cs;xstatement can be an expression,
which means multiple rows will be imported until the result of computing expression
x changes. Take for A1 ;floor(_1/5)as an example. It divides _1
field by 5 androunds the result off, put the records with the same results into
the same group, like the first row to the fifth row.
B2=file("e:\\result.txt").export@a([A2(1)])
As the loop body of forstatement in A2, it processes every group of data in the same
way. The method is to get the first row of the current group and append it to
file result.txt. A2 is the loop
variable which represents all records in the current group. A2(1) represents
the first record in 2. exportfunction
is used to write the structured data into a new file, its @a option means appending. Since A2(1) is a single record, it needs
to be converted into array with the operator [].
In esProc, the working rangeof for statement can be represented by indentation instead of the parentheses or identifiers like begin/end. In the following code block, for instance, B2-B5 is A2’s working range.
No comments:
Post a Comment