Sometimes we need to fetch certain data
from multiple files of a multi-level directory during text processing. The
operation is too complicated to be well performed at the command line. Though
it can be realized in high-level languages, the code is difficult to write; and
the involvement of big files will increase the difficulty. esProc, however, can
import big files with cursors and call the script recursively and thus can
process the data fetching in batch. The following example will show its way of
doing it.
A directory - "D:\files" – has
subdirectories of multiple levels. Each subdirectory has many files of text
format. We are asked to fetch a specified line (say the second line) from each
of these files and write them into a new file – result.txt. Part of the structure
of D:\files is as follows:
esProc code for doing this:
First define a parameter, path, and set its initial value as "D:\files" so as to get data from this directory, as shown below:
A1=directory@p(path)
directory function is used to get the file list in the root directory of the parameter,
path. @p option means file names should be presented with full path. The
following shows some of the results:
A2=A1.(file(~).cursor@s()). This line of code opens A1's files respectively in the form of
cursors. A1.(…) means processing A1's members in proper order; "~" represents
the current member; file function is
used to create a file object and cursor
function will return a cursor object according to the file object.
A3=A2.((~.skip(1),~.fetch@x(1))). This line of code fetches
the second row from A2’s each file cursor. A2.(…) means
computing A2's cursors one by one. (~.skip(1),~.fetch@x(1)) means computing the expression
in the parentheses in order and returning the last computed result. ~.skip(1)
means skipping a row. ~.fetch@x(1) means fetching the row at the current position (i.e. the
second row) and closing the cursor. @x means closing the cursor automatically
after the data are fetched. ~.fetch@x(1) represents the result which the parentheses operator
will return.
skip function skips multiple rows. You can determine how many
rows need to be skipped through a parameter. fetch function fetches multiple rows. Fetch two rows starting from
the 10th row, for example, the code is ~.skip(10),fetch@x(2).
A4=A3.union(). This line of code unions the results in A4 together. union function is used to realize the union operation, removing the duplicate data at the same time. For example, the code for computing the union of two sets: [1,2] and [2,3] is [1,2],[2,3]].union() and the result is [1,2,3]. If duplicate data are wanted, conj function (for concatenation) should be used. Some of the results of A4 are as follows:
A5=file("d:\\result.txt").export@a(A4). This line of code
exports the results of A4 to result.txt.
export function is used to write data
to a file. @a option means appending.
At this point, all data have been fetched
as required from the current directory. The rest of the work is to fetch the
subdirectories of the current directory and to call this script recursively.
A7=A6.(call("c:\\readfile.dfx",~)). This
line of code deals with A6's members (the subdirectories). The operation is to
call the esProc script - c:\\readfile.dfx,
and makes the current member (one of the subdirectories) as the input
parameter. Note that readfile.dfx is
the name of this script.
No comments:
Post a Comment