The concept of cursor is very important for the database. With the
cursor, data can be manipulated more flexibly and returned from the data table by
rows. esProc supports many types of cursor, like database cursor, file cursor
and in-memory record sequence cursor, to satisfy various needs in data fetching
and processing. In this article, we’ll introduce what the cursor used in esProc
is and its three types.
1. What is cursor
Using the cursor in the database enables returning the result in
batches, instead of trying to retrieve all of it into the memory at a time. The
cursor is like a pointer, it fetches one row of data each time from the result
set by moving its own position. This mechanism of retrieving data by rows is
usually used to process big data table, which cannot be entirely loaded into
the limited memory.
Compared with directly returning all result, the database cursor is
flexible enough to avoid the memory overflow but its speed of processing is not
very fast.
There are some differences, however, between esProc cursor and
database cursor:
1) esProc cursor is only used to
get data without modifying the result set.
2) Data are traversed only once
from front to back during fetching data from the esProc cursor, which is
similar to the setting of TYPE_FORWARD_ONLY in JDBC.
3) esProc cursor can be created
based on either the database or the data file, or the in-memory record
sequence.
Different types of esProc cursors have similar usages and we can mix
and match them to suit the requirements.
2. Database cursor
With db.cursor(sql) function, data returned from the database
can be converted into the cursor. For example:
A1 creates a database cursor. A2 fetches the first 100 records from it, as shown below:
Data fetched from the cursor with cs.fetch() will be
returned as a table sequence.
As can be seen from EID field, A3 fetches the records from the 101th
employee. In this example, cs.fetch() is used merely to fetch
data from the cursor. For more usages of the esProc cursor, please refer to esProc External Memory Computing: Basic
Usages of the Cursor.
If the data in the cursor haven’t been all fetched out when data
fetching is over, use cs.close() to close the cursor. It is very
important to close the cursor, particularly the database cursor, timely.
Because the database cursor gets data from the database, the existence of it
means the connection to the database will remain, causing unnecessary memory
usage and even the overrun of cursors allowed to be created by the database.
In SQL statements, parameters can be represented by the question mark (?). They are entered in a certain order after it and separated by the comma. A1’s cursor will return data of female employees whose EID is greater than 100. A2 returns all data of A1’s cursor, as shown below:
The number of rows
of data to be returned is not specified for the cs.fetch() function used
in A2, thus all data in the cursor will be returned. When data in a cursor have all been fetched out, it will close automatically
with no need of calling cs.close().
3. External file cursor
In practice, particularly big data processing, data are often generated from the file rather than the database. Given this, esProc provides the function of creating the external file cursor for data fetching based on the structured file. The following text file PersonnelInfo.txt holds the personnel information:In the file, each row holds a record and columns are separated by tabs. The external file cursor can be created according to the file for data fetching. For example:
A2 creates a cursor based on the external file and uses @t option to make the file’s first row the column names of the result set. A3 skips the first 50,000 rows and then A4 fetches 100 rows from A2’s cursor, as shown below:
As the database cursor, data are fetched from the external file
cursor through a single one-way traversal. After A3 skips certain rows using cs.skip(), the data fetching in A4 will begin
from the 50,001th row.
The code in the above cellset is almost the same as that the previous one, except that A2 creates the external file cursor using specified fields.
Note: It’s only after field names are imported using @t option that they can be used to
specify fields. Without the option, fields can only be specified directly by
their positions, like =A1.cursor(#1,
#2, #6, #3).
About the external file cursor, we’ll make a deeper explanation in esProc External Memory Computing: Text Files.
4. In-memory record sequence cursor
In esProc, the result table sequence or record sequence can be converted into the cursor to be used with other types of cursor. For example:The following is data of the states A1 selects:
A2 creates a cursor using the table sequence:
Generally the in-memory record sequence is converted into the cursor
to perform some operations based on the cursor, like merging and joining the
cursor data. For details in this respect, please refer to esProc External Memory Computing: Merge and Join Cursor Data.
The result of the cellset file returned by the result statement can be converted into a cursor, as shown below:
A2 fetches the first 100 rows from the A1’s cursor:
This is only a simple conversion of a result table sequence got in
another cellset file into the in-memory record sequence cursor, which is a kind
of cross-cellset cursor in this case. More usages of the cross-cellset cursor
will be illustrated in the esProc
External Memory Computing: Cross-cellset Cursor.
No comments:
Post a Comment