Data Process Engine: esProc Helps Process Structured Text in Java

While importing the structured text files into the database using Java alone, we need to combine the SQL statements together manually, and to deal with various troublesome situations as well, like if the data in a table has been existed, whether we should update it or insert data into it, if some fields are included in the file, and if the fields in the file are consistent with those in the table.

As esProc participates in Java programming, these problems can be solved without self-programmed code. Let's give an example to explain this clearly.

The text file sales.txt contains data of sales orders, with tab being the column separator. The first lines of data are as follows:

Now it is required to import the data of sales.txt into an empty table of the same structure.

First, write the script in esProc and name it test.dfx.

In the above script, import function is used to import the text file. tab is the column separator by default and the option @t means the first row is set as the column name. update function is used to import the data in A1 into the database in batches. OracleDB is the name of data source. Tsales is the table name.

With the script, all data has been imported into the database. Then we just need to call the script in Java code.

// create a connection using esProc jdbc

Class.forName("com.esproc.jdbc.InternalDriver");

con= DriverManager.getConnection("jdbc:esproc:local://");

// call the esProc script, whose name is test.

st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test(?)");

// execute the esProc script in a similar way as executing the stored procedure

st.execute();

After the above Java code is executed, sales.txt is imported by the esProc engine into table Tsales.

This script written in esProc is simple, so it can be embedded directly in Java code without creating a script file. The code is written as follows:

st.executeQuery("=OracleDB.update(file("E:\\sales.txt").import@t(),Tsales)");

Sometimes the table in the database may not be empty. In this case, we shouldn't import all data into it. We should first compare the data in the file with that in the table according to the primary keys instead. update statements will be generated if the data shares the same primary key and insert statements will be generated if the primary keys in the text file cannot be found in the table. There are several situations:

If the table in the database already has primary keys (it is OrderID in this example), the code remains the same as that in the above. In other words, the esProc engine can automatically compare the primary keys (including composite primary keys) of the database with the fields of the text file and generate corresponding update statements or insert statements.

But if there are no primary keys in the table, we just need to set them manually in the update function, that is, change the script in cell A2 into =OracleDB.update(A1,Tsales;OrderID).

If we don't want to change the original data in the table, we need to add a function option @i to the script to make esProc generate the insert statements only, like =OracleDB.update@i(A1,Tsales). Similarly, option @u means generating update statements only.

In the example above, we assumed that the table in the database and the text file are of the same structure. But, in reality, they may not have the same structure. For example, the table has three more fields than the text file: State, OrderYear and Memo. It is required to fill a default value "done" in the field State of the table, compute the years of OrderYear according to the data of OrderDate, and make Memo remains empty. To solve this problem, we just need to write script as follows:

In the script, derive function can add new fields, or computed columns, to A1.update function will automatically skip field Memo.

If the number of fields in the table is less than that in the text file, only some of the columns should be imported. For example:

=file("E:\\sales.txt").import@t(OrderID,SellerId,Amount,OrderDate)

This line of script code means field Client will be skipped and only the rest of four fields will be imported from the file.

In the example above, the first line of the text file happens to be the names of fields in the table. But sometimes there is no information about column names and we need to set them by ourselves. The code is as follows:

It can be seen that as the first line is not the column names, there is no need to use option @t in import function. According to the code in A1, the related data will get default names for its fields: _1, _2, _3 and so forth. Order_ID:_1 in update function represents importing field "_1" in A1 into field OrderID of the table. And the rest can be done in the same manner.

If the names of fields in the text file are different from those in the table. For example, field Client in the table becomes ClientID in the text file. We can use the same method to match them. The code is as follows:

The column separator in the example above is tab, but sometimes it is comma or other signs. In such cases, just modify import function into =file("E:\\sales.txt").import(; ",").

Data Process Engine

menu

11/02/2014

esProc Helps Process Structured Text in Java - Import data into the database

No comments:

Post a Comment