Friday, July 11, 2014

SQL*Loader and external Tables

This post gives a brief introduction to the two oracle’s technologies for loading external data into a database tables – SQL*Loader (SQLLDR) and External Table and later provides some guidelines on when to chose what.

SQL*Loader (SQLLDR)

SQL*Loader loads data from external files into tables in the Oracle database. SQL*Loader uses two primary files: the datafile, which contains the information to be loaded, and the control file, which contains information on the format of the data, the records and fields within the file, the order in which they are to be loaded, and even, when needed, the names of the multiple files that will be used for data.
 When executed, SQL*Loader will automatically create a log file and a “bad” file. The log file records the status of the load, such as the number of rows processed and the number of rows committed. The “bad” file will contain all the rows that were rejected during the load due to data errors, such as nonunique values in primary key columns. Within the control file, you can specify additional commands to govern the load criteria. If these criteria are not met by a row, the row will be written to a “discard” file. The log, bad, and discard files will by default have the extensions .log, .bad, and .dsc, respectively.
It has two modes of operation:
• Conventional path: SQLLDR will employ SQL inserts on our behalf to load data.
• Direct path: SQLLDR does not use SQL in this mode; it formats database blocks directly.
The direct path load allows you to read data from a flat file and write it directly to formatted database blocks, bypassing the entire SQL engine, undo generation and, optionally, redo generation at the same time. Parallel direct path load is among the fastest ways to go from having no data to a fully loaded database.

External Tables

External tables were first introduced in Oracle9i Release 1. Put simply, they allow us to treat an operating system file as if it is a read-only database table. External tables have limits—
1. No DML operations are allowed.
2. Indexes cannot be created. The lack of indexes on external tables does not have to be aTnegative factor in application performance. Queries of external tables complete very quickly, even though a full table scan is required with each access. There is I/O involved, but modern I/O systems use caching and RAID techniques to significantly reduce the performance penalty associated with repeated full scans of the same file.
3. Virtual columns cannot be added.
4. Constraints cannot be added.
Access Driver
When you create an external table, you specify its type. Each type of external table has its own access driver that provides access parameters unique to that type of external table. The access driver ensures that data from the data source is processed so that it matches the definition of the external table.
The default type for external tables is ORACLE_LOADER, which lets you read table data from an external table and load it into a database. Oracle Database also provides the ORACLE_DATAPUMP type, which lets you unload data (that is, read data from a table in the database and insert it into an external table) and then reload it into an Oracle database.
To access external files from within Oracle, you must first use the create directory command to define a directory object pointing to the external files’ location. After this, execute the create table command with the organization external clause.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
create directory BOOK_DIR as 'e:\oracle\external';
grant read on directory BOOK_DIR to practice;
grant write on directory BOOK_DIR to practice;
create table BOOKSHELF_EXT
(Title VARCHAR2(100),
Publisher VARCHAR2(20),
CategoryName VARCHAR2(20),
Rating VARCHAR2(2))
organization external
(type ORACLE_LOADER
         default directory BOOK_DIR
         access parameters (records delimited by newline
         fields terminated by "~"
         (Title CHAR(100),
         Publisher CHAR(20),
         CategoryName CHAR(20),
         Rating CHAR(2)))
location ('bookshelf_dump.lst'));

External Tables vs SQL*Loader

Some of the key functionality features that external tables have over SQLLDR in my experience are as follows:
• The ability to use complex WHERE conditions to selectively load data. SQLLDR has a WHEN clause to select rows to load, but you are limited to using only AND expressions and expressions using equality—no ranges (greater than, less than), no OR expressions, no IS NULL, and so on.
• The ability to MERGE data. You can take an operating system file full of data and update existing database records from it.
• The ability to perform efficient code lookups. You can join an external table to other database tables as part of your load process.
• Easier multitable inserts using INSERT. Starting in Oracle9i, an INSERT statement can insert into one or more tables using complex WHEN conditions. While SQLLDR can load into multiple tables, it can be quite complex to formulate the syntax.
• A shallower learning curve for new developers. SQLLDR is “yet another tool” to learn, in addition to the programming language, the development tools, the SQL language, and so on. As long as a developer knows SQL, he can immediately apply that knowledge to bulk data loading, without having to learn a new tool (SQLLDR).
SQLLDR should be chosen over external tables in the following three situations:
• You have to load data over a network—in other words, when the input file is not on the database server itself. Because the access driver is part of the Oracle software, only the files accessible to the database can be access as external tables.
• Multiple users must concurrently work with the same external table processing different input files.
• You have to work with LOB types. External tables do not support LOBs.



Oracle external tables

Oracle external tables  may help you to import external  flat file data like comma seperated values (CSV-Format).
We briefly discuss the concept of oracle external tables and then define a sample external table of type ORACLE_LOADER connected to a flat CSV-file.

Concept of External Tables

Since Oracle 9i external tables provide a comfortable mean for directly accessing external source data with SQL as it were in a table of your database.  With external tables you can load data into your database, e.g. as part of an ETL process.
External tables provide an alternative for using SQL*Loader, a CLI-based utility provided by oracle, which is a more powerfull but more complicated, too. Typically you will use SQL*Loader for migrating a full scale data base into your oracle database server. By contrast External tables are the means to enter specifically external data channels into your data model. Often this is done periodically as part of a staging process.
See for a comparative list of diffrences between external tables and SQL*Loader.
The concept of external tables let you provide an external file in the filesystem of the database server, define the flatfiles row structure and then access this data like any other table directly with SQL. Compared to querying a relational data base table, such access is slow. Every access to a external table performs a ‘full table scan’ through the entire file. Thus normally you should access an external table only once, to insert the external data into a relational data table inside your DB tablespace.
The external table is defined with a specific DDL-command of the form SQL CREATE TABLE…ORGANIZATION EXTERNAL statement.  In the reminder of this article we go through the standard case of importing a flatfile with comma seperated values, a so-called CSV-File.
External tables access is bound via file driver. There are two drivers types. The driver ORACLE_LOADER only support upload of flatfile data into the relationals database. If you wan’t to INSERT or UPDATE  data in a flatfile you need to use the DATAPUMP Driver. Note that even is External tables and SQL*LOADER are different instruments to work with, both may still use the same driver underneath.

Creating an External Table

This DDL-creates an external table ext_csv_emp, that access the flat file data with driver type ORACLE_LOADER.
?
CREATE TABLE ext_csv_emp
(
employee_number      NUMBER(8,0),
employee_job         VARCHAR2(64 BYTE),
employee_last_name   VARCHAR2 (40),
employee_first_name  VARCHAR2 (40),
employee_middle_name VARCHAR2 (40),
employee_hire_date   DATE)
ORGANIZATION EXTERNAL
(
    TYPE ORACLE_LOADER
    DEFAULT DIRECTORY "TMP_LPNO_DIR"
    ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    badfile TMP_LPNO_DIR: 'ext_csv_emp.bad'
    logfile TMP_LPNO_DIR: 'ext_csv_emp.log'
    skip 1
    FIELDS TERMINATED BY ';' MISSING FIELD VALUES ARE NULL
( employee_number      CHAR(8),
employee_job         CHAR(20),
employee_last_name   CHAR(18),
employee_first_name  CHAR(11),
employee_middle_name CHAR(11),
employee_hire_date   CHAR(10) date_format DATE mask "mm/dd/yyyy"
)
)
LOCATION ('ext_csv_emp.csv')
);
Such a external table is visible in SQL Developer like
 Picture: External table in SQL Developer
Syntactically spoken, an external table is a relational table with specific physical propertiesdefined by the keywords ORGANIZATION EXTERNAL followed by an external table clause.
We choose the file access driver by the type parameter ORACLE_LOADER, then we define the external data properties. First we  specify the server path by referring an Oracle Directory name, we defined before.  Next we define the access parameters. CSV data rows are presented line by line, thus records are delimited by NEWLINE character.
Badfile is a logfile that contains all data rows that do not conform to the interface specification given in these access parameters. Bad rows are appended to badfile each time an SQL access the external table with SQL SELECT. Watch the size of this file and your free disk space on the database servers file system, when working on large CSV-datasources during setup of a new data fed process.
The logfile logs each access to the external table, by any SELECT on the database. What the size of this file. As the file may be looked by the database, you be obliged to delete the file while the database is shutdown.
In Switzerland the default field separator is a semicolon. Enter comma or whatever separator you want to use. Often it’s wise to use the local country defaults. Anyhow watch for the escape character problem. The field separator should not be part of the field data. If this can sometime not be avoided, we specify that fields may optionally be enclosed in double quotes, thus allowing the field separator be part of the field data.
The last part of the access parameters is the field list. These type definitions are opaque to SQL. They are not evaluated when SQL processes this DDL. You will note that type checking of the FIELDS list is deferred to runtime, when it is done by the driver specified in the file access type.  As we have chosen file access driver type ORACLE_LOADER we have to use SQL*LOADER field type here. Watch out that SQL*LOADER field type have similar names as SQL field types, but not the same. Don’t confuse and mix these in your external table definitions.

Sampe Input File

The sample inputfile ext_csv_emp.csv in path TMP_LPNO_DIR may be
EMPLOYEE_NUMBER;EMPLOYEE_JOB;LAST_NAME;FIRST_NAME;MIDDLE_NAME;HIRE_DATE
7412;Boss;Bedretti;June;Peter;01.09.2001
7413;Clerk;Moser;Mike;;01.09.1992
7415;DB Administrator;Miller;John;Jon;03.09.2008
The first line acts as a header line to hold the columns name.This is useful when checking the file with texteditors, EXCEL and the like. As we defined it, this first line is skipped. Of course it is your responsibility to assert, that this headerline is present, otherwise any first dataline might get skipped.
The external table can now be queried like
?
SELECT
  EMPLOYEE_NUMBER,   EMPLOYEE_JOB,
  EMPLOYEE_LAST_NAME,   EMPLOYEE_FIRST_NAME,
  EMPLOYEE_MIDDLE_NAME,   EMPLOYEE_HIRE_DATE
FROM   EXT_CSV_EMP ;
resulting in
 Picture: Result of SQL Query on External Table

Choosing Types in Access Parameters Definitions

You using other type than CHAR in the FIELDS-Clause like DATE or INTEGER you may spend a lot time with strange error messages in the LOG-File, that are difficult to interpret and impossible to debug or trace. Therefore I prefer to use CHAR in the access parameter FIELDS. Define it wide enough to hold any possible row values. Try to expect the unexpected.  In the sample above I rely on implicit type conversion of SQL.
If an erroneous DDL of an external table is accepted by SQL, you might get confused. Because of deference of access field type checking from compile time to runtime, you might get some unexpected, and maybe difficult to interpret messages when trying to access the external table.
If your input is more unstable, or if you want to make your data pipeline more bullet-prof define the external table like this and defer any type checking und conversion to a later step controlled by our own procedures:
?
CREATE TABLE ext_csv_emp2
    (
     employee_number      VARCHAR2 (8),
     employee_job         VARCHAR2(64 BYTE),
     employee_last_name   VARCHAR2 (40),
     employee_first_name  VARCHAR2 (40),
     employee_middle_name VARCHAR2 (40),
     employee_hire_date   VARCHAR2 (12))
  ORGANIZATION EXTERNAL
    (
    TYPE ORACLE_LOADER
    DEFAULT DIRECTORY "TMP_LPNO_DIR"
    ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    badfile TMP_LPNO_DIR: 'ext_csv_emp.bad'
    logfile TMP_LPNO_DIR: 'ext_csv_emp.log'
    skip 1
    FIELDS TERMINATED BY ';' MISSING FIELD VALUES ARE NULL
        ( employee_number      CHAR(8),
          employee_job         CHAR(64),
          employee_last_name   CHAR(40),
          employee_first_name  CHAR(40),
          employee_middle_name CHAR(40),
          employee_hire_date   CHAR(12)
          )
       )
     LOCATION ('ext_csv_emp.csv')
 );

Possible Errors

When setting up an external table the posiiblities for errors are manifold, most of them appear at runtime, an thus are painful.

Runtime Errors while driver is parsing Access Parameter Clause

As said the access parameter clause is only parsed at runtime. No checks are done at compile time.
When you get something like this
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-00554: error encountered while parsing access parameters
KUP-01005: syntax error: found "optionally": expecting one of: "column, (, reject"
KUP-01007: at line 6 column 8
29913. 00000 - "error in executing %s callout"
*Cause: The execution of the specified callout caused an error.
*Action: Examine the error messages take appropriate action.
recheck you DDL. This type of error message is from parsing a syntactical wrong access parameter clause. This message generator’s line number counting oddly starts at the beginning of the access parameter clause, not at the total line number of you DDL statement in the editor.
Thinking about these implicit given rules may help:
• Order and naming of the SQL Table and it’s access parameter fields must match
• Syntax of all access parameter pass unchecked at compile time. Study the syntax diagrams carefully.
• comment lines are accepted at compile time but rejected by the driver at runtime

Reject Limit Reached

 Picture: ORA-30653: reject limit reached
When you get this error your DDL and access parameters seems to be well defined. Parsing of CSV input data has started, but as it does not match the field definitions these rows are rejected. After reaching the reject limit the external data access is aborted.
With ORA-30653 it is time to check you LOG file. You will find further information like
LOG file opened at 09/03/12 13:53:50
Field Definitions for table EXT_CSV_EMP
Record format DELIMITED BY NEWLINE
Data in file has same endianness as the platform
Rows with all null fields are accepted
Fields in Data Source:
EMPLOYEE_NUMBER CHAR (2)
Terminated by ";"
Trim whitespace same as SQL Loader
EMPLOYEE_JOB CHAR (20)
Terminated by ";"
Trim whitespace same as SQL Loader
EMPLOYEE_LAST_NAME CHAR (18)
Terminated by ";"
Trim whitespace same as SQL Loader
EMPLOYEE_FIRST_NAME CHAR (11)
Terminated by ";"
Trim whitespace same as SQL Loader
EMPLOYEE_MIDDLE_NAME CHAR (11)
Terminated by ";"
Trim whitespace same as SQL Loader
EMPLOYEE_HIRE_DATE CHAR (12)
Terminated by ";"
Trim whitespace same as SQL Loader
KUP-04021: field formatting error for field EMPLOYEE_NUMBER
KUP-04026: field too long for datatype
KUP-04101: record 1 rejected in file C:\oraclexe\app\oracle\product\11.2.0\server\TMP_LPNO_DIR\ext_csv_emp.csv
KUP-04021: field formatting error for field EMPLOYEE_NUMBER
KUP-04026: field too long for datatype
KUP-04101: record 2 rejected in file C:\oraclexe\app\oracle\product\11.2.0\server\TMP_LPNO_DIR\ext_csv_emp.csv
KUP-04021: field formatting error for field EMPLOYEE_NUMBER
KUP-04026: field too long for datatype
KUP-04101: record 3 rejected in file C:\oraclexe\app\oracle\product\11.2.0\server\TMP_LPNO_DIR\ext_csv_emp.csv

Increase your fields characters widths or make sure your input columns in CSV are truncated to their maximum width before being delivered.

Prerequisited Directory Missing

The referenced directory must be defined and point to an server path accessible to the database server. Otherwise you get
SQL-ERROR: ORA-06564: object TMP_LPNO_DIR2 does not exist

No comments:

Post a Comment

How to improve blog performance

Improving the performance of a blog can involve a variety of strategies, including optimizing the website's technical infrastructure, im...