# Data Warehousing Concepts & Installation's

Minnu's Blog on Informatica & Data warehouse concepts

## Monday, December 15, 2008

### Unix Function to Substract 'N' days from a date

# Function to subtract N days from a YYYYMMDD date, Korn Shell
# * To subtract one day from a date, use
# new_date=\$(DATE_MINUS '20000101' 1)
#
DATE_MINUS()
{
in_date=\$1 # store input
in_year=\${in_date%[0-9][0-9][0-9][0-9]} # extract YYYY
t=\${in_date#[0-9][0-9][0-9][0-9]} # extract MMDD
in_mon=\${t%[0-9][0-9]} # extract MM
in_day=\${t#[0-9][0-9]} # extract DD

# add 0 to month (trick to make month an unpadded integer)
in_mon=`expr \$in_mon + 0`

# subtract the required no. of days from the current day
in_day=`expr \$in_day - \$2`

# subtract months until day <= 0
while [ \$in_day -le 0 ]
do
in_mon=`expr \$in_mon - 1`

# if month is 0, then it is December of last year
if [ \$in_mon -eq 0 ]; then
in_year=`expr \$in_year - 1`
in_mon=12
fi

# add appropriate number of days to month
case \$in_mon in
135781012) in_day=`expr \$in_day + 31`;;
46911) in_day=`expr \$in_day + 30`;;
2)
if [ `expr \$in_year % 4` -eq 0 ]; then
if [ `expr \$in_year % 400` -eq 0 ]; then
in_day=`expr \$in_day + 29`
elif [ `expr \$in_year % 100` -eq 0 ]; then
in_day=`expr \$in_day + 28`
else
in_day=`expr \$in_day + 29`
fi
else
in_day=`expr \$in_day + 28`
fi
;;
esac
done

# the output is the formatted new value
printf "%.4d%.2d%.2d" \$in_year \$in_mon \$in_day
}

# Native Dynamic SQL

This post shows you how to use native dynamic SQL (dynamic SQL for short), a PL/SQL interface that makes your applications more flexible and versatile. You learn simple ways to write programs that can build and process SQL statements "on the fly" at run time.

Within PL/SQL, you can execute any kind of SQL statement (even data definition and data control statements) without resorting to cumbersome programmatic approaches. Dynamic SQL blends seamlessly into your programs, making them more efficient, readable, and concise.

This chapter discusses the following topics:

What Is Dynamic SQL?
The Need for Dynamic SQL
Using the EXECUTE IMMEDIATE Statement
Using the OPEN-FOR, FETCH, and CLOSE Statements
Tips and Traps for Dynamic SQL

## What Is Dynamic SQL?

Most PL/SQL programs do a specific, predictable job. For example, a stored procedure might accept an employee number and salary increase, then update the `sal` column in the `emp` table. In this case, the full text of the `UPDATE` statement is known at compile time. Such statements do not change from execution to execution. So, they are called static SQL statements.

However, some programs must build and process a variety of SQL statements at run time. For example, a general-purpose report writer must build different `SELECT` statements for the various reports it generates. In this case, the full text of the statement is unknown until run time. Such statements can, and probably will, change from execution to execution. So, they are called dynamic SQL statements.

Dynamic SQL statements are stored in character strings built by your program at run time. Such strings must contain the text of a valid SQL statement or PL/SQL block. They can also contain placeholders for bind arguments. A placeholder is an undeclared identifier, so its name, to which you must prefix a colon, does not matter. For example, PL/SQL makes no distinction between the following strings:

`'DELETE FROM emp WHERE sal > :my_sal AND comm < :my_comm''DELETE FROM emp WHERE sal > :s AND comm < :c'`

To process most dynamic SQL statements, you use the `EXECUTE` `IMMEDIATE` statement. However, to process a multi-row query (`SELECT` statement), you must use the `OPEN-FOR`, `FETCH`, and `CLOSE` statements.

## The Need for Dynamic SQL

You need dynamic SQL in the following situations:

• You want to execute a SQL data definition statement (such as `CREATE`), a data control statement (such as `GRANT`), or a session control statement (such as `ALTER` `SESSION`). In PL/SQL, such statements cannot be executed statically.
• You want more flexibility. For example, you might want to defer your choice of schema objects until run time. Or, you might want your program to build different search conditions for the `WHERE` clause of a `SELECT` statement. A more complex program might choose from various SQL operations, clauses, etc.
• You use package `DBMS_SQL` to execute SQL statements dynamically, but you want better performance, something easier to use, or functionality that `DBMS_SQL` lacks such as support for objects and collections. (For a comparison with `DBMS_SQL`, see Oracle9i Application Developer's Guide - Fundamentals.)

## Using the EXECUTE IMMEDIATE Statement

The `EXECUTE` `IMMEDIATE` statement prepares (parses) and immediately executes a dynamic SQL statement or an anonymous PL/SQL block. The syntax is

`EXECUTE IMMEDIATE dynamic_string[INTO {define_variable[, define_variable]... | record}][USING [IN | OUT | IN OUT] bind_argument    [, [IN | OUT | IN OUT] bind_argument]...][{RETURNING | RETURN} INTO bind_argument[, bind_argument]...];`

where `dynamic_string` is a string expression that represents a SQL statement or PL/SQL block, `define_variable` is a variable that stores a selected column value, and `record` is a user-defined or `%ROWTYPE` record that stores a selected row. An input `bind_argument` is an expression whose value is passed to the dynamic SQL statement or PL/SQL block. An output `bind_argument` is a variable that stores a value returned by the dynamic SQL statement or PL/SQL block.

Except for multi-row queries, the dynamic string can contain any SQL statement (without the terminator) or any PL/SQL block (with the terminator). The string can also contain placeholders for bind arguments. However, you cannot use bind arguments to pass the names of schema objects to a dynamic SQL statement. For the right way, see "Making Procedures Work on Arbitrarily Named Schema Objects".

Used only for single-row queries, the `INTO` clause specifies the variables or record into which column values are retrieved. For each value retrieved by the query, there must be a corresponding, type-compatible variable or field in the `INTO` clause.

Used only for DML statements that have a `RETURNING` clause (without a `BULK` `COLLECT` clause), the `RETURNING` `INTO` clause specifies the variables into which column values are returned. For each value returned by the DML statement, there must be a corresponding, type-compatible variable in the `RETURNING` `INTO` clause.

You can place all bind arguments in the `USING` clause. The default parameter mode is `IN`. For DML statements that have a `RETURNING` clause, you can place `OUT` arguments in the `RETURNING` `INTO` clause without specifying the parameter mode, which, by definition, is `OUT`. If you use both the `USING` clause and the `RETURNING` `INTO` clause, the `USING` clause can contain only `IN` arguments.

At run time, bind arguments replace corresponding placeholders in the dynamic string. So, every placeholder must be associated with a bind argument in the `USING` clause and/or `RETURNING` `INTO` clause. You can use numeric, character, and string literals as bind arguments, but you cannot use Boolean literals (`TRUE`, `FALSE`, and `NULL`). To pass nulls to the dynamic string, you must use a workaround. See "Passing Nulls".

Dynamic SQL supports all the SQL datatypes. So, for example, define variables and bind arguments can be collections, `LOB`s, instances of an object type, and refs. As a rule, dynamic SQL does not support PL/SQL-specific types. So, for example, define variables and bind arguments cannot be Booleans or index-by tables. The only exception is that a PL/SQL record can appear in the `INTO` clause.

You can execute a dynamic SQL statement repeatedly using new values for the bind arguments. However, you incur some overhead because `EXECUTE` `IMMEDIATE` re-prepares the dynamic string before every execution.

### Some Examples of Dynamic SQL

The following PL/SQL block contains several examples of dynamic SQL:

`DECLARE   sql_stmt    VARCHAR2(200);   plsql_block VARCHAR2(500);   emp_id      NUMBER(4) := 7566;   salary      NUMBER(7,2);   dept_id     NUMBER(2) := 50;   dept_name   VARCHAR2(14) := 'PERSONNEL';   location    VARCHAR2(13) := 'DALLAS';   emp_rec     emp%ROWTYPE;BEGIN   EXECUTE IMMEDIATE 'CREATE TABLE bonus (id NUMBER, amt NUMBER)';   sql_stmt := 'INSERT INTO dept VALUES (:1, :2, :3)';   EXECUTE IMMEDIATE sql_stmt USING dept_id, dept_name, location;   sql_stmt := 'SELECT * FROM emp WHERE empno = :id';   EXECUTE IMMEDIATE sql_stmt INTO emp_rec USING emp_id;   plsql_block := 'BEGIN emp_pkg.raise_salary(:id, :amt); END;';   EXECUTE IMMEDIATE plsql_block USING 7788, 500;   sql_stmt := 'UPDATE emp SET sal = 2000 WHERE empno = :1      RETURNING sal INTO :2';   EXECUTE IMMEDIATE sql_stmt USING emp_id RETURNING INTO salary;   EXECUTE IMMEDIATE 'DELETE FROM dept WHERE deptno = :num'      USING dept_id;   EXECUTE IMMEDIATE 'ALTER SESSION SET SQL_TRACE TRUE';END;`

In the example below, a standalone procedure accepts the name of a database table (such as `'emp'`) and an optional `WHERE`-clause condition (such as `'sal > 2000'`). If you omit the condition, the procedure deletes all rows from the table. Otherwise, the procedure deletes only those rows that meet the condition.

`CREATE PROCEDURE delete_rows (   table_name IN VARCHAR2,   condition IN VARCHAR2 DEFAULT NULL) AS   where_clause VARCHAR2(100) := ' WHERE ' || condition;BEGIN   IF condition IS NULL THEN where_clause := NULL; END IF;   EXECUTE IMMEDIATE 'DELETE FROM ' || table_name || where_clause;EXCEPTION   ...END;`

### Backward Compatibility of the USING Clause

When a dynamic `INSERT`, `UPDATE`, or `DELETE` statement has a `RETURNING` clause, output bind arguments can go in the `RETURNING` `INTO` clause or the `USING` clause. In new applications, use the `RETURNING` `INTO` clause. In old applications, you can continue to use the `USING` clause. For example, both of the following `EXECUTE` `IMMEDIATE` statements are allowed:

`DECLARE   sql_stmt VARCHAR2(200);   my_empno NUMBER(4) := 7902;   my_ename VARCHAR2(10);   my_job   VARCHAR2(9);   my_sal   NUMBER(7,2) := 3250.00;BEGIN   sql_stmt := 'UPDATE emp SET sal = :1 WHERE empno = :2      RETURNING ename, job INTO :3, :4';   /* Bind returned values through USING clause. */   EXECUTE IMMEDIATE sql_stmt      USING my_sal, my_empno, OUT my_ename, OUT my_job;   /* Bind returned values through RETURNING INTO clause. */   EXECUTE IMMEDIATE sql_stmt      USING my_sal, my_empno RETURNING INTO my_ename, my_job;   ...END;`

### Specifying Parameter Modes

With the `USING` clause, you need not specify a parameter mode for input bind arguments because the mode defaults to `IN`. With the `RETURNING` `INTO` clause, you cannot specify a parameter mode for output bind arguments because, by definition, the mode is `OUT`. An example follows:

`DECLARE   sql_stmt VARCHAR2(200);   dept_id  NUMBER(2) := 30;   old_loc  VARCHAR2(13);BEGIN   sql_stmt :=      'DELETE FROM dept WHERE deptno = :1 RETURNING loc INTO :2';   EXECUTE IMMEDIATE sql_stmt USING dept_id RETURNING INTO old_loc;   ...END;`

When appropriate, you must specify the `OUT` or `IN` `OUT` mode for bind arguments passed as parameters. For example, suppose you want to call the following standalone procedure:

`CREATE PROCEDURE create_dept (   deptno IN OUT NUMBER,   dname  IN VARCHAR2,   loc    IN VARCHAR2) ASBEGIN   SELECT deptno_seq.NEXTVAL INTO deptno FROM dual;   INSERT INTO dept VALUES (deptno, dname, loc);END;`

To call the procedure from a dynamic PL/SQL block, you must specify the `IN` `OUT` mode for the bind argument associated with formal parameter `deptno`, as follows:

`DECLARE   plsql_block VARCHAR2(500);   new_deptno NUMBER(2);   new_dname  VARCHAR2(14) := 'ADVERTISING';   new_loc    VARCHAR2(13) := 'NEW YORK';BEGIN   plsql_block := 'BEGIN create_dept(:a, :b, :c); END;';   EXECUTE IMMEDIATE plsql_block      USING IN OUT new_deptno, new_dname, new_loc;   IF new_deptno > 90 THEN ...END;`

## Using the OPEN-FOR, FETCH, and CLOSE Statements

You use three statements to process a dynamic multi-row query: `OPEN-FOR`, `FETCH`, and `CLOSE`. First, you `OPEN` a cursor variable `FOR` a multi-row query. Then, you `FETCH` rows from the result set one at a time. When all the rows are processed, you `CLOSE` the cursor variable. (For more information about cursor variables, see "Using Cursor Variables".)

### Opening the Cursor Variable

The `OPEN-FOR` statement associates a cursor variable with a multi-row query, executes the query, identifies the result set, positions the cursor on the first row in the result set, then zeroes the rows-processed count kept by `%ROWCOUNT`.

Unlike the static form of `OPEN-FOR`, the dynamic form has an optional `USING` clause. At run time, bind arguments in the `USING` clause replace corresponding placeholders in the dynamic `SELECT` statement. The syntax is

`OPEN {cursor_variable | :host_cursor_variable} FOR dynamic_string   [USING bind_argument[, bind_argument]...];`

where `cursor_variable` is a weakly typed cursor variable (one without a return type), `host_cursor_variable` is a cursor variable declared in a PL/SQL host environment such as an OCI program, and `dynamic_string` is a string expression that represents a multi-row query.

In the following example, you declare a cursor variable, then associate it with a dynamic `SELECT` statement that returns rows from the `emp` table:

`DECLARE   TYPE EmpCurTyp IS REF CURSOR;  -- define weak REF CURSOR type   emp_cv   EmpCurTyp;  -- declare cursor variable   my_ename VARCHAR2(15);   my_sal   NUMBER := 1000;BEGIN   OPEN emp_cv FOR  -- open cursor variable      'SELECT ename, sal FROM emp WHERE sal > :s' USING my_sal;   ...END;`

Any bind arguments in the query are evaluated only when the cursor variable is opened. So, to fetch from the cursor using different bind values, you must reopen the cursor variable with the bind arguments set to their new values.

### Fetching from the Cursor Variable

The `FETCH` statement returns a row from the result set of a multi-row query, assigns the values of select-list items to corresponding variables or fields in the `INTO` clause, increments the count kept by `%ROWCOUNT`, and advances the cursor to the next row. The syntax follows:

`FETCH {cursor_variable | :host_cursor_variable}   INTO {define_variable[, define_variable]... | record};`

Continuing the example, you fetch rows from cursor variable `emp_cv` into define variables `my_ename` and `my_sal`:

`LOOP   FETCH emp_cv INTO my_ename, my_sal;  -- fetch next row   EXIT WHEN emp_cv%NOTFOUND;  -- exit loop when last row is fetched   -- process rowEND LOOP;`

For each column value returned by the query associated with the cursor variable, there must be a corresponding, type-compatible variable or field in the `INTO` clause. You can use a different `INTO` clause on separate fetches with the same cursor variable. Each fetch retrieves another row from the same result set.

If you try to fetch from a closed or never-opened cursor variable, PL/SQL raises the predefined exception `INVALID_CURSOR`.

### Closing the Cursor Variable

The `CLOSE` statement disables a cursor variable. After that, the associated result set is undefined. The syntax follows:

`CLOSE {cursor_variable | :host_cursor_variable};`

In this example, when the last row is processed, you close cursor variable `emp_cv`:

`LOOP   FETCH emp_cv INTO my_ename, my_sal;   EXIT WHEN emp_cv%NOTFOUND;   -- process rowEND LOOP;CLOSE emp_cv;  -- close cursor variable`

If you try to close an already-closed or never-opened cursor variable, PL/SQL raises `INVALID_CURSOR`.

### Examples of Dynamic SQL for Records, Objects, and Collections

As the following example shows, you can fetch rows from the result set of a dynamic multi-row query into a record:

`DECLARE   TYPE EmpCurTyp IS REF CURSOR;   emp_cv   EmpCurTyp;   emp_rec  emp%ROWTYPE;   sql_stmt VARCHAR2(200);   my_job   VARCHAR2(15) := 'CLERK';BEGIN   sql_stmt := 'SELECT * FROM emp WHERE job = :j';   OPEN emp_cv FOR sql_stmt USING my_job;   LOOP      FETCH emp_cv INTO emp_rec;      EXIT WHEN emp_cv%NOTFOUND;      -- process record   END LOOP;   CLOSE emp_cv;END;`

The next example illustrates the use of objects and collections. Suppose you define object type `Person` and `VARRAY` type `Hobbies`, as follows:

`CREATE TYPE Person AS OBJECT (name VARCHAR2(25), age NUMBER);CREATE TYPE Hobbies IS VARRAY(10) OF VARCHAR2(25);`

Now, using dynamic SQL, you can write a package of procedures that uses these types, as follows:

`CREATE PACKAGE teams AS   PROCEDURE create_table (tab_name VARCHAR2);   PROCEDURE insert_row (tab_name VARCHAR2, p Person, h Hobbies);   PROCEDURE print_table (tab_name VARCHAR2);END;CREATE PACKAGE BODY teams AS   PROCEDURE create_table (tab_name VARCHAR2) IS   BEGIN      EXECUTE IMMEDIATE 'CREATE TABLE ' || tab_name ||         ' (pers Person, hobbs Hobbies)';   END;   PROCEDURE insert_row (      tab_name VARCHAR2,      p Person,      h Hobbies) IS   BEGIN      EXECUTE IMMEDIATE 'INSERT INTO ' || tab_name ||         ' VALUES (:1, :2)' USING p, h;   END;   PROCEDURE print_table (tab_name VARCHAR2) IS      TYPE RefCurTyp IS REF CURSOR;      cv RefCurTyp;      p  Person;      h  Hobbies;   BEGIN      OPEN cv FOR 'SELECT pers, hobbs FROM ' || tab_name;      LOOP         FETCH cv INTO p, h;         EXIT WHEN cv%NOTFOUND;         -- print attributes of 'p' and elements of 'h'      END LOOP;      CLOSE cv;   END;END;`

From an anonymous PL/SQL block, you might call the procedures in package `teams`, as follows:

`DECLARE   team_name VARCHAR2(15);   ...BEGIN   ...   team_name := 'Notables';   teams.create_table(team_name);   teams.insert_row(team_name, Person('John', 31),      Hobbies('skiing', 'coin collecting', 'tennis'));   teams.insert_row(team_name, Person('Mary', 28),      Hobbies('golf', 'quilting', 'rock climbing'));   teams.print_table(team_name);END;`

## Using Bulk Dynamic SQL

In this section, you learn how to add the power of bulk binding to dynamic SQL. Bulk binding improves performance by minimizing the number of context switches between the PL/SQL and SQL engines. With bulk binding, entire collections, not just individual elements, are passed back and forth.

Using the following commands, clauses, and cursor attribute, your applications can construct bulk SQL statements, then execute them dynamically at run time:

`BULK` `FETCH` statement
`BULK` `EXECUTE` `IMMEDIATE` statement
`FORALL` statement
`COLLECT` `INTO` clause
`RETURNING` `INTO` clause
`%BULK_ROWCOUNT` cursor attribute

The static versions of these statements, clauses, and cursor attribute are discussed in "Reducing Loop Overhead for Collections with Bulk Binds". Refer to that section for background information.

### Syntax for Dynamic Bulk Binds

Bulk binding lets Oracle bind a variable in a SQL statement to a collection of values. The collection type can be any PL/SQL collection type (index-by table, nested table, or varray). However, the collection elements must have a SQL datatype such as `CHAR`, `DATE`, or `NUMBER`. Three statements support dynamic bulk binds: `EXECUTE` `IMMEDIATE`, `FETCH`, and `FORALL`.

#### Bulk EXECUTE IMMEDIATE

This statement lets you bulk-bind define variables or `OUT` bind arguments passed as parameters to a dynamic SQL statement. The syntax follows:

`EXECUTE IMMEDIATE dynamic_string   [[BULK COLLECT] INTO define_variable[, define_variable ...]]   [USING bind_argument[, bind_argument ...]]   [{RETURNING | RETURN}   BULK COLLECT INTO bind_argument[, bind_argument ...]];`

With a dynamic multi-row query, you can use the `BULK` `COLLECT` `INTO` clause to bind define variables. The values in each column are stored in a collection.

With a dynamic `INSERT`, `UPDATE`, or `DELETE` statement that returns multiple rows, you can use the `RETURNING` `BULK` `COLLECT` `INTO` clause to bulk-bind output variables. The returned rows of values are stored in a set of collections.

#### Bulk FETCH

This statement lets you fetch from a dynamic cursor the same way you fetch from a static cursor. The syntax follows:

`FETCH dynamic_cursor   BULK COLLECT INTO define_variable[, define_variable ...];`

If the number of define variables in the `BULK` `COLLECT` `INTO` list exceeds the number of columns in the query select-list, Oracle generates an error.

#### Bulk FORALL

This statement lets you bulk-bind input variables in a dynamic SQL statement. In addition, you can use the `EXECUTE` `IMMEDIATE` statement inside a `FORALL` loop. The syntax follows:

`FORALL index IN lower bound..upper bound   EXECUTE IMMEDIATE dynamic_string   USING bind_argument | bind_argument(index)      [, bind_argument | bind_argument(index)] ...   [{RETURNING | RETURN} BULK COLLECT      INTO bind_argument[, bind_argument ... ]];`

The dynamic string must represent an `INSERT`, `UPDATE`, or `DELETE` statement (not a `SELECT` statement).

### Examples of Dynamic Bulk Binds

You can bind define variables in a dynamic query using the `BULK` `COLLECT` `INTO` clause. As the following example shows, you can use that clause in a bulk `FETCH` or bulk `EXECUTE` `IMMEDIATE` statement:

`DECLARE   TYPE EmpCurTyp IS REF CURSOR;   TYPE NumList IS TABLE OF NUMBER;   TYPE NameList IS TABLE OF VARCHAR2(15);   emp_cv EmpCurTyp;   empnos NumList;   enames NameList;   sals   NumList;BEGIN   OPEN emp_cv FOR 'SELECT empno, ename FROM emp';   FETCH emp_cv BULK COLLECT INTO empnos, enames;   CLOSE emp_cv;   EXECUTE IMMEDIATE 'SELECT sal FROM emp'      BULK COLLECT INTO sals;END;`

Only the `INSERT`, `UPDATE`, and `DELETE` statements can have output bind variables. To bulk-bind them, you use the `BULK` `RETURNING` `INTO` clause, which can appear only in an `EXECUTE` `IMMEDIATE`. An example follows:

`DECLARE   TYPE NameList IS TABLE OF VARCHAR2(15);   enames    NameList;   bonus_amt NUMBER := 500;   sql_stmt  VARCHAR(200);BEGIN   sql_stmt := 'UPDATE emp SET bonus = :1 RETURNING ename INTO :2';   EXECUTE IMMEDIATE sql_stmt      USING bonus_amt RETURNING BULK COLLECT INTO enames;END;`

To bind the input variables in a SQL statement, you can use the `FORALL` statement and `USING` clause, as shown below. However, the SQL statement cannot be a query.

`DECLARE   TYPE NumList IS TABLE OF NUMBER;   TYPE NameList IS TABLE OF VARCHAR2(15);   empnos NumList;   enames NameList;BEGIN   empnos := NumList(1,2,3,4,5);   FORALL i IN 1..5      EXECUTE IMMEDIATE        'UPDATE emp SET sal = sal * 1.1 WHERE empno = :1         RETURNING ename INTO :2'         USING empnos(i) RETURNING BULK COLLECT INTO enames;   ...END;`

## Tips and Traps for Dynamic SQL

This section shows you how to take full advantage of dynamic SQL and how to avoid some common pitfalls.

### Improving Performance

In the example below, Oracle opens a different cursor for each distinct value of `emp_id`. This can lead to resource contention and poor performance.

`CREATE PROCEDURE fire_employee (emp_id NUMBER) ASBEGIN   EXECUTE IMMEDIATE      'DELETE FROM emp WHERE empno = ' || TO_CHAR(emp_id);END;`

You can improve performance by using a bind variable, as shown below. This allows Oracle to reuse the same cursor for different values of `emp_id`.

`CREATE PROCEDURE fire_employee (emp_id NUMBER) ASBEGIN   EXECUTE IMMEDIATE      'DELETE FROM emp WHERE empno = :num' USING emp_id;END;`

### Making Procedures Work on Arbitrarily Named Schema Objects

Suppose you need a procedure that accepts the name of any database table, then drops that table from your schema. Using dynamic SQL, you might write the following standalone procedure:

`CREATE PROCEDURE drop_table (table_name IN VARCHAR2) ASBEGIN   EXECUTE IMMEDIATE 'DROP TABLE :tab' USING table_name;END;`

However, at run time, this procedure fails with an invalid table name error. That is because you cannot use bind arguments to pass the names of schema objects to a dynamic SQL statement. Instead, you must embed parameters in the dynamic string, then pass the names of schema objects to those parameters.

To debug the last example, you must revise the `EXECUTE` `IMMEDIATE` statement. Instead of using a placeholder and bind argument, you embed parameter `table_name` in the dynamic string, as follows:

`CREATE PROCEDURE drop_table (table_name IN VARCHAR2) ASBEGIN   EXECUTE IMMEDIATE 'DROP TABLE ' || table_name;END;`

Now, you can pass the name of any database table to the dynamic SQL statement.

### Using Duplicate Placeholders

Placeholders in a dynamic SQL statement are associated with bind arguments in the `USING` clause by position, not by name. So, if the same placeholder appears two or more times in the SQL statement, each appearance must correspond to a bind argument in the `USING` clause. For example, given the dynamic string

`sql_stmt := 'INSERT INTO payroll VALUES (:x, :x, :y, :x)';`

you might code the corresponding `USING` clause as follows:

`EXECUTE IMMEDIATE sql_stmt USING a, a, b, a;`

However, only the unique placeholders in a dynamic PL/SQL block are associated with bind arguments in the `USING` clause by position. So, if the same placeholder appears two or more times in a PL/SQL block, all appearances correspond to one bind argument in the `USING` clause. In the example below, the first unique placeholder (`x`) is associated with the first bind argument (`a`). Likewise, the second unique placeholder (`y`) is associated with the second bind argument (`b`).

`DECLARE   a NUMBER := 4;   b NUMBER := 7;BEGIN   plsql_block := 'BEGIN calc_stats(:x, :x, :y, :x); END;'   EXECUTE IMMEDIATE plsql_block USING a, b;   ...END;`

### Using Cursor Attributes

Every explicit cursor has four attributes: `%FOUND`, `%ISOPEN`, `%NOTFOUND`, and `%ROWCOUNT`. When appended to the cursor name, they return useful information about the execution of static and dynamic SQL statements.

To process SQL data manipulation statements, Oracle opens an implicit cursor named `SQL`. Its attributes return information about the most recently executed `INSERT`, `UPDATE`, `DELETE`, or single-row `SELECT` statement. For example, the following standalone function uses `%ROWCOUNT` to return the number of rows deleted from a database table:

`CREATE FUNCTION rows_deleted (   table_name IN VARCHAR2,   condition IN VARCHAR2) RETURN INTEGER ASBEGIN   EXECUTE IMMEDIATE      'DELETE FROM ' || table_name || ' WHERE ' || condition;   RETURN SQL%ROWCOUNT;  -- return number of rows deletedEND;`

Likewise, when appended to a cursor variable name, the cursor attributes return information about the execution of a multi-row query. For more information about cursor attributes, see "Using Cursor Attributes".

### Passing Nulls

Suppose you want to pass nulls to a dynamic SQL statement. For example, you might write the following `EXECUTE` `IMMEDIATE` statement:

`EXECUTE IMMEDIATE 'UPDATE emp SET comm = :x' USING NULL;`

However, this statement fails with a bad expression error because the literal `NULL` is not allowed in the `USING` clause. To work around this restriction, simply replace the keyword `NULL` with an uninitialized variable, as follows:

`DECLARE   a_null CHAR(1); -- set to NULL automatically at run timeBEGIN   EXECUTE IMMEDIATE 'UPDATE emp SET comm = :x' USING a_null;END;`

### Doing Remote Operations

As the following example shows, PL/SQL subprograms can execute dynamic SQL statements that refer to objects on a remote database:

`PROCEDURE delete_dept (db_link VARCHAR2, dept_id INTEGER) ISBEGIN   EXECUTE IMMEDIATE 'DELETE FROM dept@' || db_link ||      ' WHERE deptno = :num' USING dept_id;END;`

Also, the targets of remote procedure calls (RPCs) can contain dynamic SQL statements. For example, suppose the following standalone function, which returns the number of rows in a table, resides on the Chicago database:

`CREATE FUNCTION row_count (tab_name VARCHAR2) RETURN INTEGER AS   rows INTEGER;BEGIN   EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM ' || tab_name INTO rows;   RETURN rows;END;`

From an anonymous block, you might call the function remotely, as follows:

`DECLARE   emp_count INTEGER;BEGIN   emp_count := row_count@chicago('emp');`

### Using Invoker Rights

By default, a stored procedure executes with the privileges of its definer, not its invoker. Such procedures are bound to the schema in which they reside. For example, assume that the following standalone procedure, which can drop any kind of database object, resides in schema `scott`:

`CREATE PROCEDURE drop_it (kind IN VARCHAR2, name IN VARCHAR2) ASBEGIN   EXECUTE IMMEDIATE 'DROP ' || kind || ' ' || name;END;`

Also assume that user `jones` has been granted the `EXECUTE` privilege on this procedure. When user `jones` calls `drop_it`, as follows, the dynamic `DROP` statement executes with the privileges of user `scott`:

`SQL> CALL drop_it('TABLE', 'dept');`

Also, the unqualified reference to table `dept` is resolved in schema `scott`. So, the procedure drops the table from schema `scott`, not from schema `jones`.

However, the `AUTHID` clause enables a stored procedure to execute with the privileges of its invoker (current user). Such procedures are not bound to a particular schema. For example, the following version of `drop_it` executes with the privileges of its invoker:

`CREATE PROCEDURE drop_it (kind IN VARCHAR2, name IN VARCHAR2)   AUTHID CURRENT_USER ASBEGIN   EXECUTE IMMEDIATE 'DROP ' || kind || ' ' || name;END;`

Also, the unqualified reference to the database object is resolved in the schema of the invoker. For details, see "Invoker Rights Versus Definer Rights".

### Using Pragma RESTRICT_REFERENCES

A function called from SQL statements must obey certain rules meant to control side effects. (See "Controlling Side Effects of PL/SQL Subprograms".) To check for violations of the rules, you can use the pragma `RESTRICT_REFERENCES`. The pragma asserts that a function does not read and/or write database tables and/or package variables. (For more information, See Oracle9i Application Developer's Guide - Fundamentals.)

However, if the function body contains a dynamic `INSERT`, `UPDATE`, or `DELETE` statement, the function always violates the rules "write no database state" (`WNDS`) and "read no database state" (`RNDS`). That is because dynamic SQL statements are checked at run time, not at compile time. In an `EXECUTE` `IMMEDIATE` statement, only the `INTO` clause can be checked at compile time for violations of `RNDS`.

In a few situations, executing a SQL data definition statement results in a deadlock. For example, the procedure below causes a deadlock because it attempts to drop itself. To avoid deadlocks, never try to `ALTER` or `DROP` a subprogram or package while you are still using it.

`CREATE PROCEDURE calc_bonus (emp_id NUMBER) ASBEGIN   ...   EXECUTE IMMEDIATE 'DROP PROCEDURE calc_bonus';`

# Supporting Multilingual Databases with Unicode

This chapter illustrates how to use Unicode in an Oracle database environment. It includes the following topics:

## Overview of Unicode

Dealing with many different languages in the same application or database has been complicated and difficult for a long time. To overcome the limitations of existing character encodings, several organizations began working on the creation of a global character set in the late 1980s. The need for this became even greater with the development of the World Wide Web in the mid-1990s. The Internet has changed how companies do business, with an emphasis on the global market that has made a universal character set a major requirement. A global character set needs to fulfill the following conditions:

• Contain all major living scripts

• Support legacy data and implementations

• Be simple enough that a single implementation of an application is sufficient for worldwide use

A global character set should also have the following capabilities:

• Support multilingual users and organizations

• Conform to international standards

• Enable worldwide interchange of data

This global character set exists, is in wide use, and is called Unicode.

## What is Unicode?

Unicode is a universal encoded character set that enables information from any language to be stored using a single character set. Unicode provides a unique code value for every character, regardless of the platform, program, or language.

The Unicode standard has been adopted by many software and hardware vendors. Many operating systems and browsers now support Unicode. Unicode is required by standards such as XML, Java, JavaScript, LDAP, and WML. It is also synchronized with the ISO/IEC 10646 standard.

Oracle Corporation started supporting Unicode as a database character set in version 7. In Oracle Database 10g, Unicode support has been expanded. Oracle Database 10g supports Unicode 3.2.

 See Also: `http://www.unicode.org` for more information about the Unicode standard

This section contains the following topics:

### Supplementary Characters

The first version of Unicode was a 16-bit, fixed-width encoding that used two bytes to encode each character. This allowed 65,536 characters to be represented. However, more characters need to be supported, especially additional CJK ideographs that are important for the Chinese, Japanese, and Korean markets.

Unicode 3.2 defines supplementary characters to meet this need. It uses two 16-bit code points (also known as supplementary characters) to represent a single character. This enables an additional 1,048,576 characters to be defined. The Unicode 3.2 standard defines 45,960 supplementary characters.

Adding supplementary characters increases the complexity of Unicode, but it is less complex than managing several different encodings in the same configuration.

### Unicode Encodings

Unicode 3.2 encodes characters in different ways: UTF-8, UCS-2, and UTF-16. Conversion between different Unicode encodings is a simple bit-wise operation that is defined in the Unicode standard.

This section contains the following topics:

#### UTF-8 Encoding

UTF-8 is the 8-bit encoding of Unicode. It is a variable-width encoding and a strict superset of ASCII. This means that each and every character in the ASCII character set is available in UTF-8 with the same code point values. One Unicode character can be 1 byte, 2 bytes, 3 bytes, or 4 bytes in UTF-8 encoding. Characters from the European scripts are represented in either 1 or 2 bytes. Characters from most Asian scripts are represented in 3 bytes. Supplementary characters are represented in 4 bytes.

UTF-8 is the Unicode encoding supported on UNIX platforms and used for HTML and most Internet browsers. Other environments such as Windows and Java use UCS-2 encoding.

The benefits of UTF-8 are as follows:

• Compact storage requirement for European scripts because it is a strict superset of ASCII

• Ease of migration between ASCII-based characters sets and UTF-8

#### UCS-2 Encoding

UCS-2 is a fixed-width, 16-bit encoding. Each character is 2 bytes. UCS-2 is the Unicode encoding used by Java and Microsoft Windows NT 4.0. UCS-2 supports characters defined for Unicode 3.0, so there is no support for supplementary characters.

The benefits of UCS-2 over UTF-8 are as follows:

• More compact storage for Asian scripts because all characters are two bytes

• Faster string processing because characters are fixed-width

• Better compatibility with Java and Microsoft clients

#### UTF-16 Encoding

UTF-16 encoding is the 16-bit encoding of Unicode. UTF-16 is an extension of UCS-2 because it supports the supplementary characters that are defined in Unicode 3.2 by using two UCS-2 code points for each supplementary character. UTF-16 is a strict superset of UCS-2.

One character can be either 2 bytes or 4 bytes in UTF-16. Characters from European and most Asian scripts are represented in 2 bytes. Supplementary characters are represented in 4 bytes. UTF-16 is the main Unicode encoding used by Microsoft Windows 2000.

The benefits of UTF-16 over UTF-8 are as follows:

• More compact storage for Asian scripts because most of the commonly used Asian characters are represented in two bytes.

• Better compatibility with Java and Microsoft clients

#### Examples: UTF-16, UTF-8, and UCS-2 Encoding

Figure 6-1 shows some characters and their character codes in UTF-16, UTF-8, and UCS-2 encoding. The last character is a treble clef (a music symbol), a supplementary character that has been added to the Unicode 3.2 standard.

### Oracle's Support for Unicode

Oracle Corporation started supporting Unicode as a database character set in version 7. Table 6-1 summarizes the Unicode character sets supported by the Oracle database server.

Table 6-1 Unicode Character Sets Supported by the Oracle Database Server

Character Set Supported in RDBMS Release Unicode Encoding Unicode Version Database Character Set National Character Set
AL24UTFFSS 7.2 - 8i UTF-8 1.1 Yes No
UTF8 8.0 - 10g UTF-8 For Oracle release 8.0 through Oracle8i release 8.1.6: 2.1

For Oracle8i release 8.1.7 and later: 3.0

Yes Yes (Oracle9i and Oracle Database 10g only)
UTFE 8.0 - 10g UTF-EBCDIC For Oracle8i releases 8.0 through 8.1.6: 2.1

For Oracle8i release 8.1.7 and later: 3.0

Yes No
AL32UTF8 9i - 10g UTF-8 Oracle9i, Release 1: 3.0

Oracle9i, Release 2: 3.1

Oracle Database 10g, Release 1: 3.2

Yes No
AL16UTF16 9i - 10g UTF-16 Oracle9i, Release 1: 3.0

Oracle9i, Release 2: 3.1

Oracle Database 10g, Release 1: 3.2

No Yes

## Implementing a Unicode Solution in the Database

You can store Unicode characters in an Oracle database in two ways.

You can create a Unicode database that enables you to store UTF-8 encoded characters as SQL `CHAR` datatypes (`CHAR`, `VARCHAR2`, `CLOB`, and `LONG)`.

If you prefer to implement Unicode support incrementally or if you need to support multilingual data only in certain columns, then you can store Unicode data in either the UTF-16 or UTF-8 encoding form in SQL `NCHAR` datatypes (`NCHAR`, `NVARCHAR2`, and `NCLOB`). The SQL `NCHAR` datatypes are called Unicode datatypes because they are used only for storing Unicode data.

 Note: You can combine a Unicode database solution with a Unicode datatype solution.

The following sections explain how to use the two Unicode solutions and how to choose between them:

### Enabling Multilingual Support with Unicode Databases

The database character set specifies the encoding to be used in the SQL `CHAR` datatypes as well as the metadata such as table names, column names, and SQL statements. A Unicode database is a database with a UTF-8 character set as the database character set. There are three Oracle character sets that implement the UTF-8 encoding. The first two are designed for ASCII-based platforms while the third one should be used on EBCDIC platforms.

• AL32UTF8

The AL32UTF8 character set supports the latest version of the Unicode standard. It encodes characters in one, two, or three bytes. Supplementary characters require four bytes. It is for ASCII-based platforms.

• UTF8

The UTF8 character set encodes characters in one, two, or three bytes. It is for ASCII-based platforms.

The UTF8 character set has supported Unicode 3.0 since Oracle8i release 8.1.7 and will continue to support Unicode 3.0 in future releases of the Oracle database server. Although specific supplementary characters were not assigned code points in Unicode until version 3.1, the code point range was allocated for supplementary characters in Unicode 3.0. If supplementary characters are inserted into a UTF8 database, then it does not corrupt the data in the database. The supplementary characters are treated as two separate, user-defined characters that occupy 6 bytes. Oracle Corporation recommends that you switch to AL32UTF8 for full support of supplementary characters in the database character set.

• UTFE

The UTFE character set is for EBCDIC platforms. It is similar to UTF8 on ASCII platforms, but it encodes characters in one, two, three, and four bytes. Supplementary characters are converted as two 4-byte characters.

Example 6-1 Creating a Database with a Unicode Character Set

To create a database with the AL32UTF8 character set, use the `CREATE` `DATABASE` statement and include the `CHARACTER SET AL32UTF8` clause. For example:

`CREATE DATABASE sample`
`CONTROLFILE REUSELOGFILE`
`GROUP 1 ('diskx:log1.log', 'disky:log1.log') SIZE 50K,GROUP 2 ('diskx:log2.log', 'disky:log2.log') SIZE 50K`
`MAXLOGFILES 5MAXLOGHISTORY 100MAXDATAFILES 10MAXINSTANCES 2ARCHIVELOGCHARACTER SET AL32UTF8NATIONAL CHARACTER SET AL16UTF16DATAFILE`
`'disk1:df1.dbf' AUTOEXTEND ON,'disk2:df2.dbf' AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED`
`DEFAULT TEMPORARY TABLESPACE temp_tsUNDO TABLESPACE undo_tsSET TIME_ZONE = '+02:00';`

 Note: Specify the database character set when you create the database.

### Enabling Multilingual Support with Unicode Datatypes

An alternative to storing Unicode data in the database is to use the SQL `NCHAR` datatypes (`NCHAR`, `NVARCHAR`, `NCLOB`). You can store Unicode characters into columns of these datatypes regardless of how the database character set has been defined. The `NCHAR` datatype is a Unicode datatype exclusively. In other words, it stores data encoded as Unicode.

In releases before Oracle9i, the `NCHAR` datatype supported fixed-width Asian character sets that were designed to provide higher performance. Examples of fixed-width character sets are JA16SJISFIXED and ZHT32EUCFIXED. No Unicode character set was supported as the national character set before Oracle9i.

You can create a table using the `NVARCHAR2` and `NCHAR` datatypes. The column length specified for the `NCHAR` and `NVARCHAR2` columns is always the number of characters instead of the number of bytes:

`CREATE TABLE product_information   ( product_id          NUMBER(6)   , product_name        NVARCHAR2(100)   , product_description VARCHAR2(1000));`

The encoding used in the SQL `NCHAR` datatypes is the national character set specified for the database. You can specify one of the following Oracle character sets as the national character set:

• AL16UTF16

This is the default character set for SQL `NCHAR` datatypes. The character set encodes Unicode data in the UTF-16 encoding. It supports supplementary characters, which are stored as four bytes.

• UTF8

When UTF8 is specified for SQL `NCHAR` datatypes, the data stored in the SQL datatypes is in UTF-8 encoding.

You can specify the national character set for the SQL `NCHAR` datatypes when you create a database using the `CREATE` `DATABASE` statement with the `NATIONAL CHARACTER SET` clause. The following statement creates a database with WE8ISO8859P1 as the database character set and AL16UTF16 as the national character set.

Example 6-2 Creating a Database with a National Character Set

`CREATE DATABASE sample`
`CONTROLFILE REUSELOGFILE`
`GROUP 1 ('diskx:log1.log', 'disky:log1.log') SIZE 50K,GROUP 2 ('diskx:log2.log', 'disky:log2.log') SIZE 50K`
`MAXLOGFILES 5MAXLOGHISTORY 100MAXDATAFILES 10MAXINSTANCES 2ARCHIVELOGCHARACTER SET WE8ISO8859P1NATIONAL CHARACTER SET AL16UTF16DATAFILE`
`'disk1:df1.dbf' AUTOEXTEND ON,'disk2:df2.dbf' AUTOEXTEND ON NEXT 10M MAXSIZE UNLIMITED`
`DEFAULT TEMPORARY TABLESPACE temp_tsUNDO TABLESPACE undo_tsSET TIME_ZONE = '+02:00';`

### How to Choose Between a Unicode Database and a Unicode Datatype Solution

To choose the right Unicode solution for your database, consider the following questions:

• Programming environment: What are the main programming languages used in your applications? How do they support Unicode?

• Ease of migration: How easily can your data and applications be migrated to take advantage of the Unicode solution?

• Performance: How much performance overhead are you willing to accept in order to use Unicode in the database?

• Type of data: Is your data mostly Asian or European? Do you need to store multilingual documents into `LOB` columns?

• Type of applications: What type of applications are you implementing: a packaged application or a customized end-user application?

This section describes some general guidelines for choosing a Unicode database or a Unicode datatype solution. The final decision largely depends on your exact environment and requirements. This section contains the following topics:

#### When Should You Use a Unicode Database?

Use a Unicode database in the situations described in Table 6-2.

Table 6-2 Using a Unicode Database

Situation Explanation
You need easy code migration for Java or PL/SQL. If your existing application is mainly written in Java and PL/SQL and your main concern is to minimize the code changes required to support multiple languages, then you may want to use a Unicode database solution. If the datatypes used to stored data remain as SQL `CHAR` datatypes, then the Java and PL/SQL code that accesses these columns does not need to change.
You have evenly distributed multilingual data. If the multilingual data is evenly distributed in existing schema tables and you are not sure which tables contain multilingual data, then you should use a Unicode database because it does not require you to identify the kind of data that is stored in each column.
Your SQL statements and PL/SQL code contain Unicode data. You must use a Unicode database. SQL statements and PL/SQL code are converted into the database character set before being processed. If the SQL statements and PL/SQL code contain characters that cannot be converted to the database character set, then those characters are lost. A common place to use Unicode data in a SQL statement is in a string literal.
You want to store multilingual documents in `BLOB` format and use Oracle Text for content searching. You must use a Unicode database. The `BLOB` data is converted to the database character set before being indexed by Oracle Text. If your database character set is not UTF8, then data are lost when the documents contain characters that cannot be converted to the database character set.

#### When Should You Use Unicode Datatypes?

Use Unicode datatypes in the situations described in Table 6-3.

Table 6-3 Using Unicode Datatypes

Situation Explanation
You want to add multilingual support incrementally. If you want to add Unicode support to the existing database without migrating the character set, then consider using Unicode datatypes to store Unicode data. You can add columns of the SQL `NCHAR` datatypes to existing tables or new tables to support multiple languages incrementally.
You want to build a packaged application. If you are building a packaged application to sell to customers, then you may want to build the application using SQL `NCHAR` datatypes. The SQL `NCHAR` datatype is a reliable Unicode datatype in which the data is always stored in Unicode, and the length of the data is always specified in UTF-16 code units. As a result, you need to test the application only once. The application will run on customer databases with any database character set.
You want better performance with single-byte database character sets. If performance is your main concern, then consider using a single-byte database character set and storing Unicode data in the SQL `NCHAR` datatypes. Databases that use a multibyte database character set such as UTF8 have a performance overhead.
You require UTF-16 support in Windows clients. If your applications are written in Visual C/C++ or Visual Basic running on Windows, then you may want to use the SQL `NCHAR` datatypes. You can store UTF-16 data in SQL `NCHAR` datatypes in the same way that you store it in the `wchar_t` buffer in Visual C/C++ and `string` buffer in Visual Basic. You can avoid buffer overflow in client applications because the length of the `wchar_t` and `string` datatypes match the length of the SQL `NCHAR` datatypes in the database.

 Note: You can use a Unicode database with Unicode datatypes.

### Comparing Unicode Character Sets for Database and Datatype Solutions

Oracle provides two solutions to store Unicode characters in the database: a Unicode database solution and a Unicode datatype solution. After you select the Unicode database solution, the Unicode datatype solution or a combination of both, determine the character set to be used in the Unicode database or the Unicode datatype.

Table 6-4 contains advantages and disadvantages of different character sets for a Unicode database solution. The Oracle character sets that can be Unicode database character sets are AL32UTF8, UTF8, and UTFE.

Table 6-4 Character Set Advantages and Disadvantages for a Unicode Database Solution

AL32UTF8
• Supplementary characters are stored in 4 bytes, there is no data conversion when supplementary characters are retrieved and inserted if the client setting is UTF-8.
• The storage for supplementary characters requires less disk space in AL32UTF8 than in UTF8.

• You cannot specify the length of SQL `CHAR` types in number of UCS-2 code points for supplementary characters. Supplementary characters are treated as one code point rather than the standard two code points.
• The binary order for SQL `CHAR` columns is different from the binary order of SQL `NCHAR` columns when the data consists of supplementary characters. As a result, `CHAR` columns and `NCHAR` columns do not always have the same sort for identical strings.

UTF8
• You can specify the length of SQL `CHAR` types in number of UCS-2 code points.
• The binary order of the SQL `CHAR` columns is always the same as the binary order of the SQL `NCHAR` columns when the data consists of the same supplementary characters. As a result, `CHAR` columns and `NCHAR` columns have the same sort for identical strings.

• Supplementary characters are stored as 6 bytes instead of the 4 bytes defined by Unicode 3.2. As a result, Oracle has to convert data for supplementary characters if the client setting is UTF-8.
UTFE
• This is the only Unicode character set for the EBCDIC platform.
• You can specify the length of SQL `CHAR` types in number of UCS-2 code points.

• The binary order of the SQL `CHAR` columns is always the same as the binary order of the SQL `NCHAR` columns when the data consists of the same supplementary characters. As a result, `CHAR` columns and `NCHAR` columns have the same sort for identical strings.

• Supplementary character are stored as 8 bytes (two 4-byte sequences) instead of the 5 bytes defined by the Unicode standard. As a result, Oracle has to convert data for those supplementary characters.
• UTFE is not a standard encoding in the Unicode standard. As a result, clients requiring standard UTF-8 encoding must convert data from UTFE to the standard encoding when data is retrieved and inserted.

Table 6-5 contains advantages and disadvantages of different character sets for a Unicode datatype solution. The Oracle character sets that can be national character sets are AL16UTF16 and UTF8. The default is AL16UTF16.

Table 6-5 Character Set Advantages and Disadvantages for a Unicode Datatype Solution

AL16UTF16
• Asian data in AL16UTF16 is usually more compact than in UTF8. As a result, you save disk space and have less disk I/O when most of the multilingual data stored in the database is Asian data.
• It is usually faster to process strings encoded in the AL16UTF16 character set than strings encoded in UTF8 because Oracle processes most characters in an AL16UTF16 encoded string as fixed-width characters.

• The maximum length limits for the `NCHAR` and `NVARCHAR2` columns are 1000 and 2000 characters, respectively. Because the data is fixed-width, the lengths are guaranteed.

• European ASCII data requires more disk space to store in AL16UTF16 than in UTF8. If most of your data is European data, then it uses more disk space than if it were UTF8 data.
• The maximum lengths for `NCHAR` and `NVARCHAR2` are 1000 and 2000 characters, which is less than the lengths for `NCHAR` (2000) and `NVARCHAR2` (4000) in UTF8.

UTF8
• European data in UTF8 is usually more compact than in AL16UTF16. As a result, you save disk space and have better response time when most of the multilingual data stored in the database is European data.
• The maximum lengths for the `NCHAR` and `NVARCHAR2` columns are 2000 and 4000 characters respectively, which is more than those for `NCHAR` (1000) and `NVARCHAR2` (2000) in AL16UTF16. Although the maximum lengths of the `NCHAR` and `NVARCHAR2` columns are larger in UTF8, the actual storage size is still bound by the byte limits of 2000 and 4000 bytes, respectively. For example, you can store 4000 UTF8 characters in an `NVARCHAR2` column if all the characters are single byte, but only 4000/3 characters if all the characters are three bytes.

• Asian data requires more disk space to store in UTF8 than in AL16UTF16. If most of your data is Asian data, then disk space usage is not less efficient than when the character set is AL16UTF16.
• Although you can specify larger length limits for `NCHAR` and `NVARCHAR`, you are not guaranteed to be able to insert the number of characters specified by these limits. This is because UTF8 allows variable-width characters.

• It is usually slower to process strings encoded in UTF8 than strings encoded in AL16UTF16 because UTF8 encoded strings consist of variable-width characters.

## Unicode Case Studies

This section describes typical scenarios for storing Unicode characters in an Oracle database:

Example 6-3 Unicode Solution with a Unicode Database

An American company running a Java application would like to add German and French support in the next release of the application. They would like to add Japanese support at a later time. The company currently has the following system configuration:

• The existing database has a database character set of US7ASCII.

• All character data in the existing database is composed of ASCII characters.

• PL/SQL stored procedures are used in the database.

• The database is around 300 GB.

• There is a nightly downtime of 4 hours.

In this case, a typical solution is to choose UTF8 for the database character set because of the following reasons:

• The database is very large and the scheduled downtime is short. Fast migration of the database to Unicode is vital. Because the database is in US7ASCII, the easiest and fastest way of enabling the database to support Unicode is to switch the database character set to UTF8 by issuing the `ALTER` `DATABASE` statement. No data conversion is required because US7ASCII is a subset of UTF8.

• Because most of the code is written in Java and PL/SQL, changing the database character set to UTF8 is unlikely to break existing code. Unicode support is automatically enabled in the application.

• Because the application supports French, German, and Japanese, there are few supplementary characters. Both AL32UTF8 and UTF8 are suitable.

Example 6-4 Unicode Solution with Unicode Datatypes

A European company that runs its applications mainly on Windows platforms wants to add new Windows applications written in Visual C/C++. The new applications will use the existing database to support Japanese and Chinese customer names. The company currently has the following system configuration:

• The existing database has a database character set of WE8ISO8859P1.

• All character data in the existing database is composed of Western European characters.

• The database is around 50 GB.

A typical solution is take the following actions:

• Use `NCHAR` and `NVARCHAR2` datatypes to store Unicode characters

• Keep WE8ISO8859P1 as the database character set

• Use AL16UTF16 as the national character set

The reasons for this solution are:

• Migrating the existing database to a Unicode database required data conversion because the database character set is WE8ISO8859P1 (a Latin-1 character set), which is not a subset of UTF8. As a result, there would be some overhead in converting the data to UTF8.

• The additional languages are supported in new applications only. They do not depend on the existing applications or schemas. It is simpler to use the Unicode datatype in the new schema and keep the existing schemas unchanged.

• Only customer name columns require Unicode support. Using a single `NCHAR` column meets the customer's requirements without migrating the entire database.

• Because the languages to be supported are mostly Asian languages, AL16UTF16 should be used as the national character set so that disk space is used more efficiently.

• The lengths of the SQL `NCHAR` datatypes are defined as number of characters. This is the same as the way they are treated when using `wchar_t` strings in Windows C/C++ programs. This reduces programming complexity.

• Existing applications using the existing schemas are unaffected.

Example 6-5 Unicode Solution with a Unicode Database and Unicode Datatypes

A Japanese company wants to develop a new Java application. The company expects that the application will support as many languages as possible in the long run.

• In order to store documents as is, the company decided to use the `BLOB` datatype to store documents of multiple languages.

• The company may also want to generate UTF-8 XML documents from the relational data for business-to-business data exchange.

• The back-end has Windows applications written in C/C++ using ODBC to access the Oracle database.

In this case, the typical solution is to create a Unicode database using AL32UTF8 as the database character set and use the SQL `NCHAR` datatypes to store multilingual data. The national character set should be set to AL16UTF16. The reasons for this solution are as follows:

• When documents of different languages are stored `BLOB` format, Oracle Text requires the database character set to be one of the UTF-8 character sets. Because the applications may retrieve relational data as UTF-8 XML format (where supplementary characters are stored as four bytes), AL32UTF8 should be used as the database character set to avoid data conversion when UTF-8 data is retrieved or inserted.

• Because applications are new and written in both Java and Windows C/C++, the company should use the SQL `NCHAR` datatype for its relational data. Both Java and Windows support the UTF-16 character datatype, and the length of a character string is always measured in the number of characters.

• If most of the data is for Asian languages, then AL16UTF16 should be used with the SQL `NCHAR` datatypes because AL16UTF16 offers better performance and storage efficiency.

## Designing Database Schemas to Support Multiple Languages

In addition to choosing a Unicode solution, the following issues should be taken into consideration when the database schema is designed to support multiple languages:

### Specifying Column Lengths for Multilingual Data

When you use `NCHAR` and `NVARCHAR2` datatypes for storing multilingual data, the column size specified for a column is defined in number of characters. (The number of characters means the number of Unicode code units.) Table 6-6 shows the maximum size of the `NCHAR` and `NVARCHAR2` datatypes for the AL16UTF16 and UTF8 national character sets.

Table 6-6 Maximum Datatype Size

National Character Set Maximum Column Size of NCHAR Datatype Maximum Column Size of NVARCHAR2 Datatype
AL16UTF16 1000 characters 2000 characters
UTF8 2000 bytes 4000 bytes

When you use `CHAR` and `VARCHAR2` datatypes for storing multilingual data, the maximum length specified for each column is, by default, in number of bytes. If the database needs to support Thai, Arabic, or multibyte languages such as Chinese and Japanese, then the maximum lengths of the `CHAR`, `VARCHAR`, and `VARCHAR2` columns may need to be extended. This is because the number of bytes required to encode these languages in UTF8 or AL32UTF8 may be significantly larger than the number of bytes for encoding English and Western European languages. For example, one Thai character in the Thai character set requires 3 bytes in UTF8 or AL32UTF8. In addition, the maximum column lengths for `CHAR`, `VARCHAR`, and `VARCHAR2` datatypes are 2000 bytes, 4000 bytes, and 4000 bytes respectively. If applications need to store more than 4000 bytes, then they should use the `CLOB` datatype.

### Storing Data in Multiple Languages

The Unicode character set includes characters of most written languages around the world, but it does not contain information about the language to which a given character belongs. In other words, a character such as `ä` does not contain information about whether it is a French or German character. In order to provide information in the language a user desires, data stored in a Unicode database should accompany the language information to which the data belongs.

There are many ways for a database schema to relate data to a language. The following sections provide different approaches:

#### Store Language Information with the Data

For data such as product descriptions or product names, you can add a language column (`language_id`) of `CHAR` or `VARCHAR2` datatype to the product table to identify the language of the corresponding product information. This enables applications to retrieve the information in the desired language. The possible values for this language column are the 3-letter abbreviations of the valid `NLS_LANGUAGE` values of the database.

 See Also: Appendix A, "Locale Data" for a list of `NLS_LANGUAGE` values and their abbreviations

You can also create a view to select the data of the current language. For example:

`ALTER TABLE scott.product_information add (language_id VARCHAR2(50)):CREATE OR REPLACE VIEW product AS  SELECT product_id, product_name  FROM   product_information  WHERE  language_id = sys_context('USERENV','LANG');`

#### Select Translated Data Using Fine-Grained Access Control

Fine-grained access control enables you to limit the degree to which a user can view information in a table or view. Typically, this is done by appending a `WHERE` clause. when you add a `WHERE` clause as a fine-grained access policy to a table or view, Oracle automatically appends the `WHERE` clause to any SQL statements on the table at run time so that only those rows satisfying the `WHERE` clause can be accessed.

You can use this feature to avoid specifying the desired language of an user in the `WHERE` clause in every `SELECT` statement in your applications. The following `WHERE` clause limits the view of a table to the rows corresponding to the desired language of a user:

`WHERE language_id = sys_context('userenv', 'LANG')`

Specify this `WHERE` clause as a fine-grained access policy for `product_information` as follows:

`create function func1 ( sch varchar2 , obj varchar2 )return varchar2(100);beginreturn 'language_id = sys_context(''userenv'', ''LANG'')';end/DBMS_RLS.ADD_POLICY ('scott', 'product_information', 'lang_policy', 'scott', 'func1', 'select');`

Then any `SELECT` statement on the `product_information` table automatically appends the `WHERE` clause.

### Storing Documents in Multiple Languages in LOB Datatypes

You can store documents in multiple languages in `CLOB`, `NCLOB`, or `BLOB` datatypes and set up Oracle Text to enable content search for the documents.

Data in `CLOB` columns is stored in a format that is compatible with UCS-2 when the database character set is multibyte, such as UTF8 or AL32UTF8. This means that the storage space required for an English document doubles when the data is converted. Storage for an Asian language document in a `CLOB` column requires less storage space than the same document in a `LONG` column using UTF8, typically around 30% less, depending on the contents of the document.

Documents in `NCLOB` format are also stored in a proprietary format that is compatible with UCS-2 regardless of the database character set or national character set. The storage space requirement is the same as for `CLOB` data. Document contents are converted to UTF-16 when they are inserted into a `NCLOB` column. If you want to store multilingual documents in a non-Unicode database, then choose `NCLOB`. However, content search on `NCLOB` is not yet supported.

Documents in `BLOB` format are stored as they are. No data conversion occurs during insertion and retrieval. However, SQL string manipulation functions (such as `LENGTH` or `SUBSTR`) and collation functions (such as `NLS_SORT` and `ORDER BY`) cannot be applied to the `BLOB` datatype.

Table 6-7 lists the advantages and disadvantages of the `CLOB`, `NCLOB`, and `BLOB` datatypes when storing documents:

Table 6-7 Comparison of LOB Datatypes for Document Storage

`CLOB`
• Content search support
• String manipulation support

• Depends on database character set
• Data conversion is necessary for insertion

• Cannot store binary documents

`NCLOB`
• Independent of database character set
• String manipulation support

• No content search support
• Data conversion is necessary for insertion

• Cannot store binary documents

`BLOB`
• Independent of database character set
• Content search support

• No data conversion, data stored as is

• Can store binary documents such as Microsoft Word or Microsoft Excel

• No string manipulation support

### Creating Indexes for Searching Multilingual Document Contents

Oracle Text enables you to build indexes for content search on multilingual documents stored in `CLOB` format and `BLOB` format. It uses a language-specific lexer to parse the `CLOB` or `BLOB` data and produces a list of searchable keywords.

Create a multilexer to search multilingual documents. The multilexer chooses a language-specific lexer for each row, based on a language column. This section describe the high level steps to create indexes for documents in multiple languages. It contains the following topics:

#### Creating Multilexers

The first step in creating the multilexer is the creation of language-specific lexer preferences for each language supported. The following example creates English, German, and Japanese lexers with PL/SQL procedures:

`ctx_ddl.create_preference('english_lexer', 'basic_lexer');ctx_ddl.set_attribute('english_lexer','index_themes','yes');ctx_ddl.create_preference('german_lexer', 'basic_lexer');ctx_ddl.set_attribute('german_lexer','composite','german');ctx_ddl.set_attribute('german_lexer','alternate_spelling','german');ctx_ddl.set_attribute('german_lexer','mixed_case','yes');ctx_ddl.create_preference('japanese_lexer', 'JAPANESE_VGRAM_LEXER');`

After the language-specific lexer preferences are created, they need to be gathered together under a single multilexer preference. First, create the multilexer preference, using the `MULTI_LEXER` object:

`ctx_ddl.create_preference('global_lexer','multi_lexer');`

Now add the language-specific lexers to the multilexer preference using the `add_sub_lexer` call:

`ctx_ddl.add_sub_lexer('global_lexer', 'german', 'german_lexer');ctx_ddl.add_sub_lexer('global_lexer', 'japanese', 'japanese_lexer');ctx_ddl.add_sub_lexer('global_lexer', 'default','english_lexer');`

This nominates the `german_lexer` preference to handle German documents, the `japanese_lexer` preference to handle Japanese documents, and the `english_lexer` preference to handle everything else, using `DEFAULT` as the language.

#### Creating Indexes for Documents Stored in the CLOB Datatype

The multilexer decides which lexer to use for each row based on a language column in the table. This is a character column that stores the language of the document in a text column. Use the Oracle language name to identify the language of a document in this column. For example, if you use the `CLOB` datatype to store your documents, then add the language column to the table where the documents are stored:

`CREATE TABLE globaldoc (doc_id    NUMBER       PRIMARY KEY, language  VARCHAR2(30), text      CLOB);`

To create an index for this table, use the multilexer preference and specify the name of the language column:

`CREATE INDEX globalx ON globaldoc(text) indextype IS ctxsys.context parameters ('lexer              global_lexer              language              column              language');`

#### Creating Indexes for Documents Stored in the BLOB Datatype

In addition to the language column, the character set and format columns must be added in the table where the documents are stored. The character set column stores the character set of the documents using the Oracle character set names. The format column specifies whether a document is a text or binary document. For example, the `CREATE TABLE` statement can specify columns called `characterset` and `format`:

`CREATE TABLE globaldoc (  doc_id       NUMBER       PRIMARY KEY,  language     VARCHAR2(30),  characterset VARCHAR2(30),  format       VARCHAR2(10),  text         BLOB );`

You can put word-processing or spreadsheet documents into the table and specify `binary` in the `format` column. For documents in HTML, XML and text format, you can put them into the table and specify `text` in the `format` column.

Because there is a column in which to specify the character set, you can store text documents in different character sets.

When you create the index, specify the names of the format and character set columns:

`CREATE INDEX globalx ON globaldoc(text) indextype is ctxsys.context parameters ('filter inso_filter               lexer global_lexer               language column language               format  column format               charset column characterset');`

You can use the `charset_filter` if all documents are in text format. The `charset_filter` converts data from the character set specified in the `charset` column to the database character set.

### ODBC Driver Version

• Unicode support is only available when using v8.1.5.5 of the Oracle ODBC Driver or later
• Unicode support is not available when using the Microsoft ODBC Driver for Oracle

### Data Types

If your database is using a Unicode character set, char, varchar2 and clob data types will hold Unicode values.

Irrespective of your database character set nchar, nvarchar2 and nclob data types will hold Unicode values.

### Unicode Character Set Database

Full support of unicode in your Oracle database is only available if your Database Character Set has been set to a unicode character set, such as UTF8. The database character set is specified when your database is defined.

You can find out the database character set with:

`Select value from SYS.NLS_DATABASE_PARAMETERS where PARAMETER = 'NLS_CHARACTERSET'`

### ASCII Character Set Database

If your Oracle database has been defined to use an ascii character set (as opposed to a unicode character set), then you can still store unicode data in your Oracle database. However there are some limitations on what you can do.

1. you can display unicode data OK
2. you cannot run an SQL statement containing a unicode literal
3. you cannot supply a unicode value to a parameter in a parameterised query.

The reason for these limitations is that, when you pass SQL or a parameter to the database, it will convert the values to the character set of the database. If the database uses an Ascii character set, the unicode values will be converted to Ascii characters.

AQT has a circumvention for the point number 2 - this is discussed below.

Beause the Data Loader uses parameterised queries by default, it cannot load unicode values unless you use the Load using Insert Statements option.

### Unistr Function

Oracle has a function UNISTR which allows you to specify unicode character values. This allows you to insert or update unicode values in your database, even if your databases uses an Ascii character set.

The UNISTR function is not available in Oracle v8.

### Running SQL Statements with Unicode literals

If your database uses a unicode character set, you can run SQL statements that include unicode literals.

If your database uses an Ascii character set, you cannot run SQL statements that include unicode literals. To circumvent this, AQT has an option Unicode Options > For Oracle, use UNISTR for Unicode strings. When this is option is checked, AQT will scan all SQL being processed. If it finds any unicode string values, it will replace them with a unistr function:

Example:

if you run the following:

Select * from HR.TEST1 WHERE COL1 = 'ald'

AQT will actually run:

Select * from HR.TEST1 WHERE COL1 = unistr('\03B1\03BB\03B4')

This makes it very easy to use unicode data in an Oracle database that uses an Ascii characterset.

# Character Set Migration & Issues

This chapter discusses character set conversion and character set migration. It includes the following topics:

## Overview of Character Set Migration

Choosing the appropriate character set for your database is an important decision. When you choose the database character set, consider the following factors:

• The type of data you need to store
• The languages that the database needs to accommodate now and in the future
• The different size requirements of each character set and the corresponding performance implications

A related topic is choosing a new character set for an existing database. Changing the database character set for an existing database is called character set migration. Migrating from one database character set to another involves additional considerations beyond choosing a character set for a new database. Plan character set migration to minimize data loss from:

### Data Truncation

When the database is created using byte semantics, the sizes of the `CHAR` and `VARCHAR2` datatypes are specified in bytes, not characters. For example, the specification `CHAR(20)` in a table definition allows 20 bytes for storing character data. This is acceptable when the database character set uses a single-byte character encoding scheme because the number of characters is equivalent to the number of bytes. If the database character set uses a multibyte character encoding scheme, then the number of bytes no longer equals the number of characters because a character can consist of one or more bytes.

During migration to a new character set, it is important to verify the column widths of existing `CHAR` and `VARCHAR2` columns because they might need to be extended to support an encoding that requires multibyte storage. Truncation of data can occur if conversion causes expansion of data.

Figure 10-1 shows an example of data expansion when single-byte characters become multibyte. For example, `ä` (`a` with an umlaut) is a single-byte character in WE8MSWIN1252, but it becomes a two-byte character in UTF8. Also, the Euro symbol expands from one byte to three bytes.

#### Figure 10-1 Single-Byte and Multibyte Encoding

Text description of the illustration nls81023.gif

The maximum number of bytes for `CHAR` and `VARCHAR2` datatypes is 2000 and 4000, respectively. If the data in the new character set requires columns that are wider than 2000 and 4000 bytes for `CHAR` and `VARCHAR2` datatypes, then you need to change your schema.

#### Additional Problems Caused by Data Truncation

Data truncation can cause the following problems:

• In the database data dictionary, schema object names cannot exceed 30 bytes in length. Schema objects are tables, clusters, views, indexes, synonyms, tablespaces, and usernames. You must rename schema objects if their names exceed 30 bytes in the new database character set. For example, one Thai character in the Thai national character set requires 1 byte. In UTF8, it requires 3 bytes. If you have defined a table whose name is 11 Thai characters, then the table name must be shortened to 10 or fewer Thai characters when you change the database character set to UTF8.
• If existing Oracle usernames or passwords are created based on characters that will change in size in the new character set, users will experience login difficulties due to authentication failures after the migration to a new character set. This is because the encrypted usernames and passwords stored in the data dictionary are not updated during migration to a new character set. For example, if the current database character set is WE8MSWIN1252 and the new database character set is UTF8, then the length of the username `scött` (`o` with an umlaut) will change from 5 bytes to 6 bytes. In UTF8, `scött` will no longer be able to log in because of the difference in the username. Oracle Corporation recommends that usernames and passwords be based on ASCII characters. If they are not, you must reset the affected usernames and passwords after migrating to a new character set.
• When `CHAR` data contains characters that will be expanded after migration to a new character set, space padding will not be removed during database export by default. This means that these rows will be rejected upon import into the database with the new character set. The workaround is to set the `BLANK_TRIMMING` initialization parameter to `TRUE` before importing the `CHAR` data.

 See Also: Oracle9i Database Reference for more information about the `BLANK_TRIMMING` initialization parameter

### Character Set Conversion Issues

This section includes the following topics:

#### Replacement Characters that Result from Using the Export and Import Utilities

The Export and Import utilities can convert character sets from the original database character set to the new database character set. However, character set conversions can sometimes cause data loss or data corruption. For example, if you are migrating from character set A to character set B, the destination character set B should be a superset of character set A. The destination character, B, is a superset if it contains all the characters defined in character set A. Characters that are not available in character set B are converted to replacement characters, which are often specified as `?` or `¿` or a character that is related to the unavailable character. For example, `ä` (`a` with an umlaut) can be replaced by `a`. Replacement characters are defined by the target character set.

Figure 10-2 shows an example of a character set conversion in which the copyright and Euro symbols are converted to `?` and `ä` is converted to `a`.

#### Figure 10-2 Replacement Characters in Character Set Conversion

Text description of the illustration nls81020.gif

To reduce the risk of losing data, choose a destination character set with a similar character repertoire. Migrating to Unicode can be an attractive option because UTF8 contains characters from most legacy character sets.

#### Invalid Data That Results from Setting the Client's NLS_LANG Parameter Incorrectly

Another character set migration scenario that can cause the loss of data is migrating a database that contains invalid data. Invalid data usually occurs in a database because the `NLS_LANG` parameter is not set properly on the client. The `NLS_LANG` value should reflect the client operating system code page. For example, in an English Windows environment, the code page is WE8MSWIN1252. When the `NLS_LANG` parameter is set properly, the database can automatically convert incoming data from the client operating system. When the `NLS_LANG` parameter is not set properly, then the data coming into the database is not converted properly. For example, suppose that the database character set is UTF8, the client is an English Windows operating system, and the `NLS_LANG` setting on the client is UTF8. Data coming into the database is encoded in WE8MSWIN1252 and is not converted to UTF8 data because the `NLS_LANG` setting on the client matches the database character set. Thus Oracle assumes that no conversion is necessary, and invalid data is entered into the database.

This can lead to two possible data inconsistency problems. One problem occurs when a database contains data from a character set that is different from the database character set but the same code points exist in both character sets. For example, if the database character set is WE8ISO8859P1 and the `NLS_LANG` setting of the Chinese Windows NT client is `SIMPLIFIED` `CHINESE_CHINA.WE8ISO8859P1`, then all multibyte Chinese data (from the ZHS16GBK character set) is stored as multiples of single-byte WE8ISO8859P1 data. This means that Oracle will treat these characters as single-byte WE8ISO8859P1 characters. Hence all SQL string manipulation functions such as `SUBSTR` or `LENGTH` will be based on bytes rather than characters. All bytes constituting ZHS16GBK data are legal WE8ISO8859P1 codes. If such a database is migrated to another character set, for example, UTF8, character codes will be converted as if they were in WE8ISO8859P1. This way, each of the two bytes of a ZHS16GBK character will be converted separately, yielding meaningless values in UTF8. Figure 10-3 shows an example of this incorrect character set replacement.

#### Figure 10-3 Incorrect Character Set Replacement

Text description of the illustration nls81021.gif

The second possible problem is having data from mixed character sets inside the database. For example, if the data character set is WE8MSWIN1252, and two separate Windows clients using German and Greek are both using the `NLS_LANG` character set setting as WE8MSWIN1252, then the database will contain a mixture of German and Greek characters. Figure 10-4 shows how different clients can use different character sets in the same database.

#### Figure 10-4 Mixed Character Sets

Text description of the illustration nls81022.gif

For database character set migration to be successful, both of these cases require manual intervention because Oracle cannot determine the character sets of the data being stored.

## Changing the Database Character Set of an Existing Database

Database character set migration has two stages: data scanning and data conversion. Before you change the database character set, you need to identify possible database character set conversion problems and truncation of data. This step is called data scanning.

Data scanning identifies the amount of effort required to migrate data into the new character encoding scheme before changing the database character set. Some examples of what may be found during a data scan are the number of schema objects where the column widths need to be expanded and the extent of the data that does not exist in the target character repertoire. This information helps to determine the best approach for converting the database character set.

There are three approaches to converting data from one database character set to another if the database does not contain any of the inconsistencies described in "Character Set Conversion Issues". A description of methods to migrate databases with such inconsistencies is out of the scope of this documentation. For more information, contact Oracle Consulting Services for assistance.

The approaches are:

### Migrating Character Data Using a Full Export and Import

In most cases, a full export and import is recommended to properly convert all data to a new character set. It is important to be aware of data truncation issues, because columns with character datatypes may need to be extended before the import to handle an increase in size. Existing PL/SQL code should be reviewed to ensure that all byte-based SQL functions such as `LENGTHB`, `SUBSTRB`, and `INSTRB`, and PL/SQL `CHAR` and `VARCHAR2` declarations are still valid.

### Migrating Character Data Using the ALTER DATABASE CHARACTER SET Statement

The `ALTER DATABASE CHARACTER SET` statement is the fastest way to migrate a character set, but it can be used only under special circumstances. The `ALTER DATABASE CHARACTER SET` statement does not perform any data conversion, so it can be used if and only if the new character set is a strict superset of the current character set.

The new character set is a strict superset of the current character set if:

• Each and every character in the current character set is available in the new character set.
• Each and every character in the current character set has the same code point value in the new character set. For example, US7ASCII is a strict subset of many character sets.

Another restriction of the `ALTER` `DATABASE` `CHARACTER` `SET` statement is that it can be used only when the character set migration is between two single-byte character sets or between two multibyte character sets. If the planned character set migration is from a single-byte character set to a multibyte character set, then use the Export and Import utilities.

This restriction on using the `ALTER DATABASE CHARACTER SET` statement arises because of `CLOB` data. In Oracle9i, some internal fields in the data dictionary are stored in `CLOB` columns. Customers may also store data in `CLOB` fields. When the database character set is multibyte, `CLOB` data in Oracle9i is stored as UCS-2 data (two-byte, fixed-width Unicode). When the database character set is single-byte, `CLOB` data is stored using the database character set. Because the `ALTER DATABASE CHARACTER SET` statement does not convert data, `CLOB` columns remain in the original database character set encoding when the database character set is migrated from single-byte to multibyte. This introduces data inconsistency in the `CLOB` columns.

The syntax of the `ALTER DATABASE CHARACTER SET` statement is as follows:

`ALTER DATABASE [`db_name``]` CHARACTER SET `new_character_set`;`

`db_name` is optional. The character set name should be specified without quotes. For example:

`ALTER DATABASE CHARACTER SET AL32UTF8;`

To change the database character set, perform the following steps:

1. Shut down the database, using either a `SHUTDOWN IMMEDIATE` or a `SHUTDOWN NORMAL` statement.
2. Do a full backup of the database because the `ALTER DATABASE CHARACTER SET` statement cannot be rolled back.
3. Complete the following statements:
`STARTUP MOUNT;ALTER SYSTEM ENABLE RESTRICTED SESSION;ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;ALTER SYSTEM SET AQ_TM_PROCESSES=0;ALTER DATABASE OPEN;ALTER DATABASE CHARACTER SET `new_character_set`;SHUTDOWN IMMEDIATE; -- or SHUTDOWN NORMAL;STARTUP;`
 See Also: Oracle9i SQL Reference for more information about the `ALTER DATABASE CHARACTER SET` statementAppendix A, "Locale Data" for a list of all superset character sets

#### Using the ALTER DATABASE CHARACTER SET Statement in an Oracle9i Real Application Clusters Environment

In a Oracle9i Real Application Clusters environment, ensure that no other Oracle background processes are running, with the exception of the background processes associated with the instance through which a user is connected, before attempting to issue the `ALTER` `DATABASE` `CHARACTER` `SET` statement. Use the following SQL statement to verify the environment:

`SELECT SID, SERIAL#, PROGRAM FROM V\$SESSION;`

Set the `CLUSTER_DATABASE` initialization parameter to `FALSE` to allow the character set change to be completed. This is required in an Oracle9i Real Application Cluster environment; an exclusive startup is not sufficient.

### Migrating Character Data Using the ALTER DATABASE CHARACTER SET Statement and Selective Imports

Another approach to migrating character data is to perform an `ALTER` `DATABASE` `CHARACTER` `SET` statement followed by selective imports. This method is best suited for a known distribution of convertible data that is stored within a small number of tables. A full export and import is too expensive in this scenario. For example, suppose you have a 100GB database with over 300 tables, but only 3 tables require character set conversions. The rest of the data is of the same encoding as the destination character set. The 3 tables can be exported and imported back to the new database after issuing the `ALTER DATABASE CHARACTER SET` statement.

Incorrect data conversion can lead to data corruption, so perform a full backup of the database before attempting to migrate the data to a new character set.

## Migrating to the Oracle9i NCHAR Datatypes

In Oracle9i, data that is stored in columns of the `NCHAR` datatypes is stored exclusively in a Unicode encoding regardless of the database character set. This allows users to store Unicode in a database that does not use Unicode as the database character set.

This section includes the following topics:

### Migrating Oracle8 NCHAR Columns to Oracle9i

In release 8.0, the Oracle Server introduced a national character datatype (`NCHAR`) that allows a second, alternate character set in addition to the database character set. The `NCHAR` datatypes support several fixed-width Asian character sets that were introduced to provide better performance when processing Asian character data.

In Oracle9i, the SQL `NCHAR` datatypes are limited to Unicode character set encoding (UTF8 and AL16UTF16). Any other Oracle8 Server character sets that were available for the `NCHAR` datatype, including Asian character sets such as JA16SJISFIXED are no longer supported.

The steps for migrating existing `NCHAR`, `NVARCHAR2`, and `NCLOB` columns to Oracle9i `NCHAR` datatypes are as follows:

1. Export all `NCHAR` columns from the Oracle8 or Oracle8i database.
2. Drop the `NCHAR` columns.
4. Import the `NCHAR` columns into Oracle9i.

The Oracle9i migration utility can also convert Oracle8 and Oracle8i `NCHAR` columns to 9i `NCHAR` columns. A SQL `NCHAR` upgrade script called `utlchar.sql` is supplied with the migration utility. Run it at the end of the database migration to convert Oracle8 and Oracle8i `NCHAR` columns to the Oracle9i `NCHAR` columns. After the script has been executed, the data cannot be downgraded. The only way to move back to Oracle8 or Oracle8i is to drop all `NCHAR` columns, downgrade the database, and import the old `NCHAR` data from a previous Oracle8 or Oracle8i export file. Ensure that you have a backup (export file) of Oracle8 or Oracle8i `NCHAR` data, in case you need to downgrade your database in the future.

 See Also: Oracle9i Database Utilities for a description of export and import proceduresOracle9i Database Migration for `NCHAR` migration information

### Changing the National Character Set

To change the national character set, use the `ALTER` `DATABASE` `NATIONAL` `CHARACTER` `SET` statement. The syntax of the statement is as follows:

`ALTER DATABASE [`db_name``]` NATIONAL CHARACTER SET `new_NCHAR_character_set`;`

`db_name` is optional. The character set name should be specified without quotes.

You can issue the `ALTER DATABASE CHARACTER SET` and `ALTER DATABASE NATIONAL CHARACTER SET` statements together if desired.

 See Also: Oracle9i SQL Reference for the syntax of the `ALTER` `DATABASE` `NATIONAL` `CHARACTER` `SET` statement

### Migrating CHAR Columns to NCHAR Columns in an Oracle9i Database

You can change a column's datatype definition using the following methods:

• The `ALTER TABLE MODIFY` statement
• Online table redefinition

The `ALTER TABLE MODIFY` statement has the following advantages over online table redefinition:

• Easier to use
• Fewer restrictions

Online table redefinition has the following advantages over the `ALTER TABLE MODIFY` statement:

• Faster for columns with a large amount of data
• Can migrate several columns at one time
• Table is available for DML during most of the migration process
• Avoids table fragmentation, which saves space and allows faster access to data.
• Can be used for migration from the `CLOB` datatype to the `NCLOB` datatype

This section contains the following topics:

#### Using the ALTER TABLE MODIFY Statement to Change CHAR Columns to NCHAR Columns

The `ALTER TABLE MODIFY` statement can be used to change table column definitions from the `CHAR` datatypes to `NCHAR` datatypes. It also converts all of the data in the column from the database character set to the `NCHAR` character set. The syntax of the `ALTER TABLE MODIFY` statement is as follows:

`ALTER TABLE `table_name` MODIFY `(``column_name datatype``);``

If indexes have been built on the migrating column, then dropping the indexes can improve the performance of the `ALTER TABLE MODIFY` statement because indexes are updated when each row is updated.

The maximum column lengths for `NCHAR` and `NVARCHAR2` columns are 2000 and 4000 bytes. When the `NCHAR` character set is AL16UTF16, the maximum column lengths for `NCHAR` and `NVARCHAR2` columns are 1000 and 2000 characters, which are 2000 and 4000 bytes. If this size limit is violated during migration, consider changing the column to the `NCLOB` datatype instead.

 Note: `CLOB` columns cannot be migrated to `NCLOB` columns using the `ALTER TABLE MODIFY` statement. Use online table redefinition to change a column from the `CLOB` datatype to the `NCLOB` datatype.

#### Using Online Table Redefinition to Migrate a Large Table to Unicode

It takes significant time to migrate a large table with a large number of rows to Unicode datatypes. During the migration, the column data is unavailable for both reading and updating. Online table redefinition can significantly reduce migration time. Using online table redefinition also allows the table to be accessible to DML during most of the migration time.

Perform the following tasks to migrate a table to Unicode datatypes using online table redefinition:

1. Use the `DBMS_REDEFINITION.CAN_REDEF_TABLE` PL/SQL procedure to verify that the table can be redefined online. For example, to migrate the scott.emp table, enter the following command:
`DBMS_REDEFINITION.CAN_REDEF_TABLE('scott','emp');`
2. Create an empty interim table in the same schema as the table that is to be redefined. Create it with NCHAR datatypes as the attributes. For example, enter a statement similar to the following:
`CREATE TABLE int_emp(`
`empno NUMBER(4),ename NVARCHAR2(10),job NVARCHAR2(9),mgr NUMBER(4),hiredate DATE,sal NUMBER(7,2),deptno NUMBER(2),org NVARCHAR2(10));`
3. Start the online table redefinition. Enter a command similar to the following:
`DBMS_REDEFINITION.START_REDEF_TABLE('scott','emp','int_emp','empno empno,to_nchar(ename) ename,to_nchar(job) job,mgr mgr,hiredate hiredate,sal sal,deptno deptno,to_nchar(org) org');`

If you are migrating `CLOB` columns to `NCLOB` columns, then use the `TO_NCLOB` SQL conversion function instead of the `TO_NCHAR` SQL function.

4. Create triggers, indexes, grants, and constraints on the interim table. Referential constraints that apply to the interim table (the interim table is a parent or child table of the referential constraint) must be created in `DISABLED` mode. Triggers that are defined on the interim table are not executed until the online table redefinition process has been completed.
5. You can synchronize the interim table with the original table. If many DML operations have been applied to the original table since the online redefinition began, then execute the `DBMS_REDEFINITION.SYNC_INTERIM_TABLE` procedure. This reduces the time required for the `DBMS_REDEFINITION.FINISH_REDEF_TABLE` procedure. Enter a command similar to the following:
`DBMS_REDEFINITION.SYNC_INTERIM_TABLE('scott', 'emp', 'int_emp');`
6. Execute the `DBMS_REDEFINITION.FINISH_REDEF_TABLE` procedure. Enter a command similar to the following:
`DBMS_REDEFINITION.RINISH_REDEF_TABLE('scott', 'emp', 'int_emp');`

When this procedure has been completed, the following conditions are true:

• The original table is redefined so that it has all the attributes, indexes, constraints, grants, and triggers of the interim table.
• The referential constraints that apply to the interim table apply to the redefined original table.
7. Drop the interim table. Enter a statement similar to the following:
`DROP TABLE int_emp;`

The results of the online table redefinition tasks are as follows:

• The original table is migrated to Unicode columns.
• The triggers, grants, indexes, and constraints defined on the interim table after the `START_REDEF_TABLE` subprogram and before the `FINISH_REDEF_TABLE` subprogram are defined for the redefined original table. Referential constraints that apply to the interim table now apply to the redefined original table and are enabled.
• The triggers, grants, indexes, and constraints defined on the original table before redefinition are transferred to the interim table and are dropped when you drop the interim table. Referential constraints that applied to the original table before redefinition were applied to the interim table and are now disabled.
• PL/SQL procedures and cursors that were defined on the original table before redefinition are invalidated. They are automatically revalidated the next time they are used. Revalidation may fail because the table definition has changed.

## Tasks to Recover Database Schema After Character Set Migration

You may need to perform additional tasks to recover a migrated database schema to its original state. Consider the issues described in Table 10-1.

##### Table 10-1 Issues During Recovery of a Migrated Database Schema
Issue Description

Indexes

When table columns are changed from `CHAR` datatypes to `NCHAR` datatypes by the `ALTER TABLE MODIFY` statement, indexes that are built on the columns are changed automatically by the database. This slows down performance for the `ALTER TABLE MODIFY` statement. If you drop indexes before issuing the `ALTER TABLE MODIFY` statement, then re-create them after migration.

Constraints

If you disable constraints before migration, then re-enable them after migration.

Triggers

If you disable triggers before migration, then re-enable them after migration.

Replication

If the columns that are migrated to Unicode datatypes are replicated across several sites, then the changes should be executed at the master definition site. Then they will be propagated to the other sites.

Binary order

The migration from `CHAR` datatypes to `NCHAR` datatypes involves character set conversion if the database and `NCHAR` data have different character sets. The binary order of the same data in different encodings can be different. This affects applications that rely on binary order.