Tuesday, 24 March 2015

Unicode data extracts using Oracle SQL*Plus in Unix


The normal data spooling method using Oracle SQL*Plus will use the locale setting to determine the output file encoding method. Typically we are only interested in data extracts using just ASCII characters, but if you need (for example) to use a unicode character as a data separator, then things tend to go wrong.

Consider the following SQL statement which uses the double dagger unicode character as a separator:-

SELECT 'FIRST_NAME'||unistr('\2021')||'SURNAME' from dual;
If we run this in an SQLPlus session we get the following..

(nb. I'm going to assume that you are using Putty as your terminal client)

 Setting your Locale in Putty

The data returned by the select looks like it's become mangled, but it's not. Your Putty session needs it's Window/Translation setting to be changed to UTF-8 before it can be displayed correctly.

Select UTF-8 translation in Putty

Now if we run this SQL again we get..


Similarly we can run the following SQL script using SQL*Plus to produce the file output.csv:-

set feedback off
set heading off
set term off
set echo off
spool output.csv SELECT 'FIRST_NAME'||unistr('\2021')||'SURNAME' from dual;
spool off

The contents of output.csv will also display correctly using 'cat', but only because your locale settings in Putty are UTF-8.

NB. You can also copy the file back to your PC and open it using notepad (which handles UTF8 files correctly).

Scheduling with Cron

The spool script works OK when run interactively, but now let's schedule it from the cron. The script will be run using the machine's locale settings (probably ISO8859), and so SQLPlus will replace the unicode characters with question marks to keep it a valid ASCII format.


But we can fix this by adding the following line near the start of our shell script:-

export NLS_LANG=.AL32UTF8

1 comment:

  1. This comment has been removed by a blog administrator.