QUESTION: Oracle has rich
multi-media content with LOB's, and I wonder how to store and search
MS-Word documents within my Oracle database.
ANSWER: The Oracle BLOB datatype, and
BFILE datatypes can be used with the dbms_lob procedure allows for storing of MS-Word docs, and several
people have published instructions for storing MS Word documents
inside Oracle.
The trick when pasting-in an MS-Word document in
a application is to store the contents of the paste as-is in a BLOB
datatype inside Oracle table, and then work on translating the binary
representation at document fetch time.
Translating an MS-Word document in Oracle is not
trivial, but some applications use a BLOB datatype with Oracle*Text
indexes. Other applications store the MS-Word documents as a BFILE and
pass them to MS-Word within the application.
Part of your problem will be the "smart quotes"
and other funky symbols (curly brackets heavy arrows) that MS-Word
inserts automatically.
Most folks use UTF-8 encoding, but there is also
the Windows-1252 character set (WE8MSWIN1252) and the ISO-8851-1
character set.
You can insert any special characters into your Oracle DB with any
character set:
(1) Using RAWTOHEX() function to find out the
Hex code of the string:
bfile_loc
:= BFILENAME('DOC_DIR', in_doc);
INSERT INTO my_docs (doc_id, bfile_loc, doc_title) VALUES (1,
bfile_loc, in_doc);
SELECT doc_blob INTO temp_blob FROM my_docs WHERE doc_id = in_id
FOR UPDATE;
DBMS_LOB.OPEN(bfile_loc, DBMS_LOB.LOB_READONLY);
Open the external blob object with the MS Word document for
reading.
DBMS_LOB.OPEN(temp_blob, DBMS_LOB.LOB_READWRITE); Open the
temporary blob object for reading and writing.
DBMS_LOB.LOADFROMFILE (temp_blob, bfile_loc, Bytes_to_load);