Thursday, September 15, 2011

Using the Java Class component in OWB

I have been meaning to write about this for a long time.

I promised David Allan to write something about the Java Activity and then my personal life caught up with me so i had to put this on the backburner.
But here it is: the beginning of my blog about what i do for a living, dabbling around in Oracle stuff.
In the meantime David posted a hello world example here that covered the basics, so i'll focus more on the example i built instead of explaining all of it.
Using a java component isn't that difficult as you saw there but what can you do with it  ?
I have used it to process a very, very large (8GB) xml file that we needed as a source for a datawarehouse. And i have used to convert a simple xml extract from the owb repository to WordML.

I'll explain the xml processing in another topic and just explain the java stuff here.


Overview

We were faced with importing a huge xmlfile into a warehouse built with owb.
After lots of experimenting i ended up with a easy and clean solution, using a java based sax parser to process the xml file and convert it to 4 csv files.
Importing the csv files into your warehouse is easy then.
A sax parser is a java library that can do xml transformations for you.

It takes a xml file and a xslt file and spits out another file.

So we need for this to work the following bits:
  1. a sax parser, this is a java library that does the work
  2. an xslt document, this tells the library what to do
  3. an input file, the actual source
  4. a java vm, owb provides this for us
  5. some glue in owb to tie it together.
  6. a place to store this all
The sax parser is made by Saxonica, full details here. It is not for free but one of the best around with great support for a decent price.

Calling a XML Transformation goes like this under Windows:
--calling a xslt transformation from the commandline
java -cp saxon9ee.jar net.sf.saxon.Transform  -xsl:export.xsl -o:csv_export.txt -it:main filename=file:///c:\data\yourfile.xml

--end
Breakdown of the parameters involved:

-> java : this is the java executable , owb will provide this for us
-> -cp  : os name of the library that does the parsing
-> net.sf.saxon.Transform : name of function inside the library
-> -xsl : parameter of the Transform function that tells where on the os the xml stylesheet resides 
-> -o : parameter of the Transform function that specifies where to put the output 
-> it:main name of the template in the xslt that will be processed first
-> filename=  : parameter used  in the xslt specified above.
All of the Transform parameters can be found here.

Next is getting it in owb. David explained this nice and easy so'll touch this just briefly and zoom in on the details.
Finished process flow