Thursday, September 15, 2011

Using the Java Class component in OWB

I have been meaning to write about this for a long time.

I promised David Allan to write something about the Java Activity and then my personal life caught up with me so i had to put this on the backburner.
But here it is: the beginning of my blog about what i do for a living, dabbling around in Oracle stuff.
In the meantime David posted a hello world example here that covered the basics, so i'll focus more on the example i built instead of explaining all of it.
Using a java component isn't that difficult as you saw there but what can you do with it  ?
I have used it to process a very, very large (8GB) xml file that we needed as a source for a datawarehouse. And i have used to convert a simple xml extract from the owb repository to WordML.

I'll explain the xml processing in another topic and just explain the java stuff here.


Overview

We were faced with importing a huge xmlfile into a warehouse built with owb.
After lots of experimenting i ended up with a easy and clean solution, using a java based sax parser to process the xml file and convert it to 4 csv files.
Importing the csv files into your warehouse is easy then.
A sax parser is a java library that can do xml transformations for you.

It takes a xml file and a xslt file and spits out another file.

So we need for this to work the following bits:
  1. a sax parser, this is a java library that does the work
  2. an xslt document, this tells the library what to do
  3. an input file, the actual source
  4. a java vm, owb provides this for us
  5. some glue in owb to tie it together.
  6. a place to store this all
The sax parser is made by Saxonica, full details here. It is not for free but one of the best around with great support for a decent price.

Calling a XML Transformation goes like this under Windows:
--calling a xslt transformation from the commandline
java -cp saxon9ee.jar net.sf.saxon.Transform  -xsl:export.xsl -o:csv_export.txt -it:main filename=file:///c:\data\yourfile.xml

--end
Breakdown of the parameters involved:

-> java : this is the java executable , owb will provide this for us
-> -cp  : os name of the library that does the parsing
-> net.sf.saxon.Transform : name of function inside the library
-> -xsl : parameter of the Transform function that tells where on the os the xml stylesheet resides 
-> -o : parameter of the Transform function that specifies where to put the output 
-> it:main name of the template in the xslt that will be processed first
-> filename=  : parameter used  in the xslt specified above.
All of the Transform parameters can be found here.

Next is getting it in owb. David explained this nice and easy so'll touch this just briefly and zoom in on the details.
Finished process flow

Here is a screenshot of my finished processflow.

Lets look at the Java Activity parameters :
Java Activity options


These are the minimum no.of parameters you need to use the Java Activity.
In my processflow i have used variables to specify these, i use a pl/sql function to get the values from a table.
For brevity i'll just map the ones from the picture above to ones used in the above example.


CLASSPATH        saxon9ee.jar
CLASS_NAME       net.sf.saxon.Transform
JAVA_OPTIONS     not used here
PARAMETER_LIST    -xsl:export.xsl -o:to_be_processed/csv_Export.txt -it:main filename=file:////u99/data/to_be_processed/file.xml
RESULT_CODE      bound to a variable n the flow
RUN_DIRECTORY    dir where java starts: root of above xsl and jar files


Note the space in the PARAMETER_LIST's beginning. This is something that is necessary although i do not know exactly why.
This also doesn't use the option of specifying multiple parameters on the Java Activity as David did in his sample.

You can use the RESULT_CODE but i did not use that; the jar file will cause the flow to halt anyway when it detects an error.

Debugging and Monitoring your Java Activity
You can use this script to look in the repository what the results are.
This is easier than opening the process flow.
But essentially the same info is shown as would be on the <i> symbol.
Note that this script can also be used to see how the Shell Acitivity works. 


--get runinfo from shell or java job together with statuscode
select 
  e.top_level_audit_execution_id,
  e.audit_execution_id, 
  e.task_name, 
  f.file_type, 
  f.file_text,
  p.parameter_name, p.value --use this to see parameters of mappings, also see below for join
  from owbsys.wb_rtv_audit_executions e 
  left join owbsys.all_rt_audit_exec_files f    on e.audit_execution_id = f.execution_audit_id
  left join owbsys.wb_rtv_audit_parameters p ON  e.audit_execution_id = p.audit_execution_id
  join (
      select  n.top_level_audit_execution_id from 
        (  
         select   e.top_level_audit_execution_id , row_number() over   (partition by e.top_level_audit_execution_id order by e.top_level_audit_execution_id desc) row_numb 
          FROM  owbsys.wb_rtv_audit_executions e 
          where e.task_object_store_name like 'NVIS_%'                                                    -- filter out only jobs from your project by process flow location
          order by e.top_level_audit_execution_id desc 
        )n where row_numb =1 and rownum <2 order by n.top_level_audit_execution_id desc                   -- rownum <4 says last 3 runs
      ) latest
  on latest.top_level_audit_execution_id = e.top_level_audit_execution_id
  where  
    f.file_type is not null 
    and p.parameter_name in ('RESULT_CODE', 'PARAMETER_LIST')
  order by e.top_level_audit_execution_id desc, e.audit_execution_id desc
;



This will give you for example this:
--sql output
PF03_PRC_JAVA_PARSE_XML:JAVA_PARSE_XML_XPRT JavaErrorStream  PARAMETER_LIST  -xsl:java/xslt/export.xsl -o:to_be_processed/csv_xport.txt -it:main filename=file:////u99/data/to_be_processed/export.xml
PF03_PRC_JAVA_PARSE_XML:JAVA_PARSE_XML_XPRT_WGR JavaErrorStream  RESULT_CODE 0
--end


And looking for an error might give you this (real-world) example:
--sql output
266221 266552 PF03_PRC_JAVA_PARSE_XML:JAVA_PARSE_XML_XPRT_BZH JavaErrorStream "Error on line 45438159 column 4 of export.xml:
  SXXP0003: Error reported by XML parser: An invalid XML character (Unicode: 0x0) was found
  in the element content of the document.
Transformation failed: Run-time errors were reported
" RESULT_CODE 2
266221 266552 PF03_PRC_JAVA_PARSE_XML:JAVA_PARSE_XML_XPRT_BZH JavaOutputStream  PARAMETER_LIST  -xsl:java/xslt/export.xsl -o:to_be_processed/csv_export.txt -it:main filename=file:////u99/data/to_be_processed/export.xml
--end
And as you can see we have an ERROR :-)


So there it is: a real-life example of the Java Activity in OWB.
I'll post some more info later on in another post about the actual xslt-techniques used to process the xml.

No comments:

Post a Comment