Thursday, August 23, 2012

Using MD5 encryption in OWB

Just a quickie to demonstrate how easy it is to use MD5 hashes instead of the normal hash function in your owb mappings:.


case
 when INGRP1.TO_BE_HASHED is null then null
else
 RAWTOHEX(DBMS_OBFUSCATION_TOOLKIT.MD5(input_string=> INGRP1.TO_BE_HASHED ))
end



I used this make personal information in a table non-reversable unreadable. As you might know straightforward hashing is not unique enough in large datasets.
Thats the reason why i used an MD5 hash.

Later on in the project i was faced with the question of reverting the process. The powers that be have a way of changing their needs...

So i thought of  taking the source, with plain data, md5 hash it and compare that to the stored value.
That works but when you have a large dataset you'll need an index, and 
this : 

 RAWTOHEX(DBMS_OBFUSCATION_TOOLKIT.MD5(input_string=> INGRP1.TO_BE_HASHED ))

doesn't index.
You'll need an function-based index for this.
However using a function-based index with this throws a non-deterministic error. This means that the md5 function is compiled with the deterministic flag off, and as such can not be used in fbi's.

So..
I created a wrapper function that just makes the MD5 function deterministic.
See here:


CREATE OR REPLACE FUNCTION md5er (a_hash IN varchar2) RETURN varchar2
 DETERMINISTIC
IS

BEGIN
 RETURN((DBMS_OBFUSCATION_TOOLKIT.MD5(input_string=> a_hash)));
END;

Then you can create the fbi on the stage table that holds the source:
See here:


create index 

STG_EXPOR_FBIDX1 on STG_EXPORT(RAWTOHEX(md5er(a_hash=> nvl(,'empty') )));

Fbi's need their index updated with this:

exec dbms_stats.gather_table_stats(ownname=>null, tabname=> 'stg_export', estimate_percent=>null, cascade=>true, method_opt=> 'FOR ALL HIDDEN COLUMNS SIZE 1');


And now you can use the deterministic MD5 function called md5er in your query and have good performance.

update DWH.dwh_person_h ph

set ph.person_number = nvl((
 select xt.plain_data
  from (
         select xt1.source_field source_field 
         from stg.stg_export xt1
         group by xt1.source_field
        ) xt
  where
    RAWTOHEX(md5er(a_hash=> nvl(xt.source_field,'empty') ))= ph.person_number

  ), ph.person_number)--for persons not in the source anymore
;


And all of a sudden another page has managed to fill itself.. 
BTW: i promised myself to write more often, once a year is not enough..
I have more in the works, and the XML post needs another follow-up.

No comments:

Post a Comment