Sunday, May 24, 2009

GUI or command line? Where to run an IDQ plan.

Recently on a data quality project I stumbled across an anomoly that I thought I share with the data quality / Informatica community.  It involves the use of Informatica Data Quality (IDQ) and the use of certain types of queries.  I discovered this anomoly when I recieved the following error message:


[caption id="attachment_124" align="aligncenter" width="414" caption="Image 1: Failure to initialise plan"]Failure to initialise[/caption]

One of the root causes of this error is a "behind-the-scenes" process that involves the import of data into an IDQ plan.  When importing data into an IDQ plan, via SQL inside the source component, IDQ wraps a select count () around the SQL in the original query.  At times this causes IDQ to fail while initializing and returns the error message above.  Most commonly this error occurs when using a query that is beyond the standard select statement.  I was using a union in my query when I got this error.



If you are using Oracle you have the option of editing your SQL or running the plan from the command line.  However if you are using Microsoft SQL Server, you'll need to get the to command line.

The SQL edit in Oracle is simple and straight-forward so let's address that first.  If you get an error where the IDQ plan fails to initialise try wrapping the SQL as follows:


  • Select * from (<your query here>)



As for Microsoft SQL Server the  command line is the only option but it is very straight-forward.  You could even argue that the command line is more enterprise robust since the jobs could then be scheduled and thus integrated into existing data services more easily. 


Regardless of the perspective here is how you deploy the plan in realtime using the command line.  An IDQ plan can be executed from one of two locations; the IDQ repository or the file location of the XML version of the plan.  On Windows, the executable file for implementing runtime functionality is Athanor-RT.exe, located in the bin folder of the Data Quality Server installation.  It is also a requirement to specify the location of the plan in the command line utility. 


To deploy a plan from an XML file the -f switch is used.  So the command to deploy an XML version of a plan, call it myplan, in the C:/idqplans directory would be as follows:





  • athanor-rt -f c:/idqplans/myplan.xml



To deploy a plan from the repository the -p switch is used.  Consequently the command to deploy the myplan dq plan from the repository would be as follows:





  • athanor-rt -p c:/idqplans/myplan



There are other useful switches to utilize while deploying your IDQ plans at the command line. 


For instance to run an IDQ plan along with a parameter file, which allows you to override plan variables with those in the parameter file, use the -c switch.  The command for executing a plan along with a parameter file would be as follows:





  • athanor-rt -c <parameter file name> c:/idqplans/myplan



Another helpful switchs allows you to receive feedback as to the status of the plan execution.  This switch is -i. 



The -i switch will provide feedback in the command line window.   The feedback occurs at the specified interval in the command line.  For instance, the command -i 200 will provide feedback every 200 records for each field in the process.  The following is an example of the feedback provided in the command line with the -i switch:

cmdlineexample

With these basic switches you can deploy any IDQ plan regardless of the query required to source the data.  I hope this post helps someone avoid hours of debugging!

Check back later for more details about scheduling IDQ plans using the command line.

5 comments:

  1. William

    As the person who was Customer #1 of the IDQ tool back when they were Similarity Systems and it was called Athanor (hence athanor -rt), this is a great post.

    I like that you are writing some content about how to use these tools effectively, particularly the batch mode concepts.

    ReplyDelete
  2. dqchronicle authorAugust 1, 2009 at 6:16 AM

    Daragh,
    Thanks for the comment. It is good to hear about the pre-INFA IDQ. Please feel free to post more. Also, thanks for validating the concept behind the "chronicle" concept. I read a lot of theory based posts & I wanted to be more hands-on and build a body of tactical knowledge.
    Again, thanks for the feedback!

    ReplyDelete
  3. Hi,
    I liked the information provided above. But can you please tell me how to mention the DB details in parameter file, if we are using DB Source component as an input to the plan?

    Thanks.
    Laxmikant.

    ReplyDelete
  4. Laxmikant,
    Thanks for stopping by and commenting on the post. It's been some time since I worked with 8.6 and source definition has changed in v9. Let me try and chase down some answers and get back to you asap.
    Thanks again
    William

    ReplyDelete
  5. Thanks for reply William,
    I thought to share some more information here. We are using "IDQ workbench 8.6.2.8". Also along with the DB Component information, can you help me to know how can we write shell script to schedule the execution of plans.
    AS per my understanding we will have to give the path of athanor.exe in the .sh file and then follow the athanor-rt commands. If possible please guide me in this also. :)

    Thanks again,
    Laxmikant.

    ReplyDelete

What data quality is (and what it is not)

Like the radar system pictured above, data quality is a sentinel; a detection system put in place to warn of threats to valuable assets. ...