Saturday, January 8, 2011

Get ready 'cause here I come: A Tale of Data Quality Preparedness

Normally I use this blog to tell stories of a very technical nature.  However, a recent experience has led me in a very different direction.  I want to talk informally about what it takes to prepare for a data quality project. 

First and foremost, there needs to be a catalyst for starting a data quality project.  Typically this usually takes the form of anecdotal tales of woe from data consumers trying to use enterprise data.  From these anecdotes can be extracted a series of problem areas that need to be addressed.  For instance, if marketing director complains that direct marketing campaign costs are rising due to mailing multiple mailers to one address, then there is the identification of the need to perform household analysis and consolidation.  In a large enterprise, interviews need to be conducted to gather these stories and reduce them to a set of data quality operations.



This is in contrast to kicking off a data quality project without capturing these requirements.  Data quality is many things to many organizations and requires focus to be most effective.  Failure to log the details on the various anecdotes is a failure in being prepared to start a data quality project.

Another essential aspect of preparing for a data quality project is defining what data is in scope.  This requirement is highly coupled with the reasons for undetaking the effort.  In other words, if the marketing department is the business unit questioning the usability of the data then it is logical to focus on the data that deals with customer and customer contact details.

This is in contrast to defining a source system as the in-scope parameter of the data quality project.   I've rarely come across an enterprise source system that consisted of less than a thousand tables with countless data elements.  Failure to detail the data elements in question is a failure in being prepared to start a data quality project.

A final essential requirement to any data quality project is a data quality environment which consists of an application server with the required software and data connections.  I can't say how many times this requirement is overlooked.  Due to typcial enterprise requisition processes, this step needs to be initiated prior to engaging a team to conduct data quality effort.  I've spent weeks waiting for hardware to do the analysis on and even longer waiting for the data.  In some cases, this hardware can reside outside of the organizational structure.  If this is the case, it is most advisable to obtain the required permission to have the data exist outside the corporate firewall.

In short, before you decide to start a data quality project know why, what, and where you are going to do the analysis.  I know this may sound obvious but, believe me, it is not.  I've spent so much time tracking these three pieces of criteria that I have started development on a data quality project checklist that will be included in various project initiation documents such as proposals and statement of work. 

Let this post be a warning signal to corporate sponsors and data quality practioners alike when you decide to start a data quality project, get ready 'cause here I come!

6 comments:

  1. Bill,

    A good thought provoking post - how to encourage businesses to be proactive before their data quality issues become critical.

    This is a little like the situation in the UK where many burglar alarms are fitted to peoples homes shortly after they have been burgled. Very often it was something that they had always intended to do, but never quite go round to.

    In the corporate world I have encountered numerous situations where data quality improvement only starts to be taken seriously after a major failure, or a critical external audit etc. Again, data quality was something that had never quite got up the organisational priority list. This gets back to the age old dilemma of prioritising important but non-urgent activities (e.g. data quality) alongside urgent and possibly less important issues (e.g. a peak in staff sickness levels, a power failure etc.).

    Our latest blog post introduces another character to the Data Zoo - the PoD which is an essential requirement for any organisation to start on any large change programme (which can include data quality).

    ReplyDelete
  2. This seems to imply that you should begin to build the data quality program based on anecdotes. In another article there is a suggestion that building a data quality program on anecdotes is limiting. “Ten Mistakes to Avoid When Building a Data Quality Program”.
    http://www.bi-bestpractices.com/view-articles/582...

    I suggest that these anecdotes perhaps represent perceptions about the quality of data and these perceptions must be acknowledged, researched, quantified and addressed in a data quality program. Anecdotes typically have a very short half-life and responding to them may result in nothing more than a reactive data quality program.

    ReplyDelete
  3. Richard,
    I agree that there are times when anecdotes can be misleading. However they are a start at framing the effort and the act of engaging the business users telling these tales builds a relationship. It also puts to rest the impact, which is a key piece of this investigation, of the unfounded anecdotes.
    One thing I can attest to is that sending out questionnaires is almost pointless in the investigation process.
    Thanks for stopping by and commenting on the blog post!
    William

    ReplyDelete
  4. [...] 'cause here I come: A Tale of Data Quality Preparedness – Normally I use this… http://thedataqualitychronicle.org/… #dataquality 6 seconds [...]

    ReplyDelete
  5. [...] 'cause here I come: A Tale of Data Quality Preparedness – Normally I use this… http://thedataqualitychronicle.org/… #dataquality 3 hours [...]

    ReplyDelete
  6. Reading: @dqchronicle Post by W. Sharp: Get ready 'cause here I come: A Tale of Data Quality Preparedness http://t.co/gM8pJw4r #DataQuality

    ReplyDelete

What data quality is (and what it is not)

Like the radar system pictured above, data quality is a sentinel; a detection system put in place to warn of threats to valuable assets. ...