Data quality is about finding rows that are bad for business
Super. So how does one go about doing that?
Prior to this week I probably would have taken a different approach to explaining this. However, I learned something this week that I want to pass along. Essentially ...
Teach someone how to spot a counterfeit by making them an expert on the real thing
- Brad Melton (@BradEMelton)
Following that logic, the best way to determine how to find the "bad for business" rows is to know what the "good for business rows" look like. Translating this to the dimensions of data quality, these rows would be the most complete, consistent, conformed, accurate, integral and least duplicated rows in the enterprise.
A good example, that I often use, is one from a very successful data quality project on which I participated. [ref] The project directly related to marketing and the rules on it supported a mailing campaign.[/ref] As long as you have a postal code, house number and street name, you can mail something to someone. So .... a complete address for a mail-based marketing campaign is one with a house number, street name and postal code.
Therefore, a row in an address table with, at least, these columns present is a complete address row and is "good for business". Extending this example, validating the accuracy of the address information (@Loqate maybe?) is another way of measuring how good that row is for business (probably even more so).
Moral of the story, you ask?
Before you set out to fix the broken information in your enterprise, become learned on all aspects of the "unbroken" information.
I agree, Kathy. I am familiar with Experian's offering s
ReplyDelete