Showing posts with label data quality strategy. Show all posts

Sunday, October 12, 2014

No budget, No problem! - Data Quality on the cheap

While most organizations have data quality issues, not every organization has a budget for software to monitor, report and remedy data quality. However just because there is a lack of funding that doesn't mean there needs to be a lack of quality.

With some SQL skills and creativity, a data quality program can thrive. It can even have slick dashboards that measure quality trends.

What you really need

In order to build any data quality program only one thing is absolutely necessary; a business rule that differentiates good quality data from bad. Preferably, and inevitably, there would be many rules, but the point is still the same.

The main components of a data quality program are business rules

Walking the walk
So now that we've talked the talk, let's walk the walk. As I mentioned, all you really need is a rule.

Rule: Every bill must be associated with a customer

To understand the rest of this article we need to quickly go over the framework of a data quality scorecard.

Data Quality Dimension: Integrity is the dimension we will use in our example
Metric: That's the name we give to the rule
Rule: Our essential component
Total Records: In our example this would be the total number of bill records
Total Records Violating the rule: In our example this would be the total number of bill records not associated to a customer
% Violating: Violations / Total Records
Target: Number of violating records we are willing to accept
Target Description: Explanation of the target derivation
Count Trend: Are the number of violating records increasing or decreasing
% Trend: Is the percentage of violating records increasing or decreasing

Step 2. Step one was get the rule from the business. Step two involves writing the SQL code that will give you numbers to plug into the scorecard framework above.

Total Records SQL

-------------------------------------------------------- Total BILL COUNT

SELECT COUNT(*) AS TOTAL_BILL_COUNT FROM dbo.TBL_Bill_Detail AS C

Total Records Violating Rule SQL

-------------------------------------------------------- TOTAL Bill Orphans

SELECT COUNT(*) AS TOTAL_BILL_Orphans FROM dbo.TBL_Bill_Detail AS C WHERE C.Customer_Key IS NULL

% Violating SQL

-------------------------------------------------------- ORPHANED BILL % VIOLATING

DECLARE @TOTAL_BILLING AS decimal (38,2)

SELECT @TOTAL_BILLING = COUNT(*) FROM dbo.TBL_Bill_Detail AS C

DECLARE @TOTAL_BILLING_VIOLATING AS decimal (38,2)

SELECT @TOTAL_BILLING_VIOLATING = COUNT(*) FROM dbo.TBL_Bill_Detail AS C WHERE C.Customer_Key IS NULL

SELECT SUM(@TOTAL_BILLING_VIOLATING / @TOTAL_BILLING*100) AS ORPHAN_BILLING_PERCENTAGE_VIOLATING

Target SQL

-------------------------------------------------------- ORPHANED BILL TARGET (HARD CODED AT 10% FOR TESTING PURPOSES)

DECLARE @TARGET_PERCENT AS NUMERIC(2,2)

SELECT @TARGET_PERCENT = 0.10

SELECT COUNT(*) * @TARGET_PERCENT as TOTAL_BILL_TARGET FROM dbo.TBL_Bill_Detail AS C

Now that we have our SQL coded, let's take a look at the Excel formulas for determining the trends. One thing to keep in mind is that trends are a comparison of the current total and percent violating against the a previous total and percent violating measurement. In this case I have chosen a week over week measurement, or this week's execution against last week's execution.

Here is what it looks like in Excel to the user:

Here are the Excel formulas

=SUM(E15-'DQ SCORECARD 8292014'!E15)

In this example, E15 is the current week count violating the rule and 'DQ SCORECARD 8292014'!E15 is the previous week count violating the rule.

I use conditional formatting to create the traffic light icon using a rule like this:

Once I have the SQL and Excel formulas setup, I simply take the query results and plug them into the Excel spreadsheet.

The end result looks like this ...

Interpreting the results

Only 0.13 % of Bills are not associated to a customer. This provides a measure of the severity of the issue relative to the number of bills
The count trend is heading in the wrong direction indicated by the red traffic light icon. 59 bills were processed this week that are not associated with a customer
The % trend is headed in the right direction in that more bills are being processed each week and the overall percentage of bills not associated with a customer is not growing
In the end, there is an issue. It is a small one. The issue is not growing relative to the growth in bills processed

With this structure you can quickly add metrics by writing new SQL and then leveraging the Excel formulas to expand your data quality program and scorecards.

That's how you do data quality on the cheap!

Saturday, November 19, 2011

Too much push, not enough pull

As you might have noticed from my long absence in the blogosphere, I have been very busy managing and developing EMC Consulting's Data Quality Practice. One thing has become very clear to me over the past few months ... data quality is an ambiguous term that means different things to different people.

I have also become aware of the fact that blogging lends itself to publishing a perspective, rather than collecting perspectives.

In an attempt to put some structure around the term data quality by collecting perspectives, I have decided to post some polls and then write about them.

I have chosen three rather basic polls in an attempt to build a baseline. If you are reading this post, you are obviously interested in data quality (or an insomniac looking for a cure!). Please participate in the polls and comment on this post if you feel I have missed a baseline measure that will help add clarity to data quality.

When is data quality useful?

[polldaddy poll=5684585]

When using data quality tools, what do you include?

[polldaddy poll=5684590]

Data Quality: underutilized or over-hyped?

[polldaddy poll=5684595]

Of course I have my own opinions, but I want this post, and the series of posts that follow, to be more about what others think than my own thoughts.

In an attempt to gather as many perspectives as possible, please pass this post around your network and feel free to add comments if you feel I have neglected to add something important.

Friday, August 26, 2011

ABC and DQ: Codependent Initiatives?

Activity Based Cost and Data Quality: Codependent Initiatives?

Summary

Activity Based Costing, or ABC, is an exercise where costs are assigned to business activities required to support critical business operations. While it is often used in support of a business process redesign (BPR) effort, it can also serve an important role in data quality (DQ) initiatives.

In order to conduct a data quality initiative, a significant investment is required. There are costs to purchase hardware, software, support and implementation resources. Considering DQ efforts are usually associated with "fixing" a previous investment, these costs are not generally accepted as capital investments. They are usually viewed as negative consequence to a failure to comprehensively implement a system.

One way to mitigate this perception is to demonstrate, in monetary terms, the return likely to be realized. Demonstrating this return is most affective by linking cost avoidance associated with increased business operations. In order to do this, you need costs associated with these operations.

As a result, DQ becomes dependent on ABC in order to justify the expense.

The proof is in the pudding. And by pudding, I mean operation

So let's talk about ABC efforts and how they can be used to help justify why an organization needs to invest in data quality. Rarely, does an organization know how long it takes someone to perform everyday business operations. After all, this time is often viewed as a necessary expense of doing business and anytime invested is required so why analyze it? Given this approach, it is not surprising that very little is known about how long it takes to perform a business operation when there are exceptions to the norm.

In other words, when there are data related discrepancies very little is tracked in terms of impact and cost. However, this is a legitimate and every day reality. Often reports do not reflect the same aggregation and someone spends countless hours tracking down the root cause for the discrepancy. This time is money. It is also lost opportunity, which results in further cost to the organization. Boiling this type of event down to a cost of resolution can form the basis for justifying investments in prevention.

For example, if there is a report that provides the status of product inventory and another that provides a summary of sales these two reports should decrease and increase in direct proportion. Inventory should decrement at the same rate as sales increments. The chart below illustrates this relationship.

: Inventory vs. Sales Report Relationship with High Quality Data

However when one of these two systems suffer from poor data quality, you end up with a report that looks like the chart below.

: Inventory vs. Sales Report Relationship with Low Quality Data

This doesn't take long to look at before you start realizing there is a problem, especially for someone who is responsible for generating this report and is familiar with much more normal looking analysis. Logically, this individual will start down the path of determining the root cause. At this point, the meter starts running. However, not only one meter but two are racing toward an unexpected cost. One of the meters tracks the time, and hence money, spent fixing the issue. The other meter tracks the time and money not spent performing duties that would have been performed had the issue not been there in the first place.

If you boil down this individuals compensation down to dollars per minute and track the time taken to resolve this issue and time not spent producing in other areas, you can start calculating the cost of poor quality data. In all liklihood this individual does not resolve the issue alone, so you can start adding in additional dollars a minute for supporting members. Not to mention, sales people don't have accurate inventory numbers and this impacts their ability to maximize their activities! Before long, you start to get a clear picture that poor quality data is costing you, in a recurring nature, a lot more than the cost of fixing it.

Although disturbing, this is the key to ending the vicious cycle of waste and becoming a more streamlined organization.

Conclusion

It is a difficult decision to spend money on fixing broken operations and systems, however, it is an easy decision to spend money to end waste and increase productivity. It just a matter of perspective. An experience data quality professional will help you see this effort in the right light and even help you put real numbers behind it.

If all this sounds familiar, maybe it is time to find that data quality professional and start saving some money!

Friday, May 20, 2011

The Seven Habits of Highly Effective Data Quality

7 Habits of Highly Effective Data Quality

I've been reading Stephen Covey's The 7 Habits of Highly Effective People and I couldn't help but notice the parallels between effective people and effective data management. In the book Covey discloses that there are principles, centered on self-discipline, that lead to success and fulfillment. Sounds great, right?

The seven habits include some ear-cringing buzz words, but let's take a look at them and their data quality doppelgänger.

Be Proactive

For years data quality has been a discipline striving to transform itself from reactive to proactive. In fact, the ROI in data quality programs centers on being more proactive to avoid regulatory issues and costs and improving decision making. It's an understatement to say that data quality programs need to be focused on taking the initiative and become proactive programs of change.

Proactive data quality means identifying and remediating data quality issues before they become proliferated throughout the enterprise. Simply put, proactive data quality is about having identification and remediation processes at data entry points and addressing issues at the source.

Begin with the end in mind

Beginning with the end in mind brings a smile to my face. This was practically the title of one of my first posts for this blog. Without knowing where you need to end, your route to that end will almost inevitably be scattered and twisted. For it is only by setting a clear destination that a clear path can be developed. Often, in the world of data quality, setting a destination focuses on developing metrics and targets that will bring about positive change in the organization.

Put first things first

Putting first things first is about setting priorities and building a course of action(s) that will address the prioritized list of objectives. In others words, don't focus on everything all at once but rather break down large tasks into smaller more achievable parts. This is often useful when developing and implementing data quality programs because there are so many moving parts that need to be put in place simultaneously.

Think Win-Win

Win-wins in the data management / data quality arena are all about implementing rules that help multiple business units improve their data and its use. There are some easy domains where one data quality service equates to a win-win.

Address validation is a prime example of the win-win scenario. Every business unit benefits from more accurate customer addresses. Implementing address validation processes can be orchestrated in such a way that the process can accept different address sources and implement the same validation routines. Not only is this a win-win, it also cost effective and generates a high rate of return on investment.

Seek First to Understand, Then to Be Understood

This one is pretty straight forward. Data quality / data management is all about solving problems and building effective change. You can’t be affective at solving a problem without first knowing what it is. A more subtle point I’d like to make here is that all too often there is a tendency in the technology field to explain the intricacies of the solution. Frankly, business people don’t care how you solve the issue just that you do solve it accurately. Only understanding issue ensures that you can do this.

Synergize

Cringe! Worst buzzword ever? Maybe. In essence synergy means bringing together a whole that is greater than a sum of its parts. As described in the win-win section, synergies in data management / data quality are largely derived from building a solution that works for multiple business units in such a way that they produce a benefit greater than if the solution was only built for one unit.

That said, building a solution that “chains” several beneficial processes together like address validation and duplicate reduction can also be thought of as a way of bringing together a whole greater than the sum of its parts.

Sharpen the Saw

My personal favorite! Sharpening the saw has to do with the continuous process of developing skills. In part due to the wide range of data quality modules, there is always a need to sharpen the saw. For example, I am currently working on expanding my ability to produce more accurate matching techniques so I can be sure that I identify true duplicates and produce the minimal amount of false positives. In addition, I am always searching for more knowledge on address validation techniques.

Sharpening the saw with regard to data quality processes is a way to revisit the existing solution and make it better. This is an essential practice due to the growing number and varied nature of data sources continuously added to the enterprise landscape.

Conclusion

Effective people and effective projects and strategies can learn a lot from Covey’s 7 habits research. I encourage those of you reading this post to try and implement these habits not only in yourself but also in your projects!

Friday, January 21, 2011

Data Quality: underutilized or over-hyped?

Poll Results

Conclusion

Thursday, January 20, 2011

When using data quality tools, what do you include?

Poll Results

Conclusion

The Data Quality Chronicle

Sunday, October 12, 2014

No budget, No problem! - Data Quality on the cheap

Saturday, November 19, 2011

Too much push, not enough pull

When is data quality useful?

When using data quality tools, what do you include?

Data Quality: underutilized or over-hyped?

Friday, August 26, 2011

ABC and DQ: Codependent Initiatives?

Activity Based Cost and Data Quality: Codependent Initiatives?

Summary

The proof is in the pudding. And by pudding, I mean operation

Conclusion

Friday, May 20, 2011

The Seven Habits of Highly Effective Data Quality

7 Habits of Highly Effective Data Quality

Be Proactive

Begin with the end in mind

Put first things first

Think Win-Win

Seek First to Understand, Then to Be Understood

Synergize

Sharpen the Saw

Conclusion

Friday, January 21, 2011

Data Quality: underutilized or over-hyped?

Thursday, January 20, 2011

When using data quality tools, what do you include?

What data quality is (and what it is not)