Here is a summary of the results from each of the polls open on the sound off page ... disagree? vote here
[gallery type="slideshow" ids="2761,2759,2760,2758,2757" orderby="rand"]
Showing posts with label data management. Show all posts
Showing posts with label data management. Show all posts
Monday, April 1, 2013
Sunday, July 3, 2011
Data Migration Best Practice: Orphan Analysis
What's an Orphan?
An orphan transaction is a transaction in a "child" table without an associated transaction in a "parent" table. For instance, an address record in the Address table without a link to a customer record in the Customer table. Orphaned records lead to various issues in business operations like marketing and business analytics.
Challenges orphans pose to the business
An address without an associated customer record will pose a challenge to marketing initiatives when trying to engage the customer. How do you address the marketing mailer, determine previous purchasing history or determine gender based segmentation?
In order to confront this issue it is crucial to understand if you have applications that are creating orphans. There are various ways to conduct this analysis manually with SQL or in an automated fashion using data quality tools.
Orphan Analysis Methods
In order to manually investigate orphans a series of SQL queries need to be created. Since these queries need to be executed periodically, it is most likely best if they are setup as stored procedures. For the purposes of this post, let's just concentrate on the substance of the queries.
Orphans are all about linkages, so the basic information required is to find what unique identifiers are present in the child table and not present in the parent table. Using the customer/address example, this translates into finding what customer ids are in the address table but not in the customer table.
The figure illustrates a straight forward relationship where addresses are linked to customers using the Cust ID field. An orphaned address would then be defined as a record where the Address CustID exists in Address but not in the Customer table. In SQL you can write a query that would determine how many of these tranactions exists by coding something like this:
Select Address.AddressID, Address.CustID, Address.CreateDate, Address.ModifiedDatefrom Addresswhere Address.CustID not in (select CustID from Customer)
Technically you don't need fields other than the unique identifiers, but it helps to have create and modified dates to audit trail why the orphan exists. In fact you may want to pull all the attributes of an address as part of the audit exercise.
To do this type of analysis with a data quality tool is a bit more graphical and includes some nice reports. I use Informatica v9 to do this and I'll explain the steps required to accomplish this. Let's assume you already have the customer and address tables defined. Simply follow the steps below to generate the analysis, report, and result set (a subtle value add of using the tool is that you get the report and results with less coding and all the re-usability):
- Right click on the Profiles folder
- Select New Profile
- Give your profile a name
- Add data sources (Customer and Address)
- Select Join Analysis
- Click Next
- Select the fields you want to include from each source (be sure to include the unique identifiers for each source so you can audit the orphans)
- Define the relationship required to join the two tables (Address.CustID and Customer.CustID in the case of our example)
- Click Finish to execute the analysis
In Informatica a report such as the one depicted below is produced.
From this analysis you can draw the following conclusions:
- If you are attempting to do a customer consolidtion effort, using name and address methods will not satisfy your needs 100% so you'll need another strategy like name and phone number to augment the consolidation (you may even be able to use the name/phone analysis to remedy the broken address linkages)
- If you are performing a data migration effort, this analysis will explain testing anomolies like record count discrepencies and possibly why some migration jobs fail to load
- If you are looking at data management strategy, you may look into the current status of constraints on the Address table
- If you are looking into a Data Governance program, you may look at this and determine a need for governance process to deal with Customer and Addressd management
Summary
Whether you use a tool or manual SQL efforts, looking into orphaned transactions is a best practice for many different types of data management initiatives. From customer consolidation to data governance, it is important to know the state of transactions in your operational systems. It is my opinion that a data quality tool, like Informatica v9, offers a rich set of reports that helps translate the results to business users in a digestable manner.
I'll examine some remediation methods in the next post covering orphan analysis.
Friday, May 20, 2011
The Seven Habits of Highly Effective Data Quality
7 Habits of Highly Effective Data Quality
I've been reading Stephen Covey's The 7 Habits of Highly Effective People and I couldn't help but notice the parallels between effective people and effective data management. In the book Covey discloses that there are principles, centered on self-discipline, that lead to success and fulfillment. Sounds great, right?
The seven habits include some ear-cringing buzz words, but let's take a look at them and their data quality doppelgänger.
Be Proactive
For years data quality has been a discipline striving to transform itself from reactive to proactive. In fact, the ROI in data quality programs centers on being more proactive to avoid regulatory issues and costs and improving decision making. It's an understatement to say that data quality programs need to be focused on taking the initiative and become proactive programs of change.
Proactive data quality means identifying and remediating data quality issues before they become proliferated throughout the enterprise. Simply put, proactive data quality is about having identification and remediation processes at data entry points and addressing issues at the source.
Begin with the end in mind
Beginning with the end in mind brings a smile to my face. This was practically the title of one of my first posts for this blog. Without knowing where you need to end, your route to that end will almost inevitably be scattered and twisted. For it is only by setting a clear destination that a clear path can be developed. Often, in the world of data quality, setting a destination focuses on developing metrics and targets that will bring about positive change in the organization.
Put first things first
Putting first things first is about setting priorities and building a course of action(s) that will address the prioritized list of objectives. In others words, don't focus on everything all at once but rather break down large tasks into smaller more achievable parts. This is often useful when developing and implementing data quality programs because there are so many moving parts that need to be put in place simultaneously.
Think Win-Win
Win-wins in the data management / data quality arena are all about implementing rules that help multiple business units improve their data and its use. There are some easy domains where one data quality service equates to a win-win.
Address validation is a prime example of the win-win scenario. Every business unit benefits from more accurate customer addresses. Implementing address validation processes can be orchestrated in such a way that the process can accept different address sources and implement the same validation routines. Not only is this a win-win, it also cost effective and generates a high rate of return on investment.
Seek First to Understand, Then to Be Understood
This one is pretty straight forward. Data quality / data management is all about solving problems and building effective change. You can’t be affective at solving a problem without first knowing what it is. A more subtle point I’d like to make here is that all too often there is a tendency in the technology field to explain the intricacies of the solution. Frankly, business people don’t care how you solve the issue just that you do solve it accurately. Only understanding issue ensures that you can do this.
Synergize
Cringe! Worst buzzword ever? Maybe. In essence synergy means bringing together a whole that is greater than a sum of its parts. As described in the win-win section, synergies in data management / data quality are largely derived from building a solution that works for multiple business units in such a way that they produce a benefit greater than if the solution was only built for one unit.
That said, building a solution that “chains” several beneficial processes together like address validation and duplicate reduction can also be thought of as a way of bringing together a whole greater than the sum of its parts.
Sharpen the Saw
My personal favorite! Sharpening the saw has to do with the continuous process of developing skills. In part due to the wide range of data quality modules, there is always a need to sharpen the saw. For example, I am currently working on expanding my ability to produce more accurate matching techniques so I can be sure that I identify true duplicates and produce the minimal amount of false positives. In addition, I am always searching for more knowledge on address validation techniques.
Sharpening the saw with regard to data quality processes is a way to revisit the existing solution and make it better. This is an essential practice due to the growing number and varied nature of data sources continuously added to the enterprise landscape.
Conclusion
Effective people and effective projects and strategies can learn a lot from Covey’s 7 habits research. I encourage those of you reading this post to try and implement these habits not only in yourself but also in your projects!
Subscribe to:
Posts (Atom)
What data quality is (and what it is not)
Like the radar system pictured above, data quality is a sentinel; a detection system put in place to warn of threats to valuable assets. ...

-
Answer by Alex Kamil: Prerequisites Unix shell basics: http://www.amazon.com/Uni x-Progr... C: http://www.amazon.com/Pro grammin... OS basic...
-
After working a Source to Target Matrix (STTM) for weeks I came to a rather disappointing conclusion ... we laid out the matrix by sources a...