I'm interested to hear the thoughts of my fellow data quality practitioners about the role of data quality, more specifically data profiling, in the data migration process.
Vote, leave a comment, whatever ... I'm looking for some consensus around the approach.
[polldaddy poll="5044260"]
Subscribe to:
Post Comments (Atom)
What data quality is (and what it is not)
Like the radar system pictured above, data quality is a sentinel; a detection system put in place to warn of threats to valuable assets. ...
-
Answer by Alex Kamil: Prerequisites Unix shell basics: http://www.amazon.com/Uni x-Progr... C: http://www.amazon.com/Pro grammin... OS basic...
-
While most organizations have data quality issues, not every organization has a budget for software to monitor, report and remedy data qua...
In the past few years, I've been involved exclusively with DW impementations & profiling in some manner is part of all data migrations I've been involved with. I think that most folks these days understand that data quality is important and that Data Profiling is an excellent method for helping validate DQ.
ReplyDeleteBut, with all the focus (hype & marketing) around Data Profiling lately, I see almost no discipline around the actual practice of data profiling - beyond a few line items in a project plan. Shouldn't it be possible on any project to say that these are the 20 or so standard 'potential' anomalies you always look for, here is how they affect a data migration and here is how you discover them (this will obviously differ based on the toolset)? Additionaly, here is the standard way to report the results... A real methodology maybe?...:)
Tom,
ReplyDeleteAgreed profiling for profiling's sake is a waste of time and resources. I would think that the methodology would differ from project to project, depending on the industry. For instance, pharma data would have different metrics than finance data. However, there are standard metadata metrics that can be included in each project. Data types and lengths are prime examples.
A methodology that can be applied to each project is to measure quality on the 6/7 dimensions: Conformity, Consistency, Completeness, Integrity, Accuracy, Duplication, and Timeliness.
In fact, I'm headed into a major data quality implementation this coming week and this is the template for my workstream. I've designated around 20 line items for each dimension. While I can see some needing less, some will need more.
Perhaps this could be good material for my next post!?!?