The Data Quality Chronicle: Model Citizen: Should Data Discovery Tools include modeling functionality?

Data models: Data Management 101

In my years as a consultant implementing data management solutions, my first question to a client would be …

Can I see the data model?

I have long felt that gaining a better understanding of an organization’s data landscape involves to primary artifacts; a data model and a data profile. This is because, in a lot of cases, these two artifacts represent the two most fundamental states of an organizational data landscape:

what the data should look like and what it actually looks like

With knowledge of these two states, I felt armed with the ability to quickly and easily identify areas of conformity and areas of anomaly. These two perspectives tended to be the basis of most Information Technology questions I was there to solve. In this way, I looked at a data model as the starter kit to a data management strategy, or a 101 crash course to an organization’s data management state.

If there were a lot of anomalies, I knew they would require a lot of data quality strategy and remediation, as well as a robust data governance initiative. If there was a lot of conformity, I knew the organization was mature enough to handle new data management initiatives like Master Data Management or Big Data implementation.

The sad reality is that most organizations either did not have data models for critical applications or felt that the data model was so out-of-date that it was not going to be very helpful to me in my quest for understanding.

Lack of Data Models: Data Management 100

Without a viable data model I was unable to reach these valuable conclusions quickly and was forced to, in a sense, reverse engineer profiling results which was time-consuming and based on some brash assumptions.

Here are some activities which helped me to mitigate the risks of performing educated guessing:

Perform Orphan Analysis
1. Analyzing orphans can help you determine the validity of a data model, how users are adding or deleting data and whether referential integrity constraints are even in place in production (which happens more than most will admit during interviews)

Analyze Documented versus Actual Data Types
1. Again, this addresses the validity of the design and how users are entering data (very often data is entered in formatted form, entry fields are used for purposes other than the original intention and developers build architectures without really understanding the scenarios that the app is required to support)

Analyze most and least commonly occurring values
1. This can help create a profile of how often standards are conformed to, word-of-mouth work-arounds in place and areas that are conforming and do not need attention (as valuable as identifying areas that do need attention)

Data Modeling Profiles: Data Management 102?

Having been through this many times and knowing how much time and effort this requires (often not accounted for in project plans), I feel strongly about developing a tool that can turn data profiles into a data model. Most of the functionality is there already, someone just needs to make a case for it (just call me somebody).

Such a solution could take profile results, which include actual data types and inferred relationships, and create a data model that supports data management best practices like data governance and data quality.

In addition, a profiling-to-model function could go a long way in reducing the amount of time and error involved in building an MDM hub. After all, profiling all the contributing sources is one the best practices in defining an MDM hub, why not take it the next step and bake that in?

I completely agree that there are going to be cases where there were design considerations based on performance and that a profile is not always the most accurate source for a model’s design, but there are many cases where a profile-to-model function would increase accuracy and performance and decrease error and time required to model data landscapes.

What do you think?

[polldaddy poll=6135039]

The Data Quality Chronicle

Friday, April 13, 2012

Model Citizen: Should Data Discovery Tools include modeling functionality?

Data models: Data Management 101

Lack of Data Models: Data Management 100

Data Modeling Profiles: Data Management 102?

No comments:

Post a Comment

What data quality is (and what it is not)