It is an interesting time to be in data management. There are more sources of data in so many varied formats than ever before. There are new tools continuously evolving at light speed. There is the promise of opportunity and with it enormous challenges.
With regard to the opportunities, one of the most interesting things I see developing is increased access to customers. From traditional to mobile platforms, there are more avenues to interact with customers, presenting an opportunity for product and service providers new ways to measure their effectiveness. I have starting researching things like sentiment analysis which is an example of how access to customers and data explosion provides insight into product / service perception.
With regard to the challenges, performing analysis on this data requires tools, methodologies, and resources that are very unique and unconventional. For most organizations, it will take some time to align the resources to perform meaningful analysis. That is not even taking into the account the budget that needs to be set aside for this activity.
While the technology industry is thrilled with their new story filled with magical elephants and all the promise of a new reality, the boots-on-the-ground in data management must feel like a deer caught in the headlights of an on coming 18 wheeler at 90 mph. To some the data explosion must feel like fireworks against the warm summer sky, to others the explosion must feel like the pounding of cannon fire against the office wall.
What I think it is very important to realize is that this data explosion is really both at the same time. We need to be realistic and remember that while there is a lot of data out there and with it comes the promise of gaining new insights, this presents significant challenges to organizations in just how they are going to roll this into the mix of things that already need to do.
I intend on keeping my eye on what organizations come out winners and, maybe even more so, what organizations come out as losers in this new data frontier. One of the things I intend on paying particular attention to is ROI. What it costs to do this well and what it produces.
Until I see what that looks like, I am going to hold off getting giddy about big data / no sql … what about you? Are you “all in” or waiting to see how this goes?
Showing posts with label CRM. Show all posts
Showing posts with label CRM. Show all posts
Thursday, March 22, 2012
Thursday, March 24, 2011
Data Quality Polls: Troubled domains and what to fix
[caption id="attachment_1047" align="aligncenter" width="630" caption="With which data domain do you have the most quality issues?"]
[/caption]
As expected, customer data quality remains at the top of list with regard to having the most issues. Ironically, this domain has been at the forefront of the data quality industry since its inception.
One reason for the proliferation of concerns about customer data quality could be its direct link to revenue generation.
Whatever the reason, this poll seems to indicate that services built around the improvement of customer data quality will be well founded.
[caption id="attachment_1049" align="aligncenter" width="630" caption="What would you improve about your data?"]
[/caption]
Once again there are no surprises when looking at what data improvements are desired. Data owners seem to be interested in a centralized, synchronized, single view of their data, most notably customer.
The good news that can be gathered from these polls is that as an industry, data quality is focused on the right data and the right functionality. Most data quality solutions are built around the various aspects of customer data quality and ways to improve it so there is a master managed, single version of a customer. The bad news is we've had that focus for quite some time and data owners are still concerned.
In my opinion, this is due to the nature of customer data. Customer data is at the core of every business. It is constantly changing both in definition and scope, it is continuously used in new and complex ways, and it is the most valuable asset that an organization manages.
One thing not openly reflected in these polls is that it is likely that the same issues and concerns that are present in the customer domain are also present in the employee and contact domains. However, they tend not to "bubble up" to the top of list due to lack of linkage to revenue and profit.
I'd encourage comments and feedback on this post. If we all weigh in on topics like this, we can all learn something valuable. Please let me know your thoughts on the poll results, my interpretation of the results and opinions.
As expected, customer data quality remains at the top of list with regard to having the most issues. Ironically, this domain has been at the forefront of the data quality industry since its inception.
One reason for the proliferation of concerns about customer data quality could be its direct link to revenue generation.
Whatever the reason, this poll seems to indicate that services built around the improvement of customer data quality will be well founded.
[caption id="attachment_1049" align="aligncenter" width="630" caption="What would you improve about your data?"]
Once again there are no surprises when looking at what data improvements are desired. Data owners seem to be interested in a centralized, synchronized, single view of their data, most notably customer.
The good news that can be gathered from these polls is that as an industry, data quality is focused on the right data and the right functionality. Most data quality solutions are built around the various aspects of customer data quality and ways to improve it so there is a master managed, single version of a customer. The bad news is we've had that focus for quite some time and data owners are still concerned.
In my opinion, this is due to the nature of customer data. Customer data is at the core of every business. It is constantly changing both in definition and scope, it is continuously used in new and complex ways, and it is the most valuable asset that an organization manages.
One thing not openly reflected in these polls is that it is likely that the same issues and concerns that are present in the customer domain are also present in the employee and contact domains. However, they tend not to "bubble up" to the top of list due to lack of linkage to revenue and profit.
I'd encourage comments and feedback on this post. If we all weigh in on topics like this, we can all learn something valuable. Please let me know your thoughts on the poll results, my interpretation of the results and opinions.
Saturday, October 17, 2009
Removing duplicates in Microsoft Dynamics CRM
In last month's edition of the DQC I reviewed some data quality features built into Microsoft's CRM package, namely detect a duplicate upon create or update, duplicate detection rules and duplicate detection jobs. I left off with a promise to dive deeper into how you remove the duplicates once you've detected them.
Before I get into the details, I want to emphasize that without customization, removing duplicates is not a batch process. In other words, you remove duplicates one at a time. Don't kill the messager; learn from the message. If there is one area within the data quality space that Microsoft needs to improve on, it's this one.
Duplicate consolidation, in my experience, is rarely so exception based that it can be done in such a tedious manner. Not to mention that those organizations that are most afflicted with duplicates generally have a large customer base. When you have a customer base in the millions, duplication ratios can be as high as 10% or more. Consolidating 100,000 duplicates one at a time is almost pointless. By the time you catch up, you've created more duplicates.

That said, let's move on. So you've detected duplicates and now you want to eliminate them from your data.
If you remember from last month's post, read up here if you don't, a duplicate detection job returns potential duplicates and allows you to browse each one along with it's potential match. Consult the screenshot below for a view of what that looks like.
[caption id="attachment_306" align="aligncenter" width="500" caption="Duplicate Detection Job results"]
[/caption]
In the lower pane of the screenshot above there is a toolbar option (3rd from the left) is a icon to merge the two highlighted records. This is where the consolidation effort begins.
One of the best features of the merge functionality is that it has the flexibility to build a composite, or best of available information, master record. Briefly, the master record is the record which is retained as the active record. It also allows the end user to select one record over another. An example of these features is outlined in the screenshot below. First let's look at the option where each element of the master record are selected.
[caption id="attachment_312" align="alignleft" width="500" caption="An example of an all inclusive master record selection"]
[/caption]
Here's a look at the composite option:
[caption id="attachment_315" align="aligncenter" width="500" caption="An example of a composite master record selection "]
[/caption]
Notice in the all inclusive example the entire left hand column is highlighted in blue, whereas in the composite option only those elements selected via the radio button are highlighted in blue. This is a visual indication of what data elements will be retained in the master record.
This is one of my favorite pieces of functionality with regard to the merge option. Often end users vary in the data they provide and it is always better for an organization to retain as much information about their customers as is possible.
I specifically chose the composite screenshot presented because it illustrates one of those important aspects in customer data quality. Noticed that the element selected from the right hand side was a middle initial. This data element is invaluable when performing data matching and having that element can make an important distinction between two different customers later on down the road.
Once you've defined what your master record looks like, either via the all inclusive or composite method, it is time to commit that selection. The screenshot below illustrates how this is performed.
[caption id="attachment_316" align="aligncenter" width="500" caption="How to commit your master record selection"]
[/caption]
An important option in the commit process is one enabled through the checkbox provided visible in the screenshot above. Not every field in a record is in all cases exposed via the merge utility. As a result the option made available through the checkbox allows you to make sure that, as the label indicates, select every field with data from the chosen master record even if there is a different value in the other record. Simply put, it is an overwrite function that retains all the data from the selected master record beyond what is visible in the merge screen.
Once you've reviewed and are confident in your selection, you simply click on the OK button. Provided there are no commit locks on the record, which indicates that another user has one of the two records open and is actively working on it, you will receive the following dialog box confirming your consolidation success.
[caption id="attachment_317" align="aligncenter" width="441" caption="Duplicate elimination success!"]
[/caption]
It is critical to note that the subordinate, or non-master record, is NOT deleted from the system. It is simply deactivated. This is to say that a flag (statecode) is changed to inactivate. One important note about the statecode field is that unlike conventional notation a value of '1' is not active in Microsoft Dyanmics CRM. Instead Microsoft chose the value '0' as active and '1' as inactive. Consequently all non-master records in CRM have a statecode value of '1'. This little fact can save hours of data analysis and perserve the samity of your DBAs, so it is worth noting.
I hope this information was beneficial to you Microsoft Dyanmics CRM users and administrators. As usual I welcome all comments, questions, and suggestions. So please feel free to comment on this post and I'll try and replay in a timely manner.
Before I get into the details, I want to emphasize that without customization, removing duplicates is not a batch process. In other words, you remove duplicates one at a time. Don't kill the messager; learn from the message. If there is one area within the data quality space that Microsoft needs to improve on, it's this one.
That said, let's move on. So you've detected duplicates and now you want to eliminate them from your data.
If you remember from last month's post, read up here if you don't, a duplicate detection job returns potential duplicates and allows you to browse each one along with it's potential match. Consult the screenshot below for a view of what that looks like.
[caption id="attachment_306" align="aligncenter" width="500" caption="Duplicate Detection Job results"]
In the lower pane of the screenshot above there is a toolbar option (3rd from the left) is a icon to merge the two highlighted records. This is where the consolidation effort begins.
One of the best features of the merge functionality is that it has the flexibility to build a composite, or best of available information, master record. Briefly, the master record is the record which is retained as the active record. It also allows the end user to select one record over another. An example of these features is outlined in the screenshot below. First let's look at the option where each element of the master record are selected.
[caption id="attachment_312" align="alignleft" width="500" caption="An example of an all inclusive master record selection"]
Here's a look at the composite option:
[caption id="attachment_315" align="aligncenter" width="500" caption="An example of a composite master record selection "]
Notice in the all inclusive example the entire left hand column is highlighted in blue, whereas in the composite option only those elements selected via the radio button are highlighted in blue. This is a visual indication of what data elements will be retained in the master record.
This is one of my favorite pieces of functionality with regard to the merge option. Often end users vary in the data they provide and it is always better for an organization to retain as much information about their customers as is possible.
I specifically chose the composite screenshot presented because it illustrates one of those important aspects in customer data quality. Noticed that the element selected from the right hand side was a middle initial. This data element is invaluable when performing data matching and having that element can make an important distinction between two different customers later on down the road.
Once you've defined what your master record looks like, either via the all inclusive or composite method, it is time to commit that selection. The screenshot below illustrates how this is performed.
[caption id="attachment_316" align="aligncenter" width="500" caption="How to commit your master record selection"]
An important option in the commit process is one enabled through the checkbox provided visible in the screenshot above. Not every field in a record is in all cases exposed via the merge utility. As a result the option made available through the checkbox allows you to make sure that, as the label indicates, select every field with data from the chosen master record even if there is a different value in the other record. Simply put, it is an overwrite function that retains all the data from the selected master record beyond what is visible in the merge screen.
Once you've reviewed and are confident in your selection, you simply click on the OK button. Provided there are no commit locks on the record, which indicates that another user has one of the two records open and is actively working on it, you will receive the following dialog box confirming your consolidation success.
[caption id="attachment_317" align="aligncenter" width="441" caption="Duplicate elimination success!"]
It is critical to note that the subordinate, or non-master record, is NOT deleted from the system. It is simply deactivated. This is to say that a flag (statecode) is changed to inactivate. One important note about the statecode field is that unlike conventional notation a value of '1' is not active in Microsoft Dyanmics CRM. Instead Microsoft chose the value '0' as active and '1' as inactive. Consequently all non-master records in CRM have a statecode value of '1'. This little fact can save hours of data analysis and perserve the samity of your DBAs, so it is worth noting.
I hope this information was beneficial to you Microsoft Dyanmics CRM users and administrators. As usual I welcome all comments, questions, and suggestions. So please feel free to comment on this post and I'll try and replay in a timely manner.
Subscribe to:
Posts (Atom)
What data quality is (and what it is not)
Like the radar system pictured above, data quality is a sentinel; a detection system put in place to warn of threats to valuable assets. ...

-
Answer by Alex Kamil: Prerequisites Unix shell basics: http://www.amazon.com/Uni x-Progr... C: http://www.amazon.com/Pro grammin... OS basic...
-
After working a Source to Target Matrix (STTM) for weeks I came to a rather disappointing conclusion ... we laid out the matrix by sources a...