Saturday, October 17, 2009

Removing duplicates in Microsoft Dynamics CRM

In last month's edition of the DQC I reviewed some data quality features built into Microsoft's CRM package, namely detect a duplicate upon create or update, duplicate detection rules and duplicate detection jobs.  I left off with a promise to dive deeper into how you remove the duplicates once you've detected them. 

Before I get into the details, I want to emphasize that without customization, removing duplicates is not a batch process.  In other words, you remove duplicates one at a time.  Don't kill the messager; learn from the message.  If there is one area within the data quality space that Microsoft needs to improve on, it's this one. 

home_1_01Duplicate consolidation, in my experience, is rarely so exception based that it can be done in such a tedious manner.  Not to mention that those organizations that are most afflicted with duplicates generally have  a large customer base.  When you have a customer base in the millions, duplication ratios can be as high as 10% or more.  Consolidating 100,000 duplicates one at a time is almost pointless.  By the time you catch up, you've created more duplicates.

soapbox

  That said, let's move on.  So you've detected duplicates and now you want to eliminate them from your data. 

If you remember from last month's post, read up here if you don't, a duplicate detection job returns potential duplicates and allows you to browse each one along with it's potential match.  Consult the screenshot below for a view of what that looks like.

[caption id="attachment_306" align="aligncenter" width="500" caption="Duplicate Detection Job results"]Duplicate Detection Job results[/caption]

In the lower pane of the screenshot above there is a toolbar option (3rd from the left) is a icon to merge the two highlighted records.  This is where the consolidation effort begins.

One of the best features of the merge functionality is that it has the flexibility to build a composite, or best of available information, master record.  Briefly, the master record is the record which is retained as the active record.  It also allows the end user to select one record over another.  An example of these features is outlined in the screenshot below.  First let's look at the option where each element of the master record are selected.

[caption id="attachment_312" align="alignleft" width="500" caption="An example of an all inclusive master record selection"]An example of an all inclusive master record selection[/caption]

Here's a look at the composite option:

[caption id="attachment_315" align="aligncenter" width="500" caption="An example of a composite master record selection "]An example of a composite master record selection [/caption]

Notice in the all inclusive example the entire left hand column is highlighted in blue, whereas in the composite option only those elements selected via the radio button are highlighted in blue.  This is a visual indication of what data elements will be retained in the master record.

This is one of my favorite pieces of functionality with regard to the merge option.  Often end users vary in the data they provide and it is always better for an organization to retain as much information about their customers as is possible.

I specifically chose the composite screenshot presented because it illustrates one of those important aspects in customer data quality.  Noticed that the element selected from the right hand side was a middle initial.  This data element is invaluable when performing data matching and having that element can make an important distinction between two different customers later on down the road.

Once you've defined what your master record looks like, either via the all inclusive or composite method, it is time to commit that selection.  The screenshot below illustrates how this is performed.

[caption id="attachment_316" align="aligncenter" width="500" caption="How to commit your master record selection"]How to commit your master record selection[/caption]

An important option in the commit process is one enabled through the checkbox provided visible in the screenshot above.  Not every field in a record is in all cases exposed via the merge utility.  As a result the option made available through the checkbox allows you to make sure that, as the label indicates, select every field with data from the chosen master record even if there is a different value in the other record.  Simply put, it is an overwrite function that retains all the data from the selected master record beyond what is visible in the merge screen.

Once you've reviewed and are confident in your selection, you simply click on the OK button.  Provided there are no commit locks on the record, which indicates that another user has one of the two records open and is actively working on it, you will receive the following dialog box confirming your consolidation success.

[caption id="attachment_317" align="aligncenter" width="441" caption="Duplicate elimination success!"]Duplicate elimination success![/caption]

It is critical to note that the subordinate, or non-master record, is NOT deleted from the system.  It is simply deactivated.  This is to say that a flag (statecode) is changed to inactivate.  One important note about the statecode field is that unlike conventional notation a value of '1' is not active in Microsoft Dyanmics CRM.  Instead Microsoft chose the value '0' as active and '1' as inactive.  Consequently all non-master records in CRM have a statecode value of '1'.  This little fact can save hours of data analysis and perserve the samity of your DBAs, so it is worth noting.

I hope this information was beneficial to you Microsoft Dyanmics CRM users and administrators.  As usual I welcome all comments, questions, and suggestions.  So please feel free to comment on this post and I'll try and replay in a timely manner.

9 comments:

  1. I'm complete amateur in Microsoft CRM and I've just started reading/watching presentation about it. Thanks very much for your article, I found it very useful. However, I couldn't find information what happen with data related to non-master records (EG: opportunity, other activities, tasks etc...) - are these data relinked to Master record?
    Or rather connection between data is lost?

    ReplyDelete
  2. Joanna,
    It is great to know the posting was helpful! That's exactly what I set out to do with the blog ...
    Now for your question. No the data is not lost. It is relinked to the newly nominated master record via the MasterId field. There is also a few other fields that are updated such as Merged ('1' for yes) and StateCode ('1' for inactive).
    Via this mechanism CRM is able to retain and consolidate all the data prior to the merge with the new master record.
    Hope this helps! Keep checking in on the blog. I will be posting more about duplication consolidation and CRM.

    ReplyDelete
  3. Hi,
    I have been using this feature for the last couple of months and now we are about to import another 20,000 records into our database.
    *****Guess What*****
    That hidden field in CRM that is the Merged Bit is not searchable! We have just logged a update request with Microsoft because when the duplicate detection process runs it looks at Merged inactive records as well as inactive and active records. There is no way currently to remove the Merged duplicates in bulk so they don't get detected the next time you run the duplicate detection.
    This is a nightmare for us and we are still trying to find a workaround.

    ReplyDelete
  4. [...] Removing duplicates in Microsoft Dynamics CRM October 2009 3 comments 5 [...]

    ReplyDelete
  5. Hi Wesharp,

    I need help in creating a Match plan to identify duplicates.


    can you please reply back with your contact details
    Thanks very much,
    VJ

    ReplyDelete
  6. Hi. Great article. We are evaluating Microsoft CRM as a single customer view platform to serve as a master customer repository. It ticks many of the boxes (SOA enabled, customisable data model, 360 degree view etc), but perhaps falls short on the DQ capability needed (2 million customer records captured and updated via many different systems)
    Any advice on tools / platforms / add-ins that we could use to cleanse the data in batch? e.g. dedup, merge, manage decay.

    thanks

    ReplyDelete
  7. For batch mode consolidation I recommend Informatica Data Quality. This would enable you to combine the customer records from each of the various sources, match them based on a combinative approach using name and address, and then consolidate those records based on some kind of survivorship rule logic. It's a tricky process due to the master record id and active/inactive switches in MS CRM, but it can be done.
    I can follow up with you offline about some approaches if you'd like?
    Thanks for stopping by and commenting on the post!

    ReplyDelete
  8. For deduping records in real time or in batch, consider the newly announced integration between Microsoft Dynamics CRM and the Trillium Software System. Instant data quality from within the Dynamics CRM environment. http://tinyurl.com/4yjv24l

    ReplyDelete
  9. Thanks for the link! I haven't worked in CRM Dynamics in a few years so there is bound to be some new features I am unaware of.

    ReplyDelete

What data quality is (and what it is not)

Like the radar system pictured above, data quality is a sentinel; a detection system put in place to warn of threats to valuable assets. ...