Data matching is a complex task that relies on three key factors: rules, conditions, and the right data-matching algorithm. Making the wrong selection in any of these areas can derail your data-matching efforts and lead to inaccurate results.
Rules are a set of conditions that must be met for a match to occur. Conditions are specific requirements that are placed within rules.
When performing a data match, it is important to carefully select the right conditions and place them under the right rules. This will ensure that the matching process is smooth and accurate.
For example, you might create a rule that requires the values in two columns to be identical. This would be a good rule to use if you are matching two lists of customer names. However, you might also want to create a condition that specifies that the values in the two columns must be within a certain range. This would be a good condition to use if you are matching two lists of customer ages.
By carefully selecting the right rules and conditions, you can improve the accuracy and efficiency of your data-matching processes.
So how do you decide what rules and conditions to select? Here are some tips from our team.
How to Set The Right Rules and Conditions
The purpose of rules and conditions is to define the criteria that must be met in order for a match to be considered valid. Essentially, you need to define what data attributes to connect to get the results you want.
To make this decision, you first need to address three factors:
➡️ The quality of the data: If the data is of poor quality, you may need to use more lenient rules and conditions. For example, if the data contains a lot of typos or missing values, you may need to first clean the data before attempting to use rules and conditions. This is also one reason why the clean module in WinPure is set before the match module. Accurate match results need clean data.
➡️ The purpose of the data matching: What are you trying to achieve by matching the data? Are you trying to identify duplicate records? Are you trying to create master records? The purpose of the data matching will affect the type of rules and conditions that you need to select.
➡️ The business rules: Are there any specific business rules that need to be applied to the data matching? For example, you may need to ensure that the data-matching process complies with privacy regulations.
Once you have a clear vision, you can then select the conditions to which you want to match the data. Accordingly, you can also select the type of match algorithm that best suits the condition.
So, keeping these factors in mind, let’s look at some examples:
Example 1: Match names to Identify Duplicates Across Two Files
Rule: Match company name and contact name to identify duplicates from two files.
Condition: For this match, we select the Company Name and Contact Name columns from file 1 and file 2.
Fuzzy Match: Since names are alphabets, we use fuzzy match as the algorithm, with a similarity threshold of 90%.
Intended result: Company and contact names that are similar up to 90% are a match and can be grouped as duplicates.
This result indicates there are 6 IDs sharing the same name and address found in two different files. Similarly, you can use the same logic to identify similar or duplicate addresses, phone numbers, and other data points. All you need to do is decide on the columns you want to set ad a condition within the first rule and let the software do the match. You can also add additional rules and conditions for a more refined match result.
Example 2: Identify Contacts and Company Names with the Same Adress
In this example, we use the same sample file as above, adding a new rule to the match configuration. In this setup, we want to identify company names and contacts that have the same addresses.
To do this, we simply add a rule to the existing configuration setup as shown below.
In the results, you can see there are contacts sharing the same address but having different company and contact names!
With this result, you now know you’ve got duplicate contacts in two different file sources, and most of them also share the same address, even if they have completely different numbers, zip codes, or roles. This may not necessarily be bad data, but if you intend to use this data for analysis, it would be best practice to analyze these differences and ensure the context of this data matches the result. In this example, it might just be one company with different names, owned by different owners at a time. Or, it could be one location that has had different shops and owners through the passage of time. You can further refine this search, and choose different conditions to weed out hidden duplicates that are not easily visible.
In Conclusion, WinPure Helps With Setting Easy Rules & Conditions Using a True No-Code Approach
Our software makes it easy for anyone to create and manage rules and conditions, regardless of their technical skills. This can save you hours of programming effort allowing you to spend time where it’s most needed – in strategic and contextual development of your data goals.
We offer a free trial of the demo so you can try us out before you buy. The demo includes all of the features of the full software, so you can see for yourself how easy it is to set rules and conditions and how effective we are at matching data.
We encourage you to try the WinPure demo today and see for yourself how we can help you set easy rules and conditions using a true no-code approach. You can also choose to book a demo call or download a recording to watch the software’s data match and cleaning abilities.
Choose Your Preferred Method