Define column types for Entity Resolution


For each column in the data source that will be used for entity resolution you need to inform WinPure what the data in the column represents.

For example, the column that contains a personal last name may be named lname in your data source. The corresponding term that WinPure understands to determine the data represents a last name is NAME_LAST. The process of mapping is selecting the correct term for the data in each column. 


When selecting a column type, it will automatically provide the full list of available types, together with an example of each





You can also expand the drop down by widening the window to see example descriptions for each column type



For any column types that are incorrect, WinPure will highlight these in orange and provide a message to fix them.


Labels are used to group attributes that belong to a feature when there is more than 1 feature of that type. For example, a data source contains two addresses, one address is the primary residence and the other a secondary residence such as a holiday home. To ensure the address attributes for each address group together a label is specified as part of the mapping. Assume each address is mapped to  ADDR_LINE1, ADDR_CITY, ADDR_STATE and ADDR_POSTAL_CODE, each of these receives a label such as PRIMARY which informs WinPure they are grouped together. You would do the same for the secondary address using a label such as SECONDARY. You can choose the label to use but it's useful to be descriptive to differentiate each one when reporting.


When to use labels? If entities in your data sources have multiple names, addresses, phone numbers, or any other feature that must be kept distinct dispite being the same type, then use a label 


There are also two additional options to use:


  • Include in result only - by ticking this, this will only include the selected column in the final result without entity resolution processing.
  • Ignore column - by ticking this, this will completely ignore the column from both entity resolution processing and including in the final results.