WinPure API Documentation

Version 2.0.1 | .NET 8.0

Overview

The WinPure API Version 1.x is a component designed to add advanced data cleansing and state-of-the-art fuzzy duplicate search capabilities into custom applications and websites. This same engine powers the WinPure Clean & Match Software Suite.

Key Capabilities

  • Data Cleansing - Standardize and clean data with built-in operations
  • Data Matching - Identify duplicates using fuzzy matching algorithms
  • Fuzzy Search - Find similar records based on search criteria
  • Address Verification - Validate US and UK addresses offline

Technical Requirements

  • .NET Core 3.1 or later
  • Windows-based systems (primary)
  • Linux (via containerization)

Getting Started

Prerequisites

  • NuGet Package: Available through Visual Studio package manager
  • Demo License: Built-in demo license for testing (no purchase required for evaluation)
  • Development Environment: Visual Studio (any recent version)
  • Sample Code: Available at https://www.winpure.com/Demo/WinPure.ConsoleSampleCore.zip

Installation

  1. Create a new .NET project in Visual Studio
  2. Add WinPure API NuGet package via Package Manager
  3. Reference the WinPure libraries in your code
  4. Use built-in demo license for testing

Quick Start Example

using WinPure.API;

// Initialize API
var api = new WinPureApi();

// Check license state
var licenseState = api.CheckLicenseState();
Console.WriteLine($"License Status: {licenseState}");

// Your data processing code here...

Licensing

API Methods

GetRegistrationCode()

Returns a unique registration code for the current machine used for license activation.

string registrationCode = api.GetRegistrationCode();
Console.WriteLine($"Registration Code: {registrationCode}");

Register(string licenseFile)

Registers the API using a license file provided by WinPure.

var licenseState = api.Register("path/to/license.lic");

CheckLicenseState()

Checks the current license status.

var state = api.CheckLicenseState();
// Returns: Valid, Invalid, Demo, Expired, etc.

GetFullLicenseInfo()

Returns detailed license information as key-value pairs.

var licenseInfo = api.GetFullLicenseInfo();
foreach (var kvp in licenseInfo)
{
    Console.WriteLine($"{kvp.Key}: {kvp.Value}");
}

Data Cleansing

Overview

Data cleansing operations standardize and clean your data using a comprehensive set of built-in transformations.

API Methods

CalculateStatistic()

Generates statistical analysis of your dataset to identify data quality issues.

var statisticsTable = api.CalculateStatistic(
    dataTable,
    dataFields,
    cancellationToken
);

// Analyze results
foreach (DataRow row in statisticsTable.Rows)
{
    Console.WriteLine($"Field: {row["FieldName"]}");
    Console.WriteLine($"  Total Records: {row["TotalRecords"]}");
    Console.WriteLine($"  Empty Count: {row["EmptyCount"]}");
    Console.WriteLine($"  Unique Count: {row["UniqueCount"]}");
}

CleanTable()

Applies cleansing transformations to your data table.

api.CleanTable(
    dataTable,
    cleanSettings,
    cancellationToken
);

Configuration Classes

WinPureCleanSettings

Container for all cleansing configuration settings.

Properties:

  • TextCleanerSettings - List of text cleaning rules
  • CaseConverterSettings - List of case conversion rules
  • WordManagerSettings - List of word find-and-replace rules
  • ColumnMergeSettings - List of column merge operations
  • ColumnSplitSettings - List of column split operations

TextCleanerSetting

Configures text cleaning operations including character removal, space normalization, and regex replacements.

Key Properties:

  • ColumnName - Target column name
  • RemoveNonPrintableCharacters - Remove invisible characters
  • RemoveLeadingSpace - Trim leading spaces
  • RemoveTrailingSpace - Trim trailing spaces
  • RemoveMultipleSpaces - Collapse multiple spaces into one
  • RegexExpression - Custom regex pattern
  • RegexReplace - Replacement text for regex matches

CaseConverterSetting

Converts text case (upper, lower, proper case).

Properties:

  • ColumnName - Target column
  • ToUpperCase - Convert to UPPERCASE
  • ToLowerCase - Convert to lowercase
  • ToProperCase - Convert To Title Case

WordManagerSetting

Find and replace words or phrases.

Properties:

  • ColumnName - Target column
  • SearchValue - Word/phrase to find
  • ReplaceValue - Replacement text
  • ReplaceType - Match type (WholeWord, AnyPart, etc.)
  • ToDelete - If true, removes the word entirely

ColumnMergeSetting

Merges multiple columns into one.

Properties:

  • ColumnName - Column to include in merge
  • CharToInsertBetweenColumn - Separator character
  • Order - Position in merge sequence

ColumnSplitSetting

Splits column data into multiple columns.

Properties:

  • ColumnName - Column to split
  • SplitIntoWords - Split on delimiter
  • SplitEmailAddressIntoAccountDomainAndZone - Email parsing
  • SplitDatetime - Separate date and time

Data Matching

Overview

Data matching identifies duplicate and similar records using fuzzy matching algorithms.

API Methods

MatchData()

Identifies duplicate records within a dataset or between datasets.

var matchResult = api.MatchData(
    tableParameters,
    matchParameter,
    fieldMappings,
    matchFlowType,
    cancellationToken
);

Parameters:

  • tableParameters - List of input tables
  • matchParameter - Matching configuration
  • fieldMappings - Field mapping configuration
  • matchFlowType - Type of matching flow
  • cancellationToken - Cancellation support

Configuration Classes

MatchParameter

Defines the complete matching configuration.

Properties:

  • FuzzyAlgorithm - Matching algorithm to use
  • CheckInternal - true = match within table, false = cross-table
  • MainTable - Primary table name
  • SearchDeep - Search depth parameter (1-100)
  • Groups - List of match groups

MatchField

Describes a field for matching operations.

Properties:

  • TableName - Source table name
  • ColumnName - Column name
  • ColumnDataType - Data type

MatchCondition

Defines matching conditions.

Properties:

  • Fields - List of fields to compare
  • MatchingType - Fuzzy or DirectCompare
  • Level - Confidence threshold (0.0 to 1.0)
  • Weight - Importance weight
  • IncludeEmpty - Include empty values
  • IncludeNullValues - Include null values

Address Verification

Overview

Offline address verification for US and UK addresses.

API Methods

VerifyUsAddresses()

Validates US addresses against offline database.

var verifiedAddresses = api.VerifyUsAddresses(
    addressData,
    cancellationToken
);

Note: Requires separate address verification data package.

API Reference

Core Classes

WinPureApi

Main API class providing all functionality.

Constructor:

public WinPureApi()

Events:

public event Action<string, int> OnProgress

Fired during long-running operations to report progress.

Enums

LicenseState

  • Valid
  • Invalid
  • Demo
  • Expired
  • NotRegistered

MatchFlowType

  • MixedFlow
  • (other flow types)

Sample Application

TestApiConsole Overview

The sample console application demonstrates core capabilities:

  1. TestCleansingTable1 - Basic data cleansing
  2. TestCleansingTable2 - Advanced cleansing operations
  3. TestOfflineAddressVerification - US address validation
  4. TestMatchingCompanies - Duplicate detection within single table
  5. TestSearchCompanies - Fuzzy search functionality
  6. TestMatchingCompaniesBetweenTwoTables - Cross-table matching

Sample Data

The application includes a SQLite database with test data. Use SQLite browser tools to explore the data structure:

  • Sample company records
  • Address data
  • Various data quality scenarios

Running the Sample

  1. Download sample code from WinPure website
  2. Open solution in Visual Studio
  3. Build the project
  4. Run TestApiConsole application
  5. Review console output for results

Best Practices

Data Cleansing

  • Always profile data first with CalculateStatistic() before cleaning
  • Apply cleansing rules in order: text cleaning → case conversion → word management → column operations
  • Test regex patterns thoroughly before production use

Data Matching

  • Start with higher confidence levels (0.85+) for better precision
  • Use SearchDeep value of 10 for most scenarios
  • Profile data quality before matching to improve results
  • Consider cross-table matching for validating new data against trusted sources

Performance

  • Process data in batches for large datasets
  • Use cancellation tokens for long-running operations
  • Monitor progress events to provide user feedback

Troubleshooting

Common Issues

License Issues

  • Ensure registration code is generated on deployment machine
  • Contact WinPure support for license file regeneration if hardware changes
  • Demo license is sufficient for testing and development

Matching Performance

  • Reduce SearchDeep parameter if matching is too slow
  • Consider pre-cleansing data to improve matching speed
  • Use DirectCompare for exact matches instead of fuzzy matching

Data Quality

  • Use CalculateStatistic() to identify problem fields
  • Apply appropriate cleansing rules based on data patterns
  • Test on sample data before processing full datasets

Support

WinPure Support:

Overview

The WinPure DataMatching API is an enterprise-grade .NET library for data quality management, designed for organizations that need to integrate data cleansing, deduplication, and standardization capabilities into their existing business applications and workflows.

Key Capabilities

  • Data Cleansing - Standardize and clean data with 30+ built-in operations including text normalization, case conversion, column splitting/merging, and pattern-based transformations
  • Data Matching - Identify duplicates and similar records using advanced fuzzy matching algorithms (Jaro, Jaro-Winkler, WinPureFuzzy, and more)
  • Data Profiling - Analyze data quality with statistical profiling to identify anomalies and inconsistencies
  • Master Record Definition - Automatically determine the most accurate record from duplicate groups
  • Entity Resolution - Merge, update, and normalize duplicate records into unified master records

Use Cases

The WinPure API is ideal for organizations that:

  • Consolidate data from multiple disparate sources into a central database
  • Need to deduplicate records before data warehouse loading
  • Maintain trusted master data repositories
  • Require on-premises data processing for compliance and security
  • Have custom business applications requiring data quality capabilities

Typical Workflow

1. Data Integration → Receive data from multiple sources
2. Data Profiling   → (Optional) Analyze data quality statistics
3. Data Cleansing   → Standardize and clean data
4. Data Matching    → Identify duplicates (self-matching or against trusted source)
5. Post-Processing  → Define master records, merge duplicates
6. Data Export      → Save results or integrate into downstream systems

System Requirements

  • .NET 8.0 or later
  • Operating System: Windows, Linux, macOS
  • Database: SQLite (embedded, included)
  • License: Required for production use (demo mode available for evaluation)

Quick Start Guide

Installation

Install the NuGet package:

dotnet add package WinPure.DataMatching.API.Core

Or via Package Manager Console:

Install-Package WinPure.DataMatching.API.Core

Basic Initialization

using WinPure.API.Core;
using System.Data;

// Initialize the API
var api = new WinPureApi();

// Check license status
var licenseState = api.CheckLicenseState();
Console.WriteLine($"License Status: {licenseState}");

// Get registration code (needed for licensing)
var registrationCode = api.GetRegistrationCode();
Console.WriteLine($"Registration Code: {registrationCode}");

First Example: Clean and Match Data

This example demonstrates cleaning data and finding duplicates:

using WinPure.API.Core;
using WinPure.Cleansing.Models;
using WinPure.Matching.Models;
using WinPure.Matching.Enums;
using System.Data;

// Initialize API
var api = new WinPureApi();
var cancellationToken = new CancellationToken();

// Prepare your data in a DataTable
var dataTable = new DataTable("Customers");
dataTable.Columns.Add("Company Name", typeof(string));
dataTable.Columns.Add("Address", typeof(string));
dataTable.Columns.Add("City", typeof(string));

// Add sample data
dataTable.Rows.Add("Acme Corp", "123 Main St", "New York");
dataTable.Rows.Add("ACME Corporation", "123 Main Street", "New York");
dataTable.Rows.Add("Beta Inc", "456 Oak Ave", "Boston");

// Step 1: Clean the data
var cleanSettings = new WinPureCleanSettings();
cleanSettings.TextCleanerSettings.Add(new TextCleanerSetting
{
    ColumnName = "Company Name",
    RemoveMultipleSpaces = true,
    RemoveNonPrintableCharacters = true
});

api.CleanTable(dataTable, cleanSettings, cancellationToken);

// Step 2: Match duplicates
var matchParameter = new MatchParameter
{
    FuzzyAlgorithm = MatchAlgorithm.WinPureFuzzy,
    CheckInternal = true,  // Match within the same table
    SearchDeep = 10
};

// ... (configure matching conditions)

var matchResult = api.MatchData(
    new List<TableParameter> { tableParameter },
    matchParameter,
    fieldMappings,
    MatchFlowType.MixedFlow,
    cancellationToken
);

Console.WriteLine($"Match Results: {matchResult.Rows.Count} rows with duplicate information");

Progress Monitoring

Monitor long-running operations using the OnProgress event:

var api = new WinPureApi();

api.OnProgress += (description, progressPercent) =>
{
    Console.WriteLine($"[{progressPercent}%] {description}");
};

// Now execute operations - progress will be reported automatically

License Management

The WinPure API requires a valid license for production use. Demo mode is available for evaluation with record limitations.

Check License Status

var api = new WinPureApi();
var licenseState = api.CheckLicenseState();

switch (licenseState)
{
    case LicenseState.Valid:
        Console.WriteLine("License is valid");
        break;
    case LicenseState.Demo:
        Console.WriteLine("Running in demo mode (limited records)");
        break;
    case LicenseState.LicenseExpire:
        Console.WriteLine("License has expired");
        break;
    case LicenseState.Invalid:
        Console.WriteLine("Invalid license");
        break;
    case LicenseState.Free:
        Console.WriteLine("Free license activated");
        break;
}

Get Registration Code

Obtain your unique registration code to send to WinPure for license generation:

var registrationCode = api.GetRegistrationCode();
Console.WriteLine($"Registration Code: {registrationCode}");
// Send this code to WinPure to receive your license file

⚠️ IMPORTANT - Hardware Binding:

The registration code is generated based on your computer's hardware configuration (CPU, motherboard, MAC address, etc.). This means:

  • The license file you receive will only work on the computer that generated this registration code
  • If you change your hardware (new computer, significant hardware upgrades, or virtual machine migration), you will need to:
    1. Generate a new registration code on the new hardware
    2. Contact WinPure support to request a new license file
    3. Register with the new license file

This hardware binding ensures license security and prevents unauthorized use.

Register License

Once you receive your .license file from WinPure:

var licenseFilePath = @"C:\Path\To\Your\WinPure.license";
var licenseState = api.Register(licenseFilePath);

if (licenseState == LicenseState.Valid)
{
    Console.WriteLine("License registered successfully!");
}
else
{
    Console.WriteLine($"License registration failed: {licenseState}");
}

Demo Mode Limitations

When running in Demo mode (before registration), the API has the following limitations:

  • Matching Operations: Limited to a fixed number of records per operation
  • Address Verification: Limited records (if available)
  • All other operations function normally

Demo mode is ideal for:

  • Evaluating the API capabilities
  • Development and testing
  • Proof-of-concept implementations

Data Profiling

Data profiling provides statistical analysis of your dataset, helping identify data quality issues before cleansing or matching operations.

Calculate Statistics

The CalculateStatistic method analyzes your data and returns comprehensive statistics for each field.

var api = new WinPureApi();
var cancellationToken = new CancellationToken();

// Prepare your data
var dataTable = new DataTable("Customers");
// ... (populate dataTable)

// Define field metadata
var dataFields = new List<DataField>();
foreach (DataColumn column in dataTable.Columns)
{
    dataFields.Add(new DataField
    {
        Id = column.Ordinal,
        DatabaseName = column.ColumnName,
        DisplayName = column.Caption,
        FieldType = column.DataType.ToString(),
        Pattern = ""  // Optional: specify pattern for validation
    });
}

// Calculate statistics
var statisticsTable = api.CalculateStatistic(
    dataTable,
    dataFields,
    cancellationToken
);

// Statistics table contains analysis results
foreach (DataRow row in statisticsTable.Rows)
{
    Console.WriteLine($"Field: {row["FieldName"]}");
    Console.WriteLine($"  Total Records: {row["TotalRecords"]}");
    Console.WriteLine($"  Empty Values: {row["EmptyCount"]}");
    Console.WriteLine($"  Unique Values: {row["UniqueCount"]}");
}

DataField Properties

Property Type Description
Id int Unique identifier (typically column ordinal)
DatabaseName string Column name in the data source
DisplayName string User-friendly display name
FieldType string Data type (System.String, System.Int32, etc.)
Pattern string Optional validation pattern

Use Cases for Profiling

  1. Pre-Cleansing Analysis - Identify which fields need cleaning
  2. Data Quality Assessment - Measure completeness and consistency
  3. Pattern Detection - Discover data format variations
  4. Outlier Identification - Find anomalous values

Data Cleansing

Data cleansing standardizes and corrects data using a comprehensive set of built-in operations. All cleansing is performed in-place on the DataTable.

Overview

The CleanTable method applies cleansing rules to your data:

var api = new WinPureApi();
var cleanSettings = new WinPureCleanSettings();

// Add cleansing rules (see sections below)

api.CleanTable(dataTable, cleanSettings, cancellationToken);

Text Cleaner Operations

Remove, replace, or normalize characters and text patterns:

cleanSettings.TextCleanerSettings.Add(new TextCleanerSetting
{
    ColumnName = "Company Name",

    // Character removal
    RemoveNonPrintableCharacters = true,
    RemoveCommas = true,
    RemoveMultipleSpaces = true,

    // Space normalization
    RemoveLeadingSpace = true,
    RemoveTrailingSpace = true,

    // Character conversion
    ConvertOnesToLs = true,  // 1 → L

    // Empty value handling
    ConvertEmptyToDefaultValue = "Unknown",

    // Regex-based replacement
    RegexExpression = @"\d{3}-\d{3}-\d{4}",
    RegexReplace = "PHONE"
});

Case Converter

Standardize text casing across fields:

cleanSettings.CaseConverterSettings.Add(new CaseConverterSetting
{
    ColumnName = "State",
    ToUpperCase = true
});

cleanSettings.CaseConverterSettings.Add(new CaseConverterSetting
{
    ColumnName = "Title",
    ToProperCase = true
});

Word Manager

Find and replace words or phrases with standardized terms:

cleanSettings.WordManagerSettings.Add(new WordManagerSetting
{
    ColumnName = "Company Name",
    SearchValue = "Corp",
    ReplaceValue = "Corporation",
    ReplaceType = WordManagerReplaceType.WholeWord
});

Column Operations

Column Splitting:

// Split on delimiter
cleanSettings.ColumnSplitSettings.Add(new ColumnSplitSetting
{
    ColumnName = "ZIP",
    SplitIntoWords = new SplitIntoWords { SplitSeparator = "-" }
});

// Split email components
cleanSettings.ColumnSplitSettings.Add(new ColumnSplitSetting
{
    ColumnName = "Email",
    SplitEmailAddressIntoAccountDomainAndZone = true
});

Column Merging:

cleanSettings.ColumnMergeSettings.Add(new ColumnMergeSetting
{
    ColumnName = "FirstName",
    CharToInsertBetweenColumn = " ",
    Order = 1
});

cleanSettings.ColumnMergeSettings.Add(new ColumnMergeSetting
{
    ColumnName = "LastName",
    CharToInsertBetweenColumn = " ",
    Order = 2
});

Models

This subsection describes the model classes used for Data Cleansing operations.

TextCleanerSetting

Namespace: WinPure.Cleansing.Models

Description: Comprehensive text cleaning configuration with 20+ options for removing, replacing, and normalizing characters. The most versatile and commonly-used cleansing class.

Property Type Required Default Description
ColumnName string Yes - Name of the column to clean.
Character Removal
RemoveNonPrintableCharacters bool No false Remove invisible/control characters (tabs, newlines, special ASCII). Recommended: true
RemoveAllDigits bool No false Remove all numeric digits (0-9).
RemoveAllLetters bool No false Remove all alphabetic characters (a-z, A-Z).
RemoveAllSpaces bool No false Remove all spaces (use with caution - creates single word).
RemoveDots bool No false Remove periods (.). Useful for abbreviations.
RemoveCommas bool No false Remove commas (,).
RemoveHyphens bool No false Remove hyphens (-). Useful for phone numbers.
RemoveApostrophes bool No false Remove apostrophes (').
RemoveTabs bool No false Remove tab characters.
RemoveNewLine bool No false Remove line breaks (\n, \r\n).
RemovePunctuation bool No false Remove all punctuation marks (!?.,;:).
Space Normalization
RemoveLeadingSpace bool No false Trim spaces from start of string. Commonly used
RemoveTrailingSpace bool No false Trim spaces from end of string. Commonly used
RemoveMultipleSpaces bool No false Replace multiple consecutive spaces with single space. Recommended: true
Character Conversion
ConvertOsToNaughts bool No false Convert letter O to digit 0.
ConvertLsToOnes bool No false Convert letter L to digit 1.
ConvertNaughtsToOs bool No false Convert digit 0 to letter O.
ConvertOnesToLs bool No false Convert digit 1 to letter L.
Default Values
ConvertEmptyToDefaultValue string No null Replace empty/null values with specified default. Example: "Unknown"
Custom Patterns
RemoveCharacters string No null String of custom characters to remove. Example: "#@$" removes all #, @, $ characters.
RegexExpression string No null Regular expression pattern to find. Use with RegexReplace.
RegexReplace string No null Replacement text for matches found by RegexExpression.

Usage Notes:

  • Space normalization (Leading/Trailing/Multiple) is recommended for almost all text fields
  • RemoveNonPrintableCharacters should almost always be true
  • Regex operations are powerful but require testing - use raw strings @"pattern"
  • Character conversions are useful for OCR-corrected data
  • Multiple TextCleanerSettings can target the same column (applied sequentially)

Example - Basic Cleaning:

new TextCleanerSetting
{
    ColumnName = "Company Name",
    RemoveNonPrintableCharacters = true,
    RemoveLeadingSpace = true,
    RemoveTrailingSpace = true,
    RemoveMultipleSpaces = true,
    RemoveCommas = true
}

Example - Phone Number Cleaning:

new TextCleanerSetting
{
    ColumnName = "Phone",
    RemoveDots = true,
    RemoveHyphens = true,
    RemoveMultipleSpaces = true,
    RemoveAllSpaces = false  // Keep spaces between area code and number
}

Example - Regex Pattern Replacement:

new TextCleanerSetting
{
    ColumnName = "Contact",
    RegexExpression = @"\d{3}-\d{3}-\d{4}",  // Find phone pattern
    RegexReplace = "PHONE"  // Replace with placeholder
}

CaseConverterSetting

Namespace: WinPure.Cleansing.Models

Description: Converts text case (upper, lower, proper/title case). Simple but essential for standardizing text fields.

Property Type Required Default Description
ColumnName string Yes - Name of the column to convert.
ToUpperCase bool No false Convert to UPPERCASE. Mutually exclusive with other options.
ToLowerCase bool No false Convert to lowercase. Mutually exclusive with other options.
ToProperCase bool No false Convert To Title Case. Mutually exclusive with other options.
ProperCaseSettings ProperCaseSettings No null Advanced proper case configuration (rarely needed).

Usage Notes:

  • Only ONE of ToUpperCase/ToLowerCase/ToProperCase should be true
  • ToUpperCase is ideal for state codes, country codes (e.g., "NY", "USA")
  • ToProperCase is best for names and titles but may incorrectly capitalize "McDonald" → "Mcdonald"
  • ProperCaseSettings allows fine-tuning of capitalization rules

Example - State Code Uppercase:

new CaseConverterSetting
{
    ColumnName = "State",
    ToUpperCase = true
}

Example - Proper Case Names:

new CaseConverterSetting
{
    ColumnName = "Company Name",
    ToProperCase = true
}

WordManagerSetting

Namespace: WinPure.Cleansing.Models

Description: Find and replace words or phrases to standardize terminology. Essential for business name standardization (Corp → Corporation, Inc → Incorporated).

Property Type Required Description
ColumnName string Yes Name of the column to process.
SearchValue string Yes The word or phrase to find. Case-sensitive or insensitive depends on ReplaceType.
ReplaceValue string No The replacement text. Leave empty if ToDelete=true.
ToDelete bool No If true, removes SearchValue entirely (ReplaceValue ignored). Default: false
ReplaceType WordManagerReplaceType Yes How to match SearchValue: WholeWord, AnyPart, WholeValue, or AnyPartEntire. Recommended: WholeWord

Usage Notes:

  • WholeWord is safest - won't accidentally match "Corp" inside "Corporate"
  • ToDelete=true removes words entirely (useful for removing profanity, unnecessary words)
  • Multiple WordManagerSettings are applied sequentially
  • Build standardization dictionaries for common abbreviations

Example - Standardize Company Suffixes:

new WordManagerSetting
{
    ColumnName = "Company Name",
    SearchValue = "Corp",
    ReplaceValue = "Corporation",
    ReplaceType = WordManagerReplaceType.WholeWord,
    ToDelete = false
}

new WordManagerSetting
{
    ColumnName = "Company Name",
    SearchValue = "inc",
    ReplaceValue = "Incorporated",
    ReplaceType = WordManagerReplaceType.WholeWord
}

Example - Delete Unwanted Words:

new WordManagerSetting
{
    ColumnName = "Description",
    SearchValue = "CONFIDENTIAL",
    ToDelete = true,
    ReplaceType = WordManagerReplaceType.WholeWord
}

ColumnSplitSetting

Namespace: WinPure.Cleansing.Models

Description: Splits column data into multiple columns based on delimiters or patterns. Creates new columns with standardized naming.

Property Type Required Description
ColumnName string Yes Name of the column to split.
SplitIntoWords SplitIntoWords No Splits on a delimiter. Set SplitSeparator property (e.g., "-" for ZIP codes).
SplitTelephoneIntoInternationalCodeAndPhoneNumber bool No Splits phone into country code and number parts.
SplitDatetime bool No Splits DateTime column into separate Date and Time columns.
SplitNameAndEmailAddress bool No Extracts name and email from combined field ("John Doe <john@example.com>").
SplitEmailAddressIntoAccountDomainAndZone bool No Splits email into user, domain, and TLD (user@domain.com).
RegexCopy string No Regex pattern to extract - creates new column with matched text.

Usage Notes:

  • Split operations create new columns with suffix numbers (_1, _2, _3, etc.)
  • Original column is preserved
  • SplitIntoWords is most common - use for any delimiter-based splitting
  • RegexCopy extracts without splitting - useful for pulling out specific patterns

Example - Split ZIP+4:

new ColumnSplitSetting
{
    ColumnName = "ZIP",
    SplitIntoWords = new SplitIntoWords { SplitSeparator = "-" }
    // Creates ZIP_1 (12345) and ZIP_2 (6789) from "12345-6789"
}

Example - Extract Pattern:

new ColumnSplitSetting
{
    ColumnName = "ReferenceCode",
    RegexCopy = @"[A-Z]{3}-\d{4}"  // Extracts "ABC-1234" into new column
}

ColumnMergeSetting

Namespace: WinPure.Cleansing.Models

Description: Merges multiple columns into one by concatenating values in specified order with a separator.

Property Type Required Description
ColumnName string Yes Name of the column to include in merge.
CharToInsertBetweenColumn string Yes Separator inserted between column values. Common: " " (space), ", " (comma-space), "-"
Order int Yes Position in merge order (1, 2, 3, etc.). Columns merged in ascending order.

Usage Notes:

  • Multiple ColumnMergeSettings with different Orders define the merge sequence
  • Result is written to the first column (Order=1) in the merge
  • Other columns in the merge are preserved unchanged
  • Empty values are skipped (separator not added)

Example - Merge Name Fields:

// Merge FirstName + LastName into full name
cleanSettings.ColumnMergeSettings.Add(new ColumnMergeSetting
{
    ColumnName = "FirstName",
    CharToInsertBetweenColumn = " ",
    Order = 1  // First in merge
});

cleanSettings.ColumnMergeSettings.Add(new ColumnMergeSetting
{
    ColumnName = "LastName",
    CharToInsertBetweenColumn = " ",
    Order = 2  // Second in merge
});
// Result: FirstName column becomes "John Doe"

Example - Create Full Address:

new ColumnMergeSetting { ColumnName = "Street", CharToInsertBetweenColumn = ", ", Order = 1 },
new ColumnMergeSetting { ColumnName = "City", CharToInsertBetweenColumn = ", ", Order = 2 },
new ColumnMergeSetting { ColumnName = "State", CharToInsertBetweenColumn = " ", Order = 3 },
new ColumnMergeSetting { ColumnName = "ZIP", CharToInsertBetweenColumn = "", Order = 4 }
// Result: "123 Main St, New York, NY 10001"

WinPureCleanSettings

Namespace: WinPure.Cleansing.Models

Description: Container for all cleansing configuration settings. This is the root configuration object passed to CleanTable() method. Each property holds a collection of specific cleansing rules.

Property Type Required Description
TextCleanerSettings List<TextCleanerSetting> No Character and text cleaning rules (remove spaces, punctuation, regex replacement, etc.). Most commonly used.
CaseConverterSettings List<CaseConverterSetting> No Case conversion rules (UPPER, lower, Proper Case).
WordManagerSettings List<WordManagerSetting> No Word find-and-replace rules for standardizing terminology (Corp → Corporation).
ColumnSplitSettings List<ColumnSplitSetting> No Column splitting rules (split ZIP codes, emails, phone numbers).
ColumnMergeSettings List<ColumnMergeSetting> No Column merging rules (combine First Name + Last Name).
ColumnShiftSettings List<ColumnShiftSetting> No Column position shifting rules (move columns left/right).
ColumnCheckSettings List<ColumnCheckSettings> No Data validation rules (check email format, etc.).
AddressAndGenderSplitSettings AddressAndGenderSplitSettings No Address parsing and gender detection (requires Loqate API).

Usage Notes:

  • All properties are optional - only include the cleansing types you need
  • Multiple settings of the same type can be applied to different columns
  • Settings are applied in the order: Text Cleaning → Case Conversion → Word Manager → Column Operations
  • Empty lists are allowed (no operation performed)

Example:

var cleanSettings = new WinPureCleanSettings();

// Add text cleaning
cleanSettings.TextCleanerSettings.Add(new TextCleanerSetting
{
    ColumnName = "Company Name",
    RemoveMultipleSpaces = true,
    RemoveNonPrintableCharacters = true
});

// Add case conversion
cleanSettings.CaseConverterSettings.Add(new CaseConverterSetting
{
    ColumnName = "State",
    ToUpperCase = true
});

// Apply all cleaning
api.CleanTable(dataTable, cleanSettings, cancellationToken);

Data Matching

Data matching identifies duplicate and similar records using advanced fuzzy matching algorithms. The API supports multiple matching scenarios:

  1. Single-Table Matching (Self-Matching) - Find duplicates within one dataset
  2. Cross-Table Matching - Match records between two or more tables
  3. Match Against Trusted Source - Validate new records against a master database
  4. Fuzzy Search - Find records similar to a specific value

Matching Algorithms

Algorithm Description Best For
WinPureFuzzy Recommended. Pays strong attention to the beginning of strings Company names, person names where first part is most important
Jaro Original Jaro distance algorithm General string similarity
JaroWinkler Jaro with prefix bonus Strings with similar beginnings
ChapmanLengthDeviation Focuses on string length similarity Matching records of similar size
SmithWatermanGotoh Local sequence alignment algorithm DNA sequences, detailed pattern matching

Single-Table Matching (Find Duplicates)

Find duplicate records within a single table:

var matchParameter = new MatchParameter
{
    FuzzyAlgorithm = MatchAlgorithm.WinPureFuzzy,
    CheckInternal = true,  // IMPORTANT: true = match within table
    MainTable = "Customers",
    SearchDeep = 10  // Recommended: 10
};

var matchGroup = new MatchGroup
{
    GroupId = 1,
    GroupLevel = 0.85  // Group confidence threshold (85%)
};

var matchCondition = new MatchCondition
{
    MatchingType = MatchType.Fuzzy,
    Level = 0.85,  // Match confidence threshold (85%)
    Weight = 1.0,
    IncludeEmpty = false,
    IncludeNullValues = false
};

matchCondition.Fields.Add(new MatchField
{
    TableName = "Customers",
    ColumnName = "Company Name",
    ColumnDataType = typeof(string).ToString()
});

matchGroup.Conditions.Add(matchCondition);
matchParameter.Groups.Add(matchGroup);

// Execute matching
var matchResult = api.MatchData(
    new List<TableParameter> { tableParameter },
    matchParameter,
    fieldMappings,
    MatchFlowType.MixedFlow,
    cancellationToken
);

Match Configuration Guidelines

Confidence Levels:

  • 0.95 - 0.99 - Recommended for most applications (high precision)
  • 0.85 - 0.94 - Moderate precision (use with caution, may produce false positives)
  • < 0.85 - Not recommended (produces very poor results with WinPureFuzzy, JaroWinkler, and Jaro)
  • 1.0 - Not recommended for fuzzy matching (use MatchType.DirectCompare instead for exact matches)

SearchDeep Parameter:

  • Recommended value: 10
  • Range: 1 - 100
  • Higher values = more thorough matching but slower performance

Understanding Match Results

The match result DataTable contains:

  • All original columns from your source data
  • MatchGroupID - Records with the same ID are duplicates
  • MatchScore - Similarity score for the match
  • TableSource - Source table name (for cross-table matching)
  • IsMaster - Indicates master record (after DefineMasterRecord is called)

Models

This subsection describes the model classes used for Data Matching operations.

TableParameter

Namespace: WinPure.Matching.Models

Description: Describes an input data table for matching operations. Used to pass DataTable instances to matching and search operations.

Property Type Required Description
TableName string Yes Unique identifier for the table. Used to reference the table in match conditions and field mappings. Must match the table name used in MatchField definitions.
TableData DataTable Yes The actual data to be processed. Must be a populated ADO.NET DataTable with at least one column.

Usage Notes:

  • The TableName is case-sensitive and must match exactly in all related configurations
  • TableData is not modified by matching operations (read-only)
  • For cross-table matching, provide multiple TableParameter instances with different TableNames

Example:

var table = new TableParameter
{
    TableName = "Customers",
    TableData = customerDataTable  // Your populated DataTable
};

MatchParameter

Namespace: WinPure.Matching.Models

Description: Defines the complete matching configuration including algorithm, search behavior, and matching groups. This is the primary configuration object for all matching operations.

Property Type Required Default Description
FuzzyAlgorithm MatchAlgorithm Yes - The fuzzy matching algorithm to use. Recommended: WinPureFuzzy for most business data applications.
CheckInternal bool Yes - CRITICAL: true = match within same table (find duplicates), false = match between different tables (cross-table matching).
MainTable string Yes - Name of the primary table. Must match one of the TableParameter.TableName values provided to MatchData().
SearchDeep int Yes - Search depth parameter controlling thoroughness of matching. Recommended value: 10. Range: 1-100. Higher values = more thorough but slower.
Groups List<MatchGroup> Yes - Collection of matching groups. Each group represents a set of matching conditions that work together. Multiple groups provide OR logic (match if ANY group satisfies).

Usage Notes:

  • CheckInternal is the key differentiator:
    • Set true for deduplication within a single dataset
    • Set false for matching new records against an existing database
  • SearchDeep impact on performance: 10 is optimal for most scenarios
  • Groups can contain multiple MatchGroup instances for complex matching logic
  • Algorithm choice affects performance: WinPureFuzzy is optimized for speed and accuracy on business data

Example - Single Table Matching:

var matchParameter = new MatchParameter
{
    FuzzyAlgorithm = MatchAlgorithm.WinPureFuzzy,
    CheckInternal = true,      // Find duplicates within table
    MainTable = "Customers",
    SearchDeep = 10,
    Groups = new List<MatchGroup> { /* ... */ }
};

Example - Cross-Table Matching:

var matchParameter = new MatchParameter
{
    FuzzyAlgorithm = MatchAlgorithm.WinPureFuzzy,
    CheckInternal = false,     // Match between tables
    MainTable = "NewCustomers",
    SearchDeep = 10,
    Groups = new List<MatchGroup> { /* ... */ }
};

MatchGroup

Namespace: WinPure.Matching.Models

Description: Defines a group of matching conditions that are evaluated together. Multiple conditions within a group are combined with AND logic (all must match). Multiple groups within MatchParameter are combined with OR logic (any group can match).

Property Type Required Description
GroupId string Yes Unique identifier for this matching group. Used for organizing and tracking match results. Typically set to sequential values: "1", "2", "3", etc.
GroupLevel double Yes Overall confidence threshold for this group (0.0 to 1.0). Records must meet this threshold to be considered matches. Typical range: 0.75-0.95. Recommended: 0.85
Conditions List<MatchCondition> Yes Collection of matching conditions. All conditions in the group must be satisfied (AND logic). Typically contains 1-3 conditions.

Usage Notes:

  • GroupLevel acts as a minimum threshold - even if individual conditions score higher, the overall match must meet this level
  • Use multiple groups for flexible matching strategies (e.g., "Match on Name OR Address")
  • Conditions within a group are ANDed together (all must match)
  • Groups within MatchParameter are ORed together (any can match)

Example - Single Condition Group:

var group = new MatchGroup
{
    GroupId = 1,
    GroupLevel = 0.85,  // 85% confidence required
    Conditions = new List<MatchCondition>
    {
        new MatchCondition { /* Match on Company Name */ }
    }
};

Example - Multiple Condition Group (AND logic):

var group = new MatchGroup
{
    GroupId = 1,
    GroupLevel = 0.80,  // Lower threshold since multiple conditions
    Conditions = new List<MatchCondition>
    {
        new MatchCondition { /* Match on Company Name */ },
        new MatchCondition { /* AND Match on City */ }
    }
};

MatchCondition

Namespace: WinPure.Matching.Models

Description: Defines an individual matching condition that specifies which fields to compare and how to compare them. This is the core matching rule definition.

Property Type Required Default Description
Fields List<MatchField> Yes - Fields to compare. For single-table matching: 1 field. For cross-table matching: 2+ fields (one from each table).
MatchingType MatchType Yes - Comparison type: DirectCompare for exact match, Fuzzy for similarity-based matching. Recommended: Fuzzy
Level double Yes - Confidence threshold for this specific condition (0.0 to 1.0). Records scoring below this are not considered matches. Recommended: 0.85
Weight double Yes - Importance weight of this condition relative to others in the same group (0.0 to 1.0). Higher weight = more important. Typical: 1.0
DictionaryType string No null Optional: Custom dictionary for specialized matching logic (rarely used).
IncludeNullValues bool Yes - Whether to include records where this field is NULL in matching. Set false to skip NULL values.
IncludeEmpty bool Yes - Whether to include records where this field is empty string in matching. Set false to skip empty values.

Usage Notes:

  • Level vs GroupLevel: Condition Level is checked first, then Group Level
  • Weight matters when multiple conditions exist in a group - higher weights contribute more to overall score
  • IncludeNullValues/IncludeEmpty: Usually set to false to avoid matching records with missing data
  • For cross-table matching, Fields list must contain one field from each table being matched

Example - Fuzzy Condition:

var condition = new MatchCondition
{
    MatchingType = MatchType.Fuzzy,
    Level = 0.85,
    Weight = 1.0,
    IncludeEmpty = false,
    IncludeNullValues = false,
    Fields = new List<MatchField>
    {
        new MatchField { TableName = "Customers", ColumnName = "Company Name", ColumnDataType = "System.String" }
    }
};

Example - Exact Match Condition:

var condition = new MatchCondition
{
    MatchingType = MatchType.DirectCompare,  // Exact match
    Level = 1.0,  // Must be exact
    Weight = 1.0,
    IncludeEmpty = false,
    IncludeNullValues = false,
    Fields = new List<MatchField>
    {
        new MatchField { TableName = "Orders", ColumnName = "OrderID", ColumnDataType = "System.Int32" }
    }
};

MatchField

Namespace: WinPure.Matching.Models

Description: Describes a specific field (column) in a table for matching operations. Used to identify which columns to compare.

Property Type Required Description
TableName string Yes Name of the source table. Must match the TableParameter.TableName exactly (case-sensitive).
ColumnName string Yes Name of the column in the DataTable. Must match the DataColumn.ColumnName exactly (case-sensitive).
ColumnDataType string Yes Fully qualified .NET type name of the column. Examples: "System.String", "System.Int32", "System.DateTime". Use typeof(string).ToString() to get correct format.

Usage Notes:

  • All three properties are case-sensitive and must match exactly
  • ColumnDataType must be the full type name, not abbreviated (e.g., "System.String" not "string")
  • For cross-table matching, create one MatchField for each table's corresponding column
  • The type helps the matching engine apply appropriate comparison logic

Example:

var field = new MatchField
{
    TableName = "Customers",
    ColumnName = "Company Name",
    ColumnDataType = typeof(string).ToString()  // "System.String"
};

Master Record & Merge Operations

After identifying duplicates, you typically need to:

  1. Define a Master Record - Choose the "best" record from each duplicate group
  2. Merge Duplicates - Combine information from duplicate records
  3. Update Fields - Apply business logic to standardize values
  4. Delete Records - Remove non-master or unwanted records

Define Master Record

Automatically select the master record from each duplicate group based on rules:

var masterRecordSettings = new MasterRecordSettings
{
    RecordType = MasterRecordType.MostRelevant,
    IsAllRules = true,  // true = AND logic, false = OR logic
    Rules = new List<MasterRecordRule>()
};

// Add rules
masterRecordSettings.Rules.Add(new MasterRecordRule
{
    FieldName = "Company Name",
    FieldType = typeof(string).ToString(),
    RuleType = MasterRecordRuleType.IsLongest
});

masterRecordSettings.Rules.Add(new MasterRecordRule
{
    FieldName = "Phone",
    RuleType = MasterRecordRuleType.IsEmpty,
    Negate = true  // Prefer non-empty
});

bool success = api.DefineMasterRecord(
    matchResult,
    matchParameter,
    masterRecordSettings
);

Master Record Rules

RuleType Description Example Use Case
IsEmpty Field is empty Prefer non-empty fields (use Negate = true)
IsEqual Field equals specific value Prefer records where Status = "Active"
IsLongest Longest string Prefer most complete descriptions
IsMaximum Largest numeric value Prefer highest credit limit
Common Most common value in group Statistical mode

Merge Match Results

Merge duplicate records into master records:

var mergeSettings = new List<MergeMatchResultSetting>();

mergeSettings.Add(new MergeMatchResultSetting
{
    FieldName = "Phone",
    OnlyEmpty = true,  // Only update if master's phone is empty
    UpdateField = true,
    KeepAllValues = false
});

mergeSettings.Add(new MergeMatchResultSetting
{
    FieldName = "Email",
    OnlyEmpty = false,
    UpdateField = true,
    KeepAllValues = true  // Combine all unique emails
});

string valueSeparator = "; ";

var mergedResult = api.MergeMatchResult(
    matchResult,
    mergeSettings,
    valueSeparator
);

Delete Operations

Remove unwanted records after merging:

var deleteSetting = new DeleteFromMatchResultSetting
{
    DeleteSetting = DeleteMatchResultSetting.NonMaster
};

var cleanedResult = api.DeleteMergeMatchResult(
    matchResult,
    deleteSetting
);

Models

This subsection describes the model classes used for Master Record Operations.

MasterRecordSettings

Namespace: WinPure.Matching.Models

Description: Defines criteria for automatically selecting master records from duplicate groups. After matching identifies duplicates, this determines which record in each group is the "best" or master record.

Property Type Required Default Description
RecordType MasterRecordType Yes - Strategy for selecting master records. Recommended: MostRelevant (rule-based selection).
PreferredTable string No empty Optional: Prefer records from a specific table when multiple tables are matched. Leave empty if no preference.
ApplyOptionsIfRuleGiveNothing bool Yes - Fallback behavior: if rules fail to select a master, use automatic selection (most populated record). Recommended: true
IsAllRules bool Yes - Rule combination logic: true = ALL rules must match (AND), false = ANY rule can match (OR). Recommended: true for strict selection.
OnlyThisTable bool Yes - In cross-table matching: if true, only select masters from the main table. If false, any table's records can be masters.
Rules List<MasterRecordRule> Yes - Collection of rules to evaluate. Applied in order. Empty list is valid (defaults to most populated).

Usage Notes:

  • RecordType determines primary strategy: MostPopulatedByTable (automatic) vs MostRelevant (rule-based)
  • Rules are powerful: Can specify "prefer longest company name" or "prefer non-empty email"
  • IsAllRules affects rule evaluation: true = stricter (all rules) vs false = lenient (any rule)
  • If no rules defined and RecordType=MostRelevant, falls back to most populated record

Example - Rule-Based Selection:

var masterSettings = new MasterRecordSettings
{
    RecordType = MasterRecordType.MostRelevant,
    ApplyOptionsIfRuleGiveNothing = true,
    IsAllRules = true,  // ALL rules must be satisfied
    OnlyThisTable = false,
    PreferredTable = "",
    Rules = new List<MasterRecordRule>
    {
        new MasterRecordRule
        {
            FieldName = "Email",
            RuleType = MasterRecordRuleType.IsEmpty,
            Negate = true  // Prefer non-empty email
        },
        new MasterRecordRule
        {
            FieldName = "Company Name",
            RuleType = MasterRecordRuleType.IsLongest  // Prefer longest name
        }
    }
};

MasterRecordRule

Namespace: WinPure.Matching.Models

Description: Defines a single rule for evaluating which record should be the master. Rules compare field values across duplicate records.

Property Type Required Default Description
FieldName string Yes - Name of the field to evaluate. Must match a column name in the match result.
FieldType string No null .NET type of the field (e.g., "System.String", "System.Int32"). Required for numeric comparisons (IsMaximum, IsMinimum, GreaterThan).
Negate bool No false Invert the rule logic. Example: IsEmpty with Negate=true means "is NOT empty".
Value string No null Comparison value for rules like IsEqual, IsContains, GreaterThan. Not used for IsEmpty, IsLongest, etc.
RuleType MasterRecordRuleType Yes - Type of comparison to perform (IsEmpty, IsLongest, IsMaximum, Common, etc.).

Usage Notes:

  • Negate is powerful for inverting logic: "IsEmpty + Negate=true" = "prefer non-empty"
  • FieldType is required for numeric operations (IsMaximum, IsMinimum, GreaterThan)
  • Value type depends on RuleType: string for IsEqual/IsContains, numeric string for GreaterThan
  • Rules are evaluated in order within the Rules collection

Example - Prefer Non-Empty Fields:

new MasterRecordRule
{
    FieldName = "Phone",
    RuleType = MasterRecordRuleType.IsEmpty,
    Negate = true  // NOT empty = has value
}

Example - Prefer Highest Value:

new MasterRecordRule
{
    FieldName = "CreditScore",
    FieldType = typeof(int).ToString(),
    RuleType = MasterRecordRuleType.IsMaximum
}

Example - Prefer Specific Value:

new MasterRecordRule
{
    FieldName = "Status",
    RuleType = MasterRecordRuleType.IsEqual,
    Value = "Active"
}

MergeMatchResultSetting

Namespace: WinPure.Matching.Models

Description: Defines how to merge field values from duplicate records into master records. Controls which fields to update and whether to combine multiple values.

Property Type Required Description
FieldName string Yes Name of the field to merge. Must exist in the match result DataTable.
OnlyEmpty bool Yes Update behavior: true = only update if master's field is empty, false = always update/combine.
UpdateField bool Yes Whether to update this field at all. Set false to skip this field entirely.
KeepAllValues bool Yes Value combination: true = combine all unique values (separated by valueSeparator), false = keep only master's value or first non-empty.

Usage Notes:

  • KeepAllValues=true is useful for contact fields (emails, phones) to preserve all values
  • OnlyEmpty=true prevents overwriting good data in master record
  • UpdateField=false is useful to exclude certain fields from merging entirely
  • When KeepAllValues=true, duplicate values are automatically removed

Example - Preserve Master Value:

new MergeMatchResultSetting
{
    FieldName = "Company Name",
    OnlyEmpty = false,
    UpdateField = true,
    KeepAllValues = false  // Keep only master's value
}

Example - Combine All Emails:

new MergeMatchResultSetting
{
    FieldName = "Email",
    OnlyEmpty = false,
    UpdateField = true,
    KeepAllValues = true  // Combine: "email1@test.com; email2@test.com"
}

Example - Fill Empty Fields Only:

new MergeMatchResultSetting
{
    FieldName = "Phone",
    OnlyEmpty = true,   // Only if master's phone is empty
    UpdateField = true,
    KeepAllValues = false
}

API Reference

Complete reference of all classes, methods, enums, and properties in the WinPure DataMatching API.

Main Class: WinPureApi

Namespace: WinPure.API.Core

Constructor

public WinPureApi()

Initializes the WinPure API. Automatically initializes licensing, configuration, and database components.

Events

public event Action<string, int> OnProgress

Fired during long-running operations to report progress.

Parameters:

  • string - Description of current operation
  • int - Progress percentage (0-100)

License Methods

string GetRegistrationCode()

Returns the unique registration code for this machine.

LicenseState Register(string licenseFile)

Registers the API with a license file.

LicenseState CheckLicenseState()

Returns the current license status.

Dictionary<string, string> GetFullLicenseInfo()

Returns detailed license information as key-value pairs.

Data Cleansing Methods

void CleanTable(
    DataTable data,
    WinPureCleanSettings settings,
    CancellationToken cancellationToken)

Applies cleansing rules to the data (modifies DataTable in-place).

Data Matching Methods

DataTable MatchData(
    List<TableParameter> tables,
    MatchParameter parameter,
    List<FieldMapping> fieldMap,
    MatchFlowType flowType,
    CancellationToken cancellationToken)

Identifies duplicate/similar records.

DataTable SearchData(
    TableParameter table,
    SearchParameter parameter,
    CancellationToken cancellationToken)

Searches for records similar to specific values.

Master Record Methods

bool DefineMasterRecord(
    DataTable matchResult,
    MatchParameter lastMatchingParameters,
    MasterRecordSettings settings)

Defines master record for each duplicate group.

DataTable MergeMatchResult(
    DataTable matchResult,
    List<MergeMatchResultSetting> mergeSettings,
    string valueSeparator)

Merges duplicate records.

Key Enums

LicenseState

public enum LicenseState
{
    Valid = 0,          // License is valid and active
    Demo = 1,           // Running in demo mode
    LicenseExpire = 2,  // License has expired
    DemoExpire = 3,     // Demo period has expired
    Invalid = 4,        // License is invalid
    Free = 5            // Free license activated
}

MatchAlgorithm

public enum MatchAlgorithm
{
    Jaro = 0,
    WinPureFuzzy = 1,            // Recommended
    JaroWinkler = 2,
    ChapmanLengthDeviation = 3,
    SmithWatermanGotoh = 4
}

MatchType

public enum MatchType
{
    DirectCompare = 0,  // Exact match
    Fuzzy = 1           // Fuzzy/similarity match
}

Code Examples

Example 1: Complete Deduplication Workflow

Clean data, find duplicates, define masters, merge, and export clean results:

var api = new WinPureApi();
var cancellationToken = new CancellationToken();

// 1. Load and clean data
var dataTable = LoadDataFromDatabase();
var cleanSettings = new WinPureCleanSettings();
// ... configure cleaning rules
api.CleanTable(dataTable, cleanSettings, cancellationToken);

// 2. Match duplicates
var matchResult = api.MatchData(/* ... */);

// 3. Define master records
var masterSettings = new MasterRecordSettings { /* ... */ };
api.DefineMasterRecord(matchResult, matchParameter, masterSettings);

// 4. Merge duplicates
var mergedResult = api.MergeMatchResult(matchResult, mergeSettings, "; ");

// 5. Keep only master records
var deleteSetting = new DeleteFromMatchResultSetting
{
    DeleteSetting = DeleteMatchResultSetting.NonMaster
};
var finalResult = api.DeleteMergeMatchResult(mergedResult, deleteSetting);

// 6. Export clean data
SaveToDatabase(finalResult);

Example 2: Match Against Trusted Source

Validate new customer records against existing master database:

var newRecords = LoadNewCustomers();
var trustedRecords = LoadMasterDatabase();

var matchParameter = new MatchParameter
{
    FuzzyAlgorithm = MatchAlgorithm.WinPureFuzzy,
    CheckInternal = false,  // Match BETWEEN tables
    MainTable = "NewCustomers",
    SearchDeep = 10
};

// Configure matching on Company Name and Address
// ... (add match groups and conditions)

var tableParams = new List<TableParameter>
{
    new TableParameter { TableName = "NewCustomers", TableData = newRecords },
    new TableParameter { TableName = "TrustedSource", TableData = trustedRecords }
};

var matchResult = api.MatchData(
    tableParams,
    matchParameter,
    fieldMappings,
    MatchFlowType.MixedFlow,
    cancellationToken
);

// Analyze results to find new vs existing customers
var unmatchedNew = matchResult.AsEnumerable()
    .Where(row => row["TableSource"].ToString() == "NewCustomers" &&
                  row["MatchGroupID"] == DBNull.Value);

SaveToMasterDatabase(unmatchedNew);

Example 3: Progress Monitoring

Track progress of lengthy operations:

var api = new WinPureApi();

api.OnProgress += (description, percent) =>
{
    Console.SetCursorPosition(0, Console.CursorTop);
    Console.Write($"[{new string('=', percent / 2)}{new string(' ', 50 - percent / 2)}] {percent}% - {description}");
};

try
{
    var matchResult = api.MatchData(/* ... */);
    Console.WriteLine("\nMatching completed!");
}
finally
{
    api.OnProgress -= progressReporter;
}

Best Practices & Performance Tips

Data Preparation

  1. Clean Before Matching - Always clean data before matching for better accuracy
  2. Use Appropriate Data Types - Ensure ColumnDataType matches actual data type
  3. Handle Null Values - Set IncludeNullValues and IncludeEmpty explicitly

Matching Configuration

  1. Choose the Right Algorithm
    • WinPureFuzzy: Best for names, addresses
    • JaroWinkler: Good for typos and variations
    • Jaro: Balanced general-purpose matching
  2. Optimize Confidence Levels - Start with 0.85 and adjust based on results
  3. Use SearchDeep Wisely - Recommended: 10 for most cases

Performance Optimization

  1. Use MixedFlow - Enables parallel processing for better performance
  2. Optimize DataTable Size - Process data in batches for very large datasets
  3. Minimize Field Mappings - Only include necessary columns in output
  4. Use CancellationToken - Always provide cancellation token for long operations

Sample Performance Benchmarks

Based on typical hardware (modern server with SSD):

Records Operation Approximate Time
10,000 Clean 2-5 seconds
10,000 Match (single table) 5-10 seconds
10,000 vs 10,000 Cross-table match 10-20 seconds
100,000 Clean 20-40 seconds
100,000 Match (single table) 60-120 seconds

Migration Guide

This section is reserved for future migration guidance when upgrading from previous API versions.

If you are currently using an older version of the WinPure API and need migration assistance, please contact WinPure support.

Troubleshooting

Common API Exceptions

WinPureAPIWrongParametersException

Symptom: Exception thrown during MatchData() call with message about wrong parameters.

Common Causes:

  • CheckInternal is true but multiple tables provided
  • CheckInternal is false but only one table provided
  • Missing required MainTable property
  • MatchGroup or MatchCondition configuration mismatch

Solution:

// For single-table matching (find duplicates within one table)
var matchParameter = new MatchParameter
{
    CheckInternal = true,  // Must be true for single table
    MainTable = "Customers",
    SearchDeep = 10
};

// Provide only ONE table
var result = api.MatchData(
    new List<TableParameter> { singleTable },
    matchParameter,
    fieldMappings,
    MatchFlowType.MixedFlow,
    cancellationToken
);

// For cross-table matching (match between two tables)
var matchParameter = new MatchParameter
{
    CheckInternal = false,  // Must be false for cross-table
    MainTable = "NewCustomers",
    SearchDeep = 10
};

// Provide MULTIPLE tables
var result = api.MatchData(
    new List<TableParameter> { table1, table2 },
    matchParameter,
    fieldMappings,
    MatchFlowType.MixedFlow,
    cancellationToken
);

WinPureAPINoTableException

Symptom: Exception indicating that a required table is missing or cannot be found.

Common Causes:

  • TableParameter.TableData is null
  • TableParameter.TableName doesn't match the name referenced in MatchField objects
  • Empty TableParameter list passed to MatchData()

Solution:

// Ensure table is properly initialized
var table = new TableParameter
{
    TableName = "Customers",  // This name must match MatchField references
    TableData = dataTable     // Must not be null
};

// Verify DataTable is populated
if (table.TableData == null || table.TableData.Rows.Count == 0)
{
    throw new InvalidOperationException("Cannot match empty table");
}

// Ensure MatchField references correct table name
var matchField = new MatchField
{
    TableName = "Customers",  // Must match TableParameter.TableName
    ColumnName = "Company Name",
    ColumnDataType = typeof(string).ToString()
};

WinPureAPINoFieldException

Symptom: Exception indicating a referenced field/column doesn't exist in the table.

Common Causes:

  • Column name in MatchField.ColumnName doesn't exist in the DataTable
  • Typo in column name (case-sensitive in some scenarios)
  • Column was removed or renamed during cleansing but MatchField still references old name

Solution:

// Always verify column exists before creating MatchField
var dataTable = GetYourData();

string columnName = "Company Name";
if (!dataTable.Columns.Contains(columnName))
{
    throw new InvalidOperationException($"Column '{columnName}' not found in table");
}

// Create MatchField only after verification
var matchField = new MatchField
{
    TableName = "Customers",
    ColumnName = columnName,  // Verified to exist
    ColumnDataType = dataTable.Columns[columnName].DataType.ToString()
};

// Debugging: Print all available column names
Console.WriteLine("Available columns:");
foreach (DataColumn col in dataTable.Columns)
{
    Console.WriteLine($"  - {col.ColumnName} ({col.DataType})");
}

WinPureAPIWrongConditionException

Symptom: Exception thrown due to invalid match condition configuration.

Common Causes:

  • MatchCondition.Level is outside valid range (0.0 - 1.0)
  • MatchCondition.Weight is outside valid range (0.0 - 1.0)
  • MatchCondition.Fields list is empty
  • MatchGroup.Conditions list is empty

Solution:

// Ensure all condition values are in valid ranges
var matchCondition = new MatchCondition
{
    MatchingType = MatchType.Fuzzy,
    Level = 0.85,    // Must be between 0.0 and 1.0
    Weight = 1.0,    // Must be between 0.0 and 1.0
    IncludeEmpty = false,
    IncludeNullValues = false
};

// Must add at least one field
matchCondition.Fields.Add(new MatchField
{
    TableName = "Customers",
    ColumnName = "Company Name",
    ColumnDataType = typeof(string).ToString()
});

// Validate before using
if (matchCondition.Level < 0 || matchCondition.Level > 1)
{
    throw new ArgumentException("Match level must be between 0 and 1");
}

if (matchCondition.Fields.Count == 0)
{
    throw new InvalidOperationException("MatchCondition must have at least one field");
}

License Issues

"License File Not Found" or Registration Fails

Problem: api.Register(licenseFilePath) returns LicenseState.Invalid or throws exception.

Solutions:

  • Verify file path: Use absolute path, not relative path
  • Check file exists: Use File.Exists() to verify
  • Verify file extension: Must be .license
  • Check file permissions: Ensure application has read access
string licenseFilePath = @"C:\Path\To\WinPure.license";

// Verify file exists
if (!File.Exists(licenseFilePath))
{
    Console.WriteLine($"ERROR: License file not found at: {licenseFilePath}");
    Console.WriteLine($"Current directory: {Directory.GetCurrentDirectory()}");
    return;
}

// Verify file is readable
try
{
    using (var fs = File.OpenRead(licenseFilePath))
    {
        Console.WriteLine($"License file is readable, size: {fs.Length} bytes");
    }
}
catch (Exception ex)
{
    Console.WriteLine($"ERROR: Cannot read license file: {ex.Message}");
    return;
}

// Attempt registration
var licenseState = api.Register(licenseFilePath);
Console.WriteLine($"License state after registration: {licenseState}");

"License Expired" or LicenseState.LicenseExpire

Problem: CheckLicenseState() returns LicenseState.LicenseExpire.

Solutions:

  • Check license details: Use GetFullLicenseInfo() to see expiration date
  • Contact WinPure: Request license renewal
  • Update license file: Register new license file when received
var api = new WinPureApi();
var licenseState = api.CheckLicenseState();

if (licenseState == LicenseState.LicenseExpire)
{
    Console.WriteLine("Your license has expired.");

    // Get detailed license information
    var licenseInfo = api.GetFullLicenseInfo();
    foreach (var kvp in licenseInfo)
    {
        Console.WriteLine($"{kvp.Key}: {kvp.Value}");
    }

    Console.WriteLine("\nPlease contact WinPure to renew your license.");
    Console.WriteLine("Your registration code: " + api.GetRegistrationCode());
}

Demo Mode Limitations - Record Count Exceeded

Problem: Operation completes but results are truncated or operation stops unexpectedly.

Explanation: Demo mode has record limitations on matching operations.

Solutions:

  • Test with smaller datasets: Reduce data for evaluation purposes
  • Register license: Full license removes all limitations
  • Contact WinPure: Request evaluation license with higher limits
var api = new WinPureApi();
var licenseState = api.CheckLicenseState();

if (licenseState == LicenseState.Demo)
{
    Console.WriteLine("WARNING: Running in Demo mode with record limitations");
    Console.WriteLine("For full functionality, register a license");

    // Limit your test data accordingly
    int maxDemoRecords = 1000;  // Conservative estimate
    if (dataTable.Rows.Count > maxDemoRecords)
    {
        Console.WriteLine($"Trimming dataset from {dataTable.Rows.Count} to {maxDemoRecords} rows for demo");

        // Keep only first N rows
        for (int i = dataTable.Rows.Count - 1; i >= maxDemoRecords; i--)
        {
            dataTable.Rows.RemoveAt(i);
        }
    }
}

Data Issues

No Matches Found (Expected Duplicates)

Problem: MatchData() returns table with no MatchGroupID values, but you know duplicates exist.

Common Causes:

  • Match confidence level (Level) is too high (too strict)
  • Group level is too high
  • Data hasn't been cleaned (extra spaces, special characters interfere)
  • Wrong algorithm selected for data type
  • IncludeEmpty or IncludeNullValues filtering out records

Solutions:

// 1. Lower confidence thresholds
var matchCondition = new MatchCondition
{
    Level = 0.75,  // Try lowering from 0.9 to 0.75
    MatchingType = MatchType.Fuzzy
};

var matchGroup = new MatchGroup
{
    GroupLevel = 0.75  // Lower this as well
};

// 2. Clean data before matching
var cleanSettings = new WinPureCleanSettings();
cleanSettings.TextCleanerSettings.Add(new TextCleanerSetting
{
    ColumnName = "Company Name",
    RemoveMultipleSpaces = true,
    RemoveNonPrintableCharacters = true,
    RemoveLeadingSpace = true,
    RemoveTrailingSpace = true
});

api.CleanTable(dataTable, cleanSettings, cancellationToken);

// 3. Try different algorithm
var matchParameter = new MatchParameter
{
    FuzzyAlgorithm = MatchAlgorithm.WinPureFuzzy  // Try WinPureFuzzy instead of Jaro
};

// 4. Include empty/null values if needed
matchCondition.IncludeEmpty = true;
matchCondition.IncludeNullValues = true;

Too Many False Positives

Problem: Records that aren't duplicates are being matched together.

Common Causes:

  • Match confidence level is too low (too lenient)
  • Matching on insufficient fields
  • Data has many similar but distinct values

Solutions:

// 1. Increase confidence thresholds
var matchCondition = new MatchCondition
{
    Level = 0.92,  // Increase from 0.85 to 0.92 for stricter matching
    MatchingType = MatchType.Fuzzy
};

// 2. Add additional match conditions (AND logic)
var matchGroup = new MatchGroup
{
    GroupLevel = 0.9
};

// Match on Company Name
var condition1 = new MatchCondition
{
    Level = 0.9,
    Weight = 1.0,
    MatchingType = MatchType.Fuzzy
};
condition1.Fields.Add(new MatchField
{
    ColumnName = "Company Name",
    TableName = "Customers",
    ColumnDataType = typeof(string).ToString()
});

// ALSO match on City (increases precision)
var condition2 = new MatchCondition
{
    Level = 0.85,
    Weight = 1.0,
    MatchingType = MatchType.Fuzzy
};
condition2.Fields.Add(new MatchField
{
    ColumnName = "City",
    TableName = "Customers",
    ColumnDataType = typeof(string).ToString()
});

matchGroup.Conditions.Add(condition1);
matchGroup.Conditions.Add(condition2);

// 3. Use DirectCompare for critical exact-match fields
var exactMatchCondition = new MatchCondition
{
    MatchingType = MatchType.DirectCompare,  // Exact match
    Level = 1.0,
    Weight = 1.0
};
exactMatchCondition.Fields.Add(new MatchField
{
    ColumnName = "AccountNumber",
    TableName = "Customers",
    ColumnDataType = typeof(string).ToString()
});

Performance Issues with Large Datasets

Problem: Matching operation is very slow or appears to hang.

Common Causes:

  • SearchDeep is too high (> 20)
  • Dataset is very large (> 500,000 records)
  • Using DirectFlow instead of MixedFlow
  • Matching on too many conditions simultaneously

Solutions:

// 1. Optimize SearchDeep parameter
var matchParameter = new MatchParameter
{
    SearchDeep = 10,  // Recommended value - don't go above 20
    FuzzyAlgorithm = MatchAlgorithm.WinPureFuzzy
};

// 2. Use MixedFlow for parallel processing
var result = api.MatchData(
    tables,
    matchParameter,
    fieldMappings,
    MatchFlowType.MixedFlow,  // Enables parallel processing
    cancellationToken
);

// 3. Process data in batches for very large datasets
int batchSize = 50000;
var allResults = new List<DataTable>();

for (int i = 0; i < fullDataTable.Rows.Count; i += batchSize)
{
    var batchTable = fullDataTable.Clone();

    int rowsToTake = Math.Min(batchSize, fullDataTable.Rows.Count - i);
    for (int j = 0; j < rowsToTake; j++)
    {
        batchTable.ImportRow(fullDataTable.Rows[i + j]);
    }

    Console.WriteLine($"Processing batch {i / batchSize + 1}, rows {i} to {i + rowsToTake}");

    var batchResult = api.MatchData(/* ... */);
    allResults.Add(batchResult);
}

// 4. Monitor progress
api.OnProgress += (description, percent) =>
{
    Console.WriteLine($"[{percent}%] {description}");
};

// 5. Reduce FieldMapping output columns
// Only include columns you actually need in results
var fields = new List<FieldMapping>
{
    new FieldMapping { FieldName = "Company Name" },
    new FieldMapping { FieldName = "City" }
    // Don't include unnecessary columns
};

Configuration Issues

FieldMapping and MatchField Confusion

Problem: Unclear about the difference between FieldMapping and MatchField.

Explanation:

  • MatchField: Defines which fields to COMPARE during matching
  • FieldMapping: Defines which fields to INCLUDE in the output results

Best Practice:

// MatchField - used in MatchCondition to define comparison logic
var matchCondition = new MatchCondition
{
    Level = 0.85,
    MatchingType = MatchType.Fuzzy
};

// This MatchField says: "Compare Company Name column during matching"
matchCondition.Fields.Add(new MatchField
{
    TableName = "Customers",
    ColumnName = "Company Name",
    ColumnDataType = typeof(string).ToString()
});

// FieldMapping - defines output columns in result table
var fieldMappings = new List<FieldMapping>();

// This FieldMapping says: "Include Company Name in output results"
var mapping1 = new FieldMapping { FieldName = "Company Name" };
mapping1.FieldMap.Add(new MatchField
{
    TableName = "Customers",
    ColumnName = "Company Name",
    ColumnDataType = typeof(string).ToString()
});
fieldMappings.Add(mapping1);

// Include additional output columns
var mapping2 = new FieldMapping { FieldName = "Address" };
mapping2.FieldMap.Add(new MatchField
{
    TableName = "Customers",
    ColumnName = "Address",
    ColumnDataType = typeof(string).ToString()
});
fieldMappings.Add(mapping2);

// Result: Matching compares Company Name, but output includes both Company Name and Address

Cross-Table Matching Configuration

Problem: Confused about how to set up matching between two different tables.

Solution - Complete Example:

// Two different tables
var newCustomers = LoadNewCustomerData();      // DataTable 1
var existingCustomers = LoadExistingData();    // DataTable 2

var table1 = new TableParameter
{
    TableName = "NewCustomers",
    TableData = newCustomers
};

var table2 = new TableParameter
{
    TableName = "ExistingCustomers",
    TableData = existingCustomers
};

// Match parameter - CheckInternal MUST be false
var matchParameter = new MatchParameter
{
    CheckInternal = false,  // FALSE = match BETWEEN tables
    MainTable = "NewCustomers",  // Which table is primary
    SearchDeep = 10,
    FuzzyAlgorithm = MatchAlgorithm.WinPureFuzzy
};

// Match condition - reference BOTH tables
var matchCondition = new MatchCondition
{
    Level = 0.85,
    Weight = 1.0,
    MatchingType = MatchType.Fuzzy
};

// Add field from FIRST table
matchCondition.Fields.Add(new MatchField
{
    TableName = "NewCustomers",
    ColumnName = "Company Name",
    ColumnDataType = typeof(string).ToString()
});

// Add corresponding field from SECOND table
matchCondition.Fields.Add(new MatchField
{
    TableName = "ExistingCustomers",
    ColumnName = "Company Name",
    ColumnDataType = typeof(string).ToString()
});

var matchGroup = new MatchGroup { GroupLevel = 0.85 };
matchGroup.Conditions.Add(matchCondition);
matchParameter.Groups.Add(matchGroup);

// Field mappings - map corresponding columns from both tables
var fieldMappings = new List<FieldMapping>();

var companyMapping = new FieldMapping { FieldName = "Company Name" };
companyMapping.FieldMap.Add(new MatchField
{
    TableName = "NewCustomers",
    ColumnName = "Company Name",
    ColumnDataType = typeof(string).ToString()
});
companyMapping.FieldMap.Add(new MatchField
{
    TableName = "ExistingCustomers",
    ColumnName = "Company Name",
    ColumnDataType = typeof(string).ToString()
});
fieldMappings.Add(companyMapping);

// Execute match
var result = api.MatchData(
    new List<TableParameter> { table1, table2 },  // Both tables
    matchParameter,
    fieldMappings,
    MatchFlowType.MixedFlow,
    cancellationToken
);

// Result contains records from BOTH tables
// Use "TableSource" column to identify which table each row came from
foreach (DataRow row in result.Rows)
{
    Console.WriteLine($"Source: {row[\"TableSource\"]}, MatchGroupID: {row[\"MatchGroupID\"]}");

Debugging Template

Use this template to diagnose matching issues:

using WinPure.API.Core;
using WinPure.Matching.Models;
using System.Data;

var api = new WinPureApi();

// 1. Check license
Console.WriteLine($"License State: {api.CheckLicenseState()}");

// 2. Verify data load
var dataTable = LoadYourData();
Console.WriteLine($"Data loaded: {dataTable.Rows.Count} rows, {dataTable.Columns.Count} columns");
Console.WriteLine("Columns:");
foreach (DataColumn col in dataTable.Columns)
{
    Console.WriteLine($"  - {col.ColumnName} ({col.DataType.Name})");
}

// 3. Check for null/empty values in key columns
var keyColumn = "Company Name";
int nullCount = dataTable.AsEnumerable().Count(row => row.IsNull(keyColumn));
int emptyCount = dataTable.AsEnumerable().Count(row =>
    !row.IsNull(keyColumn) && string.IsNullOrWhiteSpace(row[keyColumn].ToString()));
Console.WriteLine($"{keyColumn}: {nullCount} nulls, {emptyCount} empty");

// 4. Monitor progress
api.OnProgress += (desc, pct) => Console.WriteLine($"[{pct}%] {desc}");

// 5. Try matching with verbose error handling
try
{
    Console.WriteLine("Starting match operation...");

    var result = api.MatchData(
        tables,
        matchParameter,
        fieldMappings,
        MatchFlowType.MixedFlow,
        cancellationToken
    );

    Console.WriteLine($"Match complete. Result: {result.Rows.Count} rows");

    // Check for matches
    int matchedCount = result.AsEnumerable()
        .Count(row => !row.IsNull("MatchGroupID"));
    Console.WriteLine($"Records with matches: {matchedCount}");

    // Show sample matches
    var sampleMatches = result.AsEnumerable()
        .Where(row => !row.IsNull("MatchGroupID"))
        .Take(5);

    Console.WriteLine("\nSample matches:");
    foreach (var row in sampleMatches)
    {
        Console.WriteLine($"  GroupID: {row["MatchGroupID"]}, " +
                         $"Score: {row["MatchScore"]}, " +
                         $"Name: {row["Company Name"]}");
    }
}
catch (Exception ex)
{
    Console.WriteLine($"ERROR: {ex.GetType().Name}");
    Console.WriteLine($"Message: {ex.Message}");
    if (ex.InnerException != null)
    {
        Console.WriteLine($"Inner: {ex.InnerException.Message}");
    }
    Console.WriteLine($"Stack: {ex.StackTrace}");
}

Frequently Asked Questions (FAQ)

Licensing & Pricing

Q: How much does the WinPure API license cost?

A: Pricing information is available directly from WinPure. Contact WinPure sales with your registration code to receive a customized quote based on your requirements, usage volume, and deployment scenario.

Q: Can I use the API without a license?

A: Yes, the API operates in Demo mode without a license, which allows you to evaluate all features with record limitations on matching operations. Demo mode is suitable for development, testing, and proof-of-concept work.

Q: Is the license tied to my hardware?

A: Yes, licenses are hardware-bound based on your machine's unique registration code (derived from CPU, motherboard, and network adapter identifiers). If you need to move the API to different hardware, contact WinPure to transfer your license.

Q: What happens when my license expires?

A: When a license expires, CheckLicenseState() will return LicenseState.LicenseExpire. You'll need to renew your license with WinPure and register the new license file to continue using the API.

Q: Can I use one license on multiple servers?

A: No, each license is tied to specific hardware. For multi-server deployments, you'll need separate licenses for each server. Contact WinPure sales for volume licensing options.

Algorithm Selection

Q: Which matching algorithm should I use?

A: For 90% of business applications, use MatchAlgorithm.WinPureFuzzy. It's specifically optimized for:

  • Company names (handles "Acme Corp" vs "ACME Corporation")
  • Person names (emphasizes first/last name matching)
  • Street addresses (prioritizes street number and name)

For general-purpose matching where the beginning of strings isn't critical, use MatchAlgorithm.Jaro or MatchAlgorithm.JaroWinkler.

Q: What's the difference between Jaro and JaroWinkler?

A:

  • Jaro: Balanced similarity metric that treats all parts of the string equally
  • JaroWinkler: Extends Jaro with a prefix bonus - strings that match at the beginning get higher scores

Use JaroWinkler when the beginning of strings is more important (e.g., surnames, company names).

Q: When should I use DirectCompare instead of Fuzzy matching?

A: Use MatchType.DirectCompare (exact matching) for:

  • Unique identifiers (Account Numbers, Order IDs)
  • Standardized codes (State codes "CA", "NY")
  • Dates and timestamps
  • Any field where even small differences indicate different entities

Combine DirectCompare with Fuzzy matching in multiple conditions for best results.

Performance

Q: How many records can the API handle?

A: The API can process millions of records. Practical limits depend on available memory and time constraints:

  • Up to 100,000 records: Process in single operation
  • 100,000 - 1,000,000 records: Consider batching in 50,000-100,000 record chunks
  • Over 1,000,000 records: Use batching and consider database-resident processing

Q: What's the difference between MixedFlow and DirectFlow?

A:

  • MixedFlow (recommended): Parallel processing, faster performance, suitable for most scenarios
  • DirectFlow: Sequential processing, slower but more deterministic, useful for debugging

Always use MatchFlowType.MixedFlow in production for better performance.

Q: How can I speed up matching operations?

A: Best practices for performance:

  1. Use MatchFlowType.MixedFlow
  2. Keep SearchDeep at 10 (don't exceed 20)
  3. Clean data before matching (reduces comparison complexity)
  4. Minimize the number of FieldMapping entries (only include needed columns)
  5. Process data in batches for very large datasets
  6. Use IncludeEmpty = false and IncludeNullValues = false to skip empty records

Q: Why is SearchDeep set to 10? What does it do?

A: SearchDeep controls matching thoroughness:

  • Lower values (1-5): Faster but may miss some duplicates
  • 10 (recommended): Optimal balance between speed and accuracy
  • Higher values (15-100): More thorough but significantly slower

In most real-world datasets, values above 10 provide diminishing returns.

Integration

Q: Can I use the API with Entity Framework?

A: Yes, convert your Entity Framework query results to DataTable:

// Query with Entity Framework
var customers = dbContext.Customers.Where(c => c.IsActive).ToList();

// Convert to DataTable
var dataTable = new DataTable("Customers");
dataTable.Columns.Add("Company Name", typeof(string));
dataTable.Columns.Add("City", typeof(string));

foreach (var customer in customers)
{
    dataTable.Rows.Add(customer.CompanyName, customer.City);
}

// Use with WinPure API
var result = api.MatchData(/* ... */);

Q: Can I use the API in a web application (ASP.NET)?

A: Yes, the API works in ASP.NET applications. Recommendations:

  • Initialize WinPureApi as a singleton service
  • Use background tasks or queues for long-running operations
  • Implement proper cancellation token handling
  • Consider response timeout limits (matching can take minutes for large datasets)
// Register in Startup.cs or Program.cs
public void ConfigureServices(IServiceCollection services)
{
    services.AddSingleton<WinPureApi>();
}

Q: Can I load data from SQL Server or other databases?

A: Yes, use ADO.NET to load data into a DataTable:

using System.Data.SqlClient;

var connectionString = "YourConnectionString";
var dataTable = new DataTable();

using (var connection = new SqlConnection(connectionString))
{
    connection.Open();
    var command = new SqlCommand("SELECT * FROM Customers", connection);
    using (var adapter = new SqlDataAdapter(command))
    {
        adapter.Fill(dataTable);
    }
}

// Now use dataTable with WinPure API

Q: Can I use the API with CSV files?

A: Yes, read CSV into DataTable first:

using CsvHelper;
using System.Globalization;

var dataTable = new DataTable();
using (var reader = new StreamReader("customers.csv"))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
    using (var dr = new CsvDataReader(csv))
    {
        dataTable.Load(dr);
    }
}

// Now use dataTable with WinPure API

Data Format

Q: Do I need to clean data before matching?

A: While not required, cleaning data before matching significantly improves accuracy:

  • Removes extra spaces, special characters, and inconsistent formatting
  • Standardizes text casing
  • Replaces abbreviations with full forms

Best practice: Always run CleanTable() before MatchData().

Q: What data types are supported for matching?

A: The API primarily matches string (text) data. Numeric and date fields can be compared using MatchType.DirectCompare for exact matching. For fuzzy matching, data should be strings or convertible to strings.

Q: How do I handle missing or null values?

A: Use the IncludeEmpty and IncludeNullValues properties:

var matchCondition = new MatchCondition
{
    IncludeEmpty = false,        // Skip empty strings
    IncludeNullValues = false,   // Skip NULL values
    // ...
};

// Or include them if they're meaningful
matchCondition.IncludeEmpty = true;
matchCondition.IncludeNullValues = true;

Q: Can I match on multiple fields at once?

A: Yes, create multiple MatchCondition objects within a MatchGroup:

var matchGroup = new MatchGroup
{
    GroupLevel = 0.85
};

// Condition 1: Match on Company Name
var condition1 = new MatchCondition { Level = 0.9 };
condition1.Fields.Add(new MatchField { ColumnName = "Company Name" });

// Condition 2: Match on City
var condition2 = new MatchCondition { Level = 0.85 };
condition2.Fields.Add(new MatchField { ColumnName = "City" });

matchGroup.Conditions.Add(condition1);
matchGroup.Conditions.Add(condition2);

// Both conditions must pass for records to match (AND logic)

Comparison

Q: How does WinPure compare to other deduplication tools?

A: Key differentiators:

  • On-premises: All processing happens locally (no cloud dependency, ideal for compliance)
  • Embeddable: Integrate into your existing applications via API
  • No per-record pricing: License-based, not consumption-based
  • Multiple algorithms: Choose the best matching algorithm for your data
  • .NET native: First-class .NET integration, no external dependencies

Q: When should I use the WinPure API vs the desktop application?

A:

  • Use API: Automated workflows, custom applications, integration with existing systems, scheduled processing
  • Use Desktop: Ad-hoc analysis, one-time cleansing, visual data exploration, learning the tool

Many users start with desktop to learn the concepts, then move to API for production automation.

Support & Updates

Q: How do I get support?

A:

  • Documentation: This API documentation and included sample project
  • Technical Support: Contact WinPure support (available to licensed users)
  • Website: Visit https://winpure.com for resources

Q: How do I update to a newer version?

A: Update via NuGet Package Manager:

dotnet add package WinPure.DataMatching.API.Core --version [new-version]

Or in Visual Studio: Right-click project → Manage NuGet Packages → Update WinPure.DataMatching.API.Core

Always check the Migration Guide for any breaking changes.

Q: Are there code examples available?

A: Yes:

  • Sample Project: Included with the NuGet package (WinPure.ApiSample)
  • Documentation: See [Code Examples](#code-examples) section
  • This FAQ: Contains many code snippets

Glossary

A

ADO.NET DataTable
A core .NET class representing an in-memory table of data. The WinPure API processes DataTable objects, allowing integration with any data source that supports ADO.NET.

Algorithm
A computational procedure for comparing strings or records. The WinPure API provides multiple fuzzy matching algorithms including Jaro, JaroWinkler, WinPureFuzzy, ChapmanLengthDeviation, and SmithWatermanGotoh.

C

CancellationToken
A .NET mechanism for cooperative cancellation of long-running operations. Pass a CancellationToken to API methods to enable cancellation of matching or cleansing operations.

Case Converter
A cleansing operation that standardizes text casing (upper, lower, proper case). See CaseConverterSetting in API Reference.

ChapmanLengthDeviation
A fuzzy matching algorithm that emphasizes string length similarity. Useful when matching records of similar size is important. One of the MatchAlgorithm values.

CheckInternal
A critical MatchParameter property that determines matching behavior:
true = Find duplicates within a single table (self-matching)
false = Match records between different tables (cross-table matching)

Cleansing
The process of standardizing and correcting data using operations like text cleaning, case conversion, column splitting/merging, and word replacement.

Column Merge
A cleansing operation that combines multiple columns into one field. Example: merging FirstName and LastName into FullName.

Column Split
A cleansing operation that divides a single column into multiple fields. Example: splitting "New York, NY" into City and State.

Confidence Level
A numeric threshold (0.0 to 1.0) that determines how similar two records must be to be considered a match. Higher values require stricter similarity. See MatchCondition.Level.

Cross-Table Matching
Matching records between two or more different tables. Requires CheckInternal = false and multiple TableParameter objects.

D

DataField
Metadata describing a column in your dataset, used for profiling operations. Contains properties like DatabaseName, DisplayName, FieldType, and Pattern.

Deduplication
The process of identifying and consolidating duplicate records in a dataset. A primary use case for the WinPure API.

Demo Mode
Unlicensed operation mode with record limitations on matching operations. Suitable for evaluation and testing. Check with CheckLicenseState().

DirectCompare
Exact matching mode (MatchType.DirectCompare) that requires perfect equality. Used for IDs, codes, and other fields where even small differences are significant.

DirectFlow
Sequential processing mode (MatchFlowType.DirectFlow). Slower than MixedFlow but more deterministic. Useful for debugging.

E

Entity Resolution
The process of determining when multiple records refer to the same real-world entity, then consolidating them into a single master record.

F

FieldMapping
Defines which columns to include in the match result output and how to map corresponding fields from different tables.

Fuzzy Matching
Approximate string matching that identifies similar (not necessarily identical) records. Controlled by MatchType.Fuzzy and confidence thresholds.

G

GroupLevel
A threshold in MatchGroup that determines the minimum similarity score for records to be grouped together as duplicates.

H

Hardware-Bound License
A license tied to specific hardware characteristics (CPU, motherboard, MAC address). Prevents unauthorized license sharing across different machines.

I

IncludeEmpty
A MatchCondition property that determines whether empty strings should be considered during matching.

IncludeNullValues
A MatchCondition property that determines whether NULL database values should be considered during matching.

IsMaster
A boolean column added to match results after calling DefineMasterRecord(). Indicates which record in each duplicate group is the master.

J

Jaro
A fuzzy matching algorithm that provides balanced similarity measurement treating all parts of a string equally. One of the MatchAlgorithm values.

JaroWinkler
An extension of Jaro that gives additional weight to strings that match at the beginning. Good for names and addresses. One of the MatchAlgorithm values.

L

Level
The similarity threshold (0.0 to 1.0) in a MatchCondition. Records must meet or exceed this value to be considered a match.

License State
The current status of your API license. Values include Valid, Demo, LicenseExpire, DemoExpire, Invalid, and Free. Check with CheckLicenseState().

M

MainTable
A MatchParameter property that specifies the primary table name for matching operations.

Master Record
The "best" or most authoritative record selected from a group of duplicates. Defined using DefineMasterRecord() with business rules.

MasterRecordRule
A rule for selecting the master record from duplicates. Types include IsLongest, IsMaximum, IsEmpty, IsEqual, and Common.

Match Algorithm
The computational method used for fuzzy string comparison. Options: Jaro, JaroWinkler, WinPureFuzzy, ChapmanLengthDeviation, SmithWatermanGotoh.

Match Condition
A single comparison rule within a MatchGroup. Defines which fields to compare, the matching type (fuzzy or exact), and the confidence threshold.

Match Group
A collection of MatchCondition objects that work together to identify duplicates. Multiple conditions in a group use AND logic.

MatchGroupID
A column added to match results that groups duplicate records together. All records with the same MatchGroupID are considered duplicates.

Match Parameter
The main configuration object for matching operations. Contains algorithm selection, CheckInternal setting, SearchDeep, and MatchGroup collections.

MatchScore
A numeric value (0.0 to 1.0) in match results indicating how similar two records are. Higher scores mean greater similarity.

MatchType
Determines comparison method: DirectCompare (exact) or Fuzzy (approximate). Set in MatchCondition.MatchingType.

Merge
The process of combining information from duplicate records into master records. Controlled by MergeMatchResult() method.

MixedFlow
Parallel processing mode (MatchFlowType.MixedFlow) that offers better performance. Recommended for production use.

O

OnProgress
An event in WinPureApi that reports operation progress. Subscribe to receive periodic updates during long-running operations.

P

Profiling
Statistical analysis of data quality. The CalculateStatistic() method generates metrics like empty count, unique values, and data patterns.

Proper Case
Text casing where the first letter of each word is capitalized (Title Case). Example: "john smith" becomes "John Smith".

R

Registration Code
A hardware-specific code generated by GetRegistrationCode(). Provide this to WinPure to receive your license file.

S

SearchDeep
A MatchParameter property (1-100) controlling matching thoroughness. Recommended value: 10. Higher values are more thorough but slower.

Self-Matching
Finding duplicates within a single table. Requires CheckInternal = true and one TableParameter.

SmithWatermanGotoh
A local sequence alignment algorithm useful for detailed pattern matching. One of the MatchAlgorithm values. More commonly used in bioinformatics.

T

TableParameter
A wrapper object containing a DataTable and associated TableName, used as input to matching operations.

TableSource
A column added to match results in cross-table matching that identifies which source table each record came from.

Text Cleaner
A cleansing operation that removes, replaces, or normalizes characters and text patterns. See TextCleanerSetting.

W

Weight
A value in MatchCondition that determines the relative importance of a condition when multiple conditions exist. Currently used for internal scoring.

WinPureApi
The main API class providing all data cleansing, matching, and profiling functionality. Initialize with new WinPureApi().

WinPureCleanSettings
The main configuration object for cleansing operations. Contains collections for TextCleanerSettings, CaseConverterSettings, ColumnSplitSettings, ColumnMergeSettings, and WordManagerSettings.

WinPureFuzzy
WinPure's proprietary fuzzy matching algorithm optimized for business data. Emphasizes matching at the beginning of strings. Recommended for company names, person names, and addresses. One of the MatchAlgorithm values.

Word Manager
A cleansing operation that finds and replaces specific words or phrases with standardized terms. Example: replacing "Corp" with "Corporation".


Support & Resources

Getting Help

  • Documentation: This document
  • Sample Project: Included with the API package (WinPure.ApiSample)
  • Technical Support: Contact WinPure support team
  • Website: https://winpure.com

Licensing & Sales

To obtain a license or request a demo:

  1. Initialize the API and get your registration code
  2. Contact WinPure with your registration code
  3. Receive your .license file
  4. Register using api.Register("path/to/license.file")

Sample Project

The WinPure.ApiSample project (included) demonstrates:

  • Data cleansing workflow
  • Single-table matching
  • Cross-table matching
  • Fuzzy search
  • Configuration loading from JSON
  • Best practices