Pattern Manager

Pattern Manager is a smart feature within our software that allows users to identify, validate, and manage data formats across any dataset using prebuilt or custom patterns. It is designed to make pattern recognition, data profiling, and correction faster, more accurate, and easier to scale.

Pattern Manager provides an efficient and scalable way to maintain format consistency across millions of records.

Key Features:

20+ Inbuilt Pattern Types
Instantly map data columns to common data patterns such as:

US ZIP Code
Email Address
UK Postcode
Phone Number (various formats)
Credit Card Numbers
National IDs
ISO Date/Time formats
…and many more.

Custom Patterns Using Regular Expressions
Create your own pattern using regex (regular expressions) for company-specific or industry-specific formats. Save and reuse them across projects.
Automatic Validation
Pattern Manager automatically scans your data column and analyzes:

How many values match the selected pattern (valid)
How many values don’t match (invalid)
A breakdown of pattern match percentage

Statistics Module Integration
Mapped patterns are visually highlighted in the data profiling/statistics module. You’ll get:

A visual cue showing which columns have been mapped
A summary of valid vs invalid values

Example Use Case: Validating US ZIP Codes

Step 1: Map a Pattern
You have a dataset with a column called PostalCode. You select the "US ZIP Code" pattern from the Pattern Manager's library.

Step 2: Automatic Validation
Pattern Manager scans the column and finds:

✅ 1,250 valid ZIP Codes (e.g., 90210, 10001)
❌ 150 invalid entries (e.g., XYZ123, 1234, blank cells)

Step 3: View Statistics
In the statistics module, the ZIP column is now marked as "Pattern Mapped: US ZIP Code". You see a summary:

991217 Valids
8393 Invalids

Step 4: Filter & Fix
You can click to view only invalid values, then:

Export them for correction
Use Word Manager to bulk change the values
Manually clean the data via the DATA module

Why It Matters?

Data integrity starts with structure. By ensuring that data values conform to expected formats, you reduce the risk of downstream errors, improve data quality scores, and build more reliable datasets.