If you’re thinking about how to test data quality in your organization then you’ll need to run through our seven-step checklist with your critical data inputs. Doing so could save you money, poor data quality costs the U.S. economy $3.1 trillion per year. To avoid bad data costs, you need to know how to test data quality and fix critical issues.

Getting started with data quality testing does not need to be overwhelming when you have “too much inconsistent data” with limited resources. This post helps break things down and will help you identify data quality issues and provide you a handy checklist.

Read further to identify:

  • What is Data Quality?
  • What Does It Mean to Test Data Quality?
  • How to Test Data Quality: A Checklist

WHAT IS DATA QUALITY?

Data quality describes information usability from tangible metrics and subjective feedback. A company with good data quality has met a minimum threshold needed to deliver business insights and streamline operations.

Organizations that know how to test data quality evaluate it through six data quality metrics: data completeness, data accuracy, timeliness, uniqueness, consistency, and validity. Combining these attributes with business goals and context gets to the types of data quality checks you need to identify issues.

HOW DO YOU IDENTIFY DATA QUALITY ISSUES?

You identify data quality issues by doing data quality testing on the most critical data and ensuring that you can repeat the problem.

To identify more central data issues, you need to do data quality testing with coverage and reproducibility. Remember that any I.T. or any technology department mainly handles just the infrastructure.

First of all, you must know the reasons why your business needs the data quality control it does to function. To do this, you need to have a planned data management strategy and especially listen to business concerns about data quality.

Coverage

You will need to be selective about your tests and which data inputs you cover. Given the number of data sources, volumes of data, and data capture speed; you will simply not have enough time to test all possible data sets and paths to identify data quality issues. Doing so will leave you overwhelmed.

Instead, you want to identify critical data inputs for data quality testing. You need this vital information for business operations and decision-making. Doing data quality testing on these inputs ensures your business survives or thrives.

Reproducibility

You may find it alarming when you find a data quality mistake. Before trying to fix the issue, you need to understand whether you or anyone else can reproduce the problem.

Sometimes data quality issues happen only once and cannot be reproduced with the same steps as before. For example, a database holding customer contact information may lose its network connection for a moment and then comes back a couple of minutes later with no further problem. You do not have a data quality issue with this contact.

You identify data quality issues by doing data quality testing on the most critical data and ensuring that you can repeat the problem.

data enrichment benefits

HOW TO TEST DATA QUALITY: A CHECKLIST

Now that you know how to identify data quality issues, use this checklist to plan and execute your data quality checking.

#1 List your Data Quality Requirements and Business Cases

The first step involves you combining your data strategies and your business discussions to develop specific business use cases and requirements. You get there by centering these examples around data quality dimensions and their key performance indicators (KPI). Make your requirements SMART (specific, measurable, achievable, relevant, and time-bound).

#2 Prioritize your Data Quality Requirements

Once you have a list of requirements and business cases, make sure to prioritize them. Your focus will depend on your business goals and strategy.

For example, Wadhwani AI identified concerning coughs from mobile phone data and reminded these people to set up a COVID-19 consult. In this case, matching the cough sounds with the right patient would probably take top data quality precedence over whether that patient tested positive for COVID-19 at the clinic.

wadhawani ai case

#3 Write and Run a Test Case/s

Write test cases to see that your critical data inputs match your highest data quality requirements. At this point, you do types of data quality checks to verify data quality attributes and expected KPIs.

At Wadhwani AI, a tester might set up a list of sample patient names and generate coughs automatically (using a software program). Then the tester writes the test steps to cause the coughs and check the database, verifying the cough data matches the patient who generated it.

#4 Think of Data Boundaries

When testing any of the six core dimensions, you want to consider the endpoints of the data sets that you test. These boundary conditions tend to have a lot of data quality issues

For example, the Wadhwani AI data set has new patients with no cough assessments and those with a large number of tests, say 731 tests over a period for at least one test per day for two years. You want to run data sets meeting both circumstances in your tests.

Even patient or customer contact information, such as names, have outliers. For example, you want to ensure accurate data when names begin with O’– like O’Brien – and two last names – like Von Trapp.

#5 Consider Negative Testing

Negative testing describes the most relevant situations where you have opposite or unexpected behaviors. The “#3 Write and Run a Test Case/s” example explored matching cough sounds with the right patient.

In an example of a negative test case, another person uses the mobile phone, like the owner’s child. The Wadhwani AI captures this cough. Would the application complete recording that cough data or put up a prompt asking for the mobile phone owner?

#6 Profile your Data Routinely

Sometimes you want to profile your data, monitoring and cleaning it systematically. This type of data quality checking reveals unexpected data quality issues through exploration.

For example, you run an automated data profiler against several data sources. You see several customers who live in Manchester have lots of blank and missing information in the state or borough field. You fix this information for your U.K. data sets.

But then you go back and profile a new set of data from your USA spreadsheets to see if contacts from Manchester-by-the-Sea or Manchester also have missing borough or state information. From that information, you figure other data sets and attributes to profile and fix.

#7 Develop a Data Quality Improvement Plan from the Results

Once you get results from your planned testing and profiling activities, you will better understand your data quality issues and their impacts. From there, you develop a plan to fix critical data quality issues or work around data less severe problems – like sending an email to those missing a street address.

For example, the Wadhwani Institute collected 3,500 cough sounds, including both COVID -positive and negative subjects, using good data quality best practices. Wadhwani noted data gaps that slowed down matching the patient with the cough correctly after testing data quality. It turned to WinPure’s data cleansing software to remedy this data quality issue.

wadhani data matching case study

Find Out About WinPure Data Quality Software

FINAL WORDS

Summarizing the above article, you have learned about the importance of your data to business but need to know how to test data quality to get you the best results. You might have limited time and resources and need to plan for what data inputs you will cover.

With a data strategy in hand, critical data inputs, and clear steps to get reproducible results, you define requirements and business cases. You know what level of quality business needs for different data quality dimensions and KPI’s and to be usable and why.

Now you know how to test data quality using the checklist above. You can also profile data to understand gaps in your data quality as well as identifying critical data quality issues to fix them, making your business succeed.

youtube

By Michelle Knight | May 6th, 2021 | Posted in Data Quality

About Michelle Knight

Michelle Knight has a background in software testing, a Master's in Library and Information Science from Simmons College, and an Association for Information Science and Technology (ASIST) award. At WinPure, she works as our Product Marketing Specialist and has a knack for explaining complicated data management topics to business people.

Any Questions?

We’re here to help you get the most from your data.

Download and try out our Award-Winning WinPure™ Clean & Match Data Cleansing and Matching Software Suite.

WinPure, a trusted innovator in Data Quality and Master Data Management Tools.
Join the thousands of customers who rely on WinPure to grow faster with better data.

McAfee Logo Deloitte logo vodafone HP logo