It’s no surprise that data quality remains the most ignored challenge in modern data management approaches. Despite striving to be data-driven, most businesses struggle with addressing data quality problems such as dirty data, disparate and isolated data sets, poor data formats and empty fields etc. These are supposedly mundane issues that should only be the concern of IT users – is what most businesses think. It only becomes a matter of organizational importance when the business suffers from the consequences of poor data quality; GDPR/Sanctions compliance penalties, furious customer emails, loss of reputation, and business to name a few.
We are very passionate about data quality. Having worked with hundreds of businesses to resolve their data quality challenges, we know for a fact the level of awareness of data quality among decision-makers and executive leaders remains significantly low. At the mid-level, data professionals spend an average of 80% of their time cleaning data!
To bridge this gap and emphasize the need for treating, cleaning, and making data fit for use, we decided to bring on board a leader to talk about their experience with data quality. This blog post is a summarized version of all the key points of the webinar.
If you’re a leader reading this, the time to act is now. This webinar will help you know where and how to start with a data quality strategy.
If you’re a professional reading this, send the webinar and the article to your leaders!
Introducing Sara Hanks, Director of Continuous Improvement at Wabtec Corporation
22 years ago, Sara launched her career as a mechanical engineer but shifted quickly into a continuous improvement when she became a Lean Six Sigma Black Belt.
She realized very early on the value that could be extracted from data and the importance of digitally collecting data. After several years in manufacturing, she transitioned to leading programs and teams through Digital Transformation.
Eventually, Sara was promoted to the Senior Director of Data Analytics at GE Transportation, leading a team of data scientists, and gaining firsthand experience with machine learning and artificial intelligence.
Currently, she is the Director of Continuous Improvement at Wabtec Corporation, a supplier in the Rail and Mining Equipment Industries.
In 2020, Sara launched Leverage4Data to provide a platform to do all three – create a solution to digitize supplier management workflows with an intuitive user interface and accessible data.
Below are some of the questions we and our audience asked Sara about data quality and where to start.
How Do You Define data quality?
Before jumping into defining what data quality is, we need to describe what quality means in the industry.
Quality actually has two meanings.
One is around compliance – making sure your data meets compliance regulations (GDPR, Sanctions etc).
The second is around usefulness – if you build a product but it doesn’t have utility, you’re not meeting the quality intent either even though you made it correctly.
To be more specific, data quality can be defined in terms of:
Accuracy: The data that’s captured has to be irrefutably correct. You don’t want to have data that gets
corrupted over time. You need to make sure that it’s accurate in the first place. For example, if you’ve got email addresses and they’re related to a customer, are they the right email addresses? Anytime you’re doing some kind of marketing or Outreach to your customers are you getting in touch with the right people?
Scenario: For example, a customer with four email addresses. Every time you send out an email to one person with four email addresses, you’ve already irritated them. To resolve this problem, you need to be able to match and consolidate all your customers’ email addresses and reach out to them only on the most current and updated one.
Completeness: If you have a set of data that your company has deemed important, it’s necessary that those blanks are filled and you get the data you need to operate.
Scenario: You know you need accurate postal codes to gather geographical data, so instead of leaving the blank empty, you will need to implement form controls to ensure you get the data you need. Empty blanks, web fields, CRM fields and opportunities for the user to incorrectly enter data are consistent data challenges that team members can spend hours fixing and verifying!
Consistency: Over time companies have introduced different systems into their I.T ecosystem which causes inconsistencies in how data is recorded. For example, the customer is an important attribute but the customer gets defined differently in each of the systems so a data quality issue exists because you can’t marry that data seamlessly up with each other. Having the data consistently defined across or introducing some sort of Master data management solution to help bridge that gap you know is something that needs to be considered from a data quality perspective.
Scenario: Your company uses different platforms for leads, sales, customer service and customer support. If it’s the same customer going through your systems, their information may be recorded differently under different system structures. Take, for example, a simple thing as first name and last name positioning. Do you use the Last name column first or vice versa? A small inconsistency in how you record names can have grave consequences (like marketing teams addressing people by their last names instead of their first) resulting in lowered credibility.
Valid: Data needs to be current, updated, and valid and should be free of duplicates. Sometimes when you have two data sets and you merge them you end up with a Cartesian product. If you’re not careful, you could end up with skewed insights or overstating reality which could get you (the individual or the company) in a lot of trouble.
Scenario: You’re required to pull insights from your marketing data to determine the number of leads marketing has generated in the last 3 months. You believe you have 100K email addresses but the reality is you have 60K people. The rest are duplicate, incomplete, or wasted leads.
Categorized: Data categorization is necessary, especially now that companies are acquiring data from dynamic sources such as social media where customers could be dropping comments and complaints repeatedly. The company must be able to transfer this data into a knowledge base where it can be matched and labeled as part of a FAQ section, or most-commonly asked section, and so on.
Scenario: Your customers are repeatedly asking you the same technical question on support, on chat, and on social media (for example, how to make an account), but you’re unable to gather that data and transfer it into a knowledge base for your team to access, match, and use when needed. This is a data quality problem.
At a high level then, data quality can be defined and summarized as data that is trustworthy.
Who is Responsible for Data Quality?
The short answer is, everybody is responsible for data quality. If you look at each one of the data quality elements discussed above, there is a specific owner for a certain task or role. For example, if it’s about data accuracy, then the teams capturing the data are responsible for executing that specific step; they own the data accuracy, they own the data completeness, and they own them making sure that it’s in the valid format.
When it comes to transitioning data from system to system that’s an I.T-owned function. They need to monitor and respond when there’s a hindrance, a migration error, or a technical issue that threatens the accuracy and validity of the data. If you’re in the analytics team and you use incorrect data to go ahead with your insights or analytics then you’re jeopardizing your business.
Essentially, it’s a partnership between team members. At the end of the day, everyone is responsible and accountable for their role which in turn enables an organizational approach to data quality.
how do you start with a data quality strategy?
While some companies are aware of data quality challenges, they find it difficult to initiate a strategy to fix data quality challenges. Sara recommends:
It’s important to first define a scope of data and understand what the purpose or use of the data is. It obviously starts with data quality but you can’t tackle all that at once. You can’t start fixing 1 million rows of data in one go.
What you can do is start with prioritizing a scope. Look at what foundational data is needed to run your business or a certain part of your business (for example for a marketing campaign) and ensure that specific set of data is accurate.
Once you have the scope of the data, the next you can decide on is the operational KPIs within your business. Compute your KPI and interweave it into the scope. For example, is your marketing team spending 12 hours cleaning a list of just 1000 emails for a campaign? That could be affecting your operational efficiency and meeting deadlines.
With this KPI and operational metric, you can decide on which data segment you need to focus on getting fixed. You can start with an audit of your data. You can take a sample instead of looking at every single data point that’s unnecessary.
Do Companies Need a Big Budget for a Data Quality Strategy?
Not necessarily. It helps to have a budget for sure, but it’s not necessary to pass a million-dollar budget for fixing your data.
You also don’t need to have a custom-built solution or a huge team of data experts to fix your data from duplicates or human errors.
Once you identify the scope, you can then decide whether you need to introduce new tools or new processes. For example, if you’re constantly getting bad data from web forms, your first priority would be to fix the form structure instead of cleaning the data. If manual data entry is causing more errors than expected, then your first priority would be to introduce automation.
If your processes are in order, but you have bad data streaming in, then it’s a simple matter of using a tool like WinPure to fix data quality issues and ensure the data meets the definition of, ‘quality.’
According to Sara, ‘do something small, get a win showing the ROI, and then use the ROI to justify the next set of investments.’
What Kind of Training Would Employees Need to be Conscious of Data Quality?
Other Important Takeaways
- If we had considered data quality upfront a few years ago, we’d have saved ourselves a lot of pain. A lot of use cases where companies failed to make an impact even though they were data-rich were because they didn’t start with the basics.
- Companies don’t necessarily see data as a cost, however, it is a cost when people unsubscribe from irrelevant emails or file complaints about data privacy, or when your analysis and insights are way off the mark. Employees spending hours on fixing data is also a direct cost on operational efficiency, leading to an impact on ROI.
- There’s still a good amount of ignorance that exists because people have this preconceived notion that if they have data it has quality and it takes education, exposure, and statistical analysis to help people break that paradigm and understand that data isn’t data unless it is also quality data.
- Companies are still afraid of the big cost of cleaning data, however, if they follow a ‘start small, scale when benefits are clear,’ they can manage this cost more effectively.
To Conclude: Down scope the problem, Fix, Repeat.
Summarizing the webinar with Sara’s words of wisdom:
“Treat data like you do your products and services. Data is an asset and must be treated like your products or services for which you have processes around to evaluate if it is good or not. Similarily, data must be assessed for its quality because it directly impacts your business, your customer satisfaction, and your revenue streams.
You have to start small and try to down scope something from a problem down to what can be an actionable project. You need to take the data as it exists, measure what’s wrong with the data, and then create an action plan to fix that specific problem on that specific data. You can iterate on that kind of approach over time and then you know within a period of time you’ll be able to look back and really see that transformation happen and it happened doing it one step at a time.