Explore this glossary of the top data management terms to know, understanding database quality better.
As data continues to grow exponentially and ever faster, data management can be a real challenge. Companies must efficiently tackle and handle data management for better database quality and business decisions to satisfy business needs.
When investing in a reliable data management system, companies benefit from equipping their people with the right knowledge and tools to achieve business goals. This data management terminology guide will provide the first step towards getting this data literacy.
Below, find definitions in a data management dictionary with top data management terms to know and create high-quality and reliable data. These key terms will give you a foundation in making the most of your data cleansing activities.
Related Guide: Complete Master Data Management Guide
Data Management dictionary
To create high quality and reliable data, a thorough understanding not only of the system but of the terms and vocabulary being used in data handling is of prime importance. Here is a list of common database terms:
A data source that collects and reports historical information to identify customers, products, or other trends. Analytical databases funnel existing data to be searched and generally do not allow inputs. From this historical data, businesses gain current and future insights into designing products, releasing products, and providing services.
To add missing data subsets from one or more tables to another database, using SQL. For example, geocoding and census information goes through periodic updates, adding more data. Businesses can automatically add this newer information by running an append command.
Describes the value found in each field in a table. Every field or column in a database table represents a single attribute of that table. (An attribute is what the data in that field represents, while the value is the actual data that a specific field contains).
When a user scans multiple summary records retrieved from a table or set of related tables in a tabular spreadsheet format (aka Browse Table). A Browse Table provides a quick database summary to identify data elements requiring further attention or detail.
To designate which characters in an alpha string will be uppercase and which will be lowercase. Common casing methods include uppercase all characters, lowercase all characters, uppercase first character of the string, uppercase the first character of each “word” (space-separated substrings) contained (aka called “Proper” case), lowercase the entire string, then uppercase the first character, or lowercase the whole string, then uppercase the first character of each “word.”
Computer operations interpret and return values based on the case of character values inputted. In this context, “SPUD,” “Spud,” and “spud” would all be considered as different entities, so the case-sensitivity of a function or query will influence the values they will return.
A place holder for an attribute and synonymous with field. Each unique column holds a unique, different attribute. For example, a contact spreadsheet may have a column labeled email. Each value in that column describes an email for a unique record.
Linking a consecutive series of field values, strings, or combining both, through computer code, to build a data item or field value. For example, a software program concatenates City + comma + space + State + space + ZIP Code to form the last line of an address on a mailing label. The values remain intact for other uses, such as generating a list of contacts for a specific city.
Integrating and merging various data sets into a single master or standard golden record with an accepted table structure, data types, fields, and values. Typically, companies find duplicate records with the same contact data elements across many different systems (marketing, sales, customer support, etc.). Consolidation keeps only the relevant contact information into a single customer view.
CRM: (Customer Relationship Management) System
Customer relationship management (CRM) organizes, automates, and synchronizes a company’s interactions with customers, clients, and sales prospects through a technological solution. Principally CRM supports sales activities, but also those for marketing, customer service, and technical support. Businesses use a CRM to find, attract, and win new clients, nurture, retain those the company already has, entice former clients back into the fold, and reduce marketing and client service costs.
Standardizes data already inputted by a person or computer by correcting errors — from duplications, omissions, incorrect information to misspellings. When done well, data cleansing results in actionable information that makes sense to another data system or person.
Data Governance (DG)
Formalized processes and activities across an organization to support a data strategy and guide technology, people, and data procedures. Good data governance maximizes business capabilities by balancing regulatory compliance with data privacy laws while encouraging access needed for innovation. Data governance provides a necessary structure to ensure data quality and use data cleansing tools well.
Comparing two or more sets of collected data (excel spreadsheets, for example) to remove duplicated information and consolidate records used by a business. Data matching scrubs data on mailing lists, sales lists, etc., to keep duplicate records from choking your business.
Moving or copying data from one data system or source to another. Migration occurs when an organization upgrades to a new data system, creates backups in case of data loss, or integrates data across several different systems in a standardized format. Employees, business departments, organizations, or multiple different companies may undertake data migration.
A combination of data governance with computer automation achieves data migration goals. Data governance processes determine what and how data requirements and formats will be extracted and loaded onto a different system/s and minimize migration risk and maximize success.
Software programs assist people with mapping attributes and their values from one system to another. Then these algorithms extract, transform, and load or ETL according to this mapping scheme.
The monitoring and cleansing data, using a systematic, consistent, repeatable, and metrics-based process. It is usually the first step in the process of gaining control over your data. It aims to ascertain the condition of the data stored in various locations and forms throughout your company.
Refers to the overall state of qualitative or quantitative pieces of information or data. Data quality is the condition of the information available based on tangible metrics. Generally, high-quality data delivers on the insights needed to make better business decisions.
A high-level plan and map to leverage an organization’s data to support and expand business goals. Companies that form a data strategy recognize their data’s valuableness and importance to the business strategy’s success. Good data strategies guide short and long-term data creation, transformation, storage, use, and disposal while remaining elastic as the market environment changes.
A constraint on the type of value that a field will accept. Common data types include:
• Strings: stores and uses the exact combination of characters specified (“Hello world2!”).
• Characters: stores and uses a fixed length of one or more letters, numbers, or symbols (“X”).
• Integers: stores and uses whole, rational numbers (“2021”).
• Float: stores and uses values that use decimal points (“3.14,” 784.9999999″ and “7”).
• Boolean: stores “1” as true and “0” as false. Programmers use Boolean values to set up conditions for a program action.
Setting the correct data type for an attribute optimizes database storage and performance.
One of the following:
• A systematically organized collection of all the data needed by a person or organization to perform their required business functions.
• A collection of business-related files or tables.
• Any collection of data organized to answer queries.
• A structured or unstructured database management system designed for data storage and retrieval.
Databases usually consist of both data and metadata. When a database contains a description of its structure, it is said to be self-describing.
Database Administrator [DBA]
The person who is ultimately responsible for the functionality, integrity, and safety of an organization’s data and database systems. DBA’s make sure data continues to be trusted and accessed by the users who need them for work. Also, this role uses automation for routine database maintenance and to improve data quality and accuracy.
Database Management System [DBMS]
Also called a database manager. An integrated collection of programs designed to enable people to store, define, enter, transform, access, retrieve and maintain data.
Think of a DBMS as a toolshed with different types of software. A database manager performs automated tasks enhancing database performance, minimizing storage space requirements, preserving integrity through data cleansing, logically relating data, backup data, and standardizing routine reporting and views. DBAs use SQL to communicate and instruct the DBMS.
• The person with primary responsibility for the design, construction, and maintenance of a database. Database managers lead data quality, profiling, and cleansing efforts to support business needs.
• Synonymous with a database management system (DBMS).
Removing redundant data in a collection by removing duplicate records. Data duplication can be intentional, say to create a backup in case of a network failure. It can also be unintentional due to human error and a system failure to constrain non-unique entries.
Data deduplication ensures only one data copy exists, reducing multiple instances of the same information in a file. Think of data deduplication as a type of file compression to keep only needed data.
• A collection or range of all the possible values a field can contain. Although a field’s domain is typically finite, it may be infinite as well.
• The location of a website that makes sense to people, translating the numerical internet protocol computers use.
• A group of people and resources accessible to those subsets of users within an organization. Domains secure sensitive information while giving departments, like accounting, software services to complete business transactions.
Identical values shared by different rows indicating the same entity (customer, product, or location). Outside of creating data backups for the case of a database outage, duplicate data happens in error.
A combination of poor database architecture and human error leads to information spread across many rows. The same entities may also be inputted into different departmental systems, leading to duplicated data when integrating it into one record.
Duplicated data results in a lack of trust in the database’s information and frustration in business operations. A single standardized, correct view of the data leads to better quality to deal with a customer or product.
A unique thing described by a specific data set. For example, customers or products represent entities because they have a group of attribute values that cannot wholly apply to another entity. The John Smith that lives in Liverpool, London, is an entirely different customer than the John Smith that lives in Eastlake, Ohio.
An operation that downloads or brings data in from a data source file formatted to be read and used by another software program. Users export data, such as a contact list, as a first step to import into a program to perform different actions—for example, marketers export data from their CRM to create mailing labels in a different application.
Extract, Transform, and Load (ETL)
The process to connect data from different data sources that use SQL and map relationships. ETL maps the raw, unorganized data to a predetermined structure or concept, including attributes and formats.
ETL makes it easier to search and operate on data in good data system architectures that meet the business needs. ETL results in data integrity that supports trust in data values and a better understanding of how to query the resulting database.
Synonymous with a column. A component of a relation or table that holds a single attribute of that relation or table.
Visually, fields typically appear as a rectangular box in a database’s form. Users input entries into this box or select a value from a pull-down menu to correspond with an entity. In a search form, people type in values for which they wish to retrieve results.
• The separately named unit of or object in storage for all data, programs, and indexes on most computer systems. For example, a user may store a spreadsheet workbook in one file, a relational database in another file, and algorithms to automate database tasks in different files.
• A term describing a group of related records in some database managers [usually smaller or older], like dBase, FoxPro, Alpha Four/Five, etc. Programmers tend to use primary keys to group this similar information.
The act of choosing a particular set of records based on pattern or criteria while excluding others from view. Filtering is also known as selecting, comparing data sets, and matching those with a similar characteristic.
Users filter data to get more specific and relevant results to do business tasks. For example, a filter would bring up customers in the United Kingdom while excluding ones from a different country.
Fuzzy matching defines a type of data matching algorithm used to calculate probabilities and weights to determine similarities and differences between business entities like customers. This data matching technique differs from comparing unique reference data, like name and birthday, deterministic data matching.
Instead, fuzzy matching techniques or probabilistic data matching applies parameters chosen to score data patterns mathematically. The fuzzy matching techniques compare sets of characters, numbers, strings, or other data types for similarities. When presented with the likelihood, customer entities match your fuzzy matching search; the software suggests records or data combinations that can standardize an entry as master data.
An operation that brings data exported from another data source, adding these new entries into a different application. Data to be imported has embedded programming and control characters removed from the import file.
Importing activity converts the new data into a native file format for use and human readability. As part of this process, users typically map the fields from the import file to the program’s arrangement taking the additions. For example, importing contacts from one file into a preferred CRM shows up just as if someone entered a new data set to that program.
• A method used to reorder display or output records or rows, logically, using keywords and list items. Programmers use an index to display a series of values by a particular attribute, like a date.
• An index, in this case, looks the same as a sort to the end-user. But, indexing retrieves information quicker than using the sort command. So, behind the scenes, the computer works more efficiently with an algorithm using pointers’ data structure to provide rapid, random access to rows in the table.
For example, a computer program that deduplicates data stores each object as a hashtag. Each chunk and its hashtag combine to form an index containing a table of objects and hashtags for more straightforward data retrieval.
Related: 10 Master Data Management Benefits
A type of flexible technology that stores and processes data quicker than storing it on a hard drive or another disk. In-memory software applications use the computer’s main memory or RAM (Random Access Memory) to compress, access, manipulate, and temporarily store data.
In-Memory functions like a person’s short-term memory, acting on information on-hand. In-memory needs to move data to a disk for long-term storage for future recall. Also, without power to the computer, in-memory loses the data.
Retrieving and combining different data sets from other data sources into an incorporated form and structure.” Integration takes different data values, formats, and relationships and presents them as a single view so people, organizations, and businesses can more easily find and use information.
Software developers use different and various techniques to integrate data. Data integration automatically takes care of the need to manually log into multiple systems, check multiple files, or put different data elements and rows together.
A data quality dimension measuring a data set’s accuracy and consistency. High data integrity correlates with high business trust in that data. For example, suppose a salesperson and marketer retrieve and use the same customer and contact information (John Smith, 123 King Street, London, United Kingdom, (020) 2656788)). In that case, regardless of what system they use or the time they search, John Smith’s data has integrity.
In the context of data quality, a key is a field, or combination of fields, that uniquely identifies a record in a table, either as a primary or secondary key. Software developers use a key to relate one row in one table to another row in a different table. Data architectures intentionally use keys to prevent duplication and unwieldy maintenance of the same values.
For example, say John Smith has two emails, corporate (John.Smith@winpure.com) and personal (Jsmith7456@google.com) and two addresses (a winter one in Florida and a summer one in New York). By assigning a primary key to John Smith, his name only must be entered in one place. The two emails and addresses connect to John Smith by his primary key.
Compares unique reference data to find duplicate rows or records across one or more tables or lists. For example, a match code may combine First Name, Last Name, Company, Address, and ZIP Code data elements from the same row. These concatenated values or composite keys represent the unique identities of each whole record and its corresponding entity.
Creating and using matchcodes helps databases deduplicate records or combine and standardize data elements belonging to the same customer, product, or other entity. Also, match codes solve data record relationships where the architecture did not count for a primary or secondary key.
Master Data Management (MDM)
A type of enterprise data management architecture, governed by a collection of formal data quality practices and processes, designed to leverage digital technologies for the entire business. Master Data gives a “360-degree view” of different entities or master data used by a company (typically customers or products). MDM excels at handling lots of disjointed data across an organization.
Describes core data assets distributed across many different databases, including in the cloud. This associated information belongs to specific entities necessary for a business to operate and grow. Typically, master data breaks down into four domains:
The process of combining multiple records across one or more data sources to form a single entity. Merging affects existing data elements differently from importing. Merging transforms existing data elements and any imported data into one unified row or record.
The process of combining multiple records in a database or lists while simultaneously locating or compiling duplicates and removing the undesired records. The merge purge software’s essence is to create a list that will eventually contain unique names and addresses without any duplicates.
Data that describes the database’s data. Metadata can represent the business, technical or operational data in an organization.
Metadata helps identify and create reference data in a MDM. For applications to work together across an organization, they need to know what data exists, where, and its characteristics.
The ability for a computer’s main memory to run several algorithms concurrently, multi-tasking. So, a software program using in-memory could compress and transform data sets simultaneously.
For example, an algorithm cleans up extra spaces in data source A. Then, as the same algorithm cleans up spaces from data source B, the computer profiles contact data in data source A and compressing data from data source C.
Null; Null Value
An undefined entry for an attribute and a placeholder representing an unknown value.
Null values result in messy data from at least two cases:
• Null character values, represented by spaces, may confuse an algorithm into considering the same entity as two different ones. For example, in one row, a name field may have the value “John Smith.”
In another row, John Smith’s data differs only by an extra space character in the name
(e.g., John Smith). The computer then presents “John. Smith” and “John Smith” as two
different customers because of the null character.
• A value for an attribute may be null, leaving data incomplete for an entity. For example, two records have “6 East Bridge Drive” as an Address1 value for Bob Jonson. A third record has an undefined value in Address1 for Bob Johnson.
The data for the third Bob Johnson lacks completeness. But, by using the other two Bob Johnson rows, a specially designed fuzzy matching algorithm can identify the null Address 1 value as likely “6 East Bridge Drive.”
• Something in the real world (computer, person, or tomato).
• An abstract term meaning a set of defined data elements as constructed by a programmer. Objects can represent entities or whatever building blocks a computer coder creates. For example, “Ashley” and “Kov-2007,” from the table in this blog post, could make one object.
Intelligently separating a field value or string into its parts (e.g., parsing a Full Name into its five distinct components: Prefix, First Name, Middle Name [or Initial], Last Name, and Suffix).
A parsing engine standardizes and validates partial and full global address inputs. The opposite action is called concatenation.
A command used by computer programmers to add, subtract, or replace values in one table using values in another. The World Wide Web supports posting data from one system to another one.
For example, A person fills out a contact form on the web, with his/her address, phone, or email, and then presses the “Submit” button. After pressing submit, the information inserts into a database in the background, enabling customer purchases and communications.
A primary key uniquely identifies each record in a table and cannot be undefined for a row. No duplicate primary key values exist for that table.
Database managers use primary keys to set-up and maintain the integrity of an entity. This key may consist of one or more fields.
Primary Keys come in handy setting up relationships between an entity in one table and attribute values in another. For example, a customer (John Smith) can have more than one street address and email address. A primary key makes it easier to relate John Smith to these many different values in the same “address” or “email address” field.
Also called deduping by the mailing industry, purging removes duplicate records from within a single file, table (or mailing list), or those shared among several files, tables, or lists. For example, a “Help to Buy” program to encourage home purchases had redundant records for the same member. By purging these duplicate records, UK home buyers could more easily communicate with the program members.
• A question you ask about data in the database in the form of a command written by a coder. When constructing queries, the programmer can define sort order and selection at the same time. Queries locate and retrieve information, generate an ad hoc list of records, or a sub-table of results.
• The output subset of data produced in response to a query. For example, when querying a web search engine with the term “data quality,” a list of websites describing data quality appears in the results.
Synonymous with row and tuple. An instance of data in a table, a record, is a collection of all the facts, attribute values combined into one row of data in a table, and sometimes referred to as a tuple in some, particularly older, database management systems.
Ideally, each record refers to a single object or person. This row can represent or relate different records in different tables to a single physical or conceptual entity. Data quality and data cleansing remove duplicate rows confusing master data with extraneous details.
Also known as data cleansing. Removing or improving data in a list or database that is incorrect, incomplete, improperly formatted, or duplicated. Data Scrubbing tools like Clean & Match will also provide various features to help clean, correct, and deduplicated lists and databases.
A candidate or foreign key is not chosen to be a primary key but helps establish relationships between records. Secondary keys do not have to have a value and can be null. Also, secondary keys can repeat in the same table.
Programmers use secondary keys to find and retrieve data. This process helps in connecting different entities.
For example, secondary keys can relate a customer (John Smith) to many other product purchases (e.g., an LG dryer, a garden shovel, and some pipe fittings). Also, a secondary key can relate the same product purchase (an LG dryer) to multiple customers (e.g., John Smith, Ashley Long, and Trevor Harris)
A command running in SQL programming to find and display a subset of data from one or more data sources. Users select data as part of data cleansing and quality efforts to see if duplicate records exist or a data entry has an error.
The select command also provides the data manager a way to restrict or filter output based on a value or range of values. Selection occurs upon running a query.
The part of a client/server system that holds the database and DBMS. The server stores all the database’s data, tables, and transformations (the back end).
Database users typically work on client machines that send requests to the database over a network and then retrieve search results, displaying them to the client. Using a database server helps protect data while allowing access to data only a worker needs. Also, database servers help database processes run more effectively, minimizing resources on the client needed for memory and numerous administrative tasks.
A Single Customer View (SCV) is where you store all the data about every one of your customers for your business context. A single view has only the core data assets, or master data, associated with an object, such as a customer or a product.
Good data governance, data quality, and data cleansing processes result in each entity’s single view. Master data management, done well, provides a single comprehensive view of a customer or product to an organization.
Pronounced “Sequel” and stands for Structured Query Language, the standard format for commands that a database system relates different information, uses to create, query, administer, and manipulate data.
SQL functions best when a database is ACID compliant:
Atomic: Each database transaction succeeds, or none do.
Consistent: Data persists and will be constant.
Isolated: Transactions that do not complete cannot be affected by another transaction.
Durable: A history of a transaction, once it is committed, remains, and gets stored permanently.
Database systems need to have high data quality and undergo proper management to be ACID compliant and optimal for SQL.
There are different dialects since every Teacher Aide program handles types of data differently, but the core commands are always the same. ODBC uses SQL as the “Lingua Franca” to transfer information between databases. ANSI links to the SQL standard.
A data type representing a finite sequence of alphanumeric characters. Strings represent a fixed length and remain constant. Programmers create and operate on strings as the literal group of characters.
In general, databases do not associate any intrinsic meaning to strings and rely on programming to specify what to do with these groups of characters. Name, address, email, and phone number values tend to have a data type as a string.
The essential data architecture of a database system. The structure includes tables and the number of fields in each table, their names, sequence, data types, and sizes. Data managers need to know the structure to migrate, merge or integrate data.
A collection of data organized into records and fields (aka rows and columns), with fields being descriptions of the kinds of information contained in each record (attributes); and records being specific instances usually referring to objects or persons (entities).
Databases use tables to relate different records or rows together to match a fixed schema—for example, some CRMs separate customer and company information into other tables. Using SQL, a programmer can connect people with companies and vice-versa while maintaining data integrity.
• The fundamental unit of change in many (transaction-oriented) databases. A single transaction may involve changes in several tables, all of which must be made simultaneously for the database to be internally consistent and correct.
• A real-life event modeled by the changes to a database.
• The sequence of SQL statements which must complete all its execution before being affected by another transaction.
A DBMS that handles business operations and has ACID properties.
Atomic: Each database transaction succeeds, or none do. Write operations on the database can be rolled back if they’re not completed properly by restoring the data state before the transaction started.
Consistent: Data persists and will be constant.
Isolated: Transactions that have not been completed cannot be affected by another transaction.Durable: A history of a transaction, once it is committed, remains, and gets stored permanently.
Transactional databases can be updated and appended to include new information. They handle the most current business, unlike the historical analytical database.
Verification that an entity meets data quality standards and matches authoritative sources to succeed in business activity. For example, when people validate a database address with the national post office, they ensure the mail will arrive at that location. Winpure validates street addresses, emails, and phone numbers
The computer representation of a fact or an attribute about an entity. Values must conform to a data type, are retrievable, and used in filtering or sorting data. Examples of data values include ‘J, Smith,123 King St, London, UK, 0202656788.’
WinPure is the data management software you require. Our products are fast, incredibly simple to use and critically effective. Our goal is to provide you with exceptional performance with all of your data management and data integration needs.
Solving Data Quality Issues In The Insurance Industry