Concepts

Introduction

The Data Quality Framework contains 8 conceptual modules:

Data QUality Engine Data Conceptual Model


Business Domains

Databases, Tables, Columns and Controls can be classified by Business Domains (HR, Marketing, Supply Chain, IT, …)

Data Conceptual Model - Role


The Users, Roles and Responsibilites

Responsibilities for resources (Database, table, columns, Data Quality Control) can be assigned to certain users.

For example:

  • A user will have the role of ‘Data Owner’ on a database.

  • Another user will have the role of ‘Data Quality Analyst’ on a specific control.

Data Conceptual Model - Users, Roles and responsibilities

Users interact with the Data Quality Engine.

For example, certain users are responsible for the quality of certain source data.

→ ‘usr_user’ table in Logical model

Data Conceptual Model - User

Interactions with the Data Quality Engine are categorized by Roles.

→ ‘usr_role’ table in Logical model

Data Conceptual Model - Role

A Responsibility is the link between a User, a role and a resource (database, table, column or data quality control)

For example ‘Bénédicte Blanc’ is Data Owner of the ‘geo’ database, Gaston Giraud is Data Quality Analyst of the control #28.

→ ‘usr_responsibility’ table in Logical model

Data Conceptual Model - Role


The Datasets elements

This module includes the description of all source datasets: Relational databases, tables and Columns.

A database contains many tables.

A table contains many columns.

The data sources to be qualified are available in read-only databases.

These source databases, also known as ‘Schema’, are described as “Database” within the Data Quality Engine.

→ ‘in_database’ table in Logical model

Data Conceptual Model - Database description

Each database includes Tables and Views.

These source tables and views, also known as datasets, are described as “Table” within the Data Quality Engine.

→ ‘in_table’ table in Logical model

Data Conceptual Model - Table description

Tables are structured by Columns, also known as fields.

Each column has a type (‘INT’, ‘VARCHAR’, ‘DATETIME’), can be nullable, can be part of the primary key, …

→ ‘in_column’ table in Logical model

Data Conceptual Model - Column description


The Profiling elements

By examining the content, both cognitively and visually, it is possible to better understand the data and highlight gaps or errors.

Data Conceptual Model - Profiling

  • YData Profiling library sample: *

Data Conceptual Model - Profiling


The DQ Controls elements

This module includes all the elements relating to the quality controls to be carried out.

Data Conceptual Model - DQ Controls

A “Data Quality (DQ) Control” is the heart of the Data Quality Engine. It describes the test that will be executed to check a quality rule.

Note

The quality rule: “The value of the date of birth of a client must be in the past”

The quality control: “The value of “dateOfBirth” in the table ‘Clients’ must satisfy the formula “value < NOW()”

→ ‘dq_control’ table in Logical model

Data Conceptual Model - Data Quality Control

A “DQ Control” is classified by “Data Control Type”: * NULL value, * Invalid format value, * Invalid text length value, * Invalid category value, * Invalid number value range…

→ ‘dq_dimension’ table in Logical model

Data Conceptual Model - Data Quality Control Type

A “DQ Type” is classified by “Data Quality Dimension”: Uniqueness, Completeness, Validity, Consistency, Accuracy, Timeliness…

Summaries can be proposed based on these dimensions.

→ ‘dq_dimension’ table in Logical model

Data Conceptual Model - Data Quality Dimensions

A “Data Quality Controls Set” is a collection of “Data Quality Controls” that are executed, either occasionally or periodically.

→ ‘dq_controlset’ table in Logical model

Data Conceptual Model - Data Quality Control Set


The DQ Control Results elements

When a set of controls are executed, a set of results are collected. Errors are specifically caught.

Data Conceptual Model - DQ Control Results

The execution of a set of controls starts on a given date and ends on another date.

→ ‘res_resultset’ table in Logical model

Data Conceptual Model - Results Set

For each check executed, we get the number of lines tested and the number of errors found.

→ ‘res_result’ table in Logical model

Data Conceptual Model - DQ Control Result

Errors are individually stored with the primary key of the row concerned, the value specifically in error and the full contents of the row.

→ ‘res_error’ table in Logical model

Data Conceptual Model - DQ Control Error


The Reports elements

Several types of reports are made available to Data Quality Analysts: * Reports on the last result set * Reports over time * Reports with detailed errors * Profiling Report

A scoring system can be used for visualization purposes.

Data Conceptual Model - Reports Data Conceptual Model - Scores Data Conceptual Model - Reports


The remediation elements

When an error is detected, depending on its criticality level, a remediation workflow can be initiated.

Data Conceptual Model - Remediations

A “DQ Control” has a “Criticity level”. This criticity determines the workflow and delay of remediation of any error found.

→ ‘rm_criticy’ table in Logical model

Data Conceptual Model - Criticity level