################################################### Concepts ################################################### .. meta:: :description: Concepts :keywords: conceptual data model Introduction ------------------------------------------- The Data Quality Framework contains 8 conceptual modules: * :ref:`1. The Business Domains` * :ref:`2. The Users, Roles and Responsibilites` * :ref:`3. The Datasets elements` * :ref:`4. The Profiling elements` * :ref:`5. The DQ Controls elements` * :ref:`6. The DQ Control Results elements` * :ref:`7. The Reports elements` * :ref:`8. The Remediations` .. image:: /_static/img/dqe-concept.png :alt: Data QUality Engine Data Conceptual Model :align: center :width: 800 .. raw:: html

----------------------------------------------------------------------------- .. _concept_business_domains: Business Domains --------------------------------------------- Databases, Tables, Columns and Controls can be classified by Business Domains (HR, Marketing, Supply Chain, IT, ...) .. image:: /_static/img/dqe-concept-section-businessdomains.png :alt: Data Conceptual Model - Role :align: center .. raw:: html

-------------------------------------------------------------------------------------------------------- .. _concept_users: The Users, Roles and Responsibilites --------------------------------------------- Responsibilities for resources (Database, table, columns, Data Quality Control) can be assigned to certain users. For example: * A user will have the role of 'Data Owner' on a database. * Another user will have the role of 'Data Quality Analyst' on a specific control. .. image:: /_static/img/dqengine_conceptual_model_users.png :alt: Data Conceptual Model - Users, Roles and responsibilities :align: center .. raw:: html

.. --------------------------------- .. Users > User Set section .. --------------------------------- .. _concept-users-user: Users interact with the Data Quality Engine. For example, certain users are responsible for the quality of certain source data. :ref:`→ 'usr_user' table in Logical model` .. image:: /_static/img/dqe-concept-user.png :alt: Data Conceptual Model - User :align: center .. raw:: html

.. --------------------------------- .. Users > Role section .. --------------------------------- .. _concept-users-role: Interactions with the Data Quality Engine are categorized by **Roles**. :ref:`→ 'usr_role' table in Logical model` .. image:: /_static/img/dqe-concept-role.png :alt: Data Conceptual Model - Role :align: center .. raw:: html

.. --------------------------------- .. Users > Responsibility section .. --------------------------------- .. _concept-users-responsibility: A **Responsibility** is the link between a User, a role and a resource (database, table, column or data quality control) For example 'Bénédicte Blanc' is Data Owner of the 'geo' database, Gaston Giraud is Data Quality Analyst of the control #28. :ref:`→ 'usr_responsibility' table in Logical model` .. image:: /_static/img/dqe-concept-responsibility.png :alt: Data Conceptual Model - Role :align: center .. raw:: html

---------------------------------------------------------------------------------- .. _concept_datasets: The Datasets elements ------------------------------- This module includes the description of all source datasets: Relational databases, tables and Columns. A database contains many tables. A table contains many columns. .. ------------------------------ .. Datasets > Database section .. ------------------------------ .. _concept-datasets-database: The data sources to be qualified are available in read-only databases. These source databases, also known as 'Schema', are described as **"Database"** within the Data Quality Engine. :ref:`→ 'in_database' table in Logical model` .. image:: /_static/img/dqe-concept-database.png :alt: Data Conceptual Model - Database description :align: center .. raw:: html

.. ------------------------------ .. Datasets > Table section .. ------------------------------ .. _concept-datasets-table: Each database includes **Tables** and **Views**. These source tables and views, also known as datasets, are described as **"Table"** within the Data Quality Engine. :ref:`→ 'in_table' table in Logical model` .. image:: /_static/img/dqe-concept-table.png :alt: Data Conceptual Model - Table description :align: center .. raw:: html

.. ------------------------------ .. Datasets > Column section .. ------------------------------ .. _concept-datasets-column: Tables are structured by **Columns**, also known as fields. Each column has a type ('INT', 'VARCHAR', 'DATETIME'), can be nullable, can be part of the primary key, ... :ref:`→ 'in_column' table in Logical model` .. image:: /_static/img/dqe-concept-column.png :alt: Data Conceptual Model - Column description :align: center .. raw:: html

--------------------------------------------------------------------------------------------- .. _concept_profiling: The Profiling elements ------------------------------- By examining the content, both cognitively and visually, it is possible to better understand the data and highlight gaps or errors. .. image:: /_static/img/dqengine_conceptual_model_profiling.png :alt: Data Conceptual Model - Profiling :align: center .. raw:: html

* YData Profiling library sample: * .. image:: /_static/img/dqe-screenshot-dataprofiling.png :alt: Data Conceptual Model - Profiling :align: center .. raw:: html

------------------------------------------------------------------------------------- .. _concept-controls: The DQ Controls elements ------------------------------- This module includes all the elements relating to the quality controls to be carried out. .. image:: /_static/img/dqe-concept-section-controls.png :alt: Data Conceptual Model - DQ Controls :align: center .. raw:: html

A **"Data Quality (DQ) Control"** is the heart of the Data Quality Engine. It describes the test that will be executed to check a quality rule. .. admonition:: Note **The quality rule**: "The value of the date of birth of a client must be in the past" **The quality control**: "The value of "dateOfBirth" in the table 'Clients' must satisfy the formula "value < NOW()" .. ------------------------------ .. Controls > Control section .. ------------------------------ .. _concept-controls-control: :ref:`→ 'dq_control' table in Logical model` .. image:: /_static/img/dqe-concept-control.png :alt: Data Conceptual Model - Data Quality Control :align: center .. raw:: html

.. -------------------------------- .. Controls > Control Type section .. -------------------------------- .. _concept-controls-controltype: A **"DQ Control"** is classified by **"Data Control Type"**: * NULL value, * Invalid format value, * Invalid text length value, * Invalid category value, * Invalid number value range… :ref:`→ 'dq_dimension' table in Logical model` .. image:: /_static/img/dqe-concept-controltype.png :alt: Data Conceptual Model - Data Quality Control Type :align: center .. raw:: html

.. ------------------------------ .. Controls > Dimension section .. ------------------------------ .. _concept-controls-dimension: A "DQ Type" is classified by **"Data Quality Dimension"**: Uniqueness, Completeness, Validity, Consistency, Accuracy, Timeliness… Summaries can be proposed based on these dimensions. :ref:`→ 'dq_dimension' table in Logical model` .. image:: /_static/img/dqe-concept-dimension.png :alt: Data Conceptual Model - Data Quality Dimensions :align: center .. raw:: html

.. --------------------------------- .. Controls > Controls Set section .. --------------------------------- .. _concept-controls-controlset: A **"Data Quality Controls Set"** is a collection of "Data Quality Controls" that are executed, either occasionally or periodically. :ref:`→ 'dq_controlset' table in Logical model` .. image:: /_static/img/dqe-concept-controlset.png :alt: Data Conceptual Model - Data Quality Control Set :align: center .. raw:: html

-------------------------------------------------------------------------------------- .. _concept_results: The DQ Control Results elements ------------------------------- When a set of controls are executed, a set of results are collected. Errors are specifically caught. .. image:: /_static/img/dqengine_conceptual_model_results.png :alt: Data Conceptual Model - DQ Control Results :align: center .. raw:: html

.. ------------------------------ .. Results > Resultset section .. ------------------------------ .. _concept-results-resultset: The execution of a set of controls starts on a given date and ends on another date. :ref:`→ 'res_resultset' table in Logical model` .. image:: /_static/img/dqe-concept-resultset.png :alt: Data Conceptual Model - Results Set :align: center .. raw:: html

.. ------------------------------ .. Results > Result section .. ------------------------------ .. _concept-results-result: For each check executed, we get the number of lines tested and the number of errors found. :ref:`→ 'res_result' table in Logical model` .. image:: /_static/img/dqe-concept-result.png :alt: Data Conceptual Model - DQ Control Result :align: center .. raw:: html

.. ------------------------------ .. Results > Error section .. ------------------------------ .. _concept-results-error: Errors are individually stored with the primary key of the row concerned, the value specifically in error and the full contents of the row. :ref:`→ 'res_error' table in Logical model` .. image:: /_static/img/dqe-concept-error.png :alt: Data Conceptual Model - DQ Control Error :align: center .. raw:: html

---------------------------------------------------------------------------------------- .. _concept-reports: The Reports elements ------------------------------- Several types of reports are made available to Data Quality Analysts: * Reports on the last result set * Reports over time * Reports with detailed errors * Profiling Report A scoring system can be used for visualization purposes. .. image:: /_static/img/dqengine_conceptual_model_reports.png :alt: Data Conceptual Model - Reports :align: center .. image:: /_static/img/dqengine_conceptual_model_scoring.png :alt: Data Conceptual Model - Scores :align: center .. image:: /_static/img/dqe-screenshot-reportovertime.png :alt: Data Conceptual Model - Reports :align: center .. raw:: html

---------------------------------------------------------------------------------- .. _concept_remediations: The remediation elements --------------------------------------------- When an error is detected, depending on its criticality level, a remediation workflow can be initiated. .. image:: /_static/img/dqengine_conceptual_model_remediations.png :alt: Data Conceptual Model - Remediations :align: center .. raw:: html

.. --------------------------------------- .. Remediation > Criticity level section .. --------------------------------------- .. _concept-remediation-criticity: A "DQ Control" has a **"Criticity level"**. This criticity determines the workflow and delay of remediation of any error found. :ref:`→ 'rm_criticy' table in Logical model` .. image:: /_static/img/dqe-concept-criticity.png :alt: Data Conceptual Model - Criticity level :align: center .. raw:: html