Data Quality
Data Quality (DQ) is the measure of how well a dataset satisfies the requirements of its intended business use. In financial markets, high-quality data is defined as being accurate, complete, timely, and traceable, ensuring that automated trading decisions and risk calculations are based on a "single version of the truth" rather than corrupted or ambiguous artefacts.
The Standard Dimensions - The "DQ 6"
There are six industry-standard dimensions used by ISO 8000 and DAMA:
- Accuracy: The degree to which data values match the "Golden Source" (e.g., a Bloomberg price vs. your sample).
- Completeness: The presence of all mandatory attributes (e.g., no missing ISIN or Strike Price).
- Consistency: The absence of contradictions across different tables or systems (e.g., USD/EUR rate is the same in your "Sample" and "Derived" tables).
- Timeliness: The readiness of data for its specific window (e.g., "T+0" real-time vs. "T+1" end-of-day).
- Validity: Adherence to business rules and formats (e.g., Currency codes must follow ISO 4217).
- Uniqueness: Ensuring no duplicate records exist for the same entity and timestamp.
Relevant Industry bodies and standards
DAMA-DMBOK (Data Management Body of Knowledge)
- Focus: Best practices and principles for managing data as a strategic asset (including Data Governance, Data Quality, Data Modeling, etc.).
- Role: Provides detailed practices for Data Architecture, encompassing data modeling and integration, and defines how these elements align with specific business requirements.
TOGAF (The Open Group Architecture Framework)
- Focus: Design, planning, implementation, and governance of Enterprise Architecture, ensuring IT alignment with strategic business goals.
- Role: Features a dedicated Data Architecture phase (Phase C), which leverages DAMA principles to build a scalable data ecosystem that fits within the broader organizational structure.
Key domain concepts
- Dataset: A logical collection of sample records (e.g., a set of trades, portfolios, or market data snapshots).
- Record: An individual row or entity within a dataset (optional granularity; lineage may be tracked at dataset or record level).
- Provenance: The origin of the data—external vendor, internal system, or synthetic generator.
- Classification: Categorizes the nature of the data for compliance and usage control.
- Lineage: Directed acyclic graph (DAG) linking derived data to its upstream sources.
- Temporal Context: Captures both business time (
as_of_date) and system time (ingestion_timestamp), supporting bi-temporal reasoning. - Data Passport: A self-contained metadata manifest (relational or document-based) that answers the 5 Ws of data governance.
- Quality metrics: Define and store validation rules and quality measurements.
- Change Management: Maintain history of data changes (who did what, when and for what reason).
- Data profiling: Store statistical metadata about data characteristics.
- Coding Schemes
Record Provenance Fields
Every reference data entity in ORE Studio carries six standard provenance fields that together answer the "who, when, and why" of the last change. These fields are set by a combination of the application layer and the database trigger.
| C++ Field | UI Label | Set By | Meaning |
|---|---|---|---|
version |
Version | DB trigger | Monotonically increasing integer; starts at 1, incremented on each update |
modified_by |
Modified By | Application | Username of the account that submitted the change request |
performed_by |
Performed By | Application | Username on whose behalf the change was performed (may differ from modified_by in delegated workflows) |
recorded_at |
Recorded At | DB trigger | Wall-clock timestamp (UTC) when the database row was last written |
change_reason_code |
Change Reason | Application | Optional code referencing a change_reason record explaining the business justification |
change_commentary |
Commentary | Application | Optional free-text note accompanying the change (displayed in a multi-line area in the UI) |
Relation to SQL temporal columns
These provenance fields live in the current ("live") table alongside the entity
data. The database trigger also writes a copy of every version into a companion
_history table, where each row has additional valid_from / valid_to columns
that bound its lifespan in the system (system-time bitemporality).
Qt UI presentation
In the Qt UI all six fields are displayed in a ProvenanceWidget (a promoted
QWidget containing a read-only form layout). The widget lives in a dedicated
Provenance tab of each detail dialog. The tab is disabled (greyed out, not
hidden) in create mode because provenance data does not yet exist for a record
that has not been saved.