ORE Studio SQL Schema

Database schema and management for ORE Studio.

Schema Mental Model

The ORES database is organized into two schemas that reflect a clear separation of concerns between data definition/staging and operational consumption.

The Two Schemas

┌─────────────────────────────────────────────────────────────────┐
│  metadata schema                                                │
│  ─────────────────                                              │
│  Purpose: Define, classify, and stage data                      │
│                                                                 │
│  • Governance: dimensions, domains, change_reasons              │
│  • Registry: datasets, catalogs, bundles, methodologies         │
│  • Staging: *_artefact_tbl (source for publication)             │
│  • Audit: publications, bundle_publications                     │
│                                                                 │
│  Template: ships with standard definitions + reference data     │
│  Users: can extend with custom datasets, methodologies, etc.    │
└─────────────────────────────────────────────────────────────────┘
                              │
                              │ publish (controlled gate for data entry)
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  production schema                                              │
│  ─────────────────                                              │
│  Purpose: Operational data for application consumption          │
│                                                                 │
│  • Reference data: countries, currencies, coding schemes, etc.  │
│  • Market data: rates, prices, stress scenarios (future)        │
│  • IAM: accounts, roles, permissions                            │
│  • Assets: images, tags                                         │
│                                                                 │
│  Template: empty (or minimal seed data)                         │
│  Population: via publish from metadata, plus operational writes │
└─────────────────────────────────────────────────────────────────┘

Key Principles

Unidirectional Data Flow for Reference Data

Reference data flows in one direction: from metadata to production via the publication mechanism. This ensures:

  • Clear provenance: all reference data originates from catalogued datasets
  • Audit trail: publications are tracked with timestamps and record counts
  • Reproducibility: production state can be reconstructed from metadata
metadata.dq_*_artefact_tbl ──[publish]──► production.refdata_*_tbl

Metadata is Extensible

The metadata schema is not read-only. Users can extend it with:

  • Custom change reasons for domain-specific audit trails
  • New methodologies for data derivation/transformation logic
  • Custom datasets (e.g., stress testing scenarios, derived market data)
  • New artefact tables for additional data types

This allows the same infrastructure to support both shipped reference data and user-generated derived data (e.g., stress test scenarios).

Production Has Multiple Write Sources

While reference data enters production exclusively via publication, the schema also receives writes from normal operational processes:

  • Trading activity writing to transaction tables
  • Market data feeds updating prices/rates
  • IAM operations (user registration, role assignments)
  • Session tracking and audit logs

Clear Responsibility Boundaries

Question Answer
Where do I define a new dataset? metadata.dq_datasets_tbl
Where do I stage data before publishing? metadata.dq_*_artefact_tbl
Where does my application read from? production.*
How does reference data get into production? Only via publication
Can I extend the governance model? Yes, add to metadata.dq_*
Can I create custom derived datasets? Yes, same workflow as shipped data

Cross-Schema References

Production tables maintain foreign key references to metadata tables where appropriate. For example:

  • production.refdata_countries_tbl.coding_scheme_code references metadata.dq_coding_schemes_tbl.code
  • production.refdata_*.change_reason_code references metadata.dq_change_reasons_tbl.code

This enforces that production data references valid governance metadata.

Schema Contents Summary

metadata Schema (dq_* prefix)

Category Tables Extensible
Change Control dq_change_reason_categories_tbl Yes
  dq_change_reasons_tbl Yes
Data Domains dq_data_domains_tbl Rarely
  dq_subject_areas_tbl Yes
Dimensions dq_origin_dimensions_tbl Rarely
  dq_nature_dimensions_tbl Rarely
  dq_treatment_dimensions_tbl Rarely
Coding Schemes dq_coding_scheme_authority_types_tbl Rarely
  dq_coding_schemes_tbl Yes
Methodologies dq_methodologies_tbl Yes
Artefact Types dq_artefact_types_tbl Yes
Catalogs dq_catalogs_tbl Yes
Datasets dq_datasets_tbl Yes
  dq_dataset_bundles_tbl Yes
  dq_dataset_bundle_members_tbl Yes
  dq_dataset_dependencies_tbl Yes
Staging dq_*_artefact_tbl (20+ tables) Yes
Audit dq_publications_tbl System
  dq_bundle_publications_tbl System

production Schema

Category Tables Write Source
Reference Data refdata_countries_tbl Publication
  refdata_currencies_tbl Publication
  refdata_*_tbl (FPML, etc.) Publication
IAM iam_accounts_tbl Operational + Publication
  iam_roles_tbl Publication
  iam_permissions_tbl Publication
  iam_account_roles_tbl Operational
  iam_role_permissions_tbl Operational
  iam_login_info_tbl Operational
Assets assets_images_tbl Publication
  assets_tags_tbl Publication
  assets_image_tags_tbl Publication
Geo geo_ip2country_tbl Publication
Variability variability_feature_flags_tbl Operational

Schema Architecture

Diagram:

The ORES database is organised into four domains:

IAM (Identity & Access Management)

  • accounts (temporal): user accounts with authentication credentials
  • login_info: security tracking for login attempts and lock status
  • sessions (temporal): user session tracking with geolocation

RBAC (Role-Based Access Control)

  • roles (temporal): named roles for grouping permissions
  • permissions (temporal): fine-grained permission codes
  • account_roles (junction): many-to-many mapping of accounts to roles
  • role_permissions (junction): many-to-many mapping of roles to permissions

Assets (Images & Currencies)

  • images (temporal): SVG image storage with version tracking
  • tags (temporal): image categories (flag, currency, commodity)
  • image_tags (junction): many-to-many image-to-tag mapping
  • currencies (temporal): ISO 4217 currency definitions with formatting and optional image_id for flags

Variability (Feature Flags)

  • feature_flags (temporal): feature toggles for runtime configuration

Schema Design Patterns

All temporal tables use valid_from and valid_to timestamptz fields for bitemporal data management. Key patterns include:

  • Junction tables enable soft deletes via validity period closure
  • EXCLUDE constraints prevent overlapping validity periods using gist indexes
  • Unique indexes filter on current records where valid_to = '9999-12-31 23:59:59'::timestamptz
  • Version fields provide optimistic locking (version 0 is "force overwrite" sentinel)
  • Triggers manage version auto-increment and temporal validity period transitions

For detailed implementation patterns and step-by-step instructions on creating new tables, see the SQL Schema Creator skill.

PostgreSQL Extensions

ORES uses the following PostgreSQL extensions:

btree_gist (Required)

Provides GiST index support for B-tree data types. Required for temporal exclusion constraints that prevent overlapping validity periods.

CREATE EXTENSION IF NOT EXISTS btree_gist;

TimescaleDB (Optional)

Time-series database extension for efficient session analytics. If installed, the sessions table becomes a hypertable with automatic time-based partitioning.

Installation

TimescaleDB must be installed at the system level:

# Debian/Ubuntu (official TimescaleDB repo recommended for full features)
sudo apt install timescaledb-2-postgresql-16

# macOS
brew install timescaledb

Add to postgresql.conf:

shared_preload_libraries = 'timescaledb'

Restart PostgreSQL after configuration changes.

License Editions

TimescaleDB has two license editions:

Edition License Features
Apache Apache 2.0 Hypertables, time-based partitioning, chunk pruning
Community Timescale Above + compression, retention policies, continuous aggregates

Debian/Ubuntu packages (+dfsg versions) are Apache-only. For full features, use the official TimescaleDB repository or build from source.

To check and enable the community license:

-- Check current license
SHOW timescaledb.license;

-- Enable community features (if installed)
ALTER SYSTEM SET timescaledb.license = 'timescale';
SELECT pg_reload_conf();

Feature Detection

The ORES schema scripts automatically detect TimescaleDB availability and license at runtime:

Condition Sessions Table Behaviour
No TimescaleDB Regular PostgreSQL table with standard indexes
Apache license Hypertable with 7-day chunks (no compression)
Timescale license Hypertable + compression + retention policies

Continuous aggregates (session_stats_daily, session_stats_hourly) are only created with the Timescale license.

Hypertable Benefits

Even with Apache-only, hypertables provide:

  • Automatic time-based partitioning (7-day chunks)
  • Efficient time-range query pruning
  • Parallel chunk processing
  • Simplified data lifecycle management

Data Population Layers

The ORES database uses a layered approach to data population, aligned with the two-schema model. Each layer builds upon the previous:

┌─────────────────────────────────────────────────────────────────────────────┐
│                           metadata schema                                   │
│  Foundation → Data Governance → Data Catalogues                             │
│  (prerequisites)  (rules/dimensions)   (datasets + artefacts)               │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ publish
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           production schema                                 │
│  Published Reference Data + Operational Data                                │
└─────────────────────────────────────────────────────────────────────────────┘

All population scripts write to the metadata schema. The production schema is populated via the publication mechanism (and operational processes).

Foundation Layer (metadata schema)

Essential lookup and configuration data required for schema integrity. This data is populated in the template database and must be present before other metadata can be inserted.

Category Tables Purpose
Change Control metadata.dq_change_reason_categories_tbl Audit trail categorization
  metadata.dq_change_reasons_tbl Audit trail reasons
Data Governance Framework metadata.dq_data_domains_tbl Data domain classification
  metadata.dq_subject_areas_tbl Subject area classification
  metadata.dq_coding_scheme_authority_types_tbl Coding scheme authority types
  metadata.dq_coding_schemes_tbl Party/entity identification schemes

Foundation layer is populated via populate/foundation/populate_foundation.sql.

Data Governance Layer (metadata schema)

Metadata that defines the rules and classifications for how data is organized. This layer sets out the landscape for data quality management.

Category Tables Purpose
Dimensions metadata.dq_origin_dimensions_tbl Data origin classification
  metadata.dq_nature_dimensions_tbl Data nature classification
  metadata.dq_treatment_dimensions_tbl Data treatment classification
Methodologies metadata.dq_methodologies_tbl Data sourcing/processing methods
Artefact Types metadata.dq_artefact_types_tbl Types of artefacts and mappings

Data governance layer is populated via populate/governance/populate_governance.sql.

Data Catalogues Layer (metadata schema)

Catalogued reference data organized according to governance rules. This layer contains the actual data awaiting publication to production tables.

Category Tables Purpose
Catalogs metadata.dq_catalogs_tbl Groupings of datasets by source
Datasets metadata.dq_datasets_tbl Data collection definitions
Bundles metadata.dq_dataset_bundles_tbl Named collections for publication
  metadata.dq_dataset_bundle_members_tbl Bundle membership with ordering
Dependencies metadata.dq_dataset_dependencies_tbl Relationships between datasets
Artefact Data metadata.dq_*_artefact_tbl Staged reference data

Data sources include:

  • Flag Icons: Visual assets for countries
  • ISO Standards: Countries, currencies (ISO 3166, ISO 4217)
  • Solvaris: Extended currency metadata
  • IP to Country: IP geolocation mappings
  • FPML: Financial products reference data
  • Cryptocurrency: Digital currency data

Data catalogues layer is populated via populate/catalogues/populate_catalogues.sql.

Production Layer (production schema)

Live reference data in production tables, published from the data catalogues layer via the "Publish Datasets" feature in the Data Librarian window.

Category Tables Source Artefact Table
Countries production.refdata_countries_tbl metadata.dq_countries_artefact_tbl
Currencies production.refdata_currencies_tbl metadata.dq_currencies_artefact_tbl
Images production.assets_images_tbl metadata.dq_images_artefact_tbl
FPML Data production.refdata_*_tbl metadata.dq_*_artefact_tbl

Additional production tables populated via operational processes (not publication):

Category Tables Write Source
IAM production.iam_accounts_tbl User registration
  production.iam_login_info_tbl Login tracking
Variability production.variability_feature_flags_tbl Runtime configuration

Database Creation

The database uses a template-based approach for fast instance creation.

Prerequisites

Run initial setup scripts as postgres superuser in order:

# 1. Install extensions (btree_gist required, timescaledb optional)
psql -U postgres -f setup_extensions.sql

# 2. Create application user
psql -U postgres -v ores_password='YOUR_SECRET' -f setup_user.sql

# 3. Create admin database (whimsical name generation)
psql -U postgres -f admin/setup_admin.sql

Create Template Database (One-Time)

Create the ores_template database containing the schema and foundation layer:

psql -U postgres -f setup_template.sql

The template includes:

  • Complete database schema (tables, triggers, functions)
  • Foundation layer data (change control, rounding types, data governance framework, IAM, feature flags)

Create Instance with Whimsical Name

Create a new database instance from the template with an auto-generated whimsical name (fast, uses PostgreSQL filesystem copy):

psql -U postgres -f create_instance.sql

Example output:

Generating whimsical database name...
Creating database: dancing_lemur_42
==========================================
Database created successfully!
==========================================
Database name: dancing_lemur_42

Connect with:
  psql -U ores -d dancing_lemur_42

Create Instance with Specific Name

To create an instance with a specific name:

psql -U postgres -v db_name='my_database' -f create_instance.sql

Direct Creation (Without Template)

For environments without template support, create a standalone database:

psql -U postgres -v db_name='my_database' -f create_database_direct.sql

Project Structure

For detailed directory structure, file organization, and naming conventions, see the SQL Schema Creator skill documentation.

SQL Recipes

For executable SQL query examples, see the SQL Recipes notebook. It contains ready-to-run queries for:

  • Account management and temporal history
  • RBAC queries (roles, permissions, account assignments)
  • Currency and image lookups
  • Feature flag inspection
  • Session and login tracking
  • Administrative queries (database size, table sizes)
Top: Documentation Previous: System Model Recipes: SQL Recipes