ORE Studio SQL Schema
Database schema and management for ORE Studio.
Schema Mental Model
The ORES database is organized into two schemas that reflect a clear separation of concerns between data definition/staging and operational consumption.
The Two Schemas
┌─────────────────────────────────────────────────────────────────┐
│ metadata schema │
│ ───────────────── │
│ Purpose: Define, classify, and stage data │
│ │
│ • Governance: dimensions, domains, change_reasons │
│ • Registry: datasets, catalogs, bundles, methodologies │
│ • Staging: *_artefact_tbl (source for publication) │
│ • Audit: publications, bundle_publications │
│ │
│ Template: ships with standard definitions + reference data │
│ Users: can extend with custom datasets, methodologies, etc. │
└─────────────────────────────────────────────────────────────────┘
│
│ publish (controlled gate for data entry)
▼
┌─────────────────────────────────────────────────────────────────┐
│ production schema │
│ ───────────────── │
│ Purpose: Operational data for application consumption │
│ │
│ • Reference data: countries, currencies, coding schemes, etc. │
│ • Market data: rates, prices, stress scenarios (future) │
│ • IAM: accounts, roles, permissions │
│ • Assets: images, tags │
│ │
│ Template: empty (or minimal seed data) │
│ Population: via publish from metadata, plus operational writes │
└─────────────────────────────────────────────────────────────────┘
Key Principles
Unidirectional Data Flow for Reference Data
Reference data flows in one direction: from metadata to production via the
publication mechanism. This ensures:
- Clear provenance: all reference data originates from catalogued datasets
- Audit trail: publications are tracked with timestamps and record counts
- Reproducibility: production state can be reconstructed from metadata
metadata.dq_*_artefact_tbl ──[publish]──► production.refdata_*_tbl
Metadata is Extensible
The metadata schema is not read-only. Users can extend it with:
- Custom change reasons for domain-specific audit trails
- New methodologies for data derivation/transformation logic
- Custom datasets (e.g., stress testing scenarios, derived market data)
- New artefact tables for additional data types
This allows the same infrastructure to support both shipped reference data and user-generated derived data (e.g., stress test scenarios).
Production Has Multiple Write Sources
While reference data enters production exclusively via publication, the schema
also receives writes from normal operational processes:
- Trading activity writing to transaction tables
- Market data feeds updating prices/rates
- IAM operations (user registration, role assignments)
- Session tracking and audit logs
Clear Responsibility Boundaries
| Question | Answer |
|---|---|
| Where do I define a new dataset? | metadata.dq_datasets_tbl |
| Where do I stage data before publishing? | metadata.dq_*_artefact_tbl |
| Where does my application read from? | production.* |
| How does reference data get into production? | Only via publication |
| Can I extend the governance model? | Yes, add to metadata.dq_* |
| Can I create custom derived datasets? | Yes, same workflow as shipped data |
Cross-Schema References
Production tables maintain foreign key references to metadata tables where appropriate. For example:
production.refdata_countries_tbl.coding_scheme_codereferencesmetadata.dq_coding_schemes_tbl.codeproduction.refdata_*.change_reason_codereferencesmetadata.dq_change_reasons_tbl.code
This enforces that production data references valid governance metadata.
Schema Contents Summary
metadata Schema (dq_* prefix)
| Category | Tables | Extensible |
|---|---|---|
| Change Control | dq_change_reason_categories_tbl |
Yes |
dq_change_reasons_tbl |
Yes | |
| Data Domains | dq_data_domains_tbl |
Rarely |
dq_subject_areas_tbl |
Yes | |
| Dimensions | dq_origin_dimensions_tbl |
Rarely |
dq_nature_dimensions_tbl |
Rarely | |
dq_treatment_dimensions_tbl |
Rarely | |
| Coding Schemes | dq_coding_scheme_authority_types_tbl |
Rarely |
dq_coding_schemes_tbl |
Yes | |
| Methodologies | dq_methodologies_tbl |
Yes |
| Artefact Types | dq_artefact_types_tbl |
Yes |
| Catalogs | dq_catalogs_tbl |
Yes |
| Datasets | dq_datasets_tbl |
Yes |
dq_dataset_bundles_tbl |
Yes | |
dq_dataset_bundle_members_tbl |
Yes | |
dq_dataset_dependencies_tbl |
Yes | |
| Staging | dq_*_artefact_tbl (20+ tables) |
Yes |
| Audit | dq_publications_tbl |
System |
dq_bundle_publications_tbl |
System |
production Schema
| Category | Tables | Write Source |
|---|---|---|
| Reference Data | refdata_countries_tbl |
Publication |
refdata_currencies_tbl |
Publication | |
refdata_*_tbl (FPML, etc.) |
Publication | |
| IAM | iam_accounts_tbl |
Operational + Publication |
iam_roles_tbl |
Publication | |
iam_permissions_tbl |
Publication | |
iam_account_roles_tbl |
Operational | |
iam_role_permissions_tbl |
Operational | |
iam_login_info_tbl |
Operational | |
| Assets | assets_images_tbl |
Publication |
assets_tags_tbl |
Publication | |
assets_image_tags_tbl |
Publication | |
| Geo | geo_ip2country_tbl |
Publication |
| Variability | variability_feature_flags_tbl |
Operational |
Schema Architecture
Diagram:
The ORES database is organised into four domains:
IAM (Identity & Access Management)
accounts(temporal): user accounts with authentication credentialslogin_info: security tracking for login attempts and lock statussessions(temporal): user session tracking with geolocation
RBAC (Role-Based Access Control)
roles(temporal): named roles for grouping permissionspermissions(temporal): fine-grained permission codesaccount_roles(junction): many-to-many mapping of accounts to rolesrole_permissions(junction): many-to-many mapping of roles to permissions
Assets (Images & Currencies)
images(temporal): SVG image storage with version trackingtags(temporal): image categories (flag, currency, commodity)image_tags(junction): many-to-many image-to-tag mappingcurrencies(temporal): ISO 4217 currency definitions with formatting and optional image_id for flags
Variability (Feature Flags)
feature_flags(temporal): feature toggles for runtime configuration
Schema Design Patterns
All temporal tables use valid_from and valid_to timestamptz fields for
bitemporal data management. Key patterns include:
- Junction tables enable soft deletes via validity period closure
- EXCLUDE constraints prevent overlapping validity periods using
gistindexes - Unique indexes filter on current records where
valid_to = '9999-12-31 23:59:59'::timestamptz - Version fields provide optimistic locking (version 0 is "force overwrite" sentinel)
- Triggers manage version auto-increment and temporal validity period transitions
For detailed implementation patterns and step-by-step instructions on creating new tables, see the SQL Schema Creator skill.
PostgreSQL Extensions
ORES uses the following PostgreSQL extensions:
btree_gist (Required)
Provides GiST index support for B-tree data types. Required for temporal exclusion constraints that prevent overlapping validity periods.
CREATE EXTENSION IF NOT EXISTS btree_gist;
TimescaleDB (Optional)
Time-series database extension for efficient session analytics. If installed,
the sessions table becomes a hypertable with automatic time-based partitioning.
Installation
TimescaleDB must be installed at the system level:
# Debian/Ubuntu (official TimescaleDB repo recommended for full features) sudo apt install timescaledb-2-postgresql-16 # macOS brew install timescaledb
Add to postgresql.conf:
shared_preload_libraries = 'timescaledb'
Restart PostgreSQL after configuration changes.
License Editions
TimescaleDB has two license editions:
| Edition | License | Features |
|---|---|---|
| Apache | Apache 2.0 | Hypertables, time-based partitioning, chunk pruning |
| Community | Timescale | Above + compression, retention policies, continuous aggregates |
Debian/Ubuntu packages (+dfsg versions) are Apache-only. For full features,
use the official TimescaleDB repository or build from source.
To check and enable the community license:
-- Check current license SHOW timescaledb.license; -- Enable community features (if installed) ALTER SYSTEM SET timescaledb.license = 'timescale'; SELECT pg_reload_conf();
Feature Detection
The ORES schema scripts automatically detect TimescaleDB availability and license at runtime:
| Condition | Sessions Table Behaviour |
|---|---|
| No TimescaleDB | Regular PostgreSQL table with standard indexes |
| Apache license | Hypertable with 7-day chunks (no compression) |
| Timescale license | Hypertable + compression + retention policies |
Continuous aggregates (session_stats_daily, session_stats_hourly) are only
created with the Timescale license.
Hypertable Benefits
Even with Apache-only, hypertables provide:
- Automatic time-based partitioning (7-day chunks)
- Efficient time-range query pruning
- Parallel chunk processing
- Simplified data lifecycle management
Data Population Layers
The ORES database uses a layered approach to data population, aligned with the two-schema model. Each layer builds upon the previous:
┌─────────────────────────────────────────────────────────────────────────────┐
│ metadata schema │
│ Foundation → Data Governance → Data Catalogues │
│ (prerequisites) (rules/dimensions) (datasets + artefacts) │
└─────────────────────────────────────────────────────────────────────────────┘
│
│ publish
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ production schema │
│ Published Reference Data + Operational Data │
└─────────────────────────────────────────────────────────────────────────────┘
All population scripts write to the metadata schema. The production schema
is populated via the publication mechanism (and operational processes).
Foundation Layer (metadata schema)
Essential lookup and configuration data required for schema integrity. This data is populated in the template database and must be present before other metadata can be inserted.
| Category | Tables | Purpose |
|---|---|---|
| Change Control | metadata.dq_change_reason_categories_tbl |
Audit trail categorization |
metadata.dq_change_reasons_tbl |
Audit trail reasons | |
| Data Governance Framework | metadata.dq_data_domains_tbl |
Data domain classification |
metadata.dq_subject_areas_tbl |
Subject area classification | |
metadata.dq_coding_scheme_authority_types_tbl |
Coding scheme authority types | |
metadata.dq_coding_schemes_tbl |
Party/entity identification schemes |
Foundation layer is populated via populate/foundation/populate_foundation.sql.
Data Governance Layer (metadata schema)
Metadata that defines the rules and classifications for how data is organized. This layer sets out the landscape for data quality management.
| Category | Tables | Purpose |
|---|---|---|
| Dimensions | metadata.dq_origin_dimensions_tbl |
Data origin classification |
metadata.dq_nature_dimensions_tbl |
Data nature classification | |
metadata.dq_treatment_dimensions_tbl |
Data treatment classification | |
| Methodologies | metadata.dq_methodologies_tbl |
Data sourcing/processing methods |
| Artefact Types | metadata.dq_artefact_types_tbl |
Types of artefacts and mappings |
Data governance layer is populated via populate/governance/populate_governance.sql.
Data Catalogues Layer (metadata schema)
Catalogued reference data organized according to governance rules. This layer contains the actual data awaiting publication to production tables.
| Category | Tables | Purpose |
|---|---|---|
| Catalogs | metadata.dq_catalogs_tbl |
Groupings of datasets by source |
| Datasets | metadata.dq_datasets_tbl |
Data collection definitions |
| Bundles | metadata.dq_dataset_bundles_tbl |
Named collections for publication |
metadata.dq_dataset_bundle_members_tbl |
Bundle membership with ordering | |
| Dependencies | metadata.dq_dataset_dependencies_tbl |
Relationships between datasets |
| Artefact Data | metadata.dq_*_artefact_tbl |
Staged reference data |
Data sources include:
- Flag Icons: Visual assets for countries
- ISO Standards: Countries, currencies (ISO 3166, ISO 4217)
- Solvaris: Extended currency metadata
- IP to Country: IP geolocation mappings
- FPML: Financial products reference data
- Cryptocurrency: Digital currency data
Data catalogues layer is populated via populate/catalogues/populate_catalogues.sql.
Production Layer (production schema)
Live reference data in production tables, published from the data catalogues layer via the "Publish Datasets" feature in the Data Librarian window.
| Category | Tables | Source Artefact Table |
|---|---|---|
| Countries | production.refdata_countries_tbl |
metadata.dq_countries_artefact_tbl |
| Currencies | production.refdata_currencies_tbl |
metadata.dq_currencies_artefact_tbl |
| Images | production.assets_images_tbl |
metadata.dq_images_artefact_tbl |
| FPML Data | production.refdata_*_tbl |
metadata.dq_*_artefact_tbl |
Additional production tables populated via operational processes (not publication):
| Category | Tables | Write Source |
|---|---|---|
| IAM | production.iam_accounts_tbl |
User registration |
production.iam_login_info_tbl |
Login tracking | |
| Variability | production.variability_feature_flags_tbl |
Runtime configuration |
Database Creation
The database uses a template-based approach for fast instance creation.
Prerequisites
Run initial setup scripts as postgres superuser in order:
# 1. Install extensions (btree_gist required, timescaledb optional) psql -U postgres -f setup_extensions.sql # 2. Create application user psql -U postgres -v ores_password='YOUR_SECRET' -f setup_user.sql # 3. Create admin database (whimsical name generation) psql -U postgres -f admin/setup_admin.sql
Create Template Database (One-Time)
Create the ores_template database containing the schema and foundation layer:
psql -U postgres -f setup_template.sql
The template includes:
- Complete database schema (tables, triggers, functions)
- Foundation layer data (change control, rounding types, data governance framework, IAM, feature flags)
Create Instance with Whimsical Name
Create a new database instance from the template with an auto-generated whimsical name (fast, uses PostgreSQL filesystem copy):
psql -U postgres -f create_instance.sql
Example output:
Generating whimsical database name... Creating database: dancing_lemur_42 ========================================== Database created successfully! ========================================== Database name: dancing_lemur_42 Connect with: psql -U ores -d dancing_lemur_42
Create Instance with Specific Name
To create an instance with a specific name:
psql -U postgres -v db_name='my_database' -f create_instance.sql
Direct Creation (Without Template)
For environments without template support, create a standalone database:
psql -U postgres -v db_name='my_database' -f create_database_direct.sql
Project Structure
For detailed directory structure, file organization, and naming conventions, see the SQL Schema Creator skill documentation.
SQL Recipes
For executable SQL query examples, see the SQL Recipes notebook. It contains ready-to-run queries for:
- Account management and temporal history
- RBAC queries (roles, permissions, account assignments)
- Currency and image lookups
- Feature flag inspection
- Session and login tracking
- Administrative queries (database size, table sizes)
| Top: Documentation | Previous: System Model | Recipes: SQL Recipes |