Story: Refactor ores.codegen SQL generation
Table of Contents
This page documents a story in Sprint 19. It captures the goal, current status, acceptance criteria, and the tasks that compose it.
Goal
The ores.codegen SQL generation system has accumulated two parallel pipelines that do the same job with different templates, different model formats, and different invocation conventions:
sql_schema_table_create.mustache(283 lines) — for_entity.jsonmodels; generates the production-grade SQL with full trigger logic and a per-entityvalidate_ENTITY_fn().sql_schema_domain_entity_create.mustache(377 lines) — for_domain_entity.jsonmodels; used by--profile sqland--profile all; generates SQL for domain entities (party, book, etc.) but does not produce a validation function.
On top of this, 14 bash scripts each hard-code which glob patterns to pass and which
profiles to run. There is no safe "regenerate everything" command. The --profile all
footgun silently overwrites production SQL with the wrong template when run against an
entity that has both model files.
This story unifies the SQL half of ores.codegen. Scope is SQL only; C++ templates and
the _domain_entity.json C++ generation path are unchanged.
What we want
1. A single Python CLI: codegen.py
Replaces run_generator.sh and all 14 generate_*.sh bash scripts. Structured as an
idiomatic Python application: proper argparse subcommands, entry point declaration,
logging, and a clean internal package layout. The exact subcommand names (generate vs.
regen, run vs. regenerate, etc.) are an open design decision to be resolved in the first
task; they must be unambiguous and follow Python CLI conventions (see argparse subcommands).
Required capabilities:
- Generate SQL from a single model file.
- Regenerate all SQL for a named component (
refdata,trade,dq,iam, …). - Regenerate SQL for all components in one command.
- Dry-run mode: print the paths that would be written without writing them.
- Diff mode: show what would change in the output files without writing.
- List mode: enumerate models, components, or profiles.
2. A unified SQL model format
One root key — table — that subsumes both entity and the SQL-relevant fields of
domain_entity. A single model file is the source of truth for the SQL schema of one
table. The model encodes all variability via flags; the template reads them without
needing to know the origin of the model.
Variability axes identified from the existing SQL (both templates and all generated output files):
| Flag | Values | Description |
|---|---|---|
tenant |
required / system / nullable / none |
How tenant_id behaves in the table and trigger |
coding_scheme |
none / required / nullable |
Coding scheme FK column |
image_id |
true / false | Optional image FK column |
workspace_id |
true / false | Workspace FK column (soft FK + index) |
natural_keys |
list | Columns that get their own unique index |
validation_fn |
present / absent | Whether to emit ores_PRODUCT_COMPONENT_validate_ENTITY_fn() |
validation_fn.default |
value / absent | Return value when input is null/empty |
validation_fn.tenant_scope |
system / tenant / both |
Which tenants to search in the validation fn |
validation_fn.order_by |
column name | Column to order the valid-values list in the error message |
insert_trigger.validations |
list | Call existing validation fns per column |
insert_trigger.soft_fk_validations |
list | UUID-FK existence checks |
insert_trigger.fk_copy_validations |
list | FK checks that also copy denormalised columns |
insert_trigger.text_code_validations |
list | Text-code FK checks (no UUID) |
security_definer |
true / false | SECURITY DEFINER on the trigger function |
The validation_fn.tenant_scope flag captures a key variation found in the existing
SQL that the current template does not express correctly: auxiliary types
(rounding_type, monetary_nature, currency_market_tier) validate against
tenant_id in (p_tenant_id, system_tenant), while the current template only supports
system or tenant (never both). The currency validation function has yet another
variant — it also validates against both. All three variants must be expressible.
3. A single SQL template: sql_schema_create.mustache
Replaces both existing templates. All variability is controlled by the flags above. The
template must reproduce, byte-for-byte (ignoring cosmetic whitespace), the SQL that the
current templates produce for every entity in projects/ores.sql/create/. Logical
fidelity is required; whitespace normalisation is acceptable.
What is NOT in scope
- C++ templates (domain, repository, service, protocol, Qt) — unchanged.
- The
_domain_entity.json→sql_schema_domain_entity_create.mustachepath for domain entities (party, book, counterparty, conventions, etc.) — unchanged; those SQL files are owned by that template. - Merging
_entity.jsonand_domain_entity.jsoninto a single file — this touches the C++ generation path and is deferred. - DQ, trade, workspace, and other component schemas — refdata is the pilot; roll out to other components once refdata parity is confirmed.
- Reference data seed SQL (
generate_crypto_refdata.sh,generate_fpml_refdata.sh, etc.) — those are data generation scripts, not schema generation; out of scope.
Status
| Field | Value |
|---|---|
| State | DONE |
| Parent sprint | Sprint 19 |
| Now | Nothing. |
| Waiting on | Nothing. |
| Next | Nothing. |
| Last touched | 2026-05-30 |
Acceptance
codegen.pyexists as a proper Python CLI; all subcommand names are unambiguous and documented.- A single
sql_schema_create.mustachetemplate replaces both existing SQL templates. - All refdata
_entity.jsonmodels have been migrated to the newtableformat. - Running
codegen.pyfor all refdata entities produces zero logical diff against the current SQL files inprojects/ores.sql/create/refdata/. generate_refdata_schema.shhas been retired (deleted or archived).- The
--dry-runflag prints output paths without writing;--diffshows file diffs. - All new code passes the project's linting and formatting checks.
- Site builds cleanly.
Tasks
| Task | State | Start | End | Description |
|---|---|---|---|---|
| Survey SQL variation and define unified model format | DONE | 2026-05-29 | 2026-05-29 | Audit all generated SQL files; enumerate every variation; define the table model schema and CLI interface. |
Implement codegen.py CLI scaffold |
DONE | 2026-05-29 | 2026-05-29 | Argparse structure, entry point, package layout, logging; subcommand stubs; venv wiring. |
| Implement unified SQL template | DONE | 2026-05-29 | 2026-05-29 | Write sql_schema_create.mustache; add table model-type support in generator; verify parity for all refdata entities. |
| Migrate refdata entity models to new format | DONE | 2026-05-29 | 2026-05-30 | Convert all _entity.json models to _table.json; retire refdata-table component alias; run parity check. |
Retire generate_refdata_schema.sh |
DONE | 2026-05-30 | 2026-05-30 | Delete script; update any CI or Makefile references; document replacement in architecture.org. |
Notes
Why two templates exist
sql_schema_table_create.mustache was created for the simpler _entity.json models
(auxiliary types, lookup tables). sql_schema_domain_entity_create.mustache was created
later to handle the richer _domain_entity.json models. Neither was refactored to absorb
the other because they serve different entry points. The result is that --profile all
became dangerous for entities that have both model files.
Template discrepancy discovered during analysis
The current sql_schema_table_create.mustache validation function only supports two
tenant modes: system (validate against ores_utility_system_tenant_id_fn()) or tenant
(validate against p_tenant_id). But the actual generated SQL for rounding_type,
monetary_nature, and currency_market_tier uses tenant_id in (p_tenant_id, system_tenant)
— a third mode that the template cannot currently reproduce cleanly. The source SQL files
are the ground truth; the template must be fixed to match. The unified model's
validation_fn.tenant_scope = "both" flag covers this case.
Component naming mismatch discovered during analysis
Running all-cpp on the auxiliary type domain_entity models (rounding_type,
monetary_nature, currency_market_tier) wrote C++ files to a new
projects/ores.refdata/ directory rather than to projects/ores.refdata.api/ and
projects/ores.refdata.core/ where the actual production code lives. The models have
component: "refdata" but the project structure uses a two-project split (.api for
headers, .core for implementations). The generated files were untracked (and therefore
reported as "zero git diff" — misleading). The spurious directory was deleted as cleanup.
This means the reported parity for the three auxiliaries was incorrect: the C++ parity
verification task must be redone once the codegen story resolves the component naming
issue (component_include / component_core fields in the models).
The --profile all footgun
The all profile runs sql + all-cpp. For entities that have both _entity.json (production
SQL) and _domain_entity.json (C++), running --profile all on the domain entity model
silently overwrites the production SQL with the simpler domain_entity SQL template. This
was triggered during the currency commissioning story. codegen.py must make this
impossible, either by refusing to run SQL generation when a corresponding _entity.json
exists, or by deprecating --profile all entirely.