Story: Refactor ores.codegen SQL generation

Table of Contents

This page documents a story in Sprint 19. It captures the goal, current status, acceptance criteria, and the tasks that compose it.

Goal

The ores.codegen SQL generation system has accumulated two parallel pipelines that do the same job with different templates, different model formats, and different invocation conventions:

  • sql_schema_table_create.mustache (283 lines) — for _entity.json models; generates the production-grade SQL with full trigger logic and a per-entity validate_ENTITY_fn().
  • sql_schema_domain_entity_create.mustache (377 lines) — for _domain_entity.json models; used by --profile sql and --profile all; generates SQL for domain entities (party, book, etc.) but does not produce a validation function.

On top of this, 14 bash scripts each hard-code which glob patterns to pass and which profiles to run. There is no safe "regenerate everything" command. The --profile all footgun silently overwrites production SQL with the wrong template when run against an entity that has both model files.

This story unifies the SQL half of ores.codegen. Scope is SQL only; C++ templates and the _domain_entity.json C++ generation path are unchanged.

What we want

1. A single Python CLI: codegen.py

Replaces run_generator.sh and all 14 generate_*.sh bash scripts. Structured as an idiomatic Python application: proper argparse subcommands, entry point declaration, logging, and a clean internal package layout. The exact subcommand names (generate vs. regen, run vs. regenerate, etc.) are an open design decision to be resolved in the first task; they must be unambiguous and follow Python CLI conventions (see argparse subcommands).

Required capabilities:

  • Generate SQL from a single model file.
  • Regenerate all SQL for a named component (refdata, trade, dq, iam, …).
  • Regenerate SQL for all components in one command.
  • Dry-run mode: print the paths that would be written without writing them.
  • Diff mode: show what would change in the output files without writing.
  • List mode: enumerate models, components, or profiles.

2. A unified SQL model format

One root key — table — that subsumes both entity and the SQL-relevant fields of domain_entity. A single model file is the source of truth for the SQL schema of one table. The model encodes all variability via flags; the template reads them without needing to know the origin of the model.

Variability axes identified from the existing SQL (both templates and all generated output files):

Flag Values Description
tenant required / system / nullable / none How tenant_id behaves in the table and trigger
coding_scheme none / required / nullable Coding scheme FK column
image_id true / false Optional image FK column
workspace_id true / false Workspace FK column (soft FK + index)
natural_keys list Columns that get their own unique index
validation_fn present / absent Whether to emit ores_PRODUCT_COMPONENT_validate_ENTITY_fn()
validation_fn.default value / absent Return value when input is null/empty
validation_fn.tenant_scope system / tenant / both Which tenants to search in the validation fn
validation_fn.order_by column name Column to order the valid-values list in the error message
insert_trigger.validations list Call existing validation fns per column
insert_trigger.soft_fk_validations list UUID-FK existence checks
insert_trigger.fk_copy_validations list FK checks that also copy denormalised columns
insert_trigger.text_code_validations list Text-code FK checks (no UUID)
security_definer true / false SECURITY DEFINER on the trigger function

The validation_fn.tenant_scope flag captures a key variation found in the existing SQL that the current template does not express correctly: auxiliary types (rounding_type, monetary_nature, currency_market_tier) validate against tenant_id in (p_tenant_id, system_tenant), while the current template only supports system or tenant (never both). The currency validation function has yet another variant — it also validates against both. All three variants must be expressible.

3. A single SQL template: sql_schema_create.mustache

Replaces both existing templates. All variability is controlled by the flags above. The template must reproduce, byte-for-byte (ignoring cosmetic whitespace), the SQL that the current templates produce for every entity in projects/ores.sql/create/. Logical fidelity is required; whitespace normalisation is acceptable.

What is NOT in scope

  • C++ templates (domain, repository, service, protocol, Qt) — unchanged.
  • The _domain_entity.jsonsql_schema_domain_entity_create.mustache path for domain entities (party, book, counterparty, conventions, etc.) — unchanged; those SQL files are owned by that template.
  • Merging _entity.json and _domain_entity.json into a single file — this touches the C++ generation path and is deferred.
  • DQ, trade, workspace, and other component schemas — refdata is the pilot; roll out to other components once refdata parity is confirmed.
  • Reference data seed SQL (generate_crypto_refdata.sh, generate_fpml_refdata.sh, etc.) — those are data generation scripts, not schema generation; out of scope.

Status

Field Value
State DONE
Parent sprint Sprint 19
Now Nothing.
Waiting on Nothing.
Next Nothing.
Last touched 2026-05-30

Acceptance

  • codegen.py exists as a proper Python CLI; all subcommand names are unambiguous and documented.
  • A single sql_schema_create.mustache template replaces both existing SQL templates.
  • All refdata _entity.json models have been migrated to the new table format.
  • Running codegen.py for all refdata entities produces zero logical diff against the current SQL files in projects/ores.sql/create/refdata/.
  • generate_refdata_schema.sh has been retired (deleted or archived).
  • The --dry-run flag prints output paths without writing; --diff shows file diffs.
  • All new code passes the project's linting and formatting checks.
  • Site builds cleanly.

Tasks

Task State Start End Description
Survey SQL variation and define unified model format DONE 2026-05-29 2026-05-29 Audit all generated SQL files; enumerate every variation; define the table model schema and CLI interface.
Implement codegen.py CLI scaffold DONE 2026-05-29 2026-05-29 Argparse structure, entry point, package layout, logging; subcommand stubs; venv wiring.
Implement unified SQL template DONE 2026-05-29 2026-05-29 Write sql_schema_create.mustache; add table model-type support in generator; verify parity for all refdata entities.
Migrate refdata entity models to new format DONE 2026-05-29 2026-05-30 Convert all _entity.json models to _table.json; retire refdata-table component alias; run parity check.
Retire generate_refdata_schema.sh DONE 2026-05-30 2026-05-30 Delete script; update any CI or Makefile references; document replacement in architecture.org.

Notes

Why two templates exist

sql_schema_table_create.mustache was created for the simpler _entity.json models (auxiliary types, lookup tables). sql_schema_domain_entity_create.mustache was created later to handle the richer _domain_entity.json models. Neither was refactored to absorb the other because they serve different entry points. The result is that --profile all became dangerous for entities that have both model files.

Template discrepancy discovered during analysis

The current sql_schema_table_create.mustache validation function only supports two tenant modes: system (validate against ores_utility_system_tenant_id_fn()) or tenant (validate against p_tenant_id). But the actual generated SQL for rounding_type, monetary_nature, and currency_market_tier uses tenant_id in (p_tenant_id, system_tenant) — a third mode that the template cannot currently reproduce cleanly. The source SQL files are the ground truth; the template must be fixed to match. The unified model's validation_fn.tenant_scope = "both" flag covers this case.

Component naming mismatch discovered during analysis

Running all-cpp on the auxiliary type domain_entity models (rounding_type, monetary_nature, currency_market_tier) wrote C++ files to a new projects/ores.refdata/ directory rather than to projects/ores.refdata.api/ and projects/ores.refdata.core/ where the actual production code lives. The models have component: "refdata" but the project structure uses a two-project split (.api for headers, .core for implementations). The generated files were untracked (and therefore reported as "zero git diff" — misleading). The spurious directory was deleted as cleanup.

This means the reported parity for the three auxiliaries was incorrect: the C++ parity verification task must be redone once the codegen story resolves the component naming issue (component_include / component_core fields in the models).

The --profile all footgun

The all profile runs sql + all-cpp. For entities that have both _entity.json (production SQL) and _domain_entity.json (C++), running --profile all on the domain entity model silently overwrites the production SQL with the simpler domain_entity SQL template. This was triggered during the currency commissioning story. codegen.py must make this impossible, either by refusing to run SQL generation when a corresponding _entity.json exists, or by deprecating --profile all entirely.

Emacs 29.1 (Org mode 9.6.6)