Story: Codegen unified model — org-mode migration

Table of Contents

This page documents a story in Sprint 20. It captures the goal, current status, acceptance criteria, and the tasks that compose it.

Goal

Replace the current per-entity JSON model files (*_domain_entity.json and *_table.json) with a single literate org-mode file per entity. The org file is the unified source of truth for SQL, C++, Qt, NATS protocol, and any future facets.

Three goals at once:

  1. One file per entity. No more drift between separate SQL and C++ models.
  2. Literate documentation. Each section, each column, each custom method carries prose describing intent — beyond what a JSON description field can hold. Org-mode's #+begin_src blocks accommodate generator expressions, custom SQL fragments, and custom repository methods natively.
  3. Co-location with the component. Models live under projects/ores.<component>/modeling/ alongside the C++ they describe, with filenames matching the dotted entity name (projects/ores.refdata/modeling/ores.refdata.party.org). This makes the model navigable from the component's directory.

Relationship to other stories

Why now

The refdata-cpp drift pilot exposed that the "restore from HEAD" workaround for custom methods is fragile, and the JSON model can't represent enough context (custom methods, why-they-cannot-be-templated prose, generator expressions as code blocks). The org POC fixed all that for one entity (party). Doing it once now is far cheaper than continuing the drift work against an inadequate model format and migrating later.

Status

Field Value
State DONE
Parent sprint Sprint 20
Now Nothing.
Waiting on Nothing.
Next Nothing.
Last touched 2026-06-11

Acceptance

  • Every entity has a single projects/ores.<component>/modeling/ores.<component>.<entity>.org file.
  • projects/ores.codegen/models/ directory is empty (or removed) once the migration completes; codegen no longer reads JSON entity models.
  • compass add entity-org scaffolds a new entity org file with the correct frontmatter, sections, and a pointer to the meta-model.
  • Codegen discovers org models from projects/ores.<component>/modeling/.
  • For every entity, the regenerated output (C++ and SQL) matches what the JSON-driven path produced before migration. Differences are explicit, reviewed, and approved per-entity.
  • cmake --build --preset linux-clang-debug-make passes after each per-component migration.
  • ctest -R "ores\.<component>" passes after each per-component migration.

Tasks

Task State Start End Description
Compass scaffold for entity org-models DONE   2026-06-05 New compass doc-type (ores.codegen.entity) that scaffolds the standard frontmatter, sections, and meta-model pointer.
Co-locate pilot + component-dir discovery DONE   2026-06-05 Move pilot to projects/ores.refdata/modeling/; codegen reads frontmatter to discover entity models.
Migrate refdata entity models to org DONE   2026-06-05 ~25 entities. First component-wide migration after the pilot is co-located.
Migrate trading entity models to org DONE   2026-06-05 ~35 entities. Largest component; exercises service_pagination / service_batch_get.
Migrate iam entity models to org DONE   2026-06-05 Small.
Migrate dq entity models to org DONE   2026-06-05 Small.
Migrate analytics entity models to org DONE   2026-06-05 Small.
Migrate reporting entity models to org DONE   2026-06-05 Small.
Migrate scheduler entity models to org DONE   2026-06-05 Small.
Migrate workflow entity models to org DONE   2026-06-05 Small.
Migrate controller entity models to org DONE   2026-06-05 Small.
Migrate database entity models to org DONE   2026-06-05 Small.
Migrate workspace entity models to org DONE   2026-06-05 Includes hand-crafted protocol exclusions (now sourced from org).
Migrate compute entity models to org DONE   2026-06-05 Heavy exclusion catalogue (now sourced from org).
Extend converter + org_loader for SQL array-structured fields DONE   2026-06-06 indexes / extra_checks / text_code_validations / extra_delete_sets — needed for --profile sql round-trip. Surfaced by PR #976 review.
Externalise codegen component manifest to a config file DONE   2026-06-06 Move COMPONENTS from a Python dict in manifest.py to a declarative config (JSON/TOML/YAML) loaded at startup. Surfaced during workflow migration.
Inventory remaining JSON models + plan their migration to org-mode DONE 2026-06-01 2026-06-05 Inventory complete; 8 per-kind follow-on tasks scaffolded below.
Migrate enum JSON models to org-mode DONE 2026-06-02 2026-06-02 All 3 enum JSONs turned out to be orphan (no consumers). Closed via retirement; enum kind exits the inventory.
Migrate trading field-group JSON models to org-mode DONE 2026-06-02 2026-06-02 5 files. New #+type: ores.codegen.field_group shape + loader + dispatch; byte-identical regen for 4 of 5 (one pre-existing hand-tune in the .hpp).
Migrate junction JSON models to org-mode DONE 2026-06-02 2026-06-02 7 files across refdata / dq / iam / compute. New #+type: ores.codegen.junction shape + loader + dispatch; byte-identical SQL regen for all 7.
Migrate refdata table JSON models to org-mode DONE 2026-06-02 2026-06-02 10 files. New #+type: ores.codegen.table shape + loader + dispatch; loader returns byte-identical dict vs JSON for all 10, SQL byte-identical end-to-end.
Migrate lookup-entity JSON models to org-mode DONE 2026-06-02 2026-06-02 12 files. New #+type: ores.codegen.lookup_entity shape + loader + dispatch; loader returns byte-identical dict vs JSON for all 12, SQL byte-identical end-to-end (org-path == JSON-path).
Migrate the services service_registry JSON to org-mode DONE 2026-06-02 2026-06-02 1 file → ores.services.service_registry.org. New #+type: ores.codegen.service_registry shape + loader + dispatch; loader returns byte-identical dict vs JSON, all 5 service-registry profile outputs byte-identical end-to-end.
Migrate component JSON models to org-mode DONE 2026-06-02 2026-06-02 31 files. New #+type: ores.codegen.component shape + loader + dispatch; loader returns byte-identical dict vs JSON for all 31. Standalone _component.org form (option B); overview-merge (option A) deferred to a follow-up task.
Merge component models into component_overview.org DONE 2026-06-02 2026-06-02 26 PAIR overviews gained #+name:, #+full_name:, #+brief:; 5 group-level ORPHAN overviews created (compute, controller, reporting, trading, workflow); 31 standalone *_component.org deleted. Loader reads scalars from frontmatter. Regen drift on scaffolds left to the post-regroup validation task.
Align new-component flow with the overview-merge model DONE 2026-06-02 2026-06-05 Reconcile doc_component template, codegen-add-component + doc-add-component-model skills, runbook, recipe with the merged-overview pattern; prove end-to-end with flat + composite sample components.
Migrate slovaris reference-data JSONs to org-mode ABANDONED   2026-06-02 Superseded by Introduce ores.seeder component for database test-data generation. Brainstorm concluded bulk data doesn't benefit from literate org; the right move is a dedicated component, not a shape change.
Retire plantuml_er_model.json from models/ DONE 2026-06-02 2026-06-02 Moved to build/output/codegen/ (already gitignored); plantuml_er_generate.sh updated.
Validate codegen after subcomponent regroup DONE   2026-06-05 profiles.json still carries 12 stale projects/ores.qt.{component}/... outputs (and probably cli/http/nats too). Re-verify merged migrations end-to-end against the new layout.
Cross-link entity / component overview / schema org docs DONE 2026-06-10 2026-06-10 Wire the four load-bearing edges: group overview → sub-components, group overview → entities, entity → group, entity → schema (lookup tables). Includes creating 11 missing top-level group overviews.
Update instrument entity models for the nested-struct decomposition DONE   2026-06-06 The 9 rates instrument org models still describe the flat pre-C1202 shapes; regeneration would clobber PRs 1047/1071/1075/1083/1085. Add instrument_identity (trading.api) and audit_record (dq.api) field-group models and re-express the instrument entities as field-group compositions, mirroring how trade composes its five groups.
Teach entity templates the field-group contract; zero-diff regen DONE 2026-06-10 2026-06-10 Phase C of the instrument-models pass (#1103): org_loader and the entity templates consume :domain_identity_group:/:domain_audit_group:/:group: annotations so domain class and mapper regeneration is zero-diff against the decomposed instruments; then delete the dead generate_trading_instruments.sh and lift the do-not-regenerate warnings from the 9 rates models.
Restore component regeneration over org models; revalidate currency DONE 2026-06-06 2026-06-06 codegen.py regenerate/list still reads the deleted projects/ores.codegen/models JSON tree and reports zero models for every component — the org migration taught single-model generate to discover org models but left the component registry behind, so no zero-diff validation can run anywhere. Point the registry at projects/<component>/modeling/*.org, make regenerate work over org models, then rerun the refdata SQL regeneration and confirm currency (and the other nine refdata tables) are still zero-diff against HEAD. Unblocks Commission: country's verify-codegen task and is a precondition for the CI zero-diff invariant story.
Reconcile refdata org models with production SQL DONE 2026-06-10 2026-06-10 Restoring component regeneration revealed 29 drifted refdata SQL files (161/-487). Three drift classes: (1) notify-channel renames — templates emit ores_refdata_<entity> while production SQL and C+ subscribers (e.g. application.cpp ores_currency_market_tiers) use unprefixed names: pick the convention, fix models or C++, never both silently; (2) lossy migration — org models for books, parties, counterparties and others lack the soft-FK validation blocks production DDL has, so regeneration deletes real logic; (3) net-new outputs — models generate triggers production never had (rounding_types notify). Reconcile per entity until refdata regeneration is zero-diff; this is the precondition for the CI zero-diff invariant story.
Reconcile profiles.json + components.json and migrate to org-mode DONE 2026-06-10 2026-06-10 profiles.json and components.json both live in library/ and both drive codegen behaviour, but they model different concerns and their relationship (and the right shape for each) has never been articulated. Audit what each file models, whether the data is in the right location, whether the two should be merged or kept separate, then represent the result as org-mode literate source (consistent with the rest of the library).
Migrate library/data/modeline.json to MASD-style org model DONE   2026-06-10 modeline.json holds three editor modeline strings (sql, c++, cmake) consumed by generator.py to inject mode-lines into generated file headers. Migrate to a MASD-style org hierarchy (module > modeline_group > modeline > content attribute), following the Dogen masd.org pattern; update generator.py to read from the org source; delete modeline.json.
Migrate profiles.json to a literate org-mode source ABANDONED   2026-06-10 Superseded by the broader reconcile task above (90ED9696); analysis folded into its Notes. Was: create a literate org source that tangles to profiles.json.
Improve meta-model for codegen input org files DONE 2026-06-11 2026-06-12 The codegen library uses three org files as direct inputs to the generator (facet_catalogue.org, component_catalogue.org, modeline.org), all typed as codegen_config. This is too generic: each file has a distinct structure and role that should be reflected in a specific type with a documented schema. Additionally, modeline.org already uses MASD stereotypes (module, modeline_group, modeline, attribute) — review whether these stereotypes, and others from the MASD concept library, can clarify the structure of all codegen input org files.

Notes

Naming convention

Each org file is named after the fully-qualified entity:

Component Example filename
refdata ores.refdata.party.org
trading ores.trading.vanilla_swap_instrument.org
iam ores.iam.tenant.org

The file lives in projects/ores.<component>/modeling/.

Component-side discovery

Codegen's manifest changes so each Component entry points at the component's modeling/ directory instead of (or in addition to) projects/ores.codegen/models/<short>/. The exact mechanism is settled in the first co-location task; until then, the pilot stays in projects/ores.codegen/models/refdata/.

Migration order

  1. refdata first, because the pilot file lives there and it's the most heterogeneous component.
  2. trading next, because it has the largest model set and exercises the service_pagination / service_batch_get flags.
  3. iam, dq, analytics, reporting: small, low-risk.
  4. scheduler, workflow: small.
  5. controller, database, workspace, compute: small to medium.

Per-component tasks

Each component task does the following:

  1. Author each entity's ores.<component>.<entity>.org file by hand or by a one-shot conversion script.
  2. Delete the corresponding *_domain_entity.json (and *_table.json if present).
  3. Verify regenerated output matches HEAD byte-for-byte.
  4. Build and run tests.
  5. Commit; raise PR; merge.

Open questions

  • Do junction tables stay as JSON, or migrate to org too? The POC only covers domain entities; junction tables are a small minority. Suggest defer to a follow-up task at the end of this story.
  • Do we tangle the org files, or have codegen read them directly? The POC reads org directly (no tangle step). Keep that until a real reason emerges.

Emacs 29.1 (Org mode 9.6.6)