Story: Codegen unified model — org-mode migration
Table of Contents
This page documents a story in Sprint 20. It captures the goal, current status, acceptance criteria, and the tasks that compose it.
Goal
Replace the current per-entity JSON model files (*_domain_entity.json and
*_table.json) with a single literate org-mode file per entity. The org
file is the unified source of truth for SQL, C++, Qt, NATS protocol, and
any future facets.
Three goals at once:
- One file per entity. No more drift between separate SQL and C++ models.
- Literate documentation. Each section, each column, each custom method
carries prose describing intent — beyond what a JSON
descriptionfield can hold. Org-mode's#+begin_srcblocks accommodate generator expressions, custom SQL fragments, and custom repository methods natively. - Co-location with the component. Models live under
projects/ores.<component>/modeling/alongside the C++ they describe, with filenames matching the dotted entity name (projects/ores.refdata/modeling/ores.refdata.party.org). This makes the model navigable from the component's directory.
Relationship to other stories
- Supersedes Codegen unified model — Phase 2: single model file per entity. Phase 2 wanted to merge JSON files into one unified JSON; this story goes further by moving the unified format to org-mode.
- Blocks Refactor ores.codegen C++ generation and all its sub-tasks. The refdata-cpp pilot (and the eleven follow-on per-component drift tasks) resume once the migration lands.
- Supported by the org-mode codegen POC (task) which proved the
mechanism works end-to-end on
party.
Why now
The refdata-cpp drift pilot exposed that the "restore from HEAD" workaround
for custom methods is fragile, and the JSON model can't represent enough
context (custom methods, why-they-cannot-be-templated prose, generator
expressions as code blocks). The org POC fixed all that for one entity
(party). Doing it once now is far cheaper than continuing the drift work
against an inadequate model format and migrating later.
Status
| Field | Value |
|---|---|
| State | DONE |
| Parent sprint | Sprint 20 |
| Now | Nothing. |
| Waiting on | Nothing. |
| Next | Nothing. |
| Last touched | 2026-06-11 |
Acceptance
- Every entity has a single
projects/ores.<component>/modeling/ores.<component>.<entity>.orgfile. projects/ores.codegen/models/directory is empty (or removed) once the migration completes; codegen no longer reads JSON entity models.compass add entity-orgscaffolds a new entity org file with the correct frontmatter, sections, and a pointer to the meta-model.- Codegen discovers org models from
projects/ores.<component>/modeling/. - For every entity, the regenerated output (C++ and SQL) matches what the JSON-driven path produced before migration. Differences are explicit, reviewed, and approved per-entity.
cmake --build --preset linux-clang-debug-makepasses after each per-component migration.ctest -R "ores\.<component>"passes after each per-component migration.
Tasks
| Task | State | Start | End | Description |
|---|---|---|---|---|
| Compass scaffold for entity org-models | DONE | 2026-06-05 | New compass doc-type (ores.codegen.entity) that scaffolds the standard frontmatter, sections, and meta-model pointer. |
|
| Co-locate pilot + component-dir discovery | DONE | 2026-06-05 | Move pilot to projects/ores.refdata/modeling/; codegen reads frontmatter to discover entity models. |
|
| Migrate refdata entity models to org | DONE | 2026-06-05 | ~25 entities. First component-wide migration after the pilot is co-located. | |
| Migrate trading entity models to org | DONE | 2026-06-05 | ~35 entities. Largest component; exercises service_pagination / service_batch_get. | |
| Migrate iam entity models to org | DONE | 2026-06-05 | Small. | |
| Migrate dq entity models to org | DONE | 2026-06-05 | Small. | |
| Migrate analytics entity models to org | DONE | 2026-06-05 | Small. | |
| Migrate reporting entity models to org | DONE | 2026-06-05 | Small. | |
| Migrate scheduler entity models to org | DONE | 2026-06-05 | Small. | |
| Migrate workflow entity models to org | DONE | 2026-06-05 | Small. | |
| Migrate controller entity models to org | DONE | 2026-06-05 | Small. | |
| Migrate database entity models to org | DONE | 2026-06-05 | Small. | |
| Migrate workspace entity models to org | DONE | 2026-06-05 | Includes hand-crafted protocol exclusions (now sourced from org). | |
| Migrate compute entity models to org | DONE | 2026-06-05 | Heavy exclusion catalogue (now sourced from org). | |
| Extend converter + org_loader for SQL array-structured fields | DONE | 2026-06-06 | indexes / extra_checks / text_code_validations / extra_delete_sets — needed for --profile sql round-trip. Surfaced by PR #976 review. |
|
| Externalise codegen component manifest to a config file | DONE | 2026-06-06 | Move COMPONENTS from a Python dict in manifest.py to a declarative config (JSON/TOML/YAML) loaded at startup. Surfaced during workflow migration. | |
| Inventory remaining JSON models + plan their migration to org-mode | DONE | 2026-06-01 | 2026-06-05 | Inventory complete; 8 per-kind follow-on tasks scaffolded below. |
| Migrate enum JSON models to org-mode | DONE | 2026-06-02 | 2026-06-02 | All 3 enum JSONs turned out to be orphan (no consumers). Closed via retirement; enum kind exits the inventory. |
| Migrate trading field-group JSON models to org-mode | DONE | 2026-06-02 | 2026-06-02 | 5 files. New #+type: ores.codegen.field_group shape + loader + dispatch; byte-identical regen for 4 of 5 (one pre-existing hand-tune in the .hpp). |
| Migrate junction JSON models to org-mode | DONE | 2026-06-02 | 2026-06-02 | 7 files across refdata / dq / iam / compute. New #+type: ores.codegen.junction shape + loader + dispatch; byte-identical SQL regen for all 7. |
| Migrate refdata table JSON models to org-mode | DONE | 2026-06-02 | 2026-06-02 | 10 files. New #+type: ores.codegen.table shape + loader + dispatch; loader returns byte-identical dict vs JSON for all 10, SQL byte-identical end-to-end. |
| Migrate lookup-entity JSON models to org-mode | DONE | 2026-06-02 | 2026-06-02 | 12 files. New #+type: ores.codegen.lookup_entity shape + loader + dispatch; loader returns byte-identical dict vs JSON for all 12, SQL byte-identical end-to-end (org-path == JSON-path). |
| Migrate the services service_registry JSON to org-mode | DONE | 2026-06-02 | 2026-06-02 | 1 file → ores.services.service_registry.org. New #+type: ores.codegen.service_registry shape + loader + dispatch; loader returns byte-identical dict vs JSON, all 5 service-registry profile outputs byte-identical end-to-end. |
| Migrate component JSON models to org-mode | DONE | 2026-06-02 | 2026-06-02 | 31 files. New #+type: ores.codegen.component shape + loader + dispatch; loader returns byte-identical dict vs JSON for all 31. Standalone _component.org form (option B); overview-merge (option A) deferred to a follow-up task. |
| Merge component models into component_overview.org | DONE | 2026-06-02 | 2026-06-02 | 26 PAIR overviews gained #+name:, #+full_name:, #+brief:; 5 group-level ORPHAN overviews created (compute, controller, reporting, trading, workflow); 31 standalone *_component.org deleted. Loader reads scalars from frontmatter. Regen drift on scaffolds left to the post-regroup validation task. |
| Align new-component flow with the overview-merge model | DONE | 2026-06-02 | 2026-06-05 | Reconcile doc_component template, codegen-add-component + doc-add-component-model skills, runbook, recipe with the merged-overview pattern; prove end-to-end with flat + composite sample components. |
| Migrate slovaris reference-data JSONs to org-mode | ABANDONED | 2026-06-02 | Superseded by Introduce ores.seeder component for database test-data generation. Brainstorm concluded bulk data doesn't benefit from literate org; the right move is a dedicated component, not a shape change. | |
| Retire plantuml_er_model.json from models/ | DONE | 2026-06-02 | 2026-06-02 | Moved to build/output/codegen/ (already gitignored); plantuml_er_generate.sh updated. |
| Validate codegen after subcomponent regroup | DONE | 2026-06-05 | profiles.json still carries 12 stale projects/ores.qt.{component}/... outputs (and probably cli/http/nats too). Re-verify merged migrations end-to-end against the new layout. |
|
| Cross-link entity / component overview / schema org docs | DONE | 2026-06-10 | 2026-06-10 | Wire the four load-bearing edges: group overview → sub-components, group overview → entities, entity → group, entity → schema (lookup tables). Includes creating 11 missing top-level group overviews. |
| Update instrument entity models for the nested-struct decomposition | DONE | 2026-06-06 | The 9 rates instrument org models still describe the flat pre-C1202 shapes; regeneration would clobber PRs 1047/1071/1075/1083/1085. Add instrument_identity (trading.api) and audit_record (dq.api) field-group models and re-express the instrument entities as field-group compositions, mirroring how trade composes its five groups. | |
| Teach entity templates the field-group contract; zero-diff regen | DONE | 2026-06-10 | 2026-06-10 | Phase C of the instrument-models pass (#1103): org_loader and the entity templates consume :domain_identity_group:/:domain_audit_group:/:group: annotations so domain class and mapper regeneration is zero-diff against the decomposed instruments; then delete the dead generate_trading_instruments.sh and lift the do-not-regenerate warnings from the 9 rates models. |
| Restore component regeneration over org models; revalidate currency | DONE | 2026-06-06 | 2026-06-06 | codegen.py regenerate/list still reads the deleted projects/ores.codegen/models JSON tree and reports zero models for every component — the org migration taught single-model generate to discover org models but left the component registry behind, so no zero-diff validation can run anywhere. Point the registry at projects/<component>/modeling/*.org, make regenerate work over org models, then rerun the refdata SQL regeneration and confirm currency (and the other nine refdata tables) are still zero-diff against HEAD. Unblocks Commission: country's verify-codegen task and is a precondition for the CI zero-diff invariant story. |
| Reconcile refdata org models with production SQL | DONE | 2026-06-10 | 2026-06-10 | Restoring component regeneration revealed 29 drifted refdata SQL files ( |
| Reconcile profiles.json + components.json and migrate to org-mode | DONE | 2026-06-10 | 2026-06-10 | profiles.json and components.json both live in library/ and both drive codegen behaviour, but they model different concerns and their relationship (and the right shape for each) has never been articulated. Audit what each file models, whether the data is in the right location, whether the two should be merged or kept separate, then represent the result as org-mode literate source (consistent with the rest of the library). |
| Migrate library/data/modeline.json to MASD-style org model | DONE | 2026-06-10 | modeline.json holds three editor modeline strings (sql, c++, cmake) consumed by generator.py to inject mode-lines into generated file headers. Migrate to a MASD-style org hierarchy (module > modeline_group > modeline > content attribute), following the Dogen masd.org pattern; update generator.py to read from the org source; delete modeline.json. | |
| Migrate profiles.json to a literate org-mode source | ABANDONED | 2026-06-10 | Superseded by the broader reconcile task above (90ED9696); analysis folded into its Notes. Was: create a literate org source that tangles to profiles.json. | |
| Improve meta-model for codegen input org files | DONE | 2026-06-11 | 2026-06-12 | The codegen library uses three org files as direct inputs to the generator (facet_catalogue.org, component_catalogue.org, modeline.org), all typed as codegen_config. This is too generic: each file has a distinct structure and role that should be reflected in a specific type with a documented schema. Additionally, modeline.org already uses MASD stereotypes (module, modeline_group, modeline, attribute) — review whether these stereotypes, and others from the MASD concept library, can clarify the structure of all codegen input org files. |
Notes
Naming convention
Each org file is named after the fully-qualified entity:
| Component | Example filename |
|---|---|
| refdata | ores.refdata.party.org |
| trading | ores.trading.vanilla_swap_instrument.org |
| iam | ores.iam.tenant.org |
The file lives in projects/ores.<component>/modeling/.
Component-side discovery
Codegen's manifest changes so each Component entry points at the
component's modeling/ directory instead of (or in addition to)
projects/ores.codegen/models/<short>/. The exact mechanism is settled in
the first co-location task; until then, the pilot stays in
projects/ores.codegen/models/refdata/.
Migration order
- refdata first, because the pilot file lives there and it's the most heterogeneous component.
- trading next, because it has the largest model set and exercises the service_pagination / service_batch_get flags.
- iam, dq, analytics, reporting: small, low-risk.
- scheduler, workflow: small.
- controller, database, workspace, compute: small to medium.
Per-component tasks
Each component task does the following:
- Author each entity's
ores.<component>.<entity>.orgfile by hand or by a one-shot conversion script. - Delete the corresponding
*_domain_entity.json(and*_table.jsonif present). - Verify regenerated output matches HEAD byte-for-byte.
- Build and run tests.
- Commit; raise PR; merge.
Open questions
- Do junction tables stay as JSON, or migrate to org too? The POC only covers domain entities; junction tables are a small minority. Suggest defer to a follow-up task at the end of this story.
- Do we tangle the org files, or have codegen read them directly? The POC reads org directly (no tangle step). Keep that until a real reason emerges.