ores.codegen architecture

Summary
Detail
See also

Summary

This document records how ores.codegen is built and how its parts fit together: where its code and data live, how the generator picks templates for models, how it handles modelines and licence headers, and the conventions it enforces.

For what the component is and how to operate it, see ores.codegen (the component model) and the recipes it links to.

Detail

Directory structure

projects/ores.codegen/
├── library/
│   ├── component_catalogue.org  # Component → models-dir/glob/modeling-dir (read by manifest.py)
│   ├── data/                # Static data (licences, modelines)
│   │   ├── licence-GPL-v3.txt
│   │   └── modeline.org     # Editor modelines — MASD-style org, read directly by generator.py
│   ├── facet_catalogue.org  # Profile → model-type → template mapping (read by generator.py)
│   └── templates/           # Mustache templates
│       ├── sql_*.mustache
│       └── doc_*.org.mustache
├── models/                  # JSON model files driving generation
│   └── <component>/         # e.g. refdata/, trade/, dq/, iam/
├── output/                  # Default destination for generated files
├── scripts/                 # Shell scripts for common operations
├── src/                     # Python source
│   ├── codegen/             # CLI package (codegen.py entry point)
│   │   ├── __init__.py
│   │   ├── generate.py      # generate / regenerate subcommands
│   │   ├── manifest.py      # named component registry
│   │   └── diff.py          # diff subcommand
│   ├── generator.py         # Core generator: JSON models + templates → output
│   ├── doc_generate.py      # v2 information-architecture doc generator
│   ├── fpml_parser.py       # FPML Genericode XML → JSON models
│   ├── images_generate_sql.py
│   ├── lei_extract_subset.py
│   └── iso_generate_metadata_sql.py
├── modeling/
│   └── ores.codegen.org     # Component model (the entry point)
├── docs/
│   ├── architecture.org     # This document
│   ├── cpp_generation_analysis.md
│   └── doc_generator.md
├── requirements.txt
├── codegen.sh               # Wrapper: activates venv and calls codegen.py
├── run_generator.sh         # Legacy wrapper (kept for non-CLI uses)
├── generate_doc.sh          # Wrapper for doc_generate.py
└── generate_*_schema.sh     # Legacy per-component schema scripts (being retired)

Internal modules

File	Purpose
`src/generator.py`	Core generator: JSON models + Mustache templates → SQL / C++ / etc.
`src/codegen/generate.py`	`generate` and `regenerate` subcommand implementations
`src/codegen/manifest.py`	Named component registry (maps component name → models dir + glob)
`src/codegen/diff.py`	`diff` subcommand implementation
`src/doc_generate.py`	v2 information-architecture document generator (task/story/sprint/…)
`src/fpml_parser.py`	FPML Genericode XML → JSON models
`src/iso_generate_metadata_sql.py`	ISO standards → SQL
`src/images_generate_sql.py`	Image artefacts (flags, crypto icons) → SQL
`src/lei_extract_subset.py`	LEI dataset subset extractor
`library/facet_catalogue.org`	Declares which templates each profile runs per model type; read directly by `generator.py`
`library/component_catalogue.org`	Maps component names to discovery roots; read directly by `manifest.py`
`library/data/modeline.org`	Editor modeline strings per language; read directly by `generator.py`
`library/data/`	Static data files (licences, modelines)
`library/templates/`	Mustache templates
`models/`	JSON model files
`output/`	Default destination for generated files

Main generator functions

In src/generator.py:

is_table_model(filename) — returns True for *_table.json filenames.
get_model_type(filename) — maps filename suffix to model type string (table, schema, domain_entity, junction, …).
load_profiles(base_dir) — parse library/facet_catalogue.org directly via _load_profiles_from_org().
resolve_profile_templates(profile, profiles, model_type) — return the template list for a given profile and model type.
resolve_output_path(pattern, model_data, model_type) — expand an output path pattern using model fields.
load_data(data_dir) — load JSON and text files from a data directory.
render_template(template_path, data) — render a Mustache template with the given data.
generate_from_model() — orchestrate generation for one model file.

Template system

Mustache via the pystache library.
Templates live in library/templates/. The .mustache files are generated artefacts, tangled from the literate facet docs in the same directory — see the Codegen template library overview for the hierarchy, tangle workflow, and drift checks.
Output is SQL, C++, or org-mode depending on the template family.

Data files

library/data/licence-GPL-v3.txt — full GPL v3 licence text used in generated headers.
library/data/modeline.org — per-language editor modeline strings (MASD-style org hierarchy; read directly by generator.py via _load_modelines_from_org()).

Model types and file naming

The generator detects a model's type from its filename suffix. Each type has a root JSON key of the same name and is associated with one or more templates through library/facet_catalogue.org.

Filename suffix	Model type	Root JSON key	Primary template
`*_table.json`	`table`	`table`	`sql_schema_create.mustache` (profile: sql)
`*_entity.json`	`schema`	`entity`	`sql_schema_table_create.mustache` (legacy)
`*_domain_entity.json`	`domain_entity`	`domain_entity`	C++ templates + `sql_schema_domain_entity_create.mustache`
`*_junction.json`	`junction`	(varies)	`sql_schema_junction_create.mustache`

The _table.json format is the current standard for SQL-only entity tables (refdata and similar). It replaces the legacy _entity.json → sql_schema_table_create.mustache path.

Model-template mapping

The active mapping is declared in library/facet_catalogue.org, not hard-coded in Python. Each profile entry lists which templates apply to which model types and the output path pattern.

Table schema mappings (`*_table.json` files)

These files drive unified SQL generation via sql_schema_create.mustache. Run with codegen.sh regenerate --component <name> --profile sql.

Template	Output file pattern
`sql_schema_create.mustache`	`projects/ores.sql/create/{component}/{component}_{entity_plural}_create.sql`

The *_table.json model root key is table with the following fields:

Field	Type	Description
`schema`, `product`, `component`	string	Namespace identifiers
`entity_singular`, `entity_plural`	string	Used in table name and output filename
`description`	string	Appears in the SQL header comment
`has_tenant_id`	boolean	Whether the table has a `tenant_id` column
`primary_key.column`	string	Primary key column name
`primary_key.type`	string	SQL type (e.g. `text`, `uuid`)
`primary_key.is_text`	boolean	Controls empty-string check constraint and default quoting
`columns[]`	list	Non-PK columns (name, type, nullable, default)
`coding_scheme`	`none` / `required` / `nullable`	Whether to add a `coding_scheme_code` FK column
`image_id`	boolean	Whether to add an `image_id` UUID FK column
`validation_fn.tenant_scope`	`system` / `both` / `tenant`	Which tenants the validation function queries
`validation_fn.default`	string / absent	Return value when input is null or empty
`validation_fn.order_by`	string / absent	Column for `ORDER BY` in error messages (defaults to PK)
`insert_trigger.validations[]`	list	Per-column validation function calls in the insert trigger
`check_constraints[]`	list	Additional SQL check constraints (expression strings)
`indexes[]`	list	Extra indexes beyond the standard ones

The generator preprocesses boolean flags (has_coding_scheme, has_any_coding_scheme, scope_system, scope_both, scope_tenant, etc.) and pre-renders sql_check_constraints as a single string before passing data to Mustache, to avoid pystache whitespace issues with adjacent section tags.

Domain entity schema mappings (`*_domain_entity.json` files)

These files drive both C++ generation and SQL for domain entities (party, book, counterparty, etc.). The SQL portion is handled separately from the C++ portion.

Template	Output file
`sql_schema_domain_entity_create.mustache`	`{component}_{entity}_create.sql`
`sql_schema_notify_trigger.mustache`	`{component}_{entity}_notify_trigger.sql`
`sql_schema_artefact_create.mustache`	`dq_{entity}_artefact_create.sql`
C++ templates (via `--profile all-cpp`)	`.hpp` / `.cpp` files

Standard data mappings

Model file	Template(s)
`model.json`	`sql_batch_execute.mustache`
`catalogs.json`	`sql_catalog_populate.mustache`
`country_currency.json`	`sql_flag_populate.mustache`, `sql_currency_populate.mustache`, `sql_country_populate.mustache`
`datasets.json`	`sql_dataset_populate.mustache`, `sql_dataset_dependency_populate.mustache`
`methodologies.json`	`sql_methodology_populate.mustache`
`tags.json`	`sql_tag_populate.mustache`

Entity populate mappings (`*_data.json` files)

Template	Output file
`sql_populate_refdata.mustache`	`{component}_{entity}_populate.sql`

Profile system

Profiles are declared in library/facet_catalogue.org. Each profile maps to a list of template entries; each entry specifies the template name, the output path pattern, and the model types it applies to.

Built-in profiles:

Profile	Description
`sql`	SQL DDL only (table create / domain entity create)
`all-cpp`	C++ headers, implementations, JSON and table I/O
`all`	`sql` + `all-cpp` — not allowed for `_table.json` or `_entity.json` models (see below)

The --profile all guard: running all on a SQL model (table or schema type) is refused by the CLI because a matching _domain_entity.json may exist for the same entity, and running all would silently overwrite the production SQL with the wrong template. Use --profile sql or --profile all-cpp explicitly.

Component registry

Named components are declared in library/component_catalogue.org, read directly by src/codegen/manifest.py at startup. See component_catalogue.org for the full 16-component table (name, models_dir, entity_glob, exclude_suffix, modeling_dir).

Retired scripts

The following shell scripts have been deleted and replaced by codegen.sh:

Deleted script	Replacement command
`generate_refdata_schema.sh`	`codegen.sh regenerate --component refdata --profile sql`

Modeline configurations

From library/data/modeline.org:

Language	Modeline
SQL	`sql-product: postgres; indent-tabs-mode: nil`
C++	`mode: c++; indent-tabs-mode: nil; c-basic-offset: 4`

Features

Licence generation. Generated files carry a licence header with editor modelines, a copyright with the current year, and the appropriate per-language comment formatting.
Multi-language comment support. SQL and C++ use /* ... */ with a * = line prefix; Python uses =""", JavaScript uses /** */.
Flexible output. Default output directory is output/; overridable via the second positional argument; created automatically if absent.
Overall models. A model.json can orchestrate generation of multiple artefacts in dependency order.
Dynamic prefixing. A model_name property on an overall model prefixes every output file (for example solvaris_).
Automatic sibling loading. JSON models in the same directory are loaded together so a template can cross-reference them.
Enhanced data context. Subject-area datasets (such as currencies_dataset, countries_dataset) are surfaced as named variables to templates for direct access.

Example model structure

From models/slovaris/catalogs.json:

[
    {
        "name": "Slovaris",
        "description": "Imaginary world to test all system functions.",
        "owner": "Testing Team"
    }
]

The sql_catalog_populate.mustache template generates SQL that:

Includes the enhanced licence header.
Sets the schema to ores.
Generates SQL calls to metadata.upsert_dq_catalogs().
Includes summary queries.

Extending

To add a new _table.json model for an existing component:

Create the *_table.json file in the component's models/ directory.
Run codegen.sh regenerate --component <name> --profile sql to generate the SQL.

To add a new profile or template:

Add the Mustache template to library/templates/.
Add an entry to library/facet_catalogue.org under the relevant profile heading, adding a row to the templates table with template, output pattern, and optionally model_types.
If adding a new model type, update get_model_type() in src/generator.py and add any preprocessing logic in the generate_from_model() dispatch block.

To add a new named component:

Add a row to library/component_catalogue.org with the component's models_dir, entity_glob, exclude_suffix, and modeling_dir.
The component is then available as codegen.sh regenerate --component <name>.