ores.codegen architecture

Table of Contents

Summary

This document records how ores.codegen is built and how its parts fit together: where its code and data live, how the generator picks templates for models, how it handles modelines and licence headers, and the conventions it enforces.

For what the component is and how to operate it, see ores.codegen (the component model) and the recipes it links to.

Detail

Directory structure

projects/ores.codegen/
├── library/
│   ├── component_catalogue.org  # Component → models-dir/glob/modeling-dir (read by manifest.py)
│   ├── data/                # Static data (licences, modelines)
│   │   ├── licence-GPL-v3.txt
│   │   └── modeline.org     # Editor modelines — MASD-style org, read directly by generator.py
│   ├── facet_catalogue.org  # Profile → model-type → template mapping (read by generator.py)
│   └── templates/           # Mustache templates
│       ├── sql_*.mustache
│       └── doc_*.org.mustache
├── models/                  # JSON model files driving generation
│   └── <component>/         # e.g. refdata/, trade/, dq/, iam/
├── output/                  # Default destination for generated files
├── scripts/                 # Shell scripts for common operations
├── src/                     # Python source
│   ├── codegen/             # CLI package (codegen.py entry point)
│   │   ├── __init__.py
│   │   ├── generate.py      # generate / regenerate subcommands
│   │   ├── manifest.py      # named component registry
│   │   └── diff.py          # diff subcommand
│   ├── generator.py         # Core generator: JSON models + templates → output
│   ├── doc_generate.py      # v2 information-architecture doc generator
│   ├── fpml_parser.py       # FPML Genericode XML → JSON models
│   ├── images_generate_sql.py
│   ├── lei_extract_subset.py
│   └── iso_generate_metadata_sql.py
├── modeling/
│   └── ores.codegen.org     # Component model (the entry point)
├── docs/
│   ├── architecture.org     # This document
│   ├── cpp_generation_analysis.md
│   └── doc_generator.md
├── requirements.txt
├── codegen.sh               # Wrapper: activates venv and calls codegen.py
├── run_generator.sh         # Legacy wrapper (kept for non-CLI uses)
├── generate_doc.sh          # Wrapper for doc_generate.py
└── generate_*_schema.sh     # Legacy per-component schema scripts (being retired)

Internal modules

File Purpose
src/generator.py Core generator: JSON models + Mustache templates → SQL / C++ / etc.
src/codegen/generate.py generate and regenerate subcommand implementations
src/codegen/manifest.py Named component registry (maps component name → models dir + glob)
src/codegen/diff.py diff subcommand implementation
src/doc_generate.py v2 information-architecture document generator (task/story/sprint/…)
src/fpml_parser.py FPML Genericode XML → JSON models
src/iso_generate_metadata_sql.py ISO standards → SQL
src/images_generate_sql.py Image artefacts (flags, crypto icons) → SQL
src/lei_extract_subset.py LEI dataset subset extractor
library/facet_catalogue.org Declares which templates each profile runs per model type; read directly by generator.py
library/component_catalogue.org Maps component names to discovery roots; read directly by manifest.py
library/data/modeline.org Editor modeline strings per language; read directly by generator.py
library/data/ Static data files (licences, modelines)
library/templates/ Mustache templates
models/ JSON model files
output/ Default destination for generated files

Main generator functions

In src/generator.py:

  • is_table_model(filename) — returns True for *_table.json filenames.
  • get_model_type(filename) — maps filename suffix to model type string (table, schema, domain_entity, junction, …).
  • load_profiles(base_dir) — parse library/facet_catalogue.org directly via _load_profiles_from_org().
  • resolve_profile_templates(profile, profiles, model_type) — return the template list for a given profile and model type.
  • resolve_output_path(pattern, model_data, model_type) — expand an output path pattern using model fields.
  • load_data(data_dir) — load JSON and text files from a data directory.
  • render_template(template_path, data) — render a Mustache template with the given data.
  • generate_from_model() — orchestrate generation for one model file.

Template system

  • Mustache via the pystache library.
  • Templates live in library/templates/. The .mustache files are generated artefacts, tangled from the literate facet docs in the same directory — see the Codegen template library overview for the hierarchy, tangle workflow, and drift checks.
  • Output is SQL, C++, or org-mode depending on the template family.

Data files

  • library/data/licence-GPL-v3.txt — full GPL v3 licence text used in generated headers.
  • library/data/modeline.org — per-language editor modeline strings (MASD-style org hierarchy; read directly by generator.py via _load_modelines_from_org()).

Model types and file naming

The generator detects a model's type from its filename suffix. Each type has a root JSON key of the same name and is associated with one or more templates through library/facet_catalogue.org.

Filename suffix Model type Root JSON key Primary template
*_table.json table table sql_schema_create.mustache (profile: sql)
*_entity.json schema entity sql_schema_table_create.mustache (legacy)
*_domain_entity.json domain_entity domain_entity C++ templates + sql_schema_domain_entity_create.mustache
*_junction.json junction (varies) sql_schema_junction_create.mustache

The _table.json format is the current standard for SQL-only entity tables (refdata and similar). It replaces the legacy _entity.jsonsql_schema_table_create.mustache path.

Model-template mapping

The active mapping is declared in library/facet_catalogue.org, not hard-coded in Python. Each profile entry lists which templates apply to which model types and the output path pattern.

Table schema mappings (*_table.json files)

These files drive unified SQL generation via sql_schema_create.mustache. Run with codegen.sh regenerate --component <name> --profile sql.

Template Output file pattern
sql_schema_create.mustache projects/ores.sql/create/{component}/{component}_{entity_plural}_create.sql

The *_table.json model root key is table with the following fields:

Field Type Description
schema, product, component string Namespace identifiers
entity_singular, entity_plural string Used in table name and output filename
description string Appears in the SQL header comment
has_tenant_id boolean Whether the table has a tenant_id column
primary_key.column string Primary key column name
primary_key.type string SQL type (e.g. text, uuid)
primary_key.is_text boolean Controls empty-string check constraint and default quoting
columns[] list Non-PK columns (name, type, nullable, default)
coding_scheme none / required / nullable Whether to add a coding_scheme_code FK column
image_id boolean Whether to add an image_id UUID FK column
validation_fn.tenant_scope system / both / tenant Which tenants the validation function queries
validation_fn.default string / absent Return value when input is null or empty
validation_fn.order_by string / absent Column for ORDER BY in error messages (defaults to PK)
insert_trigger.validations[] list Per-column validation function calls in the insert trigger
check_constraints[] list Additional SQL check constraints (expression strings)
indexes[] list Extra indexes beyond the standard ones

The generator preprocesses boolean flags (has_coding_scheme, has_any_coding_scheme, scope_system, scope_both, scope_tenant, etc.) and pre-renders sql_check_constraints as a single string before passing data to Mustache, to avoid pystache whitespace issues with adjacent section tags.

Domain entity schema mappings (*_domain_entity.json files)

These files drive both C++ generation and SQL for domain entities (party, book, counterparty, etc.). The SQL portion is handled separately from the C++ portion.

Template Output file
sql_schema_domain_entity_create.mustache {component}_{entity}_create.sql
sql_schema_notify_trigger.mustache {component}_{entity}_notify_trigger.sql
sql_schema_artefact_create.mustache dq_{entity}_artefact_create.sql
C++ templates (via --profile all-cpp) .hpp / .cpp files

Standard data mappings

Model file Template(s)
model.json sql_batch_execute.mustache
catalogs.json sql_catalog_populate.mustache
country_currency.json sql_flag_populate.mustache, sql_currency_populate.mustache, sql_country_populate.mustache
datasets.json sql_dataset_populate.mustache, sql_dataset_dependency_populate.mustache
methodologies.json sql_methodology_populate.mustache
tags.json sql_tag_populate.mustache

Entity populate mappings (*_data.json files)

Template Output file
sql_populate_refdata.mustache {component}_{entity}_populate.sql

Profile system

Profiles are declared in library/facet_catalogue.org. Each profile maps to a list of template entries; each entry specifies the template name, the output path pattern, and the model types it applies to.

Built-in profiles:

Profile Description
sql SQL DDL only (table create / domain entity create)
all-cpp C++ headers, implementations, JSON and table I/O
all sql + all-cppnot allowed for _table.json or _entity.json models (see below)

The --profile all guard: running all on a SQL model (table or schema type) is refused by the CLI because a matching _domain_entity.json may exist for the same entity, and running all would silently overwrite the production SQL with the wrong template. Use --profile sql or --profile all-cpp explicitly.

Component registry

Named components are declared in library/component_catalogue.org, read directly by src/codegen/manifest.py at startup. See component_catalogue.org for the full 16-component table (name, models_dir, entity_glob, exclude_suffix, modeling_dir).

Retired scripts

The following shell scripts have been deleted and replaced by codegen.sh:

Deleted script Replacement command
generate_refdata_schema.sh codegen.sh regenerate --component refdata --profile sql

Modeline configurations

From library/data/modeline.org:

Language Modeline
SQL sql-product: postgres; indent-tabs-mode: nil
C++ mode: c++; indent-tabs-mode: nil; c-basic-offset: 4

Features

  • Licence generation. Generated files carry a licence header with editor modelines, a copyright with the current year, and the appropriate per-language comment formatting.
  • Multi-language comment support. SQL and C++ use /* ... */ with a * = line prefix; Python uses =""", JavaScript uses /** */.
  • Flexible output. Default output directory is output/; overridable via the second positional argument; created automatically if absent.
  • Overall models. A model.json can orchestrate generation of multiple artefacts in dependency order.
  • Dynamic prefixing. A model_name property on an overall model prefixes every output file (for example solvaris_).
  • Automatic sibling loading. JSON models in the same directory are loaded together so a template can cross-reference them.
  • Enhanced data context. Subject-area datasets (such as currencies_dataset, countries_dataset) are surfaced as named variables to templates for direct access.

Example model structure

From models/slovaris/catalogs.json:

[
    {
        "name": "Slovaris",
        "description": "Imaginary world to test all system functions.",
        "owner": "Testing Team"
    }
]

The sql_catalog_populate.mustache template generates SQL that:

  1. Includes the enhanced licence header.
  2. Sets the schema to ores.
  3. Generates SQL calls to metadata.upsert_dq_catalogs().
  4. Includes summary queries.

Extending

To add a new _table.json model for an existing component:

  1. Create the *_table.json file in the component's models/ directory.
  2. Run codegen.sh regenerate --component <name> --profile sql to generate the SQL.

To add a new profile or template:

  1. Add the Mustache template to library/templates/.
  2. Add an entry to library/facet_catalogue.org under the relevant profile heading, adding a row to the templates table with template, output pattern, and optionally model_types.
  3. If adding a new model type, update get_model_type() in src/generator.py and add any preprocessing logic in the generate_from_model() dispatch block.

To add a new named component:

  1. Add a row to library/component_catalogue.org with the component's models_dir, entity_glob, exclude_suffix, and modeling_dir.
  2. The component is then available as codegen.sh regenerate --component <name>.

See also

Emacs 29.1 (Org mode 9.6.6)