OreStudio Code Generator (ores.codegen)
Table of Contents
Overview
The OreStudio Code Generator is a Python-based tool that uses JSON models and Mustache templates to generate code. It's designed to be a simple code generator that takes structured data and applies templates to produce output files.
The project generates SQL files for the ORE Studio data quality system with proper licensing and modeline support.
Setup
Install dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Dependencies:
pystache>=0.6.0- Mustache templating for Python
Usage
Direct Usage
Run the code generator using the provided script:
./run_generator.sh <model_path> [output_dir]
Examples:
# Using default output directory (output/) ./run_generator.sh models/slovaris/catalogs.json # Using custom output directory ./run_generator.sh models/slovaris/catalogs.json custom_output/
Or run directly with Python:
python src/generator.py <model_path> [output_dir]
Batch Execution (Overall Models)
You can define an overall model.json that references multiple files. The
generator will automatically process all dependent models first.
./run_generator.sh models/slovaris/model.json
Using Profiles
Profiles allow you to generate all templates for a specific facet in one command.
Profiles are defined in library/profiles.json.
# List available profiles ./run_generator.sh --list-profiles # Generate all Qt UI components for a domain entity ./run_generator.sh models/dq/dataset_bundle_domain_entity.json output/ --profile qt # Generate all C++ domain facets (domain type, repository, service, protocol) ./run_generator.sh models/dq/some_schema.json output/ --profile all-cpp # Generate SQL schema files ./run_generator.sh models/dq/dataset_bundle_domain_entity.json output/ --profile sql
Available profiles:
| Profile | Description | Model Types |
|---|---|---|
sql |
SQL schema creation scripts | domainentity, junction |
qt |
Qt UI components (model, window, dialogs, controller) | domainentity |
protocol |
Messaging protocol (request/response) | domainentity, schema |
domain |
Domain types (class, JSON I/O, table) | schema |
generator |
Fake data generators for testing | schema |
repository |
Repository layer (entity, mapper, CRUD) | schema |
service |
Service layer | schema |
all-cpp |
All C++ facets combined | schema |
plantuml |
PlantUML ER diagrams | schema, data |
Slovaris Generation
A dedicated script is provided to generate all Slovaris artefacts and place them
in the correct location in the ores.sql project:
./generate_slovaris.sh
FPML Reference Data Generation
Generate SQL schema and populate scripts from FPML Genericode XML files:
# Generate all FPML reference data (parses XML + generates SQL) ./generate_fpml_refdata.sh # Generate only specific entities ./generate_fpml_refdata.sh --entities 'party-roles person-roles' # Skip parsing, just regenerate SQL from existing models ./generate_fpml_refdata.sh --skip-parse # Show help ./generate_fpml_refdata.sh --help
This script:
- Parses FPML XML files from
projects/ores.sql/populate/data/ - Generates JSON entity models to
output/models/ - Generates SQL schema files to
projects/ores.sql/create/ - Generates SQL populate files to
projects/ores.sql/populate/
Output files per entity (e.g., party_roles):
| File | Location |
|---|---|
refdata_party_roles_create.sql |
projects/ores.sql/create/ |
refdata_party_roles_notify_trigger.sql |
projects/ores.sql/create/ |
dq_party_roles_artefact_create.sql |
projects/ores.sql/create/ |
refdata_party_roles_populate.sql |
projects/ores.sql/populate/ |
Plus the shared coding schemes file: fpml_coding_schemes_artefact_populate.sql
LEI Data Subset Extraction
Extract diverse subsets from the GLEIF LEI dataset for testing and development:
# Generate both small and large subsets ./scripts/generate_lei_subsets.sh # Generate specific size only ./scripts/generate_lei_subsets.sh --small ./scripts/generate_lei_subsets.sh --large # Download latest GLEIF data first ./scripts/generate_lei_subsets.sh --download
Or use the Python script directly:
# Small subset (~10K entities) python3 src/lei_extract_subset.py --size small # Large subset (~50K entities) python3 src/lei_extract_subset.py --size large # Download latest data python3 src/lei_extract_subset.py --download --size small
Input files (in external/lei/):
*-gleif-goldencopy-lei2-golden-copy.csv- LEI entity data (~4.3 GB)*-gleif-goldencopy-rr-golden-copy.csv- Relationship records (~221 MB)
Output files (in external/lei/):
*-subset-small.csv- Small subset (~7 MB)*-subset-large.csv- Large subset (~35 MB)
The subsets are sampled across multiple diversity dimensions:
- Geographic (all ~235 countries)
- Entity category (GENERAL, FUND, SOLEPROPRIETOR, etc.)
- Sector (BANK, INSURANCE, TECHNOLOGY, etc.)
- Fund type (ETF, BOND, EQUITY, etc.)
- Relationship depth (0 to 5+ children)
Image Artefact Generation (Flags and Crypto Icons)
Generate SQL populate scripts for image artefacts (flags, crypto icons, etc.):
# Generate flag images python3 src/images_generate_sql.py --config flags # Generate cryptocurrency icons python3 src/images_generate_sql.py --config crypto # Custom configuration python3 src/images_generate_sql.py \ --dataset-name "My Icons" \ --subject-area "Icons" \ --domain "Reference Data" \ --source-dir "./icons" \ --output-file "icons.sql"
Predefined configurations:
| Config | Source | Output |
|---|---|---|
flags |
populate/data/flags/ |
dq_flags_images_artefact_populate.sql |
crypto |
external/crypto/cryptocurrency-icons/ |
populate/crypto/crypto_images_artefact_populate.sql |
Cryptocurrency Reference Data
Cryptocurrency reference data files are located in projects/ores.sql/populate/crypto/:
| File | Description |
|---|---|
crypto_dataset_populate.sql |
Dataset definitions (icons, large, small) |
crypto_images_artefact_populate.sql |
Icon images (generated by images_generate_sql.py) |
crypto_currencies_large_artefact_populate.sql |
All ~12K coins |
crypto_currencies_small_artefact_populate.sql |
Top 100 coins by market cap |
populate_crypto.sql |
Master include file |
Source data (in external/crypto/):
cryptocurrencies/cryptocurrencies.json- Symbol to name mapping (~12K coins)cryptocurrency-icons/*.svg- Icon images (483 icons)
See external/crypto/methodology.txt for data sourcing details.
Directory Structure
projects/ores.codegen/ ├── library/ │ ├── data/ # Static data files (licenses, modelines, etc.) │ │ ├── licence-GPL-v3.txt │ │ └── modeline.json │ └── templates/ # Mustache template files │ └── sql_catalog_populate.mustache ├── models/ # JSON model files that provide data for generation │ └── slovaris/ │ ├── catalogs.json │ ├── country_currency.json │ └── datasets.json ├── output/ # Where generated files are placed ├── scripts/ # Shell scripts for common operations ├── src/ # Python source code │ ├── __init__.py │ ├── generator.py # Main code generator │ ├── fpml_parser.py # FPML XML parser │ ├── images_generate_sql.py # Image artefact generator │ ├── lei_extract_subset.py # LEI subset extractor │ └── iso_generate_metadata_sql.py # ISO standards generator ├── modeling/ │ └── ores.codegen.org # This documentation ├── venv/ # Python virtual environment ├── requirements.txt # Python dependencies └── run_generator.sh # Execution script
Architecture
Core Components
| Component | Description |
|---|---|
src/generator.py |
Main code generator (JSON models + Mustache templates → SQL) |
src/fpml_parser.py |
FPML Genericode XML parser (XML → JSON models) |
src/iso_generate_metadata_sql.py |
ISO standards SQL generator |
src/images_generate_sql.py |
Image artefact SQL generator (flags, crypto icons) |
src/lei_extract_subset.py |
LEI dataset subset extractor |
library/data/ |
Static data files (licenses, modelines, etc.) |
library/templates/ |
Mustache templates |
models/ |
JSON model files |
output/ |
Default directory for generated files |
Main Generator Functions
The main src/generator.py file contains several key functions:
load_data(data_dir)- Loads JSON and text files from the data directoryformat_comment_block(text, lang)- Formats text as language-specific comment blocksgenerate_license_with_header(license_text, modeline_info, lang)- Creates license headers with modelines and copyrightrender_template(template_path, data)- Renders Mustache templates with provided dataget_template_mappings()- Defines mapping between model filenames and templatesgenerate_from_model()- Main function that orchestrates the generation process
Template System
- Uses Mustache templating engine via
pystachelibrary - Templates are stored in
library/templates/directory - Templates generate SQL code with proper license headers and modelines
Data Files
library/data/licence-GPL-v3.txt- Contains the GPL v3 license textlibrary/data/modeline.json- Contains editor modeline configurations for different languages
Model-Template Mapping
The generator maps model files to templates based on their filenames.
Standard Mappings
| Model File | Template(s) |
|---|---|
model.json |
sql_batch_execute.mustache |
catalogs.json |
sql_catalog_populate.mustache |
country_currency.json |
sql_flag_populate.mustache, sql_currency_populate.mustache, sql_country_populate.mustache |
datasets.json |
sql_dataset_populate.mustache, sql_dataset_dependency_populate.mustache |
methodologies.json |
sql_methodology_populate.mustache |
tags.json |
sql_tag_populate.mustache |
Entity Schema Mappings (*_entity.json files)
| Template | Output File |
|---|---|
sql_schema_table_create.mustache |
{component}_{entity}_create.sql |
sql_schema_notify_trigger.mustache |
{component}_{entity}_notify_trigger.sql |
sql_schema_artefact_create.mustache |
dq_{entity}_artefact_create.sql |
Entity Populate Mappings (*_data.json files)
| Template | Output File |
|---|---|
sql_populate_refdata.mustache |
{component}_{entity}_populate.sql |
Modeline Configurations
From library/data/modeline.json:
| Language | Modeline |
|---|---|
| SQL | sql-product: postgres; tab-width: 4; indent-tabs-mode: nil |
| C++ | mode: c++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 |
Features
License Generation
Automatically generates license headers with:
- Editor modelines
- Copyright information with current year
- Proper comment formatting for different languages
Multi-language Comment Support
Supports different comment formats:
- SQL:
/* ... */with = * = prefix for lines - C++:
/* ... */with = * = prefix for lines - Python:
""" ... """with =# = prefix for lines - JavaScript:
/** ... */with = * = prefix for lines
Flexible Output
- Default output directory:
output/ - Custom output directory support via command line parameter
- Automatic creation of output directory if it doesn't exist
Overall Models
Support for model.json files that orchestrate the generation of multiple artefacts.
Dynamic Prefixing
Use the model_name property in an overall model to prefix all output files (e.g., solvaris_).
Automatic Sibling Loading
Sibling JSON models in the same directory are automatically loaded and available for cross-referencing in templates.
Enhanced Data Context
Identifies specific datasets by subject area (e.g., currencies_dataset,
countries_dataset) for easy template access.
Example Model Structure
From models/slovaris/catalogs.json:
[
{
"name": "Slovaris",
"description": "Imaginary world to test all system functions.",
"owner": "Testing Team"
}
]
The sql_catalog_populate.mustache template generates SQL code that:
- Includes the enhanced license header
- Sets the schema to 'ores'
- Generates SQL calls to
metadata.upsert_dq_catalogs()function - Includes summary queries
Extending
To add a new model-template pair:
- Add your JSON model to the
models/directory - Add your Mustache template to
library/templates/ - Update the mapping in
src/generator.pyin theget_template_mappings()function