Automate new service registration

Table of Contents

This page is a capture in the next bucket of the product backlog — a pre-sprint idea, not yet pulled into a sprint as a story.

Background

When ores.marketdata.service was added to the running system, the following manual steps were required across multiple files, with no single checklist and no tooling to enforce completeness. Several were discovered only at runtime (missing NATS cert, missing DB user, missing IAM role, missing IAM permissions), requiring iterative recreate_database runs.

Manual steps identified (in discovery order)

  1. build/scripts/generate_nats_certs.sh — add service name to SERVICES array so that an mTLS client certificate is generated. Without this the service crashes immediately on NATS connect with Connection Closed.
  2. projects/ores.codegen/models/services/ores_services_service_registry.json — add service entry (name, psql_var, env_key, iam_role, dml_prefixes, select_tables). Then re-run the service-registry code-gen profile to regenerate five files:
  3. projects/ores.sql/teardown_all.sql — add drop role if exists <env>_<service>_service; entry (not covered by code-gen).
  4. projects/ores.sql/populate/iam/iam_permissions_populate.sql — register all domain permissions (<service>::<resource>:<action> and <service>::*). These must exist before any role can reference them.
  5. projects/ores.sql/populate/iam/iam_roles_populate.sql — create the IAM role and assign permissions. Fails at runtime if permissions from step 4 are absent.
  6. .env — add ORES_<SERVICE>_SERVICE_DB_USER and ORES_<SERVICE>_SERVICE_DB_PASSWORD variables (currently done by init-environment.sh, but only if the service is known to that script).

Proposed improvements

Tasks

  • [ ] Drive generate_nats_certs.sh from SERVICE_NAMES in service_vars.sh instead of a hard-coded array
  • [ ] Add service-registry code-gen template for IAM permissions seed SQL
  • [ ] Add service-registry code-gen template for IAM role seed SQL
  • [ ] Add service-registry code-gen template (or include fragment) for teardown_all.sql service role drops
  • [ ] Add cross-check to recreate_database.sh / validate_schemas.sh: every service has NATS cert + IAM role + ≥1 permission
  • [ ] Document the remaining manual steps in doc/how-to/add-a-new-service.md as an interim checklist

Emacs 29.1 (Org mode 9.6.6)