NATS Multi-Tenancy and Multi-Party Design
Table of Contents
Overview
The goal is to make every NATS domain service (refdata, dq, assets, etc.) honour tenant and party isolation per request, using RS256 JWT tokens issued by the IAM service.
The system has two layers of isolation:
- Tenant: separates organisations from each other (enforced by PostgreSQL RLS
on
app.current_tenant_id). - Party: separates business units within a tenant (enforced by RLS on
app.visible_party_ids).
Currently domain services ignore the Bearer token and run everything under the system tenant. The fix requires three things working together:
- Two-phase login — phase 1 authenticates the user and returns available
parties; phase 2 accepts the party selection, computes
visible_party_idsvia recursive CTE, stores the session in the DB, and issues a full RS256 JWT containingtenant_id,party_id, andsession_id. - JWKS distribution — IAM serves its public key at startup via
ores.iam.v1.auth.jwks; domain services fetch and cache it once at startup for local JWT validation. - Per-request context extraction — domain services validate the JWT, extract
tenant_id/party_id/session_id, loadvisible_party_idsfrom the IAM sessions table (cached bysession_id), and construct the proper DB context before executing the request.
Success Criteria
- A domain service request with a valid JWT runs under the correct tenant and party context.
- A request with an invalid, expired, or missing token is rejected before touching the database.
- Domain service handlers require no knowledge of auth — context extraction is handled by a shared helper.
- Party switching requires re-login (log out and log back in).
Architecture
The design touches four layers:
ores.security
Gains RS256 JWT signing and verification (already planned in
2026-03-05-jwt-migration-rs256.org). jwt_claims adds tenant_id,
party_id, and session_id fields. All services link against this.
ores.iam.service
Three changes:
- Login handler split into two phases:
auth.loginreturns a party list plus an interim token (tenant-scoped, short-lived, valid only forauth.select_party);auth.select_partycomputesvisible_party_ids, persists the session to the DB, and issues the full JWT. - New
auth.jwkshandler returns the RS256 public key on request. auth_session_serviceupdated to includevisible_party_idsper session.
ores.nats
A new request_context helper shared by all domain service registrars. Given a
NATS message, it:
- Extracts the Bearer token from headers.
- Validates the JWT signature using the cached public key.
- Extracts
tenant_id,party_id,session_idfrom claims. - Loads
visible_party_idsfromores_iam_sessions_tbl(in-memory cache keyed bysession_id, TTL matches token expiry). - Returns a fully constructed
database::context.
Domain service registrars
Each registrar replaces the current system-tenant context with a
request_context call. One change per handler.
The visible_party_ids cache in each domain service is process-local. On first
request for a session_id, it queries ores_iam_sessions_tbl directly (same
DB, shared schema). Subsequent requests for the same session hit the cache.
Data Flow
Login phase 1 — auth.login
- Client sends
{username, password}(username inuser@hostnameformat). - IAM resolves tenant from hostname, authenticates credentials.
- Looks up
account_partiesfor the user. - If exactly one party: proceeds directly to phase 2 internally.
- If multiple: returns
{interim_token, parties: [{id, name}, ...]}to client. - Client shows party picker.
Login phase 2 — auth.select_party
- Client sends
{interim_token, party_id}. - IAM validates interim token, checks user is member of requested party.
- Runs recursive CTE to compute
visible_party_ids. - Persists session record to
ores_iam_sessions_tblwithtenant_id,party_id,visible_party_ids. - Mints full RS256 JWT: claims =
sub=account_id,tenant_id,party_id,session_id,roles,exp. - Returns
{token, username, tenant_name, party_name}.
Authenticated domain request
- Client sends NATS message with
Authorization: Bearer <jwt>header. - Domain service registrar calls
request_context(msg, base_ctx, verifier). request_contextvalidates JWT → extractstenant_id,party_id,session_id.- Cache miss on
session_id→ SELECTvisible_party_idsfromores_iam_sessions_tbl. - Constructs
database::contextwith tenant + party + visible set. - Handler executes query — PostgreSQL RLS filters automatically.
- Response returned to client.
JWKS fetch at domain service startup
- Service sends request to
ores.iam.v1.auth.jwks. - IAM returns
{keys: [{kid, kty, alg, n, e}]}. - Service caches public key, uses it for all subsequent JWT validation.
API and Protocol
NATS Subjects
| Subject | Direction | Description |
|---|---|---|
ores.iam.v1.auth.login |
client → IAM | Modified: returns party list + interim token |
ores.iam.v1.auth.select_party |
client → IAM | New: accepts interim token + party_id, returns JWT |
ores.iam.v1.auth.jwks |
service → IAM | New: returns public key in JWKS format |
Message Types
// Modified — phase 1 response struct login_response { bool success; std::string message; std::string interim_token; // short-lived, select_party only std::vector<party_summary> parties; // empty if auto-selected }; struct party_summary { std::string id; std::string name; }; // New — phase 2 struct select_party_request { std::string interim_token; std::string party_id; }; struct select_party_response { bool success; std::string message; std::string token; // full RS256 JWT std::string username; std::string tenant_name; std::string party_name; }; // New — JWKS struct jwks_request {}; struct jwks_response { std::string json; // raw JWKS JSON };
jwt_claims additions (ores.security)
struct jwt_claims { // existing fields ... std::optional<std::string> tenant_id; // UUID string std::optional<std::string> party_id; // UUID string std::optional<std::string> session_id; // UUID string // NOTE: visible_party_ids NOT in JWT — loaded from DB by session_id };
request_context helper (ores.nats)
namespace ores::nats::service { // Extracts tenant/party/visible_party_ids from a NATS message JWT and // returns a database::context scoped to the authenticated user. // Throws on invalid/expired token or unknown session. database::context request_context( const nats::domain::message& msg, const database::context& base_ctx, const security::jwt::jwt_verifier& verifier); }
Key management
| Key | Location | Used by |
|---|---|---|
| Private | ORES_IAM_SERVICE_JWT_PRIVATE_KEY env var |
IAM only |
| Public | Derived at startup, served via auth.jwks |
All services |
Error Handling
- Missing/malformed Bearer token →
unauthenticatederror, no DB access. - Invalid signature →
unauthenticatederror. - Expired token →
token_expirederror; client must re-login. - Unknown
kid→ refetch JWKS once, retry; if still unknown →unauthenticated. - Session not found (e.g. after logout) →
session_invalid; client must re-login. - Interim token sent to domain service → rejected via
audienceclaim mismatch →unauthenticated. - JWKS fetch failure at startup → service retries with backoff; fails to start after N attempts. A service that cannot validate tokens must not accept requests.
- Party not in visible set → handled transparently by PostgreSQL RLS; query returns empty results, no application-level check needed.
Testing Strategy
Unit tests (ores.security)
- RS256 sign/verify round-trip passes.
- Tampered payload fails verification.
- Expired token rejected.
- Missing claims handled gracefully.
- Unknown
kidtriggers JWKS refetch.
Unit tests (ores.nats)
request_contextwith valid JWT → correctdatabase::contextconstructed.request_contextwith expired token → throwstoken_expired.request_contextwith missing Bearer header → throwsunauthenticated.visible_party_idscache hit vs miss behaviour.
Integration tests (ores.iam.service)
- Phase 1 login with single party → auto-selects, no picker.
- Phase 1 login with multiple parties → party list returned.
- Phase 2
select_partywith valid interim token + valid party → full JWT issued. - Phase 2
select_partywith invalid party (user not a member) → rejected. - JWKS endpoint returns valid public key matching private key.
End-to-end (manual, local)
- Login as
admin@system→ auto-selects system party → domain requests succeed. - Login as multi-party user → party picker shown → select party → domain requests scoped correctly.
- Logout → session invalidated → subsequent requests with old token rejected.
Open Questions
None at this time.