Story: Fix rfl complexity failure (Windows + macOS CI)

Table of Contents

This page documents a story in Sprint 19. It captures the goal, current status, acceptance criteria, and the tasks that compose it.

Goal

Restore Windows and macOS CI by permanently eliminating rfl::AddTagsToVariants from the Qt client's instrument parse path. The Sprint 18 triple-isolation workaround (TU split + two-phase parse + -fbracket-depth=1024) moved the 502-field rfl::Literal into a dedicated TU but did not remove it; MSVC C1202 and Clang's fold-expression nesting limit both still fire. The correct fix is to replace variant-based rfl parsing with a type-dispatched read that uses trade.classification.product_type and trade.classification.trade_type — the same discriminators the server already uses — to read each concrete leaf instrument struct directly, with no variant and no AddTagsToVariants.

The story also corrects naming (get_trade_detailget_trade_instrument) and removes the accumulated hacks from Sprint 18. A follow-on task refactors IInstrumentForm to remove the variant from the Qt UI boundary as well, since the variant serves no purpose there.

Status

Field Value
State DONE
Parent sprint Sprint 19
Now Nothing.
Waiting on PR #1041 merge.
Next Nothing.
Last touched 2026-06-05

Acceptance

  • Windows CI: no C1202 recursive type error anywhere in ores.qt.api.
  • macOS CI: no fold-expression nesting limit error in rfl/internal/StringLiteral.hpp.
  • Linux CI: unaffected (still green).
  • ClientManagerTradeInstrument.cpp contains no rfl::AddTagsToVariants include or use.
  • get_trade_detail_response_base and get_trade_detail_instrument_wrapper are deleted.
  • Round-trip tests pass for FX, rates/swap, equity, bond, credit, commodity, composite, scripted.
  • IInstrumentForm::populate_from_instrument(const trade_instrument&) is replaced by type-specific populate overloads on each concrete form.
  • Logging: every parse branch logs at debug on entry and on success; deserialise errors log at error with the raw JSON fragment and the error message.

Tasks

Task State Start End Description
Scaffold: document architecture and create task stubs DONE 2026-05-29 2026-05-29 Update story decisions, create task stubs; PR for Gemini architecture review.
Rename get_trade_detail → get_trade_instrument and rewrite parse path DONE 2026-05-29 2026-05-30 Rename protocol types, delete workarounds, rewrite client TU with dispatch.
Write round-trip tests for instrument parse dispatch DONE 2026-05-30 2026-05-31 One test per instrument family; serialize server-side, verify client dispatch.
Refactor IInstrumentForm to type-specific populate methods DONE 2026-05-30 2026-05-31 Remove trade_instrument variant from the Qt UI boundary; typed populate per form.
Delete leftover get_trade_detail code and fix remaining AddTagsToVariants use DONE 2026-05-31 2026-05-31 Delete ClientManagerTradeDetail.cpp; remove getTradeDetail API; restore macOS/Windows CI.
Fix missing Windows DLL export macros on service parser classes DONE 2026-06-01 2026-06-05 Add ORES_*_SERVICE_EXPORT to 16 service parser.hpp files; restore Windows CI linker.
Fix ClientManager AddTagsToVariants and archiver DLL export DONE 2026-06-01 2026-06-03 Remove AddTagsToVariants from ClientManager reads; add ORES_COMPUTE_WRAPPER_EXPORT to archiver.
Investigate and fix ClientManager template C1202 for complex response types DONE 2026-06-02 2026-06-03 process_authenticated_request<T> injects rfl in caller's TU; fix by concrete method for export_portfolio.
Fix C1202 in ClientManagerExportPortfolio.cpp (swap_leg 19-field Literal) DONE 2026-06-03 2026-06-05 swap_leg's 19-field rfl::Literal triggers no_duplicate_field_names C1202; rfl::Flatten applied but proven ineffective — rfl still sees 19 fields.
Replace rfl::Flatten with plain nested structs in swap_leg to fix C1202 DONE 2026-06-04 2026-06-05 rfl::Flatten does not reduce Literal size; use plain nested sub-structs so each is reflected independently (≤9 fields). Wire format changes to nested JSON.
Extract shared instrument_identity and instrument_audit sub-structs BACKLOG     Prerequisite for all instrument decomposition tasks. Gates tasks below.
Decompose rates instrument types into nested structs (9 types, 15–20 fields) DONE 2026-06-04 2026-06-05 fra, vanilla_swap, cap_floor, swaption, bgs, callable, knock_out, inflation, rpa. Blocks CI.
Decompose FX instrument types into nested structs (7 types, 18–23 fields) BACKLOG     fx_forward, fx_vanilla_option, fx_barrier_option, fx_digital_option, fx_asian_forward, fx_accumulator, fx_variance_swap.
Decompose equity instrument types into nested structs (9 types, 17–25 fields) DONE   2026-06-05 equity_option, equity_forward, equity_digital_option, equity_asian_option, equity_barrier_option, equity_accumulator, equity_variance_swap, equity_swap, equity_position.
Decompose bond, credit, and commodity instrument types (31–38 fields) BACKLOG     Very large structs needing internal sub-grouping beyond identity/audit split.

Decisions

Root cause

rfl::AddTagsToVariants on trade_instrument (a std::variant of 9 alternatives, two of which are themselves variants — fx_instrument_variant (7 types), equity_instrument_variant (9 types) — and one of which wraps rates_instrument_variant (9 types)) collects the union of all field names across ~30 leaf structs into a single rfl::Literal<f1, …, f502>. rfl::internal::no_duplicate_field_names then performs an O(N²) constexpr string-equality check over all 502 names:

  • MSVC: C1202 — recursive type or function dependency too complex (hard limit, no flag).
  • Clang: fold-expression nesting exceeds -fbracket-depth (set to 1024; the Literal itself saturates it with 502 constructor args).
  • Linux GCC: unaffected; more permissive depth limits.

Discriminator already present

trade.classification.product_type (family: fx, swap, bond, …) and trade.classification.trade_type (leaf: "FxForward", "VanillaSwap", "EquityOption", …) are already read in Phase 1. The server uses the same (product_type, trade_type) pair in populate_instruments_for_trades to bucket instrument IDs. After Phase 1 the client knows the exact leaf type; AddTagsToVariants is not needed.

Protocol rename

get_trade_detail_* renamed to get_trade_instrument_*. Rationale: the existing get_trades_response already carries full trade metadata. This request adds the instrument — not "more trade detail". The old name implied a richer trade view; the new name states the actual purpose. NATS subject: trading.v1.trades.detailtrading.v1.trades.instrument. No backwards-compatibility concern (internal protocol).

Workaround types deleted

get_trade_detail_response_base and get_trade_detail_instrument_wrapper were Sprint 18 parse-implementation artefacts bolted onto the protocol header. They are not protocol concepts and are deleted. Phase 1 uses a file-local struct inside ClientManagerTradeInstrument.cpp.

Variant types retained

trade_instrument, fx_instrument_variant, equity_instrument_variant, and rates_instrument_variant are NOT deleted. They are used throughout ores.qt.trading as C++ polymorphic holders (IInstrumentForm virtual interface, all form subclasses, ImportTradeDialog, OreImporter) and in ores.ore.core mappers and ores.ore.service importer. Those uses are correct C++ std::variant use with no rfl involvement, and the types remain as domain types.

Variant use: import/export vs. display

trade_instrument and the sub-variants are correct for import and export: bulk processing of heterogeneous instruments where the type is genuinely unknown at the processing site. They remain for that use.

For the display path, the type is known from Phase 1 (product_type + trade_type) before the instrument is even read. Packaging the result into a trade_instrument variant just to immediately unpack it is unnecessary indirection. The display path is redesigned to eliminate the variant entirely:

  • getTradeInstrument takes an IInstrumentFormPopulator& and returns std::optional<domain::trade>. It never constructs a trade_instrument value.
  • IInstrumentFormPopulator is a pure-abstract interface with one populate overload per concrete leaf instrument type (~28 overloads; default no-ops).
  • Each form implements its relevant overload(s). Forms never receive a variant.
  • TradeController creates the right form via the factory, then passes it as the populator to getTradeInstrument. The call site never dispatches on the instrument type.

Tracked in tasks 2 and 4.

Logging requirement

All parse branches must log at the correct level:

  • debug on entry to each phase and on successful parse, including the resolved (product_type, trade_type) pair.
  • warn for unknown (product_type, trade_type) combinations that fall through the dispatch (monostate result).
  • error on any deserialise failure, including the raw JSON fragment and the full rfl error message.

The template instantiation leak pattern (root cause of repeated C1202)

This is the underlying design issue that drove the recurring C1202 cascade. Understanding it is essential for preventing future occurrences.

What the pattern is

ClientManager.hpp defines process_authenticated_request<T> as a function template:

template <nats_request RequestType>
auto process_authenticated_request(RequestType request, ...)
    -> std::expected<typename RequestType::response_type, std::string> {
    using ResponseType = typename RequestType::response_type;
    // ...
    auto result = rfl::json::read<ResponseType>(raw);  // <-- instantiated in CALLER'S TU
    // ...
}

Because the body is in the header, every .cpp that calls this template compiles rfl::json::read<ResponseType> into its own translation unit. MSVC tracks a "template dependency graph" per TU: all rfl::StringLiteral<N> field-name types from every rfl::json::read<X> instantiation accumulate in that graph.

When it triggers C1202

C1202 fires when two conditions coincide:

  1. The caller TU already has a saturated dependency graph. A file compiled late in the build (e.g. BookMdiWindow.cpp at step 4651/4936) includes many headers. Each header's templates — not just rfl — add nodes to MSVC's dependency graph. By the time compilation reaches the process_authenticated_request call, the graph is already large.
  2. The ResponseType contains trade_instrument (or another type whose rfl reflection chain reaches swap_leg or another struct with 15+ fields). Adding even 19 new StringLiteral<N> nodes (one per swap_leg field) is enough to push the graph over MSVC's internal limit.

Neither condition alone is sufficient. The pattern is invisible on Linux (GCC has more permissive limits) and only surfaces on MSVC for callers that are large enough.

Proactive audit: how to find at-risk call sites

The rule: a call site is at risk if RequestType::response_type contains trade_instrument (directly or via trade_export_item) and the calling file is large enough to saturate MSVC's dependency graph.

Step 1 — find all response types containing trade_instrument or trade_export_item:

grep -rn "trade_instrument\|trade_export_item" \
    projects --include="*.hpp" | \
    grep "struct\|vector\|instrument " | \
    grep -v "build\|vcpkg\|//\|domain/"

Step 2 — from those types, identify which are response_type aliases of a request type that is passed to process_authenticated_request:

grep -rn "process_authenticated_request\|process_request" \
    projects --include="*.cpp" | grep -v "build\|vcpkg\|test"

Then cross-reference the inferred request type at each call site with the request type's response_type.

Current audit result (2026-06-02):

Call site Request type response_type Contains trade_instrument? Risk
BookMdiWindow.cpp:476 export_portfolio_request export_portfolio_responsetrade_export_itemtrade_instrument Yes HIGH — fix in task 8
All other call sites various (save, get, delete, list) simple domain structs No Safe

Only export_portfolio_request is at risk. get_trade_instrument_request also carries trade_instrument in its response_type, but it is handled by parse_trade_instrument() directly and is never passed through process_authenticated_request.

The fix rule

For any process_authenticated_request<T> call site where T::response_type contains trade_instrument: replace it with a concrete (non-template) method on ClientManager, implemented in a dedicated .cpp file. This isolates the rfl::json::read<ResponseType> instantiation to a fresh TU where MSVC's dependency graph starts clean.

Pattern used in this story: parse_swap_instruments.cpp, ClientManagerExportPortfolio.cpp.

Architectural analysis: two distinct failure modes

Two problems being conflated

Through sprint 17, 18, and 19 these errors have been treated as one escalating problem. Error 7 makes clear they are two separate failure modes with different fixes.

Problem A — struct field count exceeds MSVC's threshold.

rfl::internal::no_duplicate_field_names builds an rfl::Literal<f1,…,fN> for every struct it reflects and performs an O(N²) constexpr string-equality check. With N ≥ ~15 this check exceeds MSVC's internal template dependency graph budget regardless of TU size or context. Error 7 proves it: ClientManagerExportPortfolio.cpp fires at step 1145/4938 with exactly two includes. TU isolation does not help when the struct itself is above the threshold.

Problem B — rfl in header templates leaks to caller TUs.

process_authenticated_request<T> is a function template defined in the header. Every caller TU instantiates rfl::json::read<ResponseType> in its own translation unit. In large, late-compiled files (e.g. BookMdiWindow.cpp at step 4651/4936) the already- saturated template dependency graph cannot absorb the additional rfl::StringLiteral<N> nodes from swap_leg's 19 fields. The "concrete method in dedicated TU" fix addresses this.

Why fixes keep exposing the next layer

The seven-incident cascade is not seven separate bugs. It is one structural problem (struct fields exceed MSVC's threshold) repeatedly re-encountered after each partial fix removes the immediately visible symptom:

Fix applied Problem cleared Problem exposed
Remove AddTagsToVariants 502-field Literal Individual structs ≥15 fields
TU split / fresh TU Caller TU saturation Struct threshold in isolation
Concrete method in dedicated TU Template leak (B) Problem A in isolation

The domain model's structural risk

Every persisted domain struct carries five audit fields: modified_by, performed_by, change_reason_code, change_commentary, recorded_at. These are invisible in domain terms but consume five slots in the rfl Literal. A struct that appears to have 10 domain fields is 15 to rfl. Any domain struct across ores.refdata, ores.assets, ores.trading, ores.iam, ores.dq, etc. that has ≥11 non-audit domain fields is silently above the MSVC threshold. There is no compile-time guard or lint rule to catch this; discovery is always reactive (CI failure).

The proactive audit in the "template instantiation leak" section above was a Problem B audit — it checked which process_authenticated_request response types contain trade_instrument. A Problem A audit would check which domain structs have ≥~15 fields, irrespective of how they are called. That audit has not yet been done for the full domain model.

The systematic fix

Extract the five audit fields into a shared sub-struct and compose it into every domain struct via rfl::Flatten:

struct audit_info {
    std::string modified_by;
    std::string performed_by;
    std::string change_reason_code;
    std::string change_commentary;
    std::chrono::system_clock::time_point recorded_at;
};

struct swap_leg {
    // up to ~13 domain fields
    rfl::Flatten<audit_info> audit;   // 5 fields; checked independently by rfl
};

rfl::Flatten projects sub-struct fields directly into the parent in JSON (no nesting), so the on-wire format is unchanged. no_duplicate_field_names checks each sub-struct independently: O(5²) = 25 comparisons for the audit group instead of O(N²) for the full struct. The safe domain field budget per struct is then ~13 non-audit fields, which comfortably covers all current domain types.

This systematic extraction is not yet scheduled as a story; task 9 applies it locally to swap_leg. A cross-cutting story would audit all persisted domain structs and apply the same pattern project-wide, permanently closing the structural risk.

Linear trace of C1202 errors and fixes

This story produced a sequence of progressively deeper C1202 fixes, each exposing the next layer. The full sequence:

# Error location Fields / cause Fix PR
1 ClientManagerTradeDetail.cpp trade_instrument variant 502-field rfl::Literal via AddTagsToVariants Delete file; rewrite as two-phase dispatch (tasks 2–5) multiple
2 ClientManager.hpp (process_authenticated_request) export_portfolio_response → 502-field AddTagsToVariants Literal on Clang/MSVC Remove AddTagsToVariants from all 4 json::read call sites PR #978
3 ores.compute.wrapper/filesystem/archiver.hpp Missing ORES_COMPUTE_WRAPPER_EXPORT on Windows Clang Add export macro PR #978
4 parse_trade_instrument.cpp Combined swaption_instrument (18) + swap_leg (19) = 29-field Literal: two types in one rfl::json::read Split each leg-based read into two passes (instrument + legs separately) PR #982
5 parse_trade_instrument.cpp Accumulated TU context: flat/FX/equity types fill MSVC's graph before swap types start; swaption_instr_outer (19 fields) pushes it over Move all 9 swap types to parse_swap_instruments.cpp (fresh TU) PR #985
6 BookMdiWindow.cpp process_authenticated_request<export_portfolio_request> is a template → instantiates rfl::json::read<export_portfolio_response> in BookMdiWindow's already-saturated context; swap_leg (19 fields) triggers C1202 Concrete ClientManager::exportPortfolio() in a dedicated TU task 8
7 ClientManagerExportPortfolio.cpp swap_leg has 19 fields → rfl::Literal<f1,…,f19> → 171 constexpr string comparisons in no_duplicate_field_names.hpp exceed MSVC's template graph budget even in a clean TU (step 1145/4938) Decompose swap_leg into three sub-structs (≤9 fields each) via rfl::Flatten task 9

See also

Emacs 29.1 (Org mode 9.6.6)