Story: Compute grid observability

Goal
Status
Acceptance
Tasks
Decisions
Out of scope
See also

This page documents a story in Sprint 15. It captures the goal, current status, acceptance criteria, and the tasks that compose it.

Goal

Make the compute grid observable: TimescaleDB telemetry pipeline feeding a unified dashboard with RAG service health.

Status

Field	Value
State	DONE
Parent sprint	Sprint 15
Now	Completed 2026-03-21.
Waiting on	None.
Next	None.
Last touched	2026-03-21

Continued in: Service lifecycle controller (sprint 16). Sprint 15 added the RAG service dashboard; sprint 16 adds the controller component that manages service definitions + instances + lifecycle events.

Acceptance

Two TimescaleDB hypertables (grid samples + node samples), 30-day retention.
Server-side poller + node-side reporter via NATS.
get_grid_stats unified NATS endpoint for the dashboard.
Service heartbeat publisher + RAG dashboard live across all domain services.

Tasks

Task	State	Start	End	Description
Add compute grid telemetry pipeline	DONE	2026-05-20	2026-03-21	Two new TimescaleDB hypertables (ores_compute_grid_samples_tbl + ores_compute_node_samples_tbl, 30-day retention); server-side compute_grid_poller (async Boost.ASIO coroutine, 30s); node-side node_stats_reporter publishing node_sample_message to NATS; unified get_grid_stats NATS request/reply for dashboard.
Add service dashboard with live RAG status	DONE	2026-05-20	2026-03-21	TimescaleDB hypertable for service heartbeat samples; reusable header-only heartbeat publisher coroutine over NATS; Qt UI dashboard with RAG (red/amber/green) status indicators; integrated into all domain services.

Decisions

Single unified get_grid_stats endpoint: replaces N ad-hoc queries with one efficient call.
Header-only heartbeat publisher coroutine: reusable across every service.

Out of scope

Alerting + paging — future.