Story: Compute grid observability

Table of Contents

This page documents a story in Sprint 15. It captures the goal, current status, acceptance criteria, and the tasks that compose it.

Goal

Make the compute grid observable: TimescaleDB telemetry pipeline feeding a unified dashboard with RAG service health.

Status

Field Value
State DONE
Parent sprint Sprint 15
Now Completed 2026-03-21.
Waiting on None.
Next None.
Last touched 2026-03-21

Continued in: Service lifecycle controller (sprint 16). Sprint 15 added the RAG service dashboard; sprint 16 adds the controller component that manages service definitions + instances + lifecycle events.

Acceptance

  • Two TimescaleDB hypertables (grid samples + node samples), 30-day retention.
  • Server-side poller + node-side reporter via NATS.
  • get_grid_stats unified NATS endpoint for the dashboard.
  • Service heartbeat publisher + RAG dashboard live across all domain services.

Tasks

Task State Start End Description
Add compute grid telemetry pipeline DONE 2026-05-20 2026-03-21 Two new TimescaleDB hypertables (ores_compute_grid_samples_tbl + ores_compute_node_samples_tbl, 30-day retention); server-side compute_grid_poller (async Boost.ASIO coroutine, 30s); node-side node_stats_reporter publishing node_sample_message to NATS; unified get_grid_stats NATS request/reply for dashboard.
Add service dashboard with live RAG status DONE 2026-05-20 2026-03-21 TimescaleDB hypertable for service heartbeat samples; reusable header-only heartbeat publisher coroutine over NATS; Qt UI dashboard with RAG (red/amber/green) status indicators; integrated into all domain services.

Decisions

Single unified get_grid_stats endpoint
replaces N ad-hoc queries with one efficient call.
Header-only heartbeat publisher coroutine
reusable across every service.

Out of scope

  • Alerting + paging — future.

See also

Emacs 29.1 (Org mode 9.6.6)