Story: Compute grid observability
Table of Contents
This page documents a story in Sprint 15. It captures the goal, current status, acceptance criteria, and the tasks that compose it.
Goal
Make the compute grid observable: TimescaleDB telemetry pipeline feeding a unified dashboard with RAG service health.
Status
| Field | Value |
|---|---|
| State | DONE |
| Parent sprint | Sprint 15 |
| Now | Completed 2026-03-21. |
| Waiting on | None. |
| Next | None. |
| Last touched | 2026-03-21 |
Continued in: Service lifecycle controller (sprint 16). Sprint 15 added the RAG service dashboard; sprint 16 adds the controller component that manages service definitions + instances + lifecycle events.
Acceptance
- Two TimescaleDB hypertables (grid samples + node samples), 30-day retention.
- Server-side poller + node-side reporter via NATS.
- get_grid_stats unified NATS endpoint for the dashboard.
- Service heartbeat publisher + RAG dashboard live across all domain services.
Tasks
| Task | State | Start | End | Description |
|---|---|---|---|---|
| Add compute grid telemetry pipeline | DONE | 2026-05-20 | 2026-03-21 | Two new TimescaleDB hypertables (ores_compute_grid_samples_tbl + ores_compute_node_samples_tbl, 30-day retention); server-side compute_grid_poller (async Boost.ASIO coroutine, 30s); node-side node_stats_reporter publishing node_sample_message to NATS; unified get_grid_stats NATS request/reply for dashboard. |
| Add service dashboard with live RAG status | DONE | 2026-05-20 | 2026-03-21 | TimescaleDB hypertable for service heartbeat samples; reusable header-only heartbeat publisher coroutine over NATS; Qt UI dashboard with RAG (red/amber/green) status indicators; integrated into all domain services. |
Decisions
- Single unified get_grid_stats endpoint
- replaces N ad-hoc queries with one efficient call.
- Header-only heartbeat publisher coroutine
- reusable across every service.
Out of scope
- Alerting + paging — future.
See also
- Compute grid implementation — the system being observed.