Story: Document deduplication recipe
Table of Contents
This page documents a story in Sprint 18. It captures the goal, current status, acceptance criteria, and the tasks that compose it.
Goal
As the doc graph grows, duplicate documents appear — two files that cover the same ground under slightly different names. Leaving both alive creates confusion about which is authoritative and breaks the principle that every id-link resolves to exactly one canonical source.
This story delivers a recipe that generalises the deduplication procedure: given two documents, identify the candidate for survival (richer, more complete, better linked) and the candidate for deletion; merge any unique content from the deletion candidate into the survivor; delete the deletion candidate; update all id-links that pointed to the deleted UUID.
The worked example that drives the recipe is a pair of near-identical
sprint health review runbooks discovered in doc/llm/runbooks/.
Status
| Field | Value |
|---|---|
| State | DONE |
| Parent sprint | Sprint 18 |
| Now | Nothing. |
| Waiting on | — |
| Next | — |
| Last touched | 2026-05-29 |
Acceptance
[ ]doc/recipes/documentation/how_do_i_deduplicate_two_documents.orgexists,#+version: 2, scaffolded viagenerate_v2_doc.sh.[ ]Recipe covers all four steps: nominate survivor/deletion, merge unique content, delete the deletion candidate, update all id-links.[ ]Recipe is wired into the Documentation recipes index.[ ]The duplicaterun_a_sprint_health_reviewrunbook is deleted; its unique chart-regeneration content is merged intorun_sprint_health_review.[ ]doc/llm/runbooks/runbooks.orgcatalogue row updated to point to the survivor UUID (30FE3C0F).
Tasks
| Task | State | Start | End | Description |
Decisions
Out of scope
- Automated duplicate detection — finding duplicates is manual; this recipe only governs what to do once two have been identified.
- Deduplication of non-org files (source code, scripts).