Story: Document deduplication recipe

Table of Contents

This page documents a story in Sprint 18. It captures the goal, current status, acceptance criteria, and the tasks that compose it.

Goal

As the doc graph grows, duplicate documents appear — two files that cover the same ground under slightly different names. Leaving both alive creates confusion about which is authoritative and breaks the principle that every id-link resolves to exactly one canonical source.

This story delivers a recipe that generalises the deduplication procedure: given two documents, identify the candidate for survival (richer, more complete, better linked) and the candidate for deletion; merge any unique content from the deletion candidate into the survivor; delete the deletion candidate; update all id-links that pointed to the deleted UUID.

The worked example that drives the recipe is a pair of near-identical sprint health review runbooks discovered in doc/llm/runbooks/.

Status

Field Value
State DONE
Parent sprint Sprint 18
Now Nothing.
Waiting on
Next
Last touched 2026-05-29

Acceptance

Tasks

Task State Start End Description

Decisions

Out of scope

  • Automated duplicate detection — finding duplicates is manual; this recipe only governs what to do once two have been identified.
  • Deduplication of non-org files (source code, scripts).

Emacs 29.1 (Org mode 9.6.6)