Story: Accumulate a tagged face dataset for profile pictures

Table of Contents

This page documents a story in Sprint 21. It captures the goal, current status, acceptance criteria, and the tasks that compose it.

Goal

Move the 60 already-downloaded AI-generated face images from build/tmp/faces/ into a properly structured external/facestudio/ dataset directory following the external/flags/ pattern, document the FaceStudio licence (permissive for internal use), and fix the accumulation script so it filters ages outside 20–70, handles HTTP 429 gracefully, supports multiple images per combination, and writes to the canonical external/facestudio/faces/ location.

Status

Field Value
State STARTED
Parent sprint Sprint 21
Now Accumulation runs (June 11–13).
Waiting on Nothing.
Next Close story after June 13 run.
Last touched 2026-06-10

Acceptance

  • external/facestudio/faces/ holds the downloaded images. manifest.json and methodology.txt follow the external/flags/ pattern. Methodology documents the 60-req/day limit and licence. Script filters ages less than 20 and greater than 70. Script generates N images per combination with indexed filenames. Script handles HTTP 429 by stopping cleanly. Script skips already-downloaded files on restart. Script output path is external/facestudio/faces/.

Tasks

Task State Start End Description
Scaffold story: Accumulate a tagged face dataset for profile pictures DONE 2026-06-10 2026-06-10 Story scaffolding rides this task: documents, sprint wiring, and the scaffold PR. Close it before merging that PR.
Create external/facestudio/ dataset structure and move assets DONE 2026-06-10 2026-06-10 Create the folder, move 60 images from build/tmp/faces/, add accumulation script, write manifest.json and methodology.txt with licence notes.
Fix accumulate_faces.py: age range, 429 handling, multiple images per combination DONE 2026-06-10 2026-06-10 Fix three bugs in the face accumulation script: filter ages outside 20–70, handle HTTP 429 by stopping cleanly, support N images per combination with indexed filenames, and update output path to external/facestudio/faces/.
Rename existing face files to indexed spec and remove out-of-range images DONE 2026-06-10 2026-06-10 Rename the 60 existing files in external/facestudio/faces/ to the indexed naming scheme and delete any files with age below 20 or above 70.
Run face accumulation script each remaining sprint day (June 11, 12, 13) BACKLOG     Run accumulate_faces.py on each of the 3 remaining sprint days to download up to 60 faces per day, building toward the 462-image target.

See also

Decisions

  • Use N=3 images per combination as the default — gives variety without multiplying the pending count 3× (238 → 714 total at 60/day ≈ 12 days).
  • Age range 20–70 inclusive: under-20 are not plausible account holders in a trading system; over-70 adds diminishing value.
  • FaceStudio licence (checked 2026-06-10): permissive for internal use. No attribution required. Competitive reuse prohibited (Clause 8) — irrelevant here.

Out of scope

  • Wiring the catalog into the seeder (separate story).
  • Automated regeneration / nightly accumulation job.

Emacs 29.1 (Org mode 9.6.6)