Story: Accumulate a tagged face dataset for profile pictures
Table of Contents
This page documents a story in Sprint 21. It captures the goal, current status, acceptance criteria, and the tasks that compose it.
Goal
Move the 60 already-downloaded AI-generated face images from build/tmp/faces/ into a properly structured external/facestudio/ dataset directory following the external/flags/ pattern, document the FaceStudio licence (permissive for internal use), and fix the accumulation script so it filters ages outside 20–70, handles HTTP 429 gracefully, supports multiple images per combination, and writes to the canonical external/facestudio/faces/ location.
Status
| Field | Value |
|---|---|
| State | STARTED |
| Parent sprint | Sprint 21 |
| Now | Accumulation runs (June 11–13). |
| Waiting on | Nothing. |
| Next | Close story after June 13 run. |
| Last touched | 2026-06-10 |
Acceptance
- external/facestudio/faces/ holds the downloaded images. manifest.json and methodology.txt follow the external/flags/ pattern. Methodology documents the 60-req/day limit and licence. Script filters ages less than 20 and greater than 70. Script generates N images per combination with indexed filenames. Script handles HTTP 429 by stopping cleanly. Script skips already-downloaded files on restart. Script output path is external/facestudio/faces/.
Tasks
| Task | State | Start | End | Description |
|---|---|---|---|---|
| Scaffold story: Accumulate a tagged face dataset for profile pictures | DONE | 2026-06-10 | 2026-06-10 | Story scaffolding rides this task: documents, sprint wiring, and the scaffold PR. Close it before merging that PR. |
| Create external/facestudio/ dataset structure and move assets | DONE | 2026-06-10 | 2026-06-10 | Create the folder, move 60 images from build/tmp/faces/, add accumulation script, write manifest.json and methodology.txt with licence notes. |
| Fix accumulate_faces.py: age range, 429 handling, multiple images per combination | DONE | 2026-06-10 | 2026-06-10 | Fix three bugs in the face accumulation script: filter ages outside 20–70, handle HTTP 429 by stopping cleanly, support N images per combination with indexed filenames, and update output path to external/facestudio/faces/. |
| Rename existing face files to indexed spec and remove out-of-range images | DONE | 2026-06-10 | 2026-06-10 | Rename the 60 existing files in external/facestudio/faces/ to the indexed naming scheme and delete any files with age below 20 or above 70. |
| Run face accumulation script each remaining sprint day (June 11, 12, 13) | BACKLOG | Run accumulate_faces.py on each of the 3 remaining sprint days to download up to 60 faces per day, building toward the 462-image target. |
See also
- Accumulate a tagged face dataset for profile pictures — source capture in the inbox.
- Users should be able to add picture to profile — UI consumer of this catalog.
Decisions
- Use N=3 images per combination as the default — gives variety without multiplying the pending count 3× (238 → 714 total at 60/day ≈ 12 days).
- Age range 20–70 inclusive: under-20 are not plausible account holders in a trading system; over-70 adds diminishing value.
- FaceStudio licence (checked 2026-06-10): permissive for internal use. No attribution required. Competitive reuse prohibited (Clause 8) — irrelevant here.
Out of scope
- Wiring the catalog into the seeder (separate story).
- Automated regeneration / nightly accumulation job.