Story: GLEIF data integration

Goal
Status
Acceptance
Tasks
Decisions
Out of scope
See also

This page documents a story in Sprint 12. It captures the goal, current status, acceptance criteria, and the tasks that compose it.

Goal

Turn the GLEIF CSV subsets into proper datasets; expand anchor coverage with central banks; add LEI-to-BIC mapping.

Status

Field	Value
State	DONE
Parent sprint	Sprint 12
Now	Completed 2026-02-13.
Waiting on	None.
Next	None.
Last touched	2026-02-13

Continued from: Party schemes and FPML reference data (sprint 10) — that story shipped the GLEIF download + split script; this one turns the resulting CSVs into proper datasets + populate scripts.

Acceptance

GLEIF artefact tables + 4 datasets registered via codegen.
Python + shell pipeline automates SQL populate generation.
Central-bank LEIs in the subset; 3× sampling priority.
LEI-to-BIC mapping dataset published.

Tasks

Task	State	Start	End	Description
Add GLEIF data to datasets	DONE	2026-05-19	2026-02-09	Codegen for lei_entities + lei_relationships artefact tables; Python lei_generate_metadata_sql.py + shell wrapper; GLEIF catalog + methodology + 4 datasets; methodology.txt updated.
Add central-bank-related LEIs	DONE	2026-05-19	2026-02-11	~93 GLEIF-verified LEIs (33 sovereign issuers, 51 central banks, 9 supranationals); CENTRAL_BANK sector keyword detection with multilingual support; 3x financial-priority sampling.
Add LEI-to-BIC dataset	DONE	2026-05-19	2026-02-13	Mapping dataset for BIC settlements; CSV in external/, dataset + populate scripts + publisher; download from GLEIF mapping site.

Decisions

Idempotent PL/pgSQL populate: regenerate without truncate.
Multilingual sector keyword detection: CENTRAL_BANK must out-rank generic BANK across languages.

Out of scope

Real-time GLEIF sync — daily snapshot is sufficient.