Story: GLEIF data integration

Table of Contents

This page documents a story in Sprint 12. It captures the goal, current status, acceptance criteria, and the tasks that compose it.

Goal

Turn the GLEIF CSV subsets into proper datasets; expand anchor coverage with central banks; add LEI-to-BIC mapping.

Status

Field Value
State DONE
Parent sprint Sprint 12
Now Completed 2026-02-13.
Waiting on None.
Next None.
Last touched 2026-02-13

Continued from: Party schemes and FPML reference data (sprint 10) — that story shipped the GLEIF download + split script; this one turns the resulting CSVs into proper datasets + populate scripts.

Acceptance

  • GLEIF artefact tables + 4 datasets registered via codegen.
  • Python + shell pipeline automates SQL populate generation.
  • Central-bank LEIs in the subset; 3× sampling priority.
  • LEI-to-BIC mapping dataset published.

Tasks

Task State Start End Description
Add GLEIF data to datasets DONE 2026-05-19 2026-02-09 Codegen for lei_entities + lei_relationships artefact tables; Python lei_generate_metadata_sql.py + shell wrapper; GLEIF catalog + methodology + 4 datasets; methodology.txt updated.
Add central-bank-related LEIs DONE 2026-05-19 2026-02-11 ~93 GLEIF-verified LEIs (33 sovereign issuers, 51 central banks, 9 supranationals); CENTRAL_BANK sector keyword detection with multilingual support; 3x financial-priority sampling.
Add LEI-to-BIC dataset DONE 2026-05-19 2026-02-13 Mapping dataset for BIC settlements; CSV in external/, dataset + populate scripts + publisher; download from GLEIF mapping site.

Decisions

Idempotent PL/pgSQL populate
regenerate without truncate.
Multilingual sector keyword detection
CENTRAL_BANK must out-rank generic BANK across languages.

Out of scope

  • Real-time GLEIF sync — daily snapshot is sufficient.

See also

Emacs 29.1 (Org mode 9.6.6)