External data issues
This page is a capture in the next bucket of the product backlog — a pre-sprint idea, not yet pulled into a sprint as a story.
Problems observed:
- missing
downloaded_atfor a lot of data. - spurious
manifest.txt, we should only havemanifest.json. - duplication of data in catalog and main manifest. The manifest is the catalog. Remove duplication.
- for github downloads, add git commit.
- not clear who "owner" is. It has to map to an account or group in the system.
- datasets have a catalog, but they shoud be forced to use the catalog of the manifest:
"catalog": "Open Source Risk Engine",
- need a domain for data such as XML Schemas.
- we should ensure the methodology is one of the defined methodologies in the file.
- since datasets refer to data in subdirectories, we should add the directory to the manifest. Not needed in DB.