add metadata and coco to asset store generation#1707
Merged
Conversation
BryonLewis
added a commit
that referenced
this pull request
Jun 22, 2026
* Round-trip datasetInfo through KWCOCO export and import Carry the dataset's datasetInfo metadata through the KWCOCO/coco_json path on both the Girder server and the Electron desktop client, mirroring the existing VIAME CSV passthrough. datasetInfo travels under a single `datasetInfo` key in the COCO `info` block, advertised in `info.dive_extensions`, and is omitted entirely when empty so exports stay byte-unchanged for datasets without it. Export: export_dive_as_coco / coco.ts serializeFile write info.datasetInfo when non-empty; the server export endpoint reads it from folder metadata and the desktop export already loads it onto JsonMeta. Import: load_coco_as_tracks_and_attributes returns info.datasetInfo (4th tuple element, like the VIAME loader's fps) and process_items merges it per-key into the folder's datasetInfo (imported values win, existing-only keys preserved, other metadata untouched); coco.ts parseFile surfaces it so the desktop import plumbing merges it onto the dataset JsonMeta. Values are treated as opaque strings. Tests: extend the server kwcoco tests and desktop coco.spec.ts for export-writes, empty-omits (byte-unchanged), and import-restores on both platforms. * Extract and test datasetInfo import merge; type the export param * Unify single-dataset and multicam COCO export paths Route the single-dataset coco_json export through _coco_json_export_text instead of duplicating the track-filtering, image_filenames, and export_dive_as_coco logic inline. This fixes the multicam export dropping datasetInfo: the merge now lives in one place feeding both paths. * Round-trip datasetInfo through VIAME CSV; snake_case the serialized keys Serialized keys: rename the per-dataset station metadata key to snake_case to match each export format's conventions -- `dive_dataset_info` in the COCO `info` block (alongside `dive_notes`, `dive_detection_attributes`) and `dataset_info` on the VIAME CSV `# metadata` line. Internal DIVE meta/model/ client keys stay camelCase (`datasetInfo`); the serializers translate at the boundary. VIAME CSV import: add a symmetric `# metadata` parser (mirror of writeHeader) that restores datasetInfo on both the server and desktop paths, so VIAME CSV round-trips it like COCO. This also fixes fps import -- read `fps:` case- insensitively; the importer only matched a capitalized `Fps:`, though DIVE and native VIAME both write lowercase `fps:`. load_csv_as_tracks_and_attributes now returns datasetInfo as a 5th tuple element. Import merge semantics: datasetInfo now follows the import "Overwrite" checkbox on both COCO and VIAME, mirroring annotations -- Overwrite (default) replaces the block, additive merges per-key (imported values win). A file that carries no datasetInfo never touches existing metadata, in either mode. Tests and docs updated to match; flip the prior "ignored on parse" VIAME test to assert restore and add an fps case-insensitivity regression. * Fix desktop Overwrite datasetInfo replace and a broken fps test assertion Desktop import persists meta via lodash `merge`, which deep-merges datasetInfo and keeps stale keys, so the "Overwrite" path never actually replaced the block the way the server (and the docs) describe. Assign the imported block wholesale in dataFileImport when not additive; the additive path keeps its per-key pre-merge. Correct the now-inaccurate comment that credited saveMetadata. Drop a stray `assert warnings == []` that the new fps case-insensitivity test carried over from a sibling test -- `warnings` is unbound there, so the test raised NameError instead of running. * Simplify datasetInfo import: centralize merge, drop redundant guards - common.ts: assign datasetInfo explicitly in dataFileImport for both the Overwrite and additive cases, removing the second merge block (and its redundant metadata re-read) from _ingestFilePath. Mirrors the server. - coco.ts: drop the redundant truthiness clause; the typeof+isEmpty guard already covers it. - crud_rpc.py: return the meta dict directly instead of collapsing an empty dict back to None (the sole caller treats both as falsy). * Centralize datasetInfo import resolution; share serializer type aliases - Replace merge_imported_dataset_info with resolve_imported_dataset_info, which owns the full overwrite/additive/absent decision as a pure function so the import call site drops to two lines. - Add Attributes, Warnings, and DatasetInfo aliases to dive_utils.types and point the KWCOCO and VIAME serializers, the CocoMetadata/JsonMeta models, and the resolver at them. - Expand the resolver tests to cover the overwrite-replaces and absent-block branches the helper now owns. * Harden VIAME dataset_info parsing and tidy serializer comments Reject JSON arrays/null (not just non-objects) when reading the dataset_info comment field, report the actual kind in the warning, and cover the malformed/number/array/null cases with tests. Trim the now over-explained comments and docstrings in the VIAME/COCO serializers. * Cover dataset info config imports * add metadata and coco to asset store generation (#1707) --------- Co-authored-by: Bryon Lewis <61746913+BryonLewis@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Update to the generateSampleData.py script that will include coco json and metadata information when creating assetstore importing data. The
This script is used to create a folder structure with random video and image-sequence datasets. After running this folder you can
uv run --script minIOConfig.pyto create a minIO bucket with some auth credentials for importing into Girder.This can be used to simulate Google Cloud Buckets or AWS S3 buckets and importing data using this format. You can then check for proper conversion and importing of the relevant data.
So now the system can create viame-csv, DIVE JSON, or COCO JSON annotations files. I've defaulting to utilizing viame-csv and coco json files with embedded random dataset metadata.
I've ran this and confirmed that ingestion of the data also includes the metadata properly.