Skip to content

add metadata and coco to asset store generation#1707

Merged
BryonLewis merged 1 commit into
coco-dataset-infofrom
assetstore-metadata-import
Jun 22, 2026
Merged

add metadata and coco to asset store generation#1707
BryonLewis merged 1 commit into
coco-dataset-infofrom
assetstore-metadata-import

Conversation

@BryonLewis

Copy link
Copy Markdown
Collaborator

Update to the generateSampleData.py script that will include coco json and metadata information when creating assetstore importing data. The

This script is used to create a folder structure with random video and image-sequence datasets. After running this folder you can uv run --script minIOConfig.py to create a minIO bucket with some auth credentials for importing into Girder.

This can be used to simulate Google Cloud Buckets or AWS S3 buckets and importing data using this format. You can then check for proper conversion and importing of the relevant data.

So now the system can create viame-csv, DIVE JSON, or COCO JSON annotations files. I've defaulting to utilizing viame-csv and coco json files with embedded random dataset metadata.

I've ran this and confirmed that ingestion of the data also includes the metadata properly.

@BryonLewis BryonLewis merged commit e1c49e6 into coco-dataset-info Jun 22, 2026
3 checks passed
@BryonLewis BryonLewis deleted the assetstore-metadata-import branch June 22, 2026 12:24
BryonLewis added a commit that referenced this pull request Jun 22, 2026
* Round-trip datasetInfo through KWCOCO export and import

Carry the dataset's datasetInfo metadata through the KWCOCO/coco_json path on
both the Girder server and the Electron desktop client, mirroring the existing
VIAME CSV passthrough. datasetInfo travels under a single `datasetInfo` key in
the COCO `info` block, advertised in `info.dive_extensions`, and is omitted
entirely when empty so exports stay byte-unchanged for datasets without it.

Export: export_dive_as_coco / coco.ts serializeFile write info.datasetInfo when
non-empty; the server export endpoint reads it from folder metadata and the
desktop export already loads it onto JsonMeta.

Import: load_coco_as_tracks_and_attributes returns info.datasetInfo (4th tuple
element, like the VIAME loader's fps) and process_items merges it per-key into
the folder's datasetInfo (imported values win, existing-only keys preserved,
other metadata untouched); coco.ts parseFile surfaces it so the desktop import
plumbing merges it onto the dataset JsonMeta. Values are treated as opaque
strings.

Tests: extend the server kwcoco tests and desktop coco.spec.ts for
export-writes, empty-omits (byte-unchanged), and import-restores on both
platforms.

* Extract and test datasetInfo import merge; type the export param

* Unify single-dataset and multicam COCO export paths

Route the single-dataset coco_json export through _coco_json_export_text
instead of duplicating the track-filtering, image_filenames, and
export_dive_as_coco logic inline. This fixes the multicam export dropping
datasetInfo: the merge now lives in one place feeding both paths.

* Round-trip datasetInfo through VIAME CSV; snake_case the serialized keys

Serialized keys: rename the per-dataset station metadata key to snake_case
to match each export format's conventions -- `dive_dataset_info` in the COCO
`info` block (alongside `dive_notes`, `dive_detection_attributes`) and
`dataset_info` on the VIAME CSV `# metadata` line. Internal DIVE meta/model/
client keys stay camelCase (`datasetInfo`); the serializers translate at the
boundary.

VIAME CSV import: add a symmetric `# metadata` parser (mirror of writeHeader)
that restores datasetInfo on both the server and desktop paths, so VIAME CSV
round-trips it like COCO. This also fixes fps import -- read `fps:` case-
insensitively; the importer only matched a capitalized `Fps:`, though DIVE and
native VIAME both write lowercase `fps:`. load_csv_as_tracks_and_attributes now
returns datasetInfo as a 5th tuple element.

Import merge semantics: datasetInfo now follows the import "Overwrite" checkbox
on both COCO and VIAME, mirroring annotations -- Overwrite (default) replaces
the block, additive merges per-key (imported values win). A file that carries
no datasetInfo never touches existing metadata, in either mode.

Tests and docs updated to match; flip the prior "ignored on parse" VIAME test
to assert restore and add an fps case-insensitivity regression.

* Fix desktop Overwrite datasetInfo replace and a broken fps test assertion

Desktop import persists meta via lodash `merge`, which deep-merges datasetInfo
and keeps stale keys, so the "Overwrite" path never actually replaced the block
the way the server (and the docs) describe. Assign the imported block wholesale
in dataFileImport when not additive; the additive path keeps its per-key
pre-merge. Correct the now-inaccurate comment that credited saveMetadata.

Drop a stray `assert warnings == []` that the new fps case-insensitivity test
carried over from a sibling test -- `warnings` is unbound there, so the test
raised NameError instead of running.

* Simplify datasetInfo import: centralize merge, drop redundant guards

- common.ts: assign datasetInfo explicitly in dataFileImport for both the
  Overwrite and additive cases, removing the second merge block (and its
  redundant metadata re-read) from _ingestFilePath. Mirrors the server.
- coco.ts: drop the redundant truthiness clause; the typeof+isEmpty guard
  already covers it.
- crud_rpc.py: return the meta dict directly instead of collapsing an empty
  dict back to None (the sole caller treats both as falsy).

* Centralize datasetInfo import resolution; share serializer type aliases

- Replace merge_imported_dataset_info with resolve_imported_dataset_info,
  which owns the full overwrite/additive/absent decision as a pure function
  so the import call site drops to two lines.
- Add Attributes, Warnings, and DatasetInfo aliases to dive_utils.types and
  point the KWCOCO and VIAME serializers, the CocoMetadata/JsonMeta models,
  and the resolver at them.
- Expand the resolver tests to cover the overwrite-replaces and
  absent-block branches the helper now owns.

* Harden VIAME dataset_info parsing and tidy serializer comments

Reject JSON arrays/null (not just non-objects) when reading the
dataset_info comment field, report the actual kind in the warning, and
cover the malformed/number/array/null cases with tests. Trim the now
over-explained comments and docstrings in the VIAME/COCO serializers.

* Cover dataset info config imports

* add metadata and coco to asset store generation (#1707)

---------

Co-authored-by: Bryon Lewis <61746913+BryonLewis@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant