Skip to content

Add opt-in GCS resumable upload scheme (server-controlled integrity) #5975

Description

@rtibbles

Overview

Studio uploads each file in a single signed PUT — an interruption restarts from byte 0, and very large objects are unsupported. Add an opt-in GCS resumable scheme: when a client passes resumable to upload_url, the server initiates a resumable session and returns a session URI for chunked uploads. Single-PUT stays unchanged for existing clients. This unblocks the ricecooker large-file client; the web frontend migration is deferred to a follow-up.

Complexity: Medium
Target branch: hotfixes

Context

get_presigned_upload_url / _get_gcs_presigned_put_url sign a single PUT (content_md5, content_type); the web frontend POSTs /api/file/upload_url then PUTs the whole file. A GCS resumable upload is a POST initiation (x-goog-resumable: start) returning a session URI that is the credential for chunked PUTs.

Security model to preserve. Files are content-addressed (path = MD5 checksum); the server stays source of truth. A client must not:

  • store content not hashing to the checksum;
  • dedup-match an object whose md5Hash differs from the checksum;
  • bypass quota by under-declaring size.

Spike outcome (resolved).

  • A signed x-goog-hash / md5Hash at the initiation is not GCS-enforced for signed-URL resumable uploads.
  • Adopted: the server initiates the resumable session itself (bytes-free JSON-API POST pinning md5Hash to the expected checksum and recording the declared size as object metadata) and returns only the session URI; GCS rejects non-matching bytes at finalize (400).
  • Tradeoff: resumable sessions are pinned to the initiating region, so distant clients upload cross-region.

Constraint. Proxying uploads through the app server is not an option — it previously caused severe app-server performance problems. Integrity must hold within the direct-upload model.

The Change

  • Backend (upload_url): accept a resumable flag.
    • Absent / false: return the existing signed single-PUT, unchanged.
    • Absent / false with declared size > 500 MB: error — large files must use resumable.
    • True, object already stored (GCS md5Hash equals the checksum): return a definitive skip — no session, no client HEAD.
    • True, otherwise: initiate the resumable session server-side (JSON-API POST), pinning md5Hash to the checksum and recording the declared size as custom metadata (declared-size); return the session URI.
    • True on the S3/dev backend: return single-PUT for now (S3 multipart added in Add S3 multipart upload support for the dev/minio backend #5990).
  • Server-controlled integrity (resumable path):
    • Upload-time (GCS-enforced): the pinned md5Hash makes GCS reject non-matching bytes at finalize.
    • Infrastructure controls (prerequisite, Choose the SQL instance name based on project ID #908): finalize validation (md5Hash vs path checksum, actual vs recorded size) and lifecycle cleanup.

Out of Scope

  • Web frontend chunked/resumable upload — deferred to a follow-up issue.
  • Removing the client HEAD-skip — deferred to the follow-up.
  • Removing single-PUT / client cutover.
  • ricecooker's resumable client.
  • The File.file_size change.
  • Parallel / XML-multipart uploads (composite objects expose only crc32c, not md5Hash).
  • Proxying uploads through the app server.
  • Object-finalize validation and lifecycle cleanup (infrastructure-side, Choose the SQL instance name based on project ID #908).

Acceptance Criteria

  • upload_url accepts a resumable flag; absent or false returns the existing signed single-PUT, unchanged.
  • upload_url errors on a non-resumable request whose declared size exceeds 500 MB.
  • On the resumable path, upload_url returns a definitive skip when GCS's md5Hash equals the checksum, and a session URI otherwise — no client HEAD.
  • The session is initiated server-side with md5Hash pinned to the checksum and declared size recorded as custom metadata (declared-size).
  • Existing single-PUT clients are unaffected.

References

AI usage

I used Claude (Opus 4.8) to verify the GCS resumable and checksum mechanics against the docs, run the integrity spike, and draft this issue. I drove the security analysis and the design decisions: server-initiated sessions over signed-URL initiation, and an opt-in resumable flag so existing single-PUT clients are untouched. I reviewed every claim against the cited GCS documentation.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions