feat(odin): Implement BoundedReader for safe current-day file streaming#543
feat(odin): Implement BoundedReader for safe current-day file streaming#543vikramlc-cognite wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces BoundedReader, a wrapper for binary file handles designed to limit reads to a snapshot byte count, preventing issues when uploading active log files that are still being appended to. The review feedback highlights that BoundedReader needs to implement tell(), closed, and close() to fully satisfy the file-like interface required by multipart uploads (e.g., when wrapped in ChunkedStream for files larger than 5 GiB) and recommends adding corresponding unit tests.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #543 +/- ##
==========================================
+ Coverage 83.08% 83.48% +0.40%
==========================================
Files 45 46 +1
Lines 4522 4548 +26
==========================================
+ Hits 3757 3797 +40
+ Misses 765 751 -14
🚀 New features to boost your workflow:
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces BoundedReader, a utility class designed to wrap a binary file handle and limit reads to a specified snapshot size, preventing upload errors when log files are actively being appended to. The feedback suggests handling cases where the underlying stream's read method returns None to avoid a potential TypeError when calling len(data).
Yaseen-A-Khan
left a comment
There was a problem hiding this comment.
The changes look good, just a nit comment.
nithinb
left a comment
There was a problem hiding this comment.
-
read(n) multiple calls until exhaustion with non-divisible chunks - e.g., snapshot=7, read(3) × 3. Third call should return 1 byte, not 3. Tests the min(size, self._remaining) clamp on the final chunk. IOByteStream uses CHUNK_SIZE=65536 so this only matters for small files, but it's the exact code path in production.
-
read(-1) after partial consumption - read(3) on snapshot=5, then read(). Should return b"lo" (2 bytes). Not tested.
-
Underlying stream shorter than to_read - _stream.read(to_read) can return fewer bytes than requested if the file is shorter than to_read. _remaining -= len(data) handles this correctly, but no test covers it. Already partially covered by test_midnight_rotation_race_file_shorter_than_snapshot (zero bytes case), but not the partial-short case.
Summary
Adds BoundedReader, a lightweight BinaryIO wrapper that gates reads at a byte count captured at snapshot time, enabling safe streaming of the active log file (file.log) to CDF Files without risking an HTTP Content-Length mismatch as the logger appends new lines.
Type of change
What changed
Why it changed
The active log file (file.log) is written by TimedRotatingFileHandler while the action runs. Passing a plain open(log_path, "rb") to IOFileUploadQueue is unsafe:
BoundedReader eliminates this by declaring the snapshot size via len and hard-stopping reads at that boundary. No buffering, no copies - the only overhead is one integer decrement per read() call.
Related issue
EDGE-183 / EDGE-607
Related docs
BULK_LOG_UPLOAD_ACTION_PLAN - "Current Day File Handling" section
What to focus on during review
BoundedReader only needs to satisfy the upload path's interface: len, read(), and the context manager. Adding inheritance would pull in abstract methods we don't need.
Test evidence
$ python -m pytest tests/test_unstable/test_bounded_reader.py -v
tests/test_unstable/test_bounded_reader.py::test_len_returns_snapshot_size_and_is_stable_after_reads PASSED
tests/test_unstable/test_bounded_reader.py::test_read_respects_snapshot_bound[read_all] PASSED
tests/test_unstable/test_bounded_reader.py::test_read_respects_snapshot_bound[read_partial] PASSED
tests/test_unstable/test_bounded_reader.py::test_read_respects_snapshot_bound[read_exceeds_snapshot] PASSED
tests/test_unstable/test_bounded_reader.py::test_read_returns_empty_after_exhaustion PASSED
tests/test_unstable/test_bounded_reader.py::test_zero_snapshot_returns_empty_immediately PASSED
tests/test_unstable/test_bounded_reader.py::test_context_manager_closes_underlying_file PASSED
tests/test_unstable/test_bounded_reader.py::test_super_len_reads_len_not_fstat PASSED
tests/test_unstable/test_bounded_reader.py::test_midnight_rotation_race_file_shorter_than_snapshot PASSED
9 passed in 0.03s
Risks and unknowns
Rollout and rollback
N/A - internal implementation class, not yet wired into any action. No config changes, no migrations.
Checklist