feat(odin): Build candidate log file list in Extractor#542
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a built-in fetch_logs action to stream rotated log files to CDF Files, along with a structured ActionError exception for reporting failures. The framework now automatically registers this action and handles structured errors during dispatching. The feedback suggests improving robustness by catching OSError instead of only FileNotFoundError when checking log files, and adding validation to reject future date ranges to prevent silent failures.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #542 +/- ##
==========================================
+ Coverage 82.90% 83.08% +0.17%
==========================================
Files 45 45
Lines 4475 4522 +47
==========================================
+ Hits 3710 3757 +47
Misses 765 765
🚀 New features to boost your workflow:
|
nithinb
left a comment
There was a problem hiding this comment.
Thanks for resolving the rest of the issues.
Can you take another look at the conversation related to private variables?
|
@nithinb Agree, my concern was misplaced. I have gone ahead and added the application_config in the action context. |
Summary
Builds the candidate log file list for the fetch_logs built-in action. After validating the date range, the action now resolves the extractor's log file path from config, enumerates the [start_date, end_date] date range, and partitions files into candidates (exist, non-empty) and skipped (missing or empty). This is the second step of the bulk log upload feature - it sets up the file set that Tasks 3–4 will stream to CDF Files.
Type of change
What changed
_log_upload_action.py:_resolve_log_file_path(config: ExtractorConfig) -> Path | None- returns the base log path from the first LogFileHandlerConfig in config.log_handlers, or None if no file handler is configured._build_candidate_files(base_path, start_date, end_date, today)- enumerates the date range. Historical dates resolve to .YYYY-MM-DD, today resolves to the live base_path. Uses try/except FileNotFoundError on stat() (not exists() + stat()) to eliminate the TOCTOU race during log rotation. Files that don't exist or are 0 bytes go to skipped.ActionError(error_type="no_file_handler_configured")if absent), then calls _build_candidate_files to produce candidates and skipped lists for Tasks 3–5.Why it changed
Part of the bulk log upload design (refer section: Related docs). Before uploading anything, the action needs to know which files exist and are non-empty for the requested date range. Separating file discovery from upload also makes each stage independently testable.
Related docs / discussion:
Design Doc - Bulk Log Upload Action
What to focus on during review
Test evidence
tests/test_unstable/test_log_upload_action.py(17 test functions, 26 parametrized cases total):_resolve_log_file_path: console-only → None; single file handler; multiple file handlers → first_build_candidate_files: all files present; missing/empty files → skipped; today uses live file; mixed range; boundary (single day, exact max)fetch_logs_action integration: no file handler → no_file_handler_configured; file handler + files present → succeeded; file handler + all files missing → succeeded (empty upload set)Risks and unknowns
Rollout and rollback
No config changes or API changes required. fetch_logs_action now raises no_file_handler_configured for extractors without a file log handler (previously it would have silently succeeded with no uploads - which is the same observable behaviour since uploads aren't implemented yet).
Rolling back means reverting this commit, no CDF state is written at this stage.
Checklist