fix(pdf): create output directory in convert_pdf_to_images#1290
Open
Osamaali313 wants to merge 1 commit into
Open
fix(pdf): create output directory in convert_pdf_to_images#1290Osamaali313 wants to merge 1 commit into
Osamaali313 wants to merge 1 commit into
Conversation
convert() saved page_N.png into output_dir without ensuring the directory existed, so passing a non-existent output directory crashed with FileNotFoundError (anthropics#1025). Create the output directory with os.makedirs(..., exist_ok=True) before writing the pages. Add a unittest that converts with a mocked image into a missing directory and asserts the directory and page files are created.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a safeguard to ensure convert_pdf_to_images.convert() always has a valid output directory, and introduces a unit test to verify directory creation and file output without requiring pdf2image/Poppler to be installed.
Changes:
- Ensure
output_diris created before saving generated images. - Add a
unittestvalidating that missing nested output directories are created and PNGs are written. - Stub
pdf2imagein tests to avoid external runtime dependencies.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| skills/pdf/scripts/convert_pdf_to_images.py | Creates the output directory up-front to prevent failures when saving images. |
| skills/pdf/scripts/test_convert_pdf_to_images.py | Adds an isolated unittest that stubs pdf2image and checks directory + output files creation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+12
to
+17
| if "pdf2image" not in sys.modules: | ||
| _stub = types.ModuleType("pdf2image") | ||
| _stub.convert_from_path = lambda *args, **kwargs: [] | ||
| sys.modules["pdf2image"] = _stub | ||
|
|
||
| sys.path.insert(0, str(Path(__file__).resolve().parent)) |
Comment on lines
9
to
12
| def convert(pdf_path, output_dir, max_dim=1000): | ||
| os.makedirs(output_dir, exist_ok=True) | ||
|
|
||
| images = convert_from_path(pdf_path, dpi=200) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1025.
skills/pdf/scripts/convert_pdf_to_images.pywrites each rendered page toos.path.join(output_dir, f"page_{i+1}.png")but never createsoutput_dir.If the directory does not already exist,
image.save(...)raisesFileNotFoundErrorand no pages are written.Fix
Create the output directory at the start of
convert():exist_ok=Truekeeps the existing-directory case a no-op, so behaviour isunchanged when the directory is already present.
Test
Adds
skills/pdf/scripts/test_convert_pdf_to_images.py, a dependency-freeunittest(stubspdf2image, so neither pdf2image nor poppler is required):it converts into a not-yet-existing nested directory using a fake image whose
save()writes a real file, and asserts the directory andpage_1.png/page_2.pngare created. The test fails before the fix (the originalFileNotFoundError) and passes after.Run: