Add DeepSeek model integration + Fix Linux/Wayland screenshots#270
Open
papadie23 wants to merge 2 commits into
Open
Add DeepSeek model integration + Fix Linux/Wayland screenshots#270papadie23 wants to merge 2 commits into
papadie23 wants to merge 2 commits into
Conversation
- Add model mode using text-only OCR approach (DeepSeek API doesn't support vision, so screen text is extracted via Tesseract/EasyOCR and sent as structured text) - Add config with OpenAI-compatible client - Add for text-only model guidance - Show DeepSeek reasoning tokens in terminal for transparency Fixes: - Replace broken X11 screenshot with flameshot (works on Wayland) with fallbacks to gnome-screenshot, mss, then ImageGrab - Add fuzzy text matching in OCR (diffs can now match 'Gooale' ~ 'Google') - Return None instead of raising on text-not-found to avoid crashes - Cache EasyOCR reader globally to avoid re-downloading models each loop - Strip premature 'done' operations (model must verify before claiming success) - Smarter delays: 4s after enter/navigation, 2s base - Update requirements.txt pins to >= for Python 3.13 compatibility - Fix numpy 1.26.1 -> 1.26.2 (yanked)
Author
|
Tested on:
|
Author
Testing performed
Known limitations / Future improvements
I plan to keep iterating on these in follow-up PRs. |
Author
|
Note: DeepSeek currently offers file/image upload in their web chat interface (chat.deepseek.com). While their API does not yet expose multimodal/vision endpoints, this suggests vision support may be added to the API in the future. When that happens, the |
Author
|
The README could be updated to add DeepSeek under supported models, e.g.: Similar to how Claude, Qwen, and LLaVA are listed. Happy to include that in this PR if maintainers want. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds DeepSeek as a new model provider and fixes several cross-platform issues.
DeepSeek integration (
operate -m deepseek-with-ocr)https://api.deepseek.com)deepseek-v4-proby default (configurable viaDEEPSEEK_MODEL_NAME)Screenshot fix for Linux/Wayland
ImageGrabis broken on Wayland — replaced with multiple fallbacks:flameshot(primary, works on both X11 and Wayland)gnome-screenshot,mss,ImageGrabas fallbacksOCR improvements
get_text_element()now returns None instead of crashing on missBehavior fixes
Dependencies
requirements.txtpins from==to>=for Python 3.13 compatibilitynumpy==1.26.1(yanked) →>=1.26.2Files changed (7 files, no new files)
operate/config.pyoperate/models/apis.pyoperate/models/prompts.pyoperate/operate.pyoperate/utils/ocr.pyoperate/utils/screenshot.pyrequirements.txt