feat: add get_task_documents to retrieve a task's documents#1252
feat: add get_task_documents to retrieve a task's documents#1252tcferreira wants to merge 1 commit into
Conversation
Meilisearch v1.13 added `GET /tasks/{uid}/documents` to fetch the documents
associated with a task. Add `Client.get_task_documents` (and the underlying
`TaskHandler` method), backed by a streaming GET (`HttpRequests.get_stream`) and
a parser that normalizes the JSON array / NDJSON / concatenated-JSON payload the
endpoint can return.
Adds unit tests for the parser, a request-shape test for the method, and a
`get_task_documents_1` code sample.
Closes meilisearch#1221
📝 WalkthroughWalkthroughAdds experimental support for the Changesget_task_documents feature
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Client
participant TaskHandler
participant HttpRequests
participant parse_task_documents
Caller->>Client: get_task_documents(uid)
Client->>TaskHandler: get_task_documents(uid)
TaskHandler->>HttpRequests: get_stream("tasks/{uid}/documents")
HttpRequests->>HttpRequests: requests.get(url, stream=True)
HttpRequests-->>TaskHandler: Response object
TaskHandler->>parse_task_documents: response.text
parse_task_documents-->>TaskHandler: List[Dict[str, Any]]
TaskHandler-->>Client: List[Dict[str, Any]]
Client-->>Caller: List[Dict[str, Any]]
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@meilisearch/_httprequests.py`:
- Around line 242-248: Add an exception handler for
`requests.exceptions.InvalidSchema` to the `get_stream` method's exception
handling block (after the ConnectionError handler) to match the pattern used in
`send_request` and `post_stream`. The handler should catch
`requests.exceptions.InvalidSchema` and raise `MeilisearchCommunicationError`
wrapping the error message, maintaining consistency across all HTTP request
methods in the SDK.
In `@meilisearch/_utils.py`:
- Around line 65-67: The splitlines() loop that processes the payload and splits
on _CONCATENATED_JSON regex pattern incorrectly handles valid JSON objects
containing "}{" within string values. Replace the naive regex-based splitting
logic (the for loops that iterate through payload.splitlines() and
_CONCATENATED_JSON.split(line)) with a JSON-aware parser that properly
understands JSON structure and correctly identifies object boundaries by
tracking quote context and brace nesting, rather than using simple string
pattern matching.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8e377971-df13-4605-b8a2-a5528e9b29db
📒 Files selected for processing (7)
.code-samples.meilisearch.yamlmeilisearch/_httprequests.pymeilisearch/_utils.pymeilisearch/client.pymeilisearch/task.pytests/client/test_client_task_meilisearch.pytests/test_utils.py
| except requests.exceptions.Timeout as err: | ||
| raise MeilisearchTimeoutError(str(err)) from err | ||
| except requests.exceptions.ConnectionError as err: | ||
| raise MeilisearchCommunicationError(str(err)) from err | ||
| except requests.exceptions.HTTPError as err: | ||
| raise MeilisearchApiError(str(err), response) from err | ||
|
|
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -name "_httprequests.py" -o -name "*http*requests*" 2>/dev/null | head -20Repository: meilisearch/meilisearch-python
Length of output: 168
🏁 Script executed:
cd /tmp && git ls-files | grep -i httpRepository: meilisearch/meilisearch-python
Length of output: 237
🏁 Script executed:
cd /tmp && git ls-files | head -30Repository: meilisearch/meilisearch-python
Length of output: 237
🏁 Script executed:
find . -type f -name "_httprequests.py" 2>/dev/nullRepository: meilisearch/meilisearch-python
Length of output: 199
🏁 Script executed:
find . -type f -name "*.py" | grep -i http | head -20Repository: meilisearch/meilisearch-python
Length of output: 236
🏁 Script executed:
ls -laRepository: meilisearch/meilisearch-python
Length of output: 1413
🏁 Script executed:
git ls-files | grep -E "\.(py|js|ts)$" | head -30Repository: meilisearch/meilisearch-python
Length of output: 1078
🏁 Script executed:
wc -l ./meilisearch/_httprequests.pyRepository: meilisearch/meilisearch-python
Length of output: 203
🏁 Script executed:
ast-grep outline ./meilisearch/_httprequests.pyRepository: meilisearch/meilisearch-python
Length of output: 425
🏁 Script executed:
sed -n '1,270p' ./meilisearch/_httprequests.py | cat -nRepository: meilisearch/meilisearch-python
Length of output: 11223
Add InvalidSchema exception handler to get_stream for consistency.
The get_stream method at lines 242-247 lacks an InvalidSchema handler that exists in both send_request and post_stream. Malformed base URLs currently raise raw requests.exceptions.InvalidSchema instead of wrapping it in MeilisearchCommunicationError, breaking SDK-level exception behavior consistency.
Suggested fix
except requests.exceptions.Timeout as err:
raise MeilisearchTimeoutError(str(err)) from err
except requests.exceptions.ConnectionError as err:
raise MeilisearchCommunicationError(str(err)) from err
except requests.exceptions.HTTPError as err:
raise MeilisearchApiError(str(err), response) from err
+ except requests.exceptions.InvalidSchema as err:
+ if "://" not in self.config.url:
+ raise MeilisearchCommunicationError(
+ f"""
+ Invalid URL {self.config.url}, no scheme/protocol supplied.
+ Did you mean https://{self.config.url}?
+ """
+ ) from err
+
+ raise MeilisearchCommunicationError(str(err)) from err🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@meilisearch/_httprequests.py` around lines 242 - 248, Add an exception
handler for `requests.exceptions.InvalidSchema` to the `get_stream` method's
exception handling block (after the ConnectionError handler) to match the
pattern used in `send_request` and `post_stream`. The handler should catch
`requests.exceptions.InvalidSchema` and raise `MeilisearchCommunicationError`
wrapping the error message, maintaining consistency across all HTTP request
methods in the SDK.
There was a problem hiding this comment.
@Strift Great, glad we're aligned. Once you add the InvalidSchema handler to get_stream, exception handling will be consistent across send_request, post_stream, and get_stream. Let me know if you'd like a hand with the diff.
ദ്ദി(˵ •̀ ᴗ - ˵ ) ✧
| for line in payload.splitlines(): | ||
| for chunk in _CONCATENATED_JSON.split(line): | ||
| stripped = chunk.strip() |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Use a JSON-aware concatenation parser instead of regex boundary splitting.
Lines 65-67 can mis-split valid payloads when a document string contains "}{" (e.g., {"text":"a}{b"}{"id":2}), causing false decode failures.
Suggested fix
except json.JSONDecodeError:
- documents: List[Dict[str, Any]] = []
- for line in payload.splitlines():
- for chunk in _CONCATENATED_JSON.split(line):
- stripped = chunk.strip()
- if stripped:
- documents.append(json.loads(stripped))
+ decoder = json.JSONDecoder()
+ documents: List[Dict[str, Any]] = []
+ idx = 0
+ while idx < len(payload):
+ while idx < len(payload) and payload[idx].isspace():
+ idx += 1
+ if idx >= len(payload):
+ break
+ document, idx = decoder.raw_decode(payload, idx)
+ documents.append(document)
return documents🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@meilisearch/_utils.py` around lines 65 - 67, The splitlines() loop that
processes the payload and splits on _CONCATENATED_JSON regex pattern incorrectly
handles valid JSON objects containing "}{" within string values. Replace the
naive regex-based splitting logic (the for loops that iterate through
payload.splitlines() and _CONCATENATED_JSON.split(line)) with a JSON-aware
parser that properly understands JSON structure and correctly identifies object
boundaries by tracking quote context and brace nesting, rather than using simple
string pattern matching.
Strift
left a comment
There was a problem hiding this comment.
Hello @tcferreira and thanks for this PR 🙌
Can you take a look at CodeRabbit feedback and address it or resolve it if you consider the feedback invalid?
Also, you will need to fix conflicts
Summary
Meilisearch v1.13 introduced
GET /tasks/{uid}/documentsto retrieve the documentsassociated with a task. This adds
Client.get_task_documents(uid)(and the underlyingTaskHandlermethod). Closes #1221.Changes
HttpRequests.get_stream— a streaming GET (mirrors the existingpost_stream),used to read the raw payload.
_utils.parse_task_documents— normalizes the payload into a list of documents.The endpoint can return a JSON array, a single JSON object, NDJSON, or several JSON
objects concatenated without a separator, so the parser handles all of those.
TaskHandler.get_task_documents/Client.get_task_documents— call the endpoint andreturn the parsed documents.
empty) and a request-shape test asserting the method hits
tasks/{uid}/documentsandparses the response.
get_task_documents_1code sample.Notes
getTaskDocumentsRoute), noted in thedocstrings.
meilisearch-jsSDK for the sameendpoint, for cross-SDK consistency.
Summary by CodeRabbit
New Features
Tests