feat: add get_task_documents to retrieve a task's documents by tcferreira · Pull Request #1252 · meilisearch/meilisearch-python

tcferreira · 2026-06-23T12:01:15Z

Summary

Meilisearch v1.13 introduced GET /tasks/{uid}/documents to retrieve the documents
associated with a task. This adds Client.get_task_documents(uid) (and the underlying
TaskHandler method). Closes #1221.

Changes

HttpRequests.get_stream — a streaming GET (mirrors the existing post_stream),
used to read the raw payload.
_utils.parse_task_documents — normalizes the payload into a list of documents.
The endpoint can return a JSON array, a single JSON object, NDJSON, or several JSON
objects concatenated without a separator, so the parser handles all of those.
TaskHandler.get_task_documents / Client.get_task_documents — call the endpoint and
return the parsed documents.
Tests: parametrized unit tests for the parser (array / object / NDJSON / concatenated /
empty) and a request-shape test asserting the method hits tasks/{uid}/documents and
parses the response.
get_task_documents_1 code sample.

Notes

This is an experimental Meilisearch feature (getTaskDocumentsRoute), noted in the
docstrings.
The parser mirrors the behavior of the official meilisearch-js SDK for the same
endpoint, for cross-SDK consistency.

Summary by CodeRabbit

New Features
- Added ability to retrieve documents associated with a specific task (experimental feature).
- Documents can be accessed via the task ID to view what was added or updated.
Tests
- Added test coverage for task document retrieval and payload parsing.

Meilisearch v1.13 added `GET /tasks/{uid}/documents` to fetch the documents associated with a task. Add `Client.get_task_documents` (and the underlying `TaskHandler` method), backed by a streaming GET (`HttpRequests.get_stream`) and a parser that normalizes the JSON array / NDJSON / concatenated-JSON payload the endpoint can return. Adds unit tests for the parser, a request-shape test for the method, and a `get_task_documents_1` code sample. Closes meilisearch#1221

coderabbitai · 2026-06-23T12:01:35Z

📝 Walkthrough

Walkthrough

Adds experimental support for the GET /tasks/{uid}/documents Meilisearch endpoint. A streaming HTTP method get_stream is added to HttpRequests. A new parse_task_documents utility normalizes multiple JSON response shapes. TaskHandler and Client expose get_task_documents(uid), backed by the streaming transport and parser. Tests and a YAML code sample are included.

Changes

get_task_documents feature

Layer / File(s)	Summary
Streaming transport and document parsing utility `meilisearch/_httprequests.py`, `meilisearch/_utils.py`	`HttpRequests.get_stream` issues a `stream=True` GET and maps `Timeout`, `ConnectionError`, and `HTTPError` to library-specific errors. `parse_task_documents` normalizes task-document payloads across JSON array, single-object, NDJSON, and concatenated-object formats, returning an empty list for blank input.
TaskHandler and Client public methods `meilisearch/task.py`, `meilisearch/client.py`, `.code-samples.meilisearch.yaml`	`TaskHandler.get_task_documents` calls `http.get_stream("tasks/{uid}/documents")` and feeds `response.text` through `parse_task_documents`. `Client.get_task_documents` delegates to `TaskHandler` with an experimental-feature docstring. The `get_task_documents_1` YAML code sample is inserted after `get_task_1`.
Unit tests `tests/test_utils.py`, `tests/client/test_client_task_meilisearch.py`	Parametrized tests cover all `parse_task_documents` input shapes (array, single object, NDJSON, concatenated, empty). A mocked client test asserts the correct endpoint path and parsed output from a streamed response.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Client
    participant TaskHandler
    participant HttpRequests
    participant parse_task_documents

    Caller->>Client: get_task_documents(uid)
    Client->>TaskHandler: get_task_documents(uid)
    TaskHandler->>HttpRequests: get_stream("tasks/{uid}/documents")
    HttpRequests->>HttpRequests: requests.get(url, stream=True)
    HttpRequests-->>TaskHandler: Response object
    TaskHandler->>parse_task_documents: response.text
    parse_task_documents-->>TaskHandler: List[Dict[str, Any]]
    TaskHandler-->>Client: List[Dict[str, Any]]
    Client-->>Caller: List[Dict[str, Any]]

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐇 Hop hop, what's this delight?
A stream of docs, JSON bright!
We parse each shape — array or lone,
NDJSON lines, concatenated stone.
get_task_documents now in sight,
The rabbit ships new endpoints right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main feature added: a get_task_documents method to retrieve a task's documents.
Linked Issues check	✅ Passed	All requirements from issue `#1221` are met: API method added to retrieve task documents [`#1221`], test cases included for the new method [`#1221`], and code sample added under get_task_documents_1 key [`#1221`].
Out of Scope Changes check	✅ Passed	All changes directly support the implementation of get_task_documents functionality as specified in issue `#1221`; no unrelated or out-of-scope changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@meilisearch/_httprequests.py`:
- Around line 242-248: Add an exception handler for
`requests.exceptions.InvalidSchema` to the `get_stream` method's exception
handling block (after the ConnectionError handler) to match the pattern used in
`send_request` and `post_stream`. The handler should catch
`requests.exceptions.InvalidSchema` and raise `MeilisearchCommunicationError`
wrapping the error message, maintaining consistency across all HTTP request
methods in the SDK.

In `@meilisearch/_utils.py`:
- Around line 65-67: The splitlines() loop that processes the payload and splits
on _CONCATENATED_JSON regex pattern incorrectly handles valid JSON objects
containing "}{" within string values. Replace the naive regex-based splitting
logic (the for loops that iterate through payload.splitlines() and
_CONCATENATED_JSON.split(line)) with a JSON-aware parser that properly
understands JSON structure and correctly identifies object boundaries by
tracking quote context and brace nesting, rather than using simple string
pattern matching.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8e377971-df13-4605-b8a2-a5528e9b29db

📥 Commits

Reviewing files that changed from the base of the PR and between 1464823 and 4287bce.

📒 Files selected for processing (7)

.code-samples.meilisearch.yaml
meilisearch/_httprequests.py
meilisearch/_utils.py
meilisearch/client.py
meilisearch/task.py
tests/client/test_client_task_meilisearch.py
tests/test_utils.py

coderabbitai · 2026-06-23T12:06:43Z

+        except requests.exceptions.Timeout as err:
+            raise MeilisearchTimeoutError(str(err)) from err
+        except requests.exceptions.ConnectionError as err:
+            raise MeilisearchCommunicationError(str(err)) from err
+        except requests.exceptions.HTTPError as err:
+            raise MeilisearchApiError(str(err), response) from err
+


🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "_httprequests.py" -o -name "*http*requests*" 2>/dev/null | head -20

Repository: meilisearch/meilisearch-python

Length of output: 168

🏁 Script executed:

cd /tmp && git ls-files | grep -i http

Repository: meilisearch/meilisearch-python

Length of output: 237

🏁 Script executed:

cd /tmp && git ls-files | head -30

Repository: meilisearch/meilisearch-python

Length of output: 237

🏁 Script executed:

find . -type f -name "_httprequests.py" 2>/dev/null

Repository: meilisearch/meilisearch-python

Length of output: 199

🏁 Script executed:

find . -type f -name "*.py" | grep -i http | head -20

Repository: meilisearch/meilisearch-python

Length of output: 236

🏁 Script executed:

ls -la

Repository: meilisearch/meilisearch-python

Length of output: 1413

🏁 Script executed:

git ls-files | grep -E "\.(py|js|ts)$" | head -30

Repository: meilisearch/meilisearch-python

Length of output: 1078

🏁 Script executed:

wc -l ./meilisearch/_httprequests.py

Repository: meilisearch/meilisearch-python

Length of output: 203

🏁 Script executed:

ast-grep outline ./meilisearch/_httprequests.py

Repository: meilisearch/meilisearch-python

Length of output: 425

🏁 Script executed:

sed -n '1,270p' ./meilisearch/_httprequests.py | cat -n

Repository: meilisearch/meilisearch-python

Length of output: 11223

Add InvalidSchema exception handler to get_stream for consistency.

The get_stream method at lines 242-247 lacks an InvalidSchema handler that exists in both send_request and post_stream. Malformed base URLs currently raise raw requests.exceptions.InvalidSchema instead of wrapping it in MeilisearchCommunicationError, breaking SDK-level exception behavior consistency.

Suggested fix

except requests.exceptions.Timeout as err: raise MeilisearchTimeoutError(str(err)) from err except requests.exceptions.ConnectionError as err: raise MeilisearchCommunicationError(str(err)) from err except requests.exceptions.HTTPError as err: raise MeilisearchApiError(str(err), response) from err + except requests.exceptions.InvalidSchema as err: + if "://" not in self.config.url: + raise MeilisearchCommunicationError( + f""" + Invalid URL {self.config.url}, no scheme/protocol supplied. + Did you mean https://{self.config.url}? + """ + ) from err + + raise MeilisearchCommunicationError(str(err)) from err

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@meilisearch/_httprequests.py` around lines 242 - 248, Add an exception handler for `requests.exceptions.InvalidSchema` to the `get_stream` method's exception handling block (after the ConnectionError handler) to match the pattern used in `send_request` and `post_stream`. The handler should catch `requests.exceptions.InvalidSchema` and raise `MeilisearchCommunicationError` wrapping the error message, maintaining consistency across all HTTP request methods in the SDK.

@Strift Great, glad we're aligned. Once you add the InvalidSchema handler to get_stream, exception handling will be consistent across send_request, post_stream, and get_stream. Let me know if you'd like a hand with the diff.

ദ്ദി(˵ •̀ ᴗ - ˵ ) ✧

coderabbitai · 2026-06-23T12:06:43Z

+        for line in payload.splitlines():
+            for chunk in _CONCATENATED_JSON.split(line):
+                stripped = chunk.strip()


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Use a JSON-aware concatenation parser instead of regex boundary splitting.

Lines 65-67 can mis-split valid payloads when a document string contains "}{" (e.g., {"text":"a}{b"}{"id":2}), causing false decode failures.

Suggested fix

except json.JSONDecodeError: - documents: List[Dict[str, Any]] = [] - for line in payload.splitlines(): - for chunk in _CONCATENATED_JSON.split(line): - stripped = chunk.strip() - if stripped: - documents.append(json.loads(stripped)) + decoder = json.JSONDecoder() + documents: List[Dict[str, Any]] = [] + idx = 0 + while idx < len(payload): + while idx < len(payload) and payload[idx].isspace(): + idx += 1 + if idx >= len(payload): + break + document, idx = decoder.raw_decode(payload, idx) + documents.append(document) return documents

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@meilisearch/_utils.py` around lines 65 - 67, The splitlines() loop that processes the payload and splits on _CONCATENATED_JSON regex pattern incorrectly handles valid JSON objects containing "}{" within string values. Replace the naive regex-based splitting logic (the for loops that iterate through payload.splitlines() and _CONCATENATED_JSON.split(line)) with a JSON-aware parser that properly understands JSON structure and correctly identifies object boundaries by tracking quote context and brace nesting, rather than using simple string pattern matching.

Strift

Hello @tcferreira and thanks for this PR 🙌

Can you take a look at CodeRabbit feedback and address it or resolve it if you consider the feedback invalid?

Also, you will need to fix conflicts

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Strift added the enhancement New feature or request label Jul 1, 2026

Strift requested changes Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add get_task_documents to retrieve a task's documents#1252

feat: add get_task_documents to retrieve a task's documents#1252
tcferreira wants to merge 1 commit into
meilisearch:mainfrom
tcferreira:feat/get-task-documents

tcferreira commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 23, 2026 •

edited

Loading

Uh oh!

Strift Jul 1, 2026

Uh oh!

coderabbitai Bot Jul 1, 2026

Uh oh!

coderabbitai Bot Jun 23, 2026

Uh oh!

Strift left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tcferreira commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Strift Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Strift left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tcferreira commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

coderabbitai Bot Jun 23, 2026 •

edited

Loading