Skip to content

fix: add early returns for md/json/xml/yaml in guess_file_type#3090

Open
totto wants to merge 1 commit into
topoteretes:mainfrom
totto:fix/guess-file-type-missing-extensions
Open

fix: add early returns for md/json/xml/yaml in guess_file_type#3090
totto wants to merge 1 commit into
topoteretes:mainfrom
totto:fix/guess-file-type-missing-extensions

Conversation

@totto

@totto totto commented Jun 18, 2026

Copy link
Copy Markdown

Bug

guess_file_type returns text/plain for .md, .json, .xml, .yaml, and .yml files instead of their correct MIME types.

Root cause

filetype.guess() uses magic-byte signatures and returns None for text-based formats that have no binary header. The function already handled .txt, .text, and .csv with extension-based early returns, but left the rest to fall through:

file_type = filetype.guess(file)  # returns None for .json, .md, .yaml, .xml

if file_type is None:
    file_type = Type("text/plain", "txt")  # wrong for all of them

This means a JSON file ingested by cognee is tagged as text/plain with extension txt, which breaks downstream pipeline steps that branch on MIME type.

A secondary dead-code issue: after the unconditional assignment above, the following check can never be True and was removed:

if file_type is None:  # unreachable
    raise FileTypeException(...)

Fix

Add extension-based early returns for the missing text formats, consistent with the existing .txt/.csv pattern:

Extension MIME type
.md, .markdown text/markdown
.json application/json
.xml application/xml
.yaml, .yml application/yaml

Remove the unreachable raise FileTypeException branch.

🤖 Generated with Claude Code

…type

filetype.guess() relies on magic bytes and returns None for text-based
formats (.md, .json, .xml, .yaml/.yml). Without extension-based early
returns these all fell through to the None fallback and were labeled
text/plain, discarding the actual format.

Also remove unreachable dead code: the second `if file_type is None`
check after an unconditional assignment can never be True.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@totto totto requested a review from Vasilije1990 as a code owner June 18, 2026 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant