Skip to content

Windows: skill-creator scripts use cp1252 file I/O, crash (UnicodeDecodeError) on UTF-8 SKILL.md #1271

@artificialintelligently

Description

@artificialintelligently

Summary

On Windows, several skill-creator scripts read and write files using Python's default text encoding (cp1252 on Windows) instead of UTF-8. Any skill whose SKILL.md — or any eval/report/JSON file the scripts touch — contains a character outside the cp1252 set (arrows , many Unicode dashes/symbols, emoji, or other non-Latin-1 punctuation) causes the script to crash.

Repro

On Windows (Python 3.13), with a SKILL.md whose body contains a character outside cp1252 (in the repro file, byte 0x9d at position 4389):

python -m scripts.quick_validate path\to\skill
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4389: character maps to <undefined>

The crash originates here in scripts/quick_validate.py:

content = skill_md.read_text()   # no encoding -> cp1252 on Windows

Scope

This is not isolated to quick_validate. The same unencoded read_text() / write_text() / open() pattern appears at ~30 sites across scripts/ and eval-viewer/, several of which break core flows:

  • scripts/utils.pyparse_skill_md does read_text(); this underlies the whole eval/optimize loop, so run_eval / run_loop crash on any UTF-8 skill.
  • scripts/run_eval.pycommand_file.write_text(command_content) crashes when the skill description contains a UTF-8 character (em dashes are common in descriptions).
  • scripts/run_loop.py, scripts/generate_report.py, scripts/aggregate_benchmark.py, scripts/improve_description.py, eval-viewer/generate_review.py — HTML/JSON readers and writers with the same issue.

Fix

Add encoding="utf-8" to every text-mode read/write:

content = skill_md.read_text(encoding="utf-8")
path.write_text(html, encoding="utf-8")
with open(metadata_path, encoding="utf-8") as f: ...

I applied exactly this to a local copy — all ~30 sites; py_compile is clean and quick_validate then passes on Windows with no environment workaround. (Interim workaround for other users: set PYTHONUTF8=1.)

Happy to open a PR with the change if that's useful.

Environment

  • Windows 10, Python 3.13.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions