claude-arabic-docs — make MS Word actually render Arabic (and Hebrew/Persian/Urdu) RTL documents correctly #1165
muhmoosa
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all 👋
Repo: https://github.com/muhmoosa/claude-arabic-docs
Release / .skill download: https://github.com/muhmoosa/claude-arabic-docs/releases/tag/v1.0.0
License: MIT
The problem this skill exists to fix
You ask Claude to produce an Arabic
.docx. It builds the file withdocx-js/python-docx/ openpyxl, marks every paragraph RTL, every run with<w:rtl/>, every table with<w:bidiVisual/>. LibreOffice and the PDF preview look perfect. You ship it to the client.They open it in Microsoft Word and every heading and body paragraph is left-aligned. Tables are fine. Only the body is broken.
This is a class of bug that traps every team I've watched try to generate Arabic Word documents with Claude. It also traps Hebrew, Persian, and Urdu users for the same reason. After ~2 days of bisecting OOXML XML with a real user as a diff oracle, I found that MS Word needs six XML layers set, and the most critical one (
<w:themeFontLang w:bidi="ar-SA"/>insettings.xml) is never written by any docx generation library — it only gets populated when a live Word session saves a file, sourced from the OS keyboard layout. So every programmatically generated Arabic docx starts off broken in Word.What the skill does
It teaches Claude to:
start/end) alignment, not physical (left/right) — because Word for Mac re-interpretsrightas logical end (= visually left) in RTL paragraphs. This alone fixes the last-mile rendering once the other layers are in place.scripts/harden_rtl.pypost-processor that injects all six layers:settings.xml→<w:themeFontLang w:bidi="ar-SA"/>(the master switch)styles.xmldocDefaults →<w:lang w:bidi="ar-SA"/>+<w:pPr><w:bidi/><w:jc w:val="start"/></w:pPr>document.xml→<w:bidi/>in<w:sectPr>(schema-correct position)<w:tbl>→<w:bidiVisual/><w:bidi/>and<w:rtl/>scripts/arabic_numerals.pyto convert Western digits + Latin punctuation to Arabic conventions (٠–٩, ٬, ٪, ،, ؛, ؟) — while protecting IBANs, emails, URLs, and license codes via a regex heuristic so they stay Latin.The hardening script is idempotent — running it twice is a no-op on the second pass.
Supported locales
Default is
ar-SAbut the--localeflag acceptsar-EG,ar-AE,ar-MA,he-IL,fa-IR,ur-PK. Other RTL scripts (Syriac, Thaana, NKo, Mandaic, Samaritan) get paragraph/run flags via Unicode-range detection.What I learned along the way
The repo's CHANGELOG.md documents the six iterations — each layer was discovered by a real user manually fixing a docx in Word and me diffing their saved file against my programmatically-generated one. If anyone is fighting the same bugs, the discovery history is more useful than the rule list.
The full rule catalog with rationale is in
SKILL.md. README has full install instructions in English and Arabic.Happy to chat about edge cases, accept PRs for additional locales, or discuss why MS Word does any of this.

تعريف بالعربية (Arabic summary)
إضافة لـ Claude تجعل برنامج Microsoft Word يُظهِر المستندات العربية (والعبرية والفارسية والأردية) بمحاذاةٍ يمينيةٍ صحيحة، حتّى حين تُولَّد برمجيّاً. تحلّ فئةً من أخطاء عرض RTL لا تتعامل معها أي مكتبة توليد .docx بشكلٍ افتراضي.
MS Word يحتاج ستّ طبقات من إعدادات XML داخل الملف، أهمّها
<w:themeFontLang w:bidi="ar-SA"/>فيsettings.xmlوالذي لا تكتبه أي مكتبة توليد. الإضافة تحقن الستّة كلّها تلقائيّاً. مفتوحة المصدر (MIT).رابط المستودع: https://github.com/muhmoosa/claude-arabic-docs
Beta Was this translation helpful? Give feedback.
All reactions