Skip to content

replace invalid xml chars in element content in OptimizedForSpeedSaver#74

Open
aizu-m wants to merge 1 commit into
apache:trunkfrom
aizu-m:speed-saver-content-badchar
Open

replace invalid xml chars in element content in OptimizedForSpeedSaver#74
aizu-m wants to merge 1 commit into
apache:trunkfrom
aizu-m:speed-saver-content-badchar

Conversation

@aizu-m

@aizu-m aizu-m commented Jul 3, 2026

Copy link
Copy Markdown

Round-tripping an element whose text holds an invalid XML character through the speed saver:

<root>a?b</root>   (the ? is a raw U+0001)
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1) was found in the element content

Traced it to OptimizedForSpeedSaver.entitizeAndWriteText. It special-cases '<', '&' and the '>' that closes ']]>', but never runs isBadChar, so a C0 control (or U+FFFE/U+FFFF) in element content is written out untouched and the result will not reparse. The default TextSaver.entitizeContent substitutes '?' for those characters, and this saver's own comment and pi emitters already do the same. The content path was the only emitter that skipped it.

Reachable from save(Writer, XmlOptions)/xmlText with setSaveOptimizeForSpeed(true) on content set via XmlCursor.insertChars. Fixed it the way the sibling pi emitter does: swap a bad char for '?' before the switch. Test added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant