fix: decode U+10FFFF instead of replacing it with U+FFFD by greymoth-jp · Pull Request #103 · mdevils/html-entities

greymoth-jp · 2026-06-30T10:30:00Z

decode replaces the numeric reference 􏿿 (and 􏿿) with U+FFFD, but U+10FFFF is a valid Unicode scalar value and encode emits it just fine:

import { encode, decode } from 'html-entities';

const s = String.fromCodePoint(0x10ffff);
encode(s, { mode: 'nonAscii' }); // "&#1114111;"
decode('&#1114111;');            // "�"  (expected "\u{10FFFF}")

So a string containing the highest code point does not survive a round trip through encode/decode, even though String.fromCodePoint(0x10ffff) is valid in JS (only 0x110000 and above throw).

The cause is an off-by-one in the bounds check inside getDecodedEntity:

decodeCode >= 0x10ffff ? outOfBoundsChar : ...

The WHATWG numeric character reference rules only substitute U+FFFD when the referenced value is greater than 0x10FFFF, so the comparison should be >. U+10FFFE and every code point below it already decode correctly; only the maximum was being caught by the >=.

This changes the check to > and adds a test covering 􏿿, 􏿿, the encode/decode round trip, and that &#1114112; (past the Unicode range) still maps to U+FFFD.

fix: decode U+10FFFF instead of replacing it with U+FFFD

23ffaa0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: decode U+10FFFF instead of replacing it with U+FFFD#103

fix: decode U+10FFFF instead of replacing it with U+FFFD#103
greymoth-jp wants to merge 1 commit into
mdevils:mainfrom
greymoth-jp:fix-decode-max-codepoint

greymoth-jp commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

greymoth-jp commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant