Why does String.fromCharCode(0xd800) to String.fromCharCode(0xdfff) return the replacement character?

Why does this happen:

> String.fromCharCode(0xd7FF)
'퟿'
> String.fromCharCode(0xd800)
'�'
> String.fromCharCode(0xdffe) // (and everything in between)
'�'
> String.fromCharCode(0xdfff)
'�'
> String.fromCharCode(0xe000)
''

DFFF₁₆ is 55296₁₀. I get the same results with String.fromCodePoint().

2

1 Answer

Code points U+D800 to U+DFFF are reserved for the UTF-16 encoding of surrogates. Effectively, these are characters which are never valid individually - they always come in surrogate pairs - a high surrogate followed by a low surrogate. (Confusingly, the "high surrogate" range is the range U+D800 to U+DBFF, and the "low surrogate" range is the range U+DC00 to U+DFFF.)

This pair of characters is combined in UTF-16 to represent a single character outside the Basic Multilingual Plane.

Outside this special meaning in UTF-16, these aren't valid characters. So it's reasonable for String.fromCharCode to basically say "you haven't provided valid string data" and use the Unicode replacement character instead.

0

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.

You Might Also Like