Mapping of Unicode character planes

Wikipedia, the free encyclopedia - Cite This Source

The Unicode characters can be categorized in many different ways, Unicode code points can be logically divided into 17 planes, each with 65,536 (= 216) code points, although currently only a few planes are used:

  • Plane 0 (0000–FFFF): Basic Multilingual Plane (BMP). This is the plane containing most of the character assignments so far. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing systems in current use.
  • Plane 1 (10000–1FFFF): Supplementary Multilingual Plane (SMP).
  • Plane 2 (20000–2FFFF): Supplementary Ideographic Plane (SIP)
  • Planes 3 to 13 (30000–DFFFF) are unassigned
  • Plane 14 (E0000–EFFFF): Supplementary Special-purpose Plane (SSP)
  • Plane 15 (F0000–FFFFF) reserved for the Private Use Area (PUA)
  • Plane 16 (100000–10FFFF), reserved for the Private Use Area (PUA)

Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively blocked out for every current and ancient writing system (script) the Unicode consortium has been able to identify: (see ). While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain, if previously unknown scripts with tens of thousands of characters are discovered. This 20 bit limit is therefore unlikely to be reached in the near future.

Basic Multilingual Plane

The first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.

The graphic on the right is a visual roadmap to the Basic Multilingual Plane. The colours in use are:

  •  Black  = Latin scripts and symbols
  •  Light Blue  = Linguistic scripts
  •  Blue  = Other European scripts
  •  Orange  = Middle Eastern and SW Asian scripts
  •  Light Orange  = African scripts
  •  Green  = South Asian scripts
  •  Purple  = Southeast Asian scripts
  •  Red  = East Asian scripts
  •  Light Red  = Unified CJK Han
  •  Yellow  = Canadian Aboriginal scripts
  •  Magenta  = Symbols
  •  Dark Grey  = Diacritics
  •  Light Grey  = UTF-16 surrogates and private use
  •  Cyan  = Miscellaneous characters
  •  White  = Unused


As of Unicode 5.0, The BMP includes the following scripts:

  • Khmer (1780–17FF)
  • Mongolian (1800–18AF)
  • Limbu (1900–194F)
  • Tai Le (1950–197F)
  • New Tai Lue (1980–19DF)
  • Khmer Symbols (19E0–19FF)
  • Buginese (1A00–1A1F)
  • Balinese (1B00–1B7F)
  • Lepcha (Rong) (1C00–1C4F)
  • Phonetic Extensions (1D00–1D7F)
  • Phonetic Extensions Supplement (1D80–1DBF)
  • Combining Diacritical Marks Supplement (1DC0–1DFF)
  • Latin Extended Additional (1E00–1EFF)
  • Greek Extended (1F00–1FFF)
  • Symbols:
  • Glagolitic (2C00–2C5F)
  • Latin Extended-C (2C60–2C7F)
  • Coptic (2C80–2CFF)
  • Georgian Supplement (2D00–2D2F)
  • Tifinagh (2D30–2D7F)
  • Ethiopic Extended (2D80–2DDF)
  • Supplemental Punctuation (2E00–2E7F)
  • CJK Radicals Supplement (2E80–2EFF)
  • Kangxi Radicals (2F00–2FDF)
  • Ideographic Description Characters (2FF0–2FFF)
  • CJK Symbols and Punctuation (3000–303F)
  • Hiragana (3040–309F)
  • Katakana (30A0–30FF)
  • Bopomofo (3100–312F)
  • Hangul Compatibility Jamo (3130–318F)
  • Kanbun (3190–319F)
  • Bopomofo Extended (31A0–31BF)
  • CJK Strokes (31C0–31EF)
  • Katakana Phonetic Extensions (31F0–31FF)
  • Enclosed CJK Letters and Months (3200–32FF)
  • CJK Compatibility (3300–33FF)
  • CJK Unified Ideographs Extension A (3400–4DBF)
  • Yijing Hexagram Symbols (4DC0–4DFF)
  • CJK Unified Ideographs (4E00–9FFF)
  • Yi Syllables (A000–A48F)
  • Yi Radicals (A490–A4CF)
  • Modifier Tone Letters (A700–A71F)
  • Latin Extended-D (A720–A7FF)
  • Syloti Nagri (A800–A82F)
  • Phags-pa (A840–A87F)
  • Hangul Syllables (AC00–D7AF)
  • High Surrogates (D800–DB7F)
  • High Private Use Surrogates (DB80–DBFF)
  • Low Surrogates (DC00–DFFF)
  • Private Use Area (E000–F8FF)
  • CJK Compatibility Ideographs (F900–FAFF)
  • Alphabetic Presentation Forms (FB00–FB4F)
  • Arabic Presentation Forms-A (FB50–FDFF)
  • Variation Selectors (FE00–FE0F)
  • Vertical Forms (FE10–FE1F)
  • Combining Half Marks (FE20–FE2F)
  • CJK Compatibility Forms (FE30–FE4F)
  • Small Form Variants (FE50–FE6F)
  • Arabic Presentation Forms-B (FE70–FEFF)
  • Halfwidth and Fullwidth Forms (FF00–FFEF)
  • Specials (FFF0–FFFF)
  • Future additions Several scripts are expected to be included in the BMP in the next revision of Unicode. These scripts, and their proposed code point ranges, are the following:

    Several other scripts are proposed for inclusion in the BMP, including:

    Supplementary Multilingual Plane

    Plane 1, the Supplementary Multilingual Plane (SMP), is mostly used for historic scripts such as Linear B, but is also used for musical and mathematical symbols.

    As of Unicode 5.0, Plane One includes the following scripts:

    Many other scripts are proposed for inclusion in Plane One, including:

    Supplementary Ideographic Plane

    Plane 2, the Supplementary Ideographic Plane (SIP), is used for about 40,000 Unified Han Ideographs that have previously been seldom used in daily written communications.

    Unused planes

    Unicode has not yet assigned any characters to Planes 3 through 13. It is not anticipated that these planes will be needed, given the total sizes of the known writing systems left to be encoded. However, the number of possible symbol characters that could arise outside of the context of writing systems is potentially limitless. The UCS and Unicode take requests for symbols on a case by case basis.

    Supplementary Special-purpose Plane

    Plane 14 (E in hexadecimal), the Supplementary Special-purpose Plane (SSP), currently contains non-graphical characters in two blocks of 128 and 240 characters. The first block is for language tag characters for use when language cannot be indicated through other protocols (such as the