Mapping of Unicode character planes
Wikipedia, the free encyclopedia - Cite This SourceThe Unicode characters can be categorized in many different ways, Unicode code points can be logically divided into 17 planes, each with 65,536 (= 216) code points, although currently only a few planes are used:
- Plane 0 (0000–FFFF): Basic Multilingual Plane (BMP). This is the plane containing most of the character assignments so far. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing systems in current use.
- Plane 1 (10000–1FFFF): Supplementary Multilingual Plane (SMP).
- Plane 2 (20000–2FFFF): Supplementary Ideographic Plane (SIP)
- Planes 3 to 13 (30000–DFFFF) are unassigned
- Plane 14 (E0000–EFFFF): Supplementary Special-purpose Plane (SSP)
- Plane 15 (F0000–FFFFF) reserved for the Private Use Area (PUA)
- Plane 16 (100000–10FFFF), reserved for the Private Use Area (PUA)
Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively blocked out for every current and ancient writing system (script) the Unicode consortium has been able to identify: (see
). While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain, if previously unknown scripts with tens of thousands of characters are discovered. This 20 bit limit is therefore unlikely to be reached in the near future.
Basic Multilingual Plane
The first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.
The graphic on the right is a visual roadmap to the Basic Multilingual Plane. The colours in use are:
- Black = Latin scripts and symbols
- Light Blue = Linguistic scripts
- Blue = Other European scripts
- Orange = Middle Eastern and SW Asian scripts
- Light Orange = African scripts
- Green = South Asian scripts
- Purple = Southeast Asian scripts
- Red = East Asian scripts
- Light Red = Unified CJK Han
- Yellow = Canadian Aboriginal scripts
- Magenta = Symbols
- Dark Grey = Diacritics
- Light Grey = UTF-16 surrogates and private use
- Cyan = Miscellaneous characters
- White = Unused
As of Unicode 5.0, The BMP includes the following scripts:
|
|
|
- Cham (18B0–18FF)
- Lanna (Old Tai Lue) (1A80–1AEF)
- Santali (Ol Cemet' / Ol Chiki) (2DE0–2DFF)
- Vai (A500–A61F)
- Saurashtra (AB00–AB5F)
Several other scripts are proposed for inclusion in the BMP, including:
- Avestan (0800–083F)
- Pahlavi (0840–087F)
- Batak (1A20–1A5F)
- Meitei Mayek / Meitei (1C80–1CDF)
- Varang Kshiti (AA00–AA3F)
- Sorang Sompeng (AA40–AA6F)
Supplementary Multilingual Plane
Plane 1, the Supplementary Multilingual Plane (SMP), is mostly used for historic scripts such as Linear B, but is also used for musical and mathematical symbols.
As of Unicode 5.0, Plane One includes the following scripts:
| Many other scripts are proposed for inclusion in Plane One, including: |