, set of codes used to represent letters, numbers, control characters, and the like, designed for use internationally in computers
. It has been expanded to include such items as scientific, mathematical, and technical symbols, and even musical notation. The Unicode standard defines codes for linguistic symbols used in every major language written today. It includes the Latin alphabet used for English, the Cyrillic alphabet used for Russian, the Greek, Hebrew, and Arabic alphabets, and other alphabets and alphabetlike writing systems used in countries across Europe, Africa, the Indian subcontinent, and Asia, such as Japanese kana, Korean hangeul, and Chinese bopomofo. A large part of the Unicode standard is devoted to thousands of unified character codes for Chinese, Japanese, and Korean ideographs. Adopted as an international standard in 1992, Unicode was originally a "double-byte," or 16-digit, binary number (see numeration
) code that could represent up to 65,536 items. No longer limited to 16 bits, it can now represent about one million code positions using three encoding forms called Unicode Transformation Formats (UTF). UTF-8, which consists of one-, two-, three-, and four-byte codes, is used extensively in World Wide Web
applications; UTF-16, which consists of two- and four-byte codes, is used primarily for data storage and text processing; and UTF-32, which consists of four-byte codes, is used where character handling must be as efficient as possible. See also ASCII
The Columbia Electronic Encyclopedia Copyright © 2004.
Licensed from Columbia University Press