The number of Chinese characters contained in the Kangxi dictionary is approximately 47,035, although a large number of these are rarely used variants accumulated throughout history. Studies carried out in China have shown that full literacy in the Chinese language requires a knowledge of only between three and four thousand characters.
In the Chinese writing system, each character corresponds to a single spoken syllable. A majority of words in all modern varieties of Chinese are poly-syllabic and thus require two or more characters to write. Cognates in the various Chinese languages/dialects which have the same or similar meaning but different pronunciations can be written with the same character. In addition, many Chinese characters were adopted according to their meaning by the Japanese and Korean languages to represent native words, disregarding pronunciation altogether. Chinese characters are also considered to be the world's longest continuously used writing system.
Chinese characters are also known as sinographs, and the Chinese writing system as sinography. Non-Chinese languages which have adopted sinography—and, with the orthography, a large number of loanwords from the Chinese language—are known as Sinoxenic languages, whether or not they still use the characters. The term does not imply any genetic affiliation with Chinese. The major Sinoxenic languages are Japanese, Korean, and Vietnamese.
In the last 50 or so years, inscriptions have been found on Neolithic pottery in a variety of locations in China such as Bànpō near Xī’ān, as well as on bone and bone artifacts at Hualouzi, Chang'an County near Xi'an. These simple, often geometric marks have been frequently compared to some of the earliest known Chinese characters, on the oracle bones, and some have taken them to mean that the history of Chinese writing extends back over six millennia. However, because these marks occur singly, without any context to imply usage as writing, and because they are generally extremely crude and simple, Qiú Xīguī (2000, p.31) concluded that "we do not have any basis for stating that these constituted writing, nor is there reason to conclude that they were ancestral to Shang dynasty Chinese characters." Isolated graphs and pictures continue to be found periodically, frequently accompanied by media reports pushing back the purported beginnings of Chinese writing a few thousand years. For example, at Damaidi in Ningxia, 3,172 pictorial cliff carvings dating to 6000–5000 BC have been discovered, leading to headlines such as "Chinese writing '8,000 years old.' Similarly, archaeologists report finding a few inscribed symbols on tortoise shells at the Neolithic site of Jiahu in Henan, dated to around 6,600–6,200BCE, leading to headlines of "'Earliest writing' found in China. However, each time, scholars urge caution and skepticism. Professor David Keightley, a renowned expert on Shang script, urged caution in the latter instance, noting "There is a gap of about 5,000 years. It seems astonishing that they would be connected," adding "we can't call it writing until we have more evidence."
An additional problem with many such claims of connections to later Chinese writing is the lack of any direct cultural connection to Shāng culture, combined with gaps between them of many millennia. One group of sites without such problems is the Dàwènkǒu culture sites (2800–2500 BCE, only one millennium earlier than the early Shāng culture sites, and positioned so as to be plausibly albeit indirectly ancestral to the Shāng). There, a few inscribed pottery and jade pieces have been found, one of which combines pictorial elements (resembling, according to some, a sun, moon or clouds, and fire or a mountain) in a stack which brings to mind the compounding of elements in Chinese characters. Major scholars are divided in their interpretation of such inscribed symbols. Some, such as Yú Xĭngwú, Táng Lán and Lĭ Xuéqín, have identified these with specific Chinese characters. Others such as Wang Ningsheng interpret them as pictorial symbols such as clan insignia, rather than writing. But as Wang Ningsheng points out, "True writing begins when it represents sounds and consists of symbols that are able to record language. The few isolated figures found on pottery still cannot substantiate this point.
The oldest Chinese inscriptions that are indisputably writing are the Oracle bone script (). These were identified by scholars in 1899 on pieces of bone and turtle shell being sold as medicine, and by 1928, the source of the oracle bones had been traced back to modern Xiǎotún (小屯) village at Ānyáng in Hénán Province, where official archaeological excavations in 1928–1937 discovered 20,000 oracle bone pieces, about 1/5 of the total discovered. The inscriptions were records of the divinations performed for or by the royal Shāng household. The oracle bone script is a well-developed writing system, attested from the late Shang Dynasty (1200–1050 BC). Only about 1,400 of the 2,500 known oracle bone script logographs can be identified with later Chinese characters and thus deciphered by paleographers.
The Chinese script spread to Korea together with Buddhism from the 7th century (Hanja). The Japanese Kanji were adopted for recording the Japanese language from the 8st century AD. Adaptation for Vietnamese (Chữ Nôm) emerged in the 13th century
There are numerous styles, or scripts, in which Chinese characters can be written, deriving from various calligraphic and historical models. Most of these originated in China and are now common, with minor variations, in all countries where Chinese characters are used. These characters were used over 3,000 years ago.
The Shang dynasty Oracle Bone and Zhou dynasty scripts found on Chinese bronze inscriptions being no longer used, the oldest script that is still in use today is the Seal Script (). It evolved organically out of the Spring and Autumn period Zhou script, and was adopted in a standardized form under the first Emperor of China, Qin Shi Huang. The seal script, as the name suggests, is now only used in artistic seals. Few people are still able to read it effortlessly today, although the art of carving a traditional seal in the script remains alive; some calligraphers also work in this style.
Scripts that are still used regularly are the "Clerical Script" of the Qin Dynasty to the Han Dynasty, the Weibei the "Regular Script" used for most printing, and the "Semi-cursive Script" used for most handwriting.
The Cursive Script is not in general use, and is a purely artistic calligraphic style. The basic character shapes are suggested, rather than explicitly realized, and the abbreviations are extreme. Despite being cursive to the point where individual strokes are no longer differentiable and the characters often illegible to the untrained eye, this script (also known as draft) is highly revered for the beauty and freedom that it embodies. Some of the Simplified Chinese characters adopted by the People's Republic of China, and some of the simplified characters used in Japan, are derived from the Cursive Script. The Japanese hiragana script is also derived from this script.
There also exist scripts created outside China, such as the Japanese Edomoji styles; these have tended to remain restricted to their countries of origin, rather than spreading to other countries like the standard scripts described above.
|Oracle Bone Script||Seal Script||Clerical Script||Semi-Cursive Script||Cursive Script||Regular Script (Traditional)||Regular Script (Simplified)||Pinyin||Meaning|
|| bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" |—||rì||Sun|
|| bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" |—||yuè||Moon|
|| bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" |—||shān||Mountain|
|| bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" |—||shuǐ||Water|
|| bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" |—||yǔ||Rain|
|| bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" |—||mù||Wood|
|| bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | mǎ||Horse|
|| bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | niǎo||Bird|
|| bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | guī||Tortoise|
|| bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | | bgcolor="#EFEFEF" valign="middle" align="center" | lóng||Chinese Dragon|
|| bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | | bgcolor="#DFDFDF" valign="middle" align="center" | fèng||Chinese Phoenix|
The earliest known Chinese texts, in the Oracle bone script, display a fully developed writing system, little different functionally than modern characters. It can only be assumed that the early stages of the development of characters were dominated by pictograms, which were the objects depicted, and ideograms, in which meaning was expressed iconically. The demands of writing full language, including words which had no easy pictographic or iconic representation, forced an expansion of this system, presumably through use of rebus.
The presumed methods of forming characters were first classified c. 100 AD by the Chinese linguist Xu Shen, whose etymological dictionary Shuowen Jiezi (說文解字/说文解字) divides the script into six categories, the liùshū (六書/六书). While the categories and classification are occasionally problematic and arguably fail to reflect the complete nature of the Chinese writing system, this account has been perpetuated by its long history and pervasive use.
Four percent of Chinese characters are derived directly from individual pictograms, though in most cases the resemblance to an object is no longer clear. Others are ideograms, compound ideograms, where two ideograms are combined to give a third reading, or rebus. But most characters are phono-semantic compounds, with one element to indicate the general category of meaning and the other to suggest the pronunciation. Again, in many cases the suggested sound is no longer accurate.
Contrary to popular belief, pictograms make up only a small portion of Chinese characters. While characters in this class derive from pictures, they have been standardized, simplified, and stylized to make them easier to write, and their derivation is therefore not always obvious. Examples include 日 (rì) for "sun", 月 (yuè) for "moon", and 木 (mù) for "tree"....
There is no concrete number for the proportion of modern characters that are pictographic in nature; however, Xu Shen (c. 100 AD) estimated that 4% of characters fell into this category.
Also called simple indicatives or simple ideographs, these characters either modify existing pictographs iconically, or are direct iconic illustrations. For instance, by modifying 刀 dāo, a pictogram for "knife", by marking the blade, an ideogram 刃 rèn for "blade" is obtained. Direct examples include 上 shàng "up" and 下 xià "down". This category is small.
Translated literally as logical aggregates or associative compounds, these characters symbolically combine pictograms or ideograms to create a third character. For instance, doubling the pictogram 木 mu "tree" produces 林 lin "forest", while combining 日 rì "sun" and 月 yuè "moon", the two natural sources of light, makes 明 míng "bright".
Xu Shen estimated that 13% of characters fall into this category.
Some scholars flatly reject the existence of this category, opining that failure of modern attempts to identify a phonetic in a compound is due simply to our not looking at ancient "secondary readings", which were lost over time. For example, the character 安 ān "peace", a combination of "roof" 宀 and "woman" 女, is commonly cited as an ideogrammic compound, purportedly motivated by a meaning such as "all is peaceful with the woman at home". However, there is evidence that 女 was once a polyphone with a secondary reading of *an, as may be gleaned from the set 妟 yàn "tranquil", 奻 nuán "to quarrel", and 姦 jiān "licentious".
Adding weight to this argument is the fact that characters claimed to belong to this group are almost invariably interpreted from modern forms rather than the archaic forms, which as a rule are quite different and often far more graphically complex. However, interpretations differ greatly between sources.
By far the most numerous category are the phono-semantic compounds, also called semantic-phonetic compounds or pictophonetic compounds. These characters are composed of two parts: one of a limited set of pictographs, often graphically simplified, which suggests the general meaning of the character, and an existing character pronounced approximately as the new target word.
Examples are 河 (hé) river, 湖 (hú) lake, 流 (liú) stream, 沖 (chōng) riptide (or flush), 滑 (huá) slippery. All these characters have on the left a radical of three dots, which is a simplified pictograph for a water drop, indicating that the character has a semantic connection with water; the right-hand side in each case is a phonetic indicator. For example, in the case of 冲 (chōng), the phonetic indicator is 中 (zhōng), which by itself means middle. In this case it can be seen that the pronunciation of the character has diverged from that of its phonetic indicator; this process means that the composition of such characters can sometimes seem arbitrary today. Further, the choice of radicals may also seem arbitrary in some cases; for example, the radical of 貓 (māo) cat is 豸 (zhì), originally a pictograph for worms, but in characters of this sort indicating an animal of any sort.
Xu Shen (c. 100 AD) placed approximately 82% of characters into this category, while in the Kangxi Dictionary (1716 AD) the number is closer to 90%, due to the extremely productive use of this technique to extend the Chinese vocabulary.
This method is still sometimes used to form new characters, for example 钚 ("bu", meaning "plutonium") is the metal radical 金 plus the phonetic component 不 ("bu"), described in Chinese as "不 gives sound, 金 gives meaning".
Characters in this category originally didn't represent the same meaning but have bifurcated through orthographic and often semantic drift. For instance, 考 (kǎo) to verify and 老 (lǎo) old were once the same character, meaning "elderly person", but detached into two separate words. Characters of this category are rare, so in modern systems this group is often omitted or combined with others.
Also called borrowings or phonetic loan characters, this category covers cases where an existing character is used to represent an unrelated word with similar pronunciation; sometimes the old meaning is then lost completely, as with characters such as 自 (zì), which has lost its original meaning of nose completely and exclusively means oneself, or 萬 (wan), which originally meant scorpion but is now used only in the sense of ten thousand.
This technique has become uncommon, since there is considerable resistance to changing the meaning of existing characters. However, it has been used in the development of written forms of dialects, notably Cantonese and Taiwanese in Hong Kong and Taiwan, due to the amount of dialectal vocabulary which historically has had no written form and thus lacks characters of its own.
Just as Roman letters have a characteristic shape (lower-case letters occupying a roundish area, with ascenders or descenders on some letters), Chinese characters occupy a more or less square area. Characters made up of multiple parts squash these parts together in order to maintain a uniform size and shape—this is the case especially with characters written in the Sòngtǐ style. Because of this, beginners often practise on squared graph paper, and the Chinese sometimes use the term "Square-Block Characters" ().
The actual shape of many Chinese characters varies in different cultures. Mainland China adopted simplified characters in 1956, but Traditional Chinese characters are still used in Hong Kong, Macau and Taiwan. Singapore has also adopted simplified Chinese characters. Postwar Japan has used its own less drastically simplified characters since 1946, while South Korea has limited its use of Chinese characters, and Vietnam and North Korea have completely abolished their use in favour of romanized Vietnamese and hangul, respectively.
Usually, each Chinese character takes up the same amount of space, due to their block-like square nature. Beginners therefore typically practice writing with a grid as a guide. In addition to strictness in the amount of space a character takes up, Chinese characters are written with very precise rules. The three most important rules are the strokes employed, stroke placement, and the order in which they are written (stroke order). Most words can be written with just one stroke order, though some words also have variant stroke orders, which may occasionally result in different stroke counts; certain characters are also written with different stroke orders in different languages.
There are two common typefaces based on the regular script for Chinese characters akin to serif and sans-serif fonts in the West. The most popular for body text is a family of fonts called the Song typeface (宋体), also known as Minchō (明朝) in Japan, and Ming typeface (明體) in Taiwan and Hong Kong. The names of these fonts come from the Song and Ming dynasties, when block printing flourished in China. Because the wood grain on printing blocks ran horizontally, it was fairly easy to carve horizontal lines with the grain. However, carving vertical or slanted patterns was difficult because those patterns intersect with the grain and break easily. This resulted in a typeface that has thin horizontal strokes and thick vertical strokes. To prevent wear and tear, the ending of horizontal strokes are also thickened. These design forces resulted in the current Song typeface characterized by thick vertical strokes contrasted with thin horizontal strokes; triangular ornaments at the end of single horizontal strokes; and overall geometrical regularity. This typeface is similar to Western serif fonts such as Times New Roman in both appearance and function.
The other common group of fonts is called the black typeface (黑体/體) in Chinese and Gothic typeface (ゴシック体) in Japanese. This group is characterized by straight lines of even thickness for each stroke, akin to sans-serif styles such as Arial and Helvetica in Western typography. This group of fonts, first introduced on newspaper headlines, is commonly used on headings, websites, signs and billboards.
Although most often associated with the PRC, character simplification predates the 1949 communist victory. Caoshu, cursive written text, almost always includes character simplification, and simplified forms have always existed in print, albeit not for the most formal works. In the 1930s and 1940s, discussions on character simplification took place within the Kuomintang government, and a large number of Chinese intellectuals and writers have long maintained that character simplification would help boost literacy in China. Indeed, this desire by the Kuomintang to simplify the Chinese writing system (inherited and implemented by the CCP) also nursed aspirations of some for the adoption of a phonetic script, in imitation of the Roman alphabet, and spawned such inventions as the Gwoyeu Romatzyh.
The PRC issued its first round of official character simplifications in two documents, the first in 1956 and the second in 1964. A second round of character simplifications (known as erjian, or "second round simplified characters") was promulgated in 1977. It was poorly received, and in 1986 the authorities rescinded the second round completely, while making six revisions to the 1964 list, including the restoration of three traditional characters that had been simplified: 叠 dié, 覆 fù, 像 xiàng.
Many of the simplifications adopted had been in use in informal contexts for a long time, as more convenient alternatives to their more complex standard forms. For example, the traditional character 來 lái (come) was written with the structure 来 in the clerical script (隸書 lìshū) of the Han dynasty. This clerical form uses two fewer strokes, and was thus adopted as a simplified form. The character 雲 yún (cloud) was written with the structure 云 in the oracle bone script of the Shāng dynasty, and had remained in use later as a phonetic loan in the meaning of to say. The simplified form reverted to this original structure.
In the years after World War II, the Japanese government also instituted a series of orthographic reforms. Some characters were given simplified forms called Shinjitai 新字体 (lit. "new character forms"; the older forms were then labelled the Kyūjitai 旧字体 , lit. "old character forms"). The number of characters in common use was restricted, and formal lists of characters to be learned during each grade of school were established, first the 1850-character Tōyō kanji 当用漢字 list in 1945, and later the 1945-character Jōyō kanji 常用漢字 list in 1981. Many variant forms of characters and obscure alternatives for common characters were officially discouraged. This was done with the goal of facilitating learning for children and simplifying kanji use in literature and periodicals. These are simply guidelines, hence many characters outside these standards are still widely known and commonly used, especially those used for personal and place names (for the former, see Jinmeiyō kanji).
Malaysia promulgated a set of simplified characters in 1981, which were also completely identical to the Mainland China simplifications; here, however, the simplifications were not generally widely adopted, as the Chinese educational system fell outside the purview of the federal government. However, with the advent of the PRC as an economic powerhouse, simplified characters are taught at school, and the simplified characters are more commonly, if not almost universally, used. However, a large majority of the older Chinese literate generation use the traditional characters. Chinese newspapers are published in either set of characters, typically with the headlines in Traditional Chinese while the body is in Simplified Chinese.
|Traditional||Chinese simp.||Japanese simp.||meaning|
|Simplified in Chinese, not Japanese||電||电||電||electricity|
|紅||红||紅||red (crimson in Japanese)|
| "Simplified" in Japanese, not Chinese |
(in some cases this represents the adoption of different variant forms as standard)
|拜||拜||拝||kowtow, pray to, worship|
|Simplified in both, but differently||龍||龙||竜||dragon|
|Simplified in both in the same way||學||学||学||learn|
Note: this table is merely a brief sample, not a complete listing.
Chinese character dictionaries often allow users to locate entries in several different ways. Many Chinese, Japanese, and Korean dictionaries of Chinese characters list characters in radical order: characters are grouped together by radical, and radicals containing fewer strokes come before radicals containing more strokes. Under each radical, characters are listed by their total number of strokes. It is often also possible to search for characters by sound, using pinyin (in Chinese dictionaries), zhuyin (in Taiwanese dictionaries), kana (in Japanese dictionaries) or hangul (in Korean dictionaries). Most dictionaries also allow searches by total number of strokes, and individual dictionaries often allow other search methods as well.
For instance, to look up the character where the sound is not known, e.g., 松 (pine tree), the user first determines which part of the character is the radical (here 木), then counts the number of strokes in the radical (four), and turns to the radical index (usually located on the inside front or back cover of the dictionary). Under the number "4" for radical stroke count, the user locates 木, then turns to the page number listed, which is the start of the listing of all the characters containing this radical. This page will have a sub-index giving remainder stroke numbers (for the non-radical portions of characters) and page numbers. The right half of the character also contains four strokes, so the user locates the number 4, and turns to the page number given. From there, the user must scan the entries to locate the character he or she is seeking. Some dictionaries have a sub-index which lists every character containing each radical, and if the user knows the number of strokes in the non-radical portion of the character, he or she can locate the correct page directly.
Another dictionary system is the four corner method, where characters are classified according to the "shape" of each of the four corners.
Most modern Chinese dictionaries and Chinese dictionaries sold to English speakers use the traditional radical-based character index in a section at the front, while the main body of the dictionary arranges the main character entries alphabetically according to their pinyin spelling. To find a character with unknown sound using one of these dictionaries, the reader finds the radical and stroke number of the character, as before, and locates the character in the radical index. The character's entry will have the character's pronunciation in pinyin written down; the reader then turns to the main dictionary section and looks up the pinyin spelling alphabetically.
In addition, the Yi script is similar to Han, but is not known to be directly related to it.
|Year||Name of dictionary||Number of characters|
|1916||Zhonghua Da Zidian||48,000|
|1989||Hanyu Da Zidian||54,678|
Comparing the Shuowen Jiezi and Hanyu Da Zidian reveals that the overall number of characters recorded in dictionaries has increased 577 percent over 1,900 years. Depending upon how one counts variants, 50,000+ is a good approximation for the current total number. This correlates with the most comprehensive Japanese and Korean dictionaries of Chinese characters; the Dai Kan-Wa jiten has some 50,000 entries, and the Han-Han Dae Sajeon has over 57,000. The latest behemoth, the Zhonghua Zihai, records a staggering 85,568 single characters, although even this fails to list all characters known, ignoring the roughly 1,500 Japanese-made kokuji given in the Kokuji no Jiten as well as the Chu Nom inventory only used in Vietnam in past days.
Modified radicals and obsolete variants are two common reasons for the ever-increasing number of characters. There are about 300 radicals and 100 are in common use. Creating a new character by modifying the radical is an easy way to disambiguate homographs among xíngshēngzì pictophonetic compounds. This practice began long before the standardization of Chinese script by Qin Shi Huang and continues to the present day. The traditional 3rd-person pronoun tā (他 "he; she; it"), which is written with the "person radical", illustrates modifying significs to form new characters. In modern usage, there is a graphic distinction between tā (她 "she") with the "woman radical", tā (牠 "it") with the "animal radical", tā (它 "it") with the "roof radical", and tā (祂 "He") with the "deity radical", One consequence of modifying radicals is the fossilization of rare and obscure variant logographs, some of which are not even used in Classical Chinese. For instance, he 和 "harmony; peace", which combines the "grain radical" with the "mouth radical", has infrequent variants 咊 with the radicals reversed and 龢 with the "flute radical".
In the People's Republic of China, which uses Simplified Chinese characters, the Xiàndài Hànyǔ Chángyòng Zìbiǎo (现代汉语常用字表; Chart of Common Characters of Modern Chinese) lists 2,500 common characters and 1,000 less-than-common characters, while the Xiàndài Hànyǔ Tōngyòng Zìbiǎo (现代汉语通用字表; Chart of Generally Utilized Characters of Modern Chinese) lists 7,000 characters, including the 3,500 characters already listed above. GB2312, an early version of the national encoding standard used in the People's Republic of China, has 6,763 code points. GB18030, the modern, mandatory standard, has a much higher number. The Hànyǔ Shuǐpíng Kǎoshì proficiency test covers approximately 5,000 characters.
In the ROC, which uses Traditional Chinese characters, the Ministry of Education's Chángyòng Guózì Biāozhǔn Zìtǐ Biǎo (常用國字標準字體表; Chart of Standard Forms of Common National Characters) lists 4,808 characters; the Cì Chángyòng Guózì Biāozhǔn Zìtǐ Biǎo (次常用國字標準字體表; Chart of Standard Forms of Less-Than-Common National Characters) lists another 6,341 characters. The Chinese Standard Interchange Code (CNS11643)—the official national encoding standard—supports 48,027 characters, while the most widely-used encoding scheme, BIG-5, supports only 13,053.
In Hong Kong, which uses Traditional Chinese characters, the Education and Manpower Bureau's Soengjung Zi Zijing Biu (常用字字形表), intended for use in elementary and junior secondary education, lists a total of 4,759 characters.
In addition, there is a large corpus of dialect characters, which are not used in formal written Chinese but represent colloquial terms in non-Mandarin Chinese spoken forms. One such variety is Written Cantonese, in widespread use in Hong Kong even for certain formal documents, due to the former British colonial administration's recognition of Cantonese for use for official purposes. In Taiwan, there is also an informal body of characters used to represent the spoken Hokkien (Min Nan) dialect.
The one area where character usage is officially restricted is in names, which may contain only government-approved characters. Since the Jōyō kanji list excludes many characters which have been used in personal and place names for generations, an additional list, referred to as the Jinmeiyō kanji (人名用漢字 lit. "kanji for use in personal names"), is published. It currently contains 983 characters, bringing the total number of government-endorsed characters to 2928. (See also the Names section of the kanji article.)
Today, a well-educated Japanese person may know upwards of 3,500 kanji. The kanji kentei (日本漢字能力検定試験 Nihon Kanji Nōryoku Kentei Shiken or Test of Japanese Kanji Aptitude) tests a speaker's ability to read and write kanji. The highest level of the kanji kentei tests on 6,000 kanji, though in practice few people attain (or need to attain) this level.
Written Japanese also includes a pair of syllabic scripts known as kana, which are used in combination with kanji. Not all words in modern Japanese can be expressed with kanji alone, requiring the use of kana in written communication.
In times past, until the 15th century, in Korea, Literary Chinese was the only form of written communication, prior to the creation of hangul, the Korean alphabet. Much of the vocabulary, especially in the realms of science and sociology, comes directly from Chinese. However, due the lack of tones in Korean, as the words were imported from Chinese, many dissimilar characters took on identical sounds, and subsequently identical spelling in hangul. Chinese characters are sometimes used to this day for either clarification in a practical manner, or to give a distinguished appearance, as knowledge of Chinese characters is considered a high class attribute and an indispensable part of a classical education.
In Korea, 한자 hanja have become a politically contentious issue, with some Koreans urging a "purification" of the national language and culture by totally abandoning their use. These individuals encourage the exclusive use of the native hangul alphabet throughout Korean society and the end to character education in public schools.
In South Korea, educational policy on characters has swung back and forth, often swayed by education ministers' personal opinions. At times, middle and high school students have been formally exposed to 1,800 to 2,000 basic characters, albeit with the principal focus on recognition, with the aim of achieving newspaper-literacy. Since there is little need to use hanja in everyday life, young adult Koreans are often unable to read more than a few hundred characters.
There is a clear trend toward the exclusive use of hangul in day-to-day South Korean society. Hanja are still used to some extent, particularly in newspapers, weddings, place names and calligraphy. Hanja is also extensively used in situations where ambiguity must be avoided, such as academic papers, high-level corporate reports, government documents, and newspapers; this is due to the large number of homonyms that have resulted from extended borrowing of Chinese words.
The issue of ambiguity is the main hurdle in any effort to "cleanse" the Korean language of Chinese characters. Characters convey meaning visually, while alphabets convey guidance to pronunciation, which in turn hints at meaning. As an example, in Korean dictionaries, the phonetic entry for 기사 gisa yields more than 30 different entries. In the past, this ambiguity had been efficiently resolved by parenthetically displaying the associated hanja.
In the modern Korean writing system based on hangul, Chinese characters are not used any more to represent native morphemes.
In North Korea, the government, wielding much tighter control than its sister government to the south, has banned Chinese characters from virtually all public displays and media, and mandated the use of hangul in their place.
Often a character not commonly used (a "rare" or "variant" character) will appear in a personal or place name in Chinese, Japanese, Korean, and Vietnamese (see Chinese name, Japanese name, Korean name, and Vietnamese name, respectively). This has caused problems as many computer encoding systems include only the most common characters and exclude the less oft-used characters. This is especially a problem for personal names which often contain rare or classical, antiquated characters.
People who have run into this problem include Taiwanese politician Yu Shyi-kun (游錫堃, pinyin Yóu Xíkūn) and Taiwanese singer David Tao (陶喆 Táo Zhé) due to the last character in each name being very rare. Newspapers have dealt with this problem in varying ways, including using software to combine two existing, similar characters, including a picture of the personality, or, especially as is the case with Yu Shyi-kun, simply substituting a homophone for the rare character in the hope that the reader would be able to make the correct inference. Taiwanese political posters, movie posters etc. will often add the bopomofo phonetic symbols next to such a character. Japanese newspapers may render such names and words in katakana instead of kanji, and it is accepted practice for people to write names for which they are unsure of the correct kanji in katakana instead.
There are also some extremely complex characters which have understandably become rather rare. According to Bellassen (1989), the most complex Chinese character is /𪚥 (U+2A6A5) zhé (pictured below, left), meaning "verbose" and boasting sixty-four strokes; this character fell from use around the 5th century. It might be argued, however, that while boasting the most strokes, it is not necessarily the most complex character (in terms of difficulty), as it simply requires writing the same sixteen-stroke character 龍 lóng (lit. "dragon") four times in the space for one.
One of the most complex characters found in modern Chinese dictionaries is 齉 (U+9F49) nàng (pictured below, second from left), meaning "snuffle" (that is, a pronunciation marred by a blocked nose), with "just" thirty-six strokes. However, this is not in common use. The most complex character that can be input using the Microsoft New Phonetic IME 2002a for Traditional Chinese is 龘 dá "the appearance of a dragon in flight"; it is composed of the dragon radical represented three times, for a total of 16 × 3 = 48 strokes. Among the most complex characters in modern dictionaries and also in frequent modern use are 籲 yù “to implore”, with 32 strokes; 鬱 yù "luxuriant, lush; gloomy", with 29 strokes, as in 憂鬱 yōuyù "depressed"; 豔 yan4 "colorful", with 28 strokes; and 釁 xìn "quarrel", with 25 strokes, as in 挑釁 tiǎoxìn "to pick a fight". Also in occasional modern use is 鱻 xiān “fresh” (variant of 鮮 xiān) with 33 strokes.
In Japanese, an 84-stroke kokuji exists —it is composed of three "cloud" (雲) characters on top of the abovementioned triple "dragon" character (龘). Also meaning "the appearance of a dragon in flight", it has been pronounced おとど otodo, たいと taito, and だいと daito.
The most complex Chinese character still in use may be biáng (pictured right, bottom), with 57 strokes, which refers to Biang biang noodles, a type of noodle from China's Shaanxi province. This character along with syllable biang cannot be found in dictionaries. The fact that it represents a syllable that does not exist in any Standard Mandarin word means that it could be classified as a dialectal character.
In contrast, the simplest character is 一 yī ("one") with just one horizontal stroke. The most common character in Chinese is 的 de, a grammatical particle functioning as an adjectival marker and as a clitic genitive case analogous to the English ’s, with eight strokes. The average number of strokes in a character has been calculated as 9.8; it is unclear, however, whether this average is weighted, or whether it includes traditional characters.
Another very simple Chinese character is 〇 (líng), the numeral zero in a positional system. For instance, the year 2000 would be 二〇〇〇年. It is not a typical character, but taken from the mathematical system of rod numerals. (The traditional character for líng is 零.) The form 〇 is attested from 1247 AD, in the Southern Song mathematical text 數術九章 (Shǔ Shù Jiǔ Zhāng "Mathematical Treatise in Nine Sections"), presumably an influence of Indian "0". Being round, the character does not contain any traditional strokes.
The art of writing Chinese characters is called Chinese calligraphy. It is usually done with ink brushes. In ancient China, Chinese calligraphy is one of the Four Arts of the Chinese Scholars. There is a minimalist set of rules of Chinese calligraphy. Every character from the Chinese scripts is built into a uniform shape by means of assigning it a geometric area in which the character must occur. Each character has a set number of brushstrokes, none must be added or taken away from the character to enhance it visually, lest the meaning be lost. Finally, strict regularity is not required, meaning the strokes may be accentuated for dramatic effect of individual style. Calligraphy was the means by which scholars could mark their thoughts and teachings for immortality, and as such, represent some of the more precious treasures that can be found from ancient China.