GB2312 Character Set
GB: An abbreviation of Guojia Biaozhun, or Guo Biao, meaning "national standard" in Chinese.
GB2312: A coded character set established by the government of People's Republic of China in 1980.
Main features of GB2312:
- It contains 7445 characters, including 6763 Hanzi and 682 non-Hanzi characters.
- It is for simplified Chinese characters only. The traditional Chinese characters are included in Big5 character set.
- It is used mainly in mainland China and Singapore.
GB2312 arranges characters into a matrix of 94 rows and 94 columns based on the following rules:
# of
Rows Chars Characters
01 94 Special symbols
02 72 Paragraph numbers
03 94 Latin characters
04 83 Hiragana characters
05 86 Katakana characters
06 48 Greek characters
07 66 Cyrillic characters
08 63 Pinyin accented vowels and zhuyin symbols
09 76 Box and table drawing symbols
16-55 3755 Hanzi level 1, ordered by pinyin
56-87 3008 Hanzi level 2, ordered by radical, then stroke
This book provides you a list of all characters in GB2312 and thier row numbers and columns.
Row 01: Regular Symbols
R.C. GB Uni. UTF-8 R.C. GB Uni. UTF-8
0101 A1A1 3000 E38080 0102 、 A1A2 3001 E38081
0103 。 A1A3 3002 E38082 0104 · A1A4 00B7 C2B7
0105 ˉ A1A5 02C9 CB89 0106 ˇ A1A6 02C7 CB87
0107 ¨ A1A7 00A8 C2A8 0108 〃 A1A8 3003 E38083
0109 々 A1A9 3005 E38085 0110 — A1AA 2014 E28094
[출처 : http://www.herongyang.com/gb2312/symbol.html]
GB2312 Codes
GB2312 assigns a 2-byte native code for each character.
The first byte is called the high byte, containing the row number plus 32;
the second byte is called the low byte, containing the column number plus 32.
For example, if a character is located at row 16 and column 1, its high byte will be 16 + 32 = 48 (0x30), and log byte will be 1 + 32 = 33 (0x21). Put them together, its native code will be 0x3021.
I guess the reason to add 32 on both row number and column is for
the byte value to not fall into the low value range, which is usually
reserved to represents controlling commands in many computer systems.
However, the byte values of GB2312 native codes are still colliding with ASCII codes. To resolve this problem, a value of 128 is added to both bytes of the native codes.
For example, if a character is located at row 16 and column 1, its
native code will be 0x3021, and its modified code will be 0xB0A1. (-> 128 = 0x80, 0x3021 + 0x8080 = 0xB0A1)
These modified codes are adopted as the GB2312 standard codes, which can be safely mixed together with ASCII codes.
This book provides you a list all GB2312 characters and their codes.
:: Row/Col 에서 GB2312로 변환하는 방법임, ASCII에서 GB2312 가 아님!
GB2312 vs. Unicode
GB2312 character set is sub set of Unicode character set. This means
that every character defined in GB2312 is also defined in Unicode.
However, GB2312 codes and Unicode codes are totally un-related. For
example, GB2312 character with code value of 0xB0A1 has a Unicode code
value of 0x554A. There no mathematical formula to convert a GB2312 code
to a Unicode code of the same character. This book provides you a
complet map of all GB2312 codes and thier corresponding Unicode codes.
The corresponding UTF-8 (Unicode Transformation Format - 8-bit) are
also listed in the map.
[출처 : http://www.herongyang.com/gb2312/overview.html]