表意文字描述字元
外觀
表意文字描述字元 Ideographic Description Characters | |
---|---|
範圍 | U+2FF0..U+2FFF (16個碼位) |
平面 | 基本多文種平面(BMP) |
文字 | 通用 |
已分配 | 16個碼位 |
未分配 | 0個保留碼位 |
來源標準 | GBK |
統一碼版本歷史 | |
3.0 | 12 (+12) |
15.1 | 16 (+4) |
註釋:[1][2] |
表意文字描述字元(英語:Ideographic Description Character,IDC)是一塊收錄描述表意文字(如漢字)結構的符號的Unicode區段,其中部分描述符號編碼於其他區段。
目前大部份編碼包含Unicode,處理漢字編碼時大致是先搜集漢字,給予每個漢字一個數字編碼。然而,漢字數量龐大,字集往往不完全;再加上漢字本身是開放組合,漢字的用戶很可能自造新字,不可能有一個可以搜集到所有漢字的字集,所以用這些字元描述某「字」如何以較簡單的部件組合起來。
碼表
[編輯]表意文字描述字元 Ideographic Description Characters [1][2] Unicode 聯盟官方碼表(PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+2FFx | ⿰ | ⿱ | ⿲ | ⿳ | ⿴ | ⿵ | ⿶ | ⿷ | ⿸ | ⿹ | ⿺ | ⿻ | | | | |
註釋
|
表意文字描述序列
[編輯]表意文字描述序列(英語:Ideographic Description Sequence,IDS)是Unicode標準定義的漢字結構描述語法,描述序列由描述字元與兩個以上特定字元(主要為漢字)組合而成,表示漢字的抽象結構。
Unicode定義了16種組合字元:
編碼 | 字元 | 意義 | 例字 | 序列 | 例字 | 序列 |
---|---|---|---|---|---|---|
U+2FF0 | ⿰ | 兩部件由左至右組成 | 相 | ⿰木目 | 𠁢 | ⿰丨㇍ |
U+2FF1 | ⿱ | 兩部件由上至下組成 | 杏 | ⿱木口 | 𠚤 | ⿱𠂊丶 |
U+2FF2 | ⿲ | 三部件由左至右組成 | 衍 | ⿲彳氵亍 | 𠂗 | ⿲丿夕乚 |
U+2FF3 | ⿳ | 三部件由上至下組成 | 京 | ⿳亠口小 | 𠋑 | ⿳亼目口 |
U+2FF4 | ⿴ | 兩部件由外而內組成 | 回 | ⿴囗口 | 𠀬 | ⿴㐁人 |
U+2FF5 | ⿵ | 三面包圍,下方開口 | 凰 | ⿵几皇 | 𧓉 | ⿵齊虫 |
U+2FF6 | ⿶ | 三面包圍,上方開口 | 凶 | ⿶凵㐅 | 义 | ⿶乂丶 |
U+2FF7 | ⿷ | 三面包圍,右方開口 | 匠 | ⿷匚斤 | 𧆬 | ⿷虎九 |
U+2FF8 | ⿸ | 兩面包圍,兩部件由左上至右下組成 | 病 | ⿸疒丙 | 𤆯 | ⿸耂火 |
U+2FF9 | ⿹ | 兩面包圍,兩部件由右上至左下組成 | 戒 | ⿹戈廾 | 𢧌 | ⿹或壬 |
U+2FFA | ⿺ | 兩面包圍,兩部件由左下至右上組成 | 超 | ⿺走召 | 𥘶 | ⿺礼分 |
U+2FFB | ⿻ | 兩部件重疊 | 巫 | ⿻工从 | 𣏃 | ⿻木⿻コ一 |
U+2FFC | | 三面包圍,左方開口 | 㕚 | 叉丶 | 𬺹 | コ二 |
U+2FFD | | 兩面包圍,兩部件由右下至左上組成 | 氷 | 水丶 | 斗 | ⺀十 |
U+2FFE | | 水平翻轉 | 卐 | 卍 | 𣥄 | 正 |
U+2FFF | | 旋轉 | 𠕄 | 凹 | 𠄔 | 予 |
另有兩個描述符號並不在此區段內:
編碼 | 字元 | 區段 | 意義 | 例字 | 序列 | 例字 | 序列 |
---|---|---|---|---|---|---|---|
U+303E | 〾 | 中日韓符號和標點 | 形似但不相等 | 㬵 (U+3B35) | 〾胶 (U+80F6)[3] | 𫜵 | 〾爫[4] |
U+31EF | | 中日韓筆畫 | 減去筆畫 | 乒 | 兵丶 | 𧰨 | 豕一 |
還有一個字元「⬚」,編碼是U+2B1A,雖然名稱只是「dotted square(點狀虛線的正方形)」,但也常配合表意文字描述字元使用,指無法分割的整體字。
IDS的運算方式是前綴表示法,運算子在前,對應數量的運算元在後。這種方式不需使用括號等字元輔助即可無歧義地表示運算順序。
統一碼標準中,表意文字描述序列之定義如下:[5]
IDS := 漢字 | 部首 | 中日韓筆畫 | 私人造字區 | U+FF1F | IDS二元運算子 IDS IDS | IDS三元運算子 IDS IDS IDS
IDS二元運算子 := U+2FF0 | U+2FF1 | U+2FF4 | U+2FF5 | U+2FF6 | U+2FF7 | U+2FF8 | U+2FF9 | U+2FFA | U+2FFB
IDS三元運算子 := U+2FF2 | U+2FF3
按此,合規範的IDS必須由漢字、中日韓部首字元、中日韓筆畫字元(U+31C0-U+31EF)、私人造字、全形問號字元(U+FF1F),以IDC連接而成。
限制
[編輯]- Unicode無定義漢字的唯一表述方式,依現行提案一漢字可用多種IDS表達,如「巫」可表示為「⿻工从」或「⿻工⿰人人」。
- IDS主要目的在於表達漢字的抽象結構,而非像組合字元一樣用於動態組字。現實繪製合體字字形時須考量許多複雜要素,光用IDS不足以繪出符合一般要求的合成字,例如合體字上下、左右比例往往並非1:1,而是按二部件的實際外形調整;左上-右下、三方包圍等組合字的比例計算則更複雜;上下交疊的兩部件也需要依賴對漢字的一般認識才能正確解讀,例如「⿻工从」是將兩個「人」放到「工」的左右兩開口裏,而非簡單地將「工」和「从」上下疊合。
歷史
[編輯]以下檔案記錄了本區段中出現的字元的提議及定稿。
Unicode 版本 |
最終碼位[a] | 碼位數 | UTC ID | L2 ID | WG2 ID | 表意文字小組 ID | 文件 |
---|---|---|---|---|---|---|---|
3.0 | U+2FF0..2FFB | 12 | X3L2/95-111 | N1284 | Ideographic Structure Symbol (additional request), 1995-11-07 | ||
N1303 (html, doc (頁面存檔備份,存於互聯網檔案館)) | Umamaheswaran, V. S.; Ksar, Mike, 8.13 Ideographic structure symbols, Minutes of Meeting 29, Tokyo, 1996-01-26 | ||||||
N1348 | Ideographic Components and Composition Scheme, 1996-02-05 | ||||||
N1357 | Revised Ideographic Structure Symbols, 1996-04-12 | ||||||
N1353 (頁面存檔備份,存於互聯網檔案館) | Umamaheswaran, V. S.; Ksar, Mike, 9, Draft minutes of WG2 Copenhagen Meeting # 30, 1996-06-25 | ||||||
L2/97-026 | N1494 | IRG proposal: Ideographic structure character, 1996-06-27 | |||||
N1430 | N365 | Proposal Summary Form: Ideographic Structure Character, 1996-08-01 | |||||
N1453 (頁面存檔備份,存於互聯網檔案館) | Ksar, Mike; Umamaheswaran, V. S., 9.6 Ideographic Structure Characters, WG 2 Minutes - Quebec Meeting 31, 1996-12-06 | ||||||
L2/97-023 | N1486 | N437 | IRG #8 Resolutions, 1997-01-16 | ||||
N1489 | Supplement to Ideographic Components and Composition Schemes, 1997-01-16 | ||||||
N1490 | N436 | Response to WG2 question on Ideographic Structure Characters, 1997-01-16 | |||||
L2/97-030 | N1503 (pdf, doc (頁面存檔備份,存於互聯網檔案館)) | Umamaheswaran, V. S.; Ksar, Mike, 9.6, Unconfirmed Minutes of WG 2 Meeting #32, Singapore; 1997-01-20--24, 1997-04-01 | |||||
L2/97-114 | N1544 (html, doc (頁面存檔備份,存於互聯網檔案館)) | N453 | Sato, T. K., Questions on the "Han structure method" described in WG2 N1490 (IRG N436), 1997-04-08 | ||||
L2/97-255R | Aliprand, Joan, 4.B.2 Ideographic Structure Characters, Approved Minutes - UTC #73 & L2 #170 joint meeting, Palo Alto, CA - August 4-5, 1997, 1997-12-03 | ||||||
N1680 (頁面存檔備份,存於互聯網檔案館) | Project Sub-Division Proposal on Scheme of Ideograph Description Sequence, 1997-12-18 | ||||||
N1782 (頁面存檔備份,存於互聯網檔案館) | Clause X Ideographic Description Sequence (IDS) – IRG N575, 1998-05-06 | ||||||
L2/98-158 | Aliprand, Joan; Winkler, Arnold, SC2 SC2 Action re Ideographic Description Sequences, Draft Minutes - UTC #76 & NCITS Subgroup L2 #173 joint meeting, Tredyffrin, Pennsylvania, April 20-22, 1998, 1998-05-26 | ||||||
N1842 (頁面存檔備份,存於互聯網檔案館) | Proposed text for a Draft for amendment 28 - Ideographic Description Sequences, 1998-06-03 | ||||||
L2/98-286 | N1703 (頁面存檔備份,存於互聯網檔案館) | Umamaheswaran, V. S.; Ksar, Mike, 9.5, Unconfirmed Meeting Minutes, WG 2 Meeting #34, Redmond, WA, USA; 1998-03-16--20, 1998-07-02, The original proposal was to use character composition. It has changed from being composition to description over its three year development. | |||||
L2/98-317 | N1892 (pdf, doc (頁面存檔備份,存於互聯網檔案館)) | Combined CD registration and consideration ballot on WD for 10646-1/Amd. 28, AMENDMENT 28: Ideographic description characters, 1998-10-22 | |||||
L2/99-010 | N1903 (pdf, html (頁面存檔備份,存於互聯網檔案館), doc (頁面存檔備份,存於互聯網檔案館)) | Umamaheswaran, V. S., 10.3, Minutes of WG 2 meeting 35, London, U.K.; 1998-09-21--25, 1998-12-30 | |||||
L2/99-072.1 | N1971 (頁面存檔備份,存於互聯網檔案館) | Irish Comments on SC 2 N 3186, 1999-01-19 | |||||
L2/99-072 | N1970 (html, doc (頁面存檔備份,存於互聯網檔案館)) | Summary of Voting on SC 2 N 3186, PDAM ballot on WD for 10646-1/Amd. 28: Ideographic description characters, 1999-02-05 | |||||
N2023 (頁面存檔備份,存於互聯網檔案館) | Paterson, Bruce, FPDAM 28 Text - Ideographic Description Characters, 1999-04-06 | ||||||
L2/99-120 | Text for FPDAM ballot of ISO/IEC 10646, Amd. 28 - Ideographic description characters, 1999-04-07 | ||||||
UTC/1999-014 | Jenkins, John, Recursion depth limit for IDC's, 1999-06-01 | ||||||
UTC/1999-015 | Whistler, Ken, Re: Brief note on length of ideograph descriptions, 1999-06-01 | ||||||
UTC/1999-020 | Jenkins, John, Diagram and language [for Ideograph Description Sequences], 1999-06-04 | ||||||
L2/99-176R | Moore, Lisa, Recursion Limit for Ideographic Description Characters, Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999, 1999-11-04 | ||||||
L2/99-232 | N2003 (頁面存檔備份,存於互聯網檔案館) | Umamaheswaran, V. S., 6.1.2 PDAM28 - Ideographic Description Characters, Minutes of WG 2 meeting 36, Fukuoka, Japan, 1999-03-09--15, 1999-08-03 | |||||
L2/99-253 | N2067 (頁面存檔備份,存於互聯網檔案館) | Summary of Voting on SC 2 N 3312, ISO 10646-1/FPDAM 28 - Ideographic description characters, 1999-08-19 | |||||
L2/99-301 | N2123 (頁面存檔備份,存於互聯網檔案館) | Disposition of Comments Report on SC 2 N 3312, ISO/IEC 10646-1/FPDAM 28, AMENDMENT 28: Ideographic description characters, 1999-09-20 | |||||
L2/99-302 | N2124 (頁面存檔備份,存於互聯網檔案館) | Paterson, Bruce, Revised Text for FDAM ballot of ISO/IEC 10646-1/FDAM 28, AMENDMENT 28: Ideographic description characters, 1999-09-24 | |||||
L2/00-010 | N2103 (頁面存檔備份,存於互聯網檔案館) | Umamaheswaran, V. S., 6.4.3, Minutes of WG 2 meeting 37, Copenhagen, Denmark: 1999-09-13--16, 2000-01-05 | |||||
L2/00-045 | Summary of FDAM voting: ISO 10646 Amd. 28: Ideographic description characters, 2000-01-31 | ||||||
L2/02-221 | N2480 (頁面存檔備份,存於互聯網檔案館) | Cook, Richard, Proposal to add Ideographic Description Characters (IDC) to the UCS, 2002-05-18 | |||||
L2/02-436 | N2534 (頁面存檔備份,存於互聯網檔案館) | N955 | IRG Radical Classification, 2002-11-21 | ||||
L2/12-087 | Proposed Changes to ISO/IEC 10646 Annex I, Ideographic Description Characters, 2012-02-09 | ||||||
L2/12-007 | Moore, Lisa, Consensus 130-C13, UTC #130 / L2 #227 Minutes, 2012-02-14, Submit L2/12-087 on extensions to ideographic description sequences to WG2. | ||||||
L2/15-065 | Jenkins, John, Proposal to Add IDS Links to Online Unihan Database, 2015-02-02 | ||||||
L2/15-070 | Davis, Mark, IDS in Unihan, 2015-02-03 | ||||||
L2/15-313 | Lunde, Ken, Request for IDS Data, 2015-11-03 | ||||||
15.1 | U+2FFC..2FFF | 4 | L2/17-386 | N2273 | Yang, Tao; Chan, Eiso; Wang, Yifan, Submission of 3 IDCes, 2017-10-13 | ||
L2/17-379 | Lunde, Ken, Proposed Ideographic Description Characters (IDCs), IRG #49 Liaison Report, 2017-10-20 | ||||||
L2/18-012 | Yang, Tao; Chan, Eiso; Wang, Yifan, Proposal of Four IDCs, 2018-01-05 | ||||||
L2/18-168 | Anderson, Deborah; Whistler, Ken; Pournader, Roozbeh; Moore, Lisa; Liang, Hai; Chapman, Chris; Cook, Richard, 22. IDCs, Recommendations to UTC #155 April-May 2018 on Script Proposals, 2018-04-28 | ||||||
L2/21-118R | N2492 | Lunde, Ken; Jenkins, John H., Preliminary proposal to add a new provisional kIDS property (Unihan), 2021-08-11 | |||||
L2/22-136 | West, Andrew, Feedback on Proposals to Encode New Ideographic Description Characters, 2022-07-08 | ||||||
L2/22-191 | N2572 | Lunde, Ken; Jenkins, John; West, Andrew, Proposal to encode five new Ideographic Description Characters, 2022-08-24 | |||||
L2/22-227 | SAT Feedback to "Preliminary proposal to add a new provisional kIDS property (Unihan)" (IRGN2492) and "Proposal to encode five new Ideographic Description Characters" (IRGN2572), 2022-08-29 | ||||||
L2/22-228 | Fan, Ming, Feedback on IRGN2572 "Proposal to encode 5 new ideograph description characters", 2022-09-02 | ||||||
L2/22-247 | Lunde, Ken, 29, CJK & Unihan Group Recommendations for UTC #173 Meeting, 2022-11-01 | ||||||
L2/22-241 | Constable, Peter, E.1 29, Approved Minutes of UTC Meeting 173, 2022-11-09 | ||||||
|
另見
[編輯]參考文獻
[編輯]- ^ Unicode character database. The Unicode Standard. [2016-07-09]. (原始內容存檔於2017-09-25).
- ^ Enumerated Versions of The Unicode Standard. The Unicode Standard. [2016-07-09]. (原始內容存檔於2016-06-29).
- ^ 「㬵(U+3B35)」和「胶(U+80F6)」为什么在《康熙字典》收录了两次? - 知乎. www.zhihu.com. [2023-09-21].
- ^ 基本集扩充字考(五・完结)附扩充块新增字考. 知乎專欄. [2023-09-21] (中文).
- ^ The Unicode StandardVersion 6.0 – Core Specification (PDF). [2020-02-10]. (原始內容存檔 (PDF)於2019-11-22).
外部連結
[編輯]查看維基詞典中的詞條「附錄:Unicode/表意文字描述字符」。
- Unicode的表意文字組合字元 (頁面存檔備份,存於互聯網檔案館)碼位一覽
- Unicode的東亞文字處理 (頁面存檔備份,存於互聯網檔案館)
- http://unicode.org/iuc/iuc18/papers/b16.ppt (頁面存檔備份,存於互聯網檔案館)