HighGo’s Patch to Bring GB18030-2022 to PostgreSQL

Enterprise PostgreSQL Solutions

Comments are off

HighGo’s Patch to Bring GB18030-2022 to PostgreSQL

PostgreSQL and GB18030-2022 Support

PostgreSQL supports GB18030 as a client-side encoding. A client-side encoding means you can set the encoding from a client application such as psql with:

psql=# set client_encoding to GB18030

This tells the PostgreSQL backend that the client will send SQL statements encoded in GB18030. When the backend receives a statement, it converts the GB18030 byte stream into UTF-8 (the default server-side encoding), and then passes it to exec_simple_query() for execution.

GB18030 Versions

GB18030 has multiple versions. Since August 1, 2023, China has mandated the GB18030-2022 standard. However, as of the time of writing, PostgreSQL still supports only GB18030-2000.

GB18030-2022 is not fully backward-compatible with GB18030-2000 — some glyphs map to different Unicode code points.

For example, the GB18030 glyph has the GB18030 code 0xA6D9. If we convert it to UTF-8 using psql:

mydb=# SELECT encode(convert_from(decode('a6d9', 'hex'), 'GB18030')::bytea, 'hex');
encode
--------
ee9e8d
(1 row)

PostgreSQL converts it to UTF-8 code 0xEE9E8D.

However, if we use a web tool like qqxiuzi.cn to look up the glyph in multiple encodings, we find that the corresponding UTF-8 code should be 0xEFB890 — a mismatch.

This difference occurs because the web tool follows GB18030-2022, while PostgreSQL still uses GB18030-2000.

Moving Toward GB18030-2022

To make PostgreSQL compliant with GB18030-2022, HighGo has submitted a patch: https://commitfest.postgresql.org/patch/5954/. Once this patch is merged, PostgreSQL will support the GB18030-2022 standard.

EDIT:The patch has been merged into PG master branch on Sep. 24th 2025, will be included in release 19.

Changes Introduced by GB18030-2022

In summary, GB18030-2022 introduces three key changes compared to the previous 2000 and 2005 standards:

  • Adds 66 new ideographs
  • Removes 9 ideographs that are no longer required; applications can choose whether to retain them
  • Alters the Unicode mapping for 18 ideographs, which causes backward compatibility issues

Below is the list of the 18 ideographs affected:

IdeographGB18030 code2000 mapped unicode2022 mapped unicode
U+E78DU+E78DU+FE10
0xA6DAU+E78EU+FE12
0xA6DBU+E78FU+FE11
0xA6DCU+E790U+FE13
0xA6DDU+E791U+FE14
0xA6DEU+E792U+FE15
0xA6DFU+E793U+FE16
0xA6ECU+E794U+FE17
0xA6EDU+E795U+FE18
0xA6F3U+E796U+FE19
0xFE59U+E81EU+9FB4
0xFE61U+E826U+9FB5
0xFE66U+E82BU+9FB6
0xFE67U+E82CU+9FB7
0xFE6DU+E832U+9FB8
0xFE7EU+E843U+9FB9
0xFE90U+E854U+9FBA
0xFEA0U+E864U+9FBB

These characters are rarely used, so the changes are unlikely to have significant impact on most databases.

For more detailed information, see this excellent reference: The GB 18030-2022 Standard.