最近在找關於JSON string encoding的時候額外發現一個有趣的技巧可以分辨Unicode家族。
JSON text SHALL be encoded in Unicode. The default encoding is UTF-8. Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8
Read more:
http://www.faqs.org/rfcs/rfc4627.html#ixzz0V5v8SI97
沒有留言:
張貼留言