The initial byte of 2-, 3- and 4-byte UTF-8 starts with 2, 3 or 4 one bits, followed by a zero bit. Since there is a built-in function “NCHAR” to return the Unicode character and “NCHAR” can accept decimal values, we just need to convert UTF-8 to decimal. Transformation from UTF-8 into UTF-16 is necessary. SQL Server does not support UTF-8 encoding. It then adds a “%” in front of each pair of hexadecimal digits and replaces non-ASCII characters with its corresponding percent-encoding sequences. Now, recall that URL encoding converts non-ASCII characters to its byte sequence in UTF-8. I set out to write a function which solves these issues. For example, “%c2%ae” should be decoded as “®” instead of “®”, “%E2%84%A2” should be decoded as “™” instead of “â„¢”, “%e6%9d%a8” should be decoded as “杨” instead of “樔. But for 2-byte, 3-byte and 4-byte Unicode, decoding byte by byte will not work. For example, “%2f” will be decoded as “/”. I tried finding solutions on the internet, but the solutions I found all tried to decode byte by byte, which works for 1-byte Unicode as its value is less than 128. Recently I had a requirement to decode URL encoded string in T-SQL.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |