Decode sqlserver

12/30/2023

Decode sqlserver

Read Now

The initial byte of 2-, 3- and 4-byte UTF-8 starts with 2, 3 or 4 one bits, followed by a zero bit. Since there is a built-in function “NCHAR” to return the Unicode character and “NCHAR” can accept decimal values, we just need to convert UTF-8 to decimal. Transformation from UTF-8 into UTF-16 is necessary. SQL Server does not support UTF-8 encoding. It then adds a “%” in front of each pair of hexadecimal digits and replaces non-ASCII characters with its corresponding percent-encoding sequences. Now, recall that URL encoding converts non-ASCII characters to its byte sequence in UTF-8. I set out to write a function which solves these issues.

For example, “%c2%ae” should be decoded as “®” instead of “Â®”, “%E2%84%A2” should be decoded as “™” instead of “â„¢”, “%e6%9d%a8” should be decoded as “杨” instead of “æ¨”. But for 2-byte, 3-byte and 4-byte Unicode, decoding byte by byte will not work.

For example, “%2f” will be decoded as “/”. I tried finding solutions on the internet, but the solutions I found all tried to decode byte by byte, which works for 1-byte Unicode as its value is less than 128. Recently I had a requirement to decode URL encoded string in T-SQL.

0 Comments

Decode sqlserver

Leave a Reply.

Author

Archives

Categories