What is the Punycode in Node.js ?
Punycode is a special encoding syntax that is specifically used to convert Unicode characters (UTF-8) to ASCII, which is nothing but the restricted string character set.
Why this type of specific conversion needed ? The hostnames will understand only ASCII characters. Punycode is used by the International Domain Names(IDN) in order to encode/decode the URL which has been typed in the browser.
For example: If you search mañana.com in the browser, your browser which has an inbuilt IDNA service convert that to xn--maana-pta.com with the help of Punycode converter embedded in the browser.
Now let’s see, how-to-use Punycode with the help of Node.js.
Punycode in Node.js: Punycode is bundled with node.js v0.6.2 and the later versions. If you want to use Punycode, you need to install Punycode module using npm installation.
npm installation:
npm install punycode --save
Include punycode module:
const punycode = require('punycode');
punycode.decode(string): It is used to convert Punycode strings of ASCII to Unicode symbols.
Example:
javascript
// Include punycode module const punycode = require( 'punycode' ); // Decode Punycode strings of ASCII // to Unicode symbols console.log(punycode.decode( 'manama-pta' )); console.log(punycode.decode( '--dqo34k' )); |
Output:
punycode.encode(string): It is used to convert Unicode strings to Punycode strings of ASCII symbols.
Example:
JavaScript
// Include punycode module const punycode = require( 'punycode' ); // Encode Unicode symbols to // Punycode ASCII string console.log(punycode.encode( 'máanama' )); console.log(punycode.encode( '?-?' )); |
Output:
manama-pta --dqo34k
punycode.toUnicode(input): It is used to convert Punycode strings that represent a domain name or an email address to Unicode symbols. It doesn’t matter you call it on an already converted Unicode.
Example:
javascript
// Include punycode module const punycode = require( 'punycode' ); console.log(punycode.toUnicode( 'xn--maana-pta.com' )); console.log(punycode.toUnicode( 'xn----dqo34k.com' )); |
Output:
punycode.toASCII(input): It is used to convert lowercased Unicode strings that represent a domain name or an email address to Punycode symbols. It doesn’t matter you call it with a domain that’s already in ASCII.
Example:
javascript
// Include punycode module const punycode = require( 'punycode' ); console.log(punycode.toASCII( 'mañana.com' )); console.log(punycode.toASCII( '?-?.com' )); |
Output:
xn--maana-pta.com xn----dqo34k.com
punycode.ucs2.decode(string): Creates an array of numeric code point values for each Unicode code symbols in the string.Behind the scenes in the browser which was built on Javascript internally, the UCS-2 function in it, will convert a pair of surrogate halves into a single coded point.
Example:
javascript
// Include punycode module const punycode = require( 'punycode' ); // Decoding strings console.log(punycode.ucs2.decode( 'abc' )); console.log(punycode.ucs2.decode( '\uD834\uDF06' )); |
Output:
[ 97, 98, 99 ] [ 119558 ]
UCS-2: UCS-2 is a 2-byte Universal Character Set that produces a fixed-length format by using 16-bit code unit. The code point ranges from 0 to 0xFFFF.
Surrogate pairs: Characters that are outside BMP, e.g. U+1D306 TETRAGRAM FOR CENTRE:, can only be encoded by using two 16-bit code units. This is known as “surrogate pairs”. The Surrogate pairs only represent a single character alone.
punycode.ucs2.encode(codePoints): It is used to create a string based on the array of numeric code point values.
Example:
javascript
// Include punycode module const punycode = require( 'punycode' ); console.log(punycode.ucs2.encode([0x61, 0x62, 0x63])); console.log(punycode.ucs2.encode([0x1D306])); |
Output:
abc ????
You can see the Punycode Converter to see the live result.