📜  ... unicode - Javascript (1)

📅  最后修改于: 2023-12-03 14:38:46.122000             🧑  作者: Mango

Unicode in JavaScript

Unicode is a character encoding system that assigns a unique number to every character across all writing systems. JavaScript provides various methods and properties to work with Unicode characters in strings. In this article, we'll explore how to handle Unicode in JavaScript.

Unicode Escapes

Unicode escapes are a way to represent Unicode characters using their hexadecimal code point value. In JavaScript, you can use the escape sequence \u{codePoint} to include Unicode characters in a string.

const smiley = '\u{1F60A}';
console.log(smiley); // Output: 😊
String Manipulation

JavaScript provides several string manipulation methods that work with Unicode characters. Some examples include:

  • String.length: The length property returns the number of UTF-16 code units in a string, so it can be used to count Unicode characters.

  • String.charAt(): The charAt() method returns the character at a specified index. It works with characters outside the Basic Multilingual Plane (BMP).

  • String.codePointAt(): The codePointAt() method returns the Unicode code point of the character at a given index.

  • String.fromCodePoint(): The fromCodePoint() method creates a string from a sequence of Unicode code points.

const astronaut = '👩‍🚀';
console.log(astronaut.length); // Output: 4
console.log(astronaut.charAt(0)); // Output: 👩
console.log(astronaut.codePointAt(1)); // Output: 8205 (Zero Width Joiner)
Regular Expressions

Regular expressions in JavaScript can be used to match Unicode characters. The u flag is used to enable full Unicode matching, allowing regular expressions to handle characters outside the BMP.

const text = 'Hello 世界';
const regex = /\p{Script=Han}/u;
console.log(text.match(regex)); // Output: ["世"]
String Iteration

JavaScript's for...of statement can iterate over Unicode characters in a string. It correctly handles characters outside the BMP.

const flags = '🇺🇳🇩🇪';
for (const flag of flags) {
  console.log(flag);
}
// Output:
// 🇺
// 🇳
// 🇩
String Normalization

Unicode defines different forms of normalization to handle equivalent sequences of characters. JavaScript provides methods to normalize strings:

  • String.normalize(): The normalize() method converts a string to one of the four Unicode normalization forms: NFC, NFD, NFKC, or NFKD.
const nfd = '\u0041\u0308'; // 'Ä' decomposed (NFD)
const nfc = nfd.normalize('NFC'); // 'Ä' composed (NFC)
console.log(nfc === 'Ä'); // Output: true

In conclusion, JavaScript has robust support for Unicode characters. Understanding how to work with Unicode in JavaScript is essential for handling multilingual text and ensuring proper string manipulation.