使用 Polyglot 的自然语言处理 - 简介

本文介绍了一个名为Polyglot的Python NLP 包，它支持各种多语言应用程序并提供广泛的分析和广泛的语言覆盖。它由Rami Al-Rfou 开发。它包含许多功能，例如

语言检测（196 种语言）
标记化（165 种语言）
命名实体识别（40 种语言）
部分语音标记（16 种语言）
情绪分析（136 种语言）等等

首先，让我们安装一些必需的包：
使用 Google Colab 轻松顺利地安装。

pip install polyglot

# installing dependency packages
pip install pyicu

# installing dependency packages
pip install Morfessor

# installing dependency packages
pip install pycld2

下载一些必要的模型
使用 Google colab 轻松安装模型

%%bash
polyglot download ner2.en    # downloading model ner

%%bash
polyglot download pos2.en    # downloading model pos

%%bash
polyglot download sentiment2.en  # downloading model sentiment

代码：语言检测

python3

from polyglot.detect import Detector
spanish_text = u"""¡Hola ! Mi nombre es Ana. Tengo veinticinco años. Vivo en Miami, Florida"""
detector = Detector(spanish_text)
print(detector.language)

python3

# importing Text from polyglot library
from polyglot.text import Text
sentences = u"""Suggest a platform for placement preparation?. GFG is a very good platform for placement
preparation.""" 
# passing sentences through imported Text                             
text = Text(sentences)
# dividing sentences into words                   
print(text.words)               
print('\n')
# separating sentences
print(text.sentences)

python3

from polyglot.text import Text
sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google"""
 
text = Text(sentence, hint_language_code ='en')
print(text.entities)

python3

from polyglot.text import Text
sentence = """GeeksforGeeks is the best place for learning things in simple manner."""
text = Text(sentence)
print(text.pos_tags)

python3

from polyglot.text import Text
sentence1 = """ABC is one of the best university in the world."""
sentence2 = """ABC is one of the worst university in the world."""
text1 = Text(sentence1)
text2 = Text(sentence2)
print(text1.polarity)
print(text2.polarity)

输出：：

它检测到的文本是西班牙文，置信度为 98
代码：标记化
标记化是将句子拆分为单词，甚至将段落拆分为句子的过程。

蟒蛇3

# importing Text from polyglot library
from polyglot.text import Text
sentences = u"""Suggest a platform for placement preparation?. GFG is a very good platform for placement
preparation.""" 
# passing sentences through imported Text                             
text = Text(sentences)
# dividing sentences into words                   
print(text.words)               
print('\n')
# separating sentences
print(text.sentences)

输出：

它将句子分成单词，甚至将两个不同的句子分开。
代码：命名实体识别：
Polyglot 识别三类实体：

地点
组织
人

蟒蛇3

from polyglot.text import Text
sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google"""
 
text = Text(sentence, hint_language_code ='en')
print(text.entities)

输出：

I-ORG 指组织
I-LOC 是指位置
I-PER 指人
代码：语音标记的一部分

蟒蛇3

from polyglot.text import Text
sentence = """GeeksforGeeks is the best place for learning things in simple manner."""
text = Text(sentence)
print(text.pos_tags)

输出：

这里 ADP 指代词，ADJ 指形容词，DET 指限定词
代码——情绪分析

蟒蛇3

from polyglot.text import Text
sentence1 = """ABC is one of the best university in the world."""
sentence2 = """ABC is one of the worst university in the world."""
text1 = Text(sentence1)
text2 = Text(sentence2)
print(text1.polarity)
print(text2.polarity)

输出：

1 表示句子处于肯定语境中
-1 表示句子处于否定语境中