📜  使用 Polyglot 的自然语言处理 - 简介

📅  最后修改于: 2022-05-13 01:55:04.487000             🧑  作者: Mango

使用 Polyglot 的自然语言处理 - 简介

本文介绍了一个名为Polyglot的Python NLP 包,它支持各种多语言应用程序并提供广泛的分析和广泛的语言覆盖。它由Rami Al-Rfou 开发。它包含许多功能,例如

  1. 语言检测(196 种语言)
  2. 标记化(165 种语言)
  3. 命名实体识别(40 种语言)
  4. 部分语音标记(16 种语言)
  5. 情绪分析(136 种语言)等等

首先,让我们安装一些必需的包:
使用 Google Colab 轻松顺利地安装。

pip install polyglot        

# installing dependency packages
pip install pyicu           

# installing dependency packages
pip install Morfessor       

# installing dependency packages
pip install pycld2          

下载一些必要的模型
使用 Google colab 轻松安装模型

%%bash
polyglot download ner2.en    # downloading model ner
%%bash
polyglot download pos2.en    # downloading model pos
%%bash
polyglot download sentiment2.en  # downloading model sentiment

代码:语言检测

python3
from polyglot.detect import Detector
spanish_text = u"""¡Hola ! Mi nombre es Ana. Tengo veinticinco años. Vivo en Miami, Florida"""
detector = Detector(spanish_text)
print(detector.language)


python3
# importing Text from polyglot library
from polyglot.text import Text
sentences = u"""Suggest a platform for placement preparation?. GFG is a very good platform for placement
preparation.""" 
# passing sentences through imported Text                             
text = Text(sentences)
# dividing sentences into words                   
print(text.words)               
print('\n')
# separating sentences
print(text.sentences)


python3
from polyglot.text import Text
sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google"""
 
text = Text(sentence, hint_language_code ='en')
print(text.entities)


python3
from polyglot.text import Text
sentence = """GeeksforGeeks is the best place for learning things in simple manner."""
text = Text(sentence)
print(text.pos_tags)


python3
from polyglot.text import Text
sentence1 = """ABC is one of the best university in the world."""
sentence2 = """ABC is one of the worst university in the world."""
text1 = Text(sentence1)
text2 = Text(sentence2)
print(text1.polarity)
print(text2.polarity)


输出::

它检测到的文本是西班牙文,置信度为 98
代码:标记化
标记化是将句子拆分为单词,甚至将段落拆分为句子的过程。

蟒蛇3

# importing Text from polyglot library
from polyglot.text import Text
sentences = u"""Suggest a platform for placement preparation?. GFG is a very good platform for placement
preparation.""" 
# passing sentences through imported Text                             
text = Text(sentences)
# dividing sentences into words                   
print(text.words)               
print('\n')
# separating sentences
print(text.sentences)               

输出

它将句子分成单词,甚至将两个不同的句子分开。
代码:命名实体识别:
Polyglot 识别三类实体:

  • 地点
  • 组织

蟒蛇3

from polyglot.text import Text
sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google"""
 
text = Text(sentence, hint_language_code ='en')
print(text.entities)

输出:

I-ORG 指组织
I-LOC 是指位置
I-PER 指人
代码:语音标记的一部分

蟒蛇3

from polyglot.text import Text
sentence = """GeeksforGeeks is the best place for learning things in simple manner."""
text = Text(sentence)
print(text.pos_tags)

输出:

这里 ADP 指代词,ADJ 指形容词,DET 指限定词
代码——情绪分析

蟒蛇3

from polyglot.text import Text
sentence1 = """ABC is one of the best university in the world."""
sentence2 = """ABC is one of the worst university in the world."""
text1 = Text(sentence1)
text2 = Text(sentence2)
print(text1.polarity)
print(text2.polarity)

输出:

1 表示句子处于肯定语境中
-1 表示句子处于否定语境中