📜  Python|使用 spaCy 的 PoS 标记和词形还原

📅  最后修改于: 2022-05-13 01:55:33.059000             🧑  作者: Mango

Python|使用 spaCy 的 PoS 标记和词形还原

spaCy是最好的文本分析库之一。 spaCy 擅长大规模信息提取任务,是世界上速度最快的任务之一。这也是为深度学习准备文本的最佳方式。 spaCy 比 NLTKTagger 和 TextBlob 更快、更准确。

如何安装 ?

pip install spacy
python -m spacy download en_core_web_sm


spaCy的主要特点:
1. 无损标记化
2.命名实体识别
3.支持49+语言
4. 9种语言的16种统计模型
5. 预训练的词向量
6. 词性标注
7. 带标签的依赖解析
8. 句法驱动的分句


导入和加载库:

import spacy
  
# python -m spacy download en_core_web_sm
nlp = spacy.load("en_core_web_sm")


评论的 POS 标记:

它是一种将单词识别为名词、动词、形容词、副词等的方法。

import spacy
  
# Load English tokenizer, tagger, 
# parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")
  
# Process whole documents
text = ("""My name is Shaurya Uppal. 
I enjoy writing articles on GeeksforGeeks checkout
my other article by going to my profile section.""")
  
doc = nlp(text)
  
# Token and Tag
for token in doc:
  print(token, token.pos_)
  
# You want list of Verb tokens
print("Verbs:", [token.text for token in doc if token.pos_ == "VERB"])

输出:

My DET
name NOUN
is VERB
Shaurya PROPN
Uppal PROPN
. PUNCT
I PRON
enjoy VERB
writing VERB
articles NOUN
on ADP
GeeksforGeeks PROPN
checkout VERB
my DET
other ADJ
article NOUN
by ADP
going VERB
to ADP
my DET
profile NOUN
section NOUN
. PUNCT

# Verb based Tagged Reviews:-
Verbs: ['is', 'enjoy', 'writing', 'checkout', 'going']


词形还原:

这是一个将单词的变形形式组合在一起的过程,以便可以将它们作为单个项目进行分析,由单词的引理或字典形式标识。

import spacy
  
# Load English tokenizer, tagger,
# parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")
  
# Process whole documents
text = ("""My name is Shaurya Uppal. I enjoy writing
          articles on GeeksforGeeks checkout my other
          article by going to my profile section.""")
  
doc = nlp(text)
  
for token in doc:
  print(token, token.lemma_)

输出:

My -PRON-
name name
is be
Shaurya Shaurya
Uppal Uppal
. .
I -PRON-
enjoy enjoy
writing write
articles article
on on
GeeksforGeeks GeeksforGeeks
checkout checkout
my -PRON-
other other
article article
by by
going go
to to
my -PRON-
profile profile
section section
. .