📜  NLP - 文本处理中的扩展收缩

📅  最后修改于: 2022-05-13 01:55:13.320000             🧑  作者: Mango

NLP - 文本处理中的扩展收缩

文本预处理是 NLP 中的关键步骤。清理我们的文本数据以将其转换为可分析和可预测的形式,这称为文本预处理。在本文中,我们将讨论收缩以及如何处理文本中的收缩。

什么是宫缩?

收缩是通过删除字母并用撇号替换它们来缩短的单词或单词组合。

如今,一切都在网上转移,我们更多地通过短信或以文本形式在不同社交媒体(如 Facebook、Instagram、Whatsapp、Twitter、LinkedIn 等)上的帖子与他人交流。有这么多人要交谈,我们依靠缩写和缩短的单词形式给人们发短信。

例如,我将在 5 分钟内到达那里。你不在那里吗?我在 smthng 上发短信了吗?我想在 d 公园附近见你。

在英语收缩中,我们经常从单词中去掉元音以形成收缩。删除收缩有助于文本标准化,并且在我们处理 Twitter 数据和产品评论时很有用,因为这些词在情绪分析中起着重要作用。

如何扩大宫缩?

1.使用收缩库

首先,安装库。您可以在 Google colab 上试用这个库,因为安装该库变得非常顺利。

使用点子:

!pip install contractions

在 Jupyter 笔记本中:

import sys  
!{sys.executable} -m pip install contractions

代码 1:使用收缩库扩展收缩

Python3
# import library
import contractions
# contracted text
text = '''I'll be there within 5 min. Shouldn't you be there too? 
          I'd love to see u there my dear. It's awesome to meet new friends.
          We've been waiting for this day for so long.'''
  
# creating an empty list
expanded_words = []    
for word in text.split():
  # using contractions.fix to expand the shotened words
  expanded_words.append(contractions.fix(word))   
    
expanded_text = ' '.join(expanded_words)
print('Original text: ' + text)
print('Expanded_text: ' + expanded_text)


Python3
text = '''She'd like to know how I'd done that! 
          She's going to the park and I don't think I'll be home for dinner.
          Theyre going to the zoo and she'll be home for dinner.'''
  
contractions.fix(text)


输出:

Original text: I'll be there within 5 min. Shouldn't you be there too? 
          I'd love to see u there my dear. It's awesome to meet new friends.
          We've been waiting for this day for so long.
Expanded_text: I will be there within 5 min. should not you be there too? 
          I would love to see you there my dear. it is awesome to meet new friends. 
          we have been waiting for this day for so long.

在形成词向量之前去除收缩有助于降维。

代码2:简单地使用contractions.fix 来扩展文本。

蟒蛇3

text = '''She'd like to know how I'd done that! 
          She's going to the park and I don't think I'll be home for dinner.
          Theyre going to the zoo and she'll be home for dinner.'''
  
contractions.fix(text)

输出:

'she would like to know how I would done that! 
 she is going to the park and I do not think I will be home for dinner.
 they are going to the zoo and she will be home for dinner.'

也可以使用其他技术(如字典映射)以及 pycontractions 库来处理收缩。您可以参考 pycontractions 库的文档以了解更多信息:https://pypi.org/project/pycontractions/