📜  使用Python从文本中提取位置

📅  最后修改于: 2022-05-13 01:54:52.952000             🧑  作者: Mango

使用Python从文本中提取位置

在本文中,我们将了解如何使用Python从文本中提取位置。

在处理文本时,要求可以是检测接收文本中的城市、地区、州和国家以及它们之间的关系。这对地理研究非常有用。在本文中,我们将使用locationtagger库。

需要一些基于语法的规则和统计建模方法的文本挖掘通常使用NER(命名实体识别)算法进行。从 NER 中提取的实体可以是个人、地点、组织或产品的名称。 locationtagger 库是从所有其他存在的实体中进一步标记和过滤地点的副产品。

安装:

要安装此模块,请在终端中键入以下命令。

pip install locationtagger

安装完成后,需要使用代码下载几个nltk模块。



Python3
import nltk
import spacy
  
# essential entity models downloads
nltk.downloader.download('maxent_ne_chunker')
nltk.downloader.download('words')
nltk.downloader.download('treebank')
nltk.downloader.download('maxent_treebank_pos_tagger')
nltk.downloader.download('punkt')
nltk.download('averaged_perceptron_tagger')


Python3
import locationtagger
  
# initializing sample text
sample_text = "India has very rich and vivid culture\
       widely spread from Kerala to Nagaland to Haryana to Maharashtra. " \
       "Delhi being capital with Mumbai financial capital.\
       Can be said better than some western cities such as " \
       " Munich, London etc. Pakistan and Bangladesh share its borders"
  
# extracting entities.
place_entity = locationtagger.find_locations(text = sample_text)
  
# getting all countries
print("The countries in text : ")
print(place_entity.countries)
  
# getting all states
print("The states in text : ")
print(place_entity.regions)
  
# getting all cities
print("The cities in text : ")
print(place_entity.cities)


Python3
import locationtagger
  
# initializing sample text
sample_text = "India has very rich and vivid culture widely\
        spread from Kerala to Nagaland to Haryana to Maharashtra. " \
       "Mumbai being financial capital can be said better\
       than some western cities such as " \
       " Lahore, Canberra etc. Pakistan and Nepal share its borders"
  
# extracting entities.
place_entity = locationtagger.find_locations(text = sample_text)
  
# getting all country regions
print("The countries regions in text : ")
print(place_entity.country_regions)
  
# getting all country cities
print("The countries cities in text : ")
print(place_entity.country_cities)
  
# getting all other countries
print("All other countries in text : ")
print(place_entity.other_countries)
  
# getting all region cities
print("The region cities in text : ")
print(place_entity.region_cities)
  
# getting all other regions
print("All other regions in text : ")
print(place_entity.other_regions)
  
# getting all other entities
print("All other entities in text : ")
print(place_entity.other)


同样从命令行:

python -m spacy download en_core_web_sm

示例 1:从文本打印国家、城市和地区。

各种函数可用于从文本中获取城市、国家、地区等。

使用的功能:

  • locationtagger.find_location(text) :返回带有位置信息的实体。 “text”参数将文本作为输入。
  • entity.countries :提取文本中的所有国家。
  • entity.regions :提取文本中的所有状态。
  • entity.cities :提取文本中的所有城市。

代码:

蟒蛇3

import locationtagger
  
# initializing sample text
sample_text = "India has very rich and vivid culture\
       widely spread from Kerala to Nagaland to Haryana to Maharashtra. " \
       "Delhi being capital with Mumbai financial capital.\
       Can be said better than some western cities such as " \
       " Munich, London etc. Pakistan and Bangladesh share its borders"
  
# extracting entities.
place_entity = locationtagger.find_locations(text = sample_text)
  
# getting all countries
print("The countries in text : ")
print(place_entity.countries)
  
# getting all states
print("The states in text : ")
print(place_entity.regions)
  
# getting all cities
print("The cities in text : ")
print(place_entity.cities)

输出 :



示例 2:提取位置关系

在这个例子中,讨论了执行获取城市、地区和州之间关系的任务的各种函数。

使用的功能:

  • entity.country_regions :提取在文本中找到区域的国家/地区。
  • entity.country_cities :提取在文本中找到城市的国家。
  • entity.other_countries :提取其地区或城市出现在文本中的所有国家/地区列表。
  • entity.region_cities :提取在文本中找到其城市的区域。
  • entity.other_regions :提取其城市出现在文本中的所有区域列表。
  • entity.other :所有未被识别为地名的实体,都被提取到这个。

蟒蛇3

import locationtagger
  
# initializing sample text
sample_text = "India has very rich and vivid culture widely\
        spread from Kerala to Nagaland to Haryana to Maharashtra. " \
       "Mumbai being financial capital can be said better\
       than some western cities such as " \
       " Lahore, Canberra etc. Pakistan and Nepal share its borders"
  
# extracting entities.
place_entity = locationtagger.find_locations(text = sample_text)
  
# getting all country regions
print("The countries regions in text : ")
print(place_entity.country_regions)
  
# getting all country cities
print("The countries cities in text : ")
print(place_entity.country_cities)
  
# getting all other countries
print("All other countries in text : ")
print(place_entity.other_countries)
  
# getting all region cities
print("The region cities in text : ")
print(place_entity.region_cities)
  
# getting all other regions
print("All other regions in text : ")
print(place_entity.other_regions)
  
# getting all other entities
print("All other entities in text : ")
print(place_entity.other)

输出: