📜  使用Python抓取 Google 评论和评分

📅  最后修改于: 2022-05-13 01:54:30.426000             🧑  作者: Mango

使用Python抓取 Google 评论和评分

在本文中,我们将了解如何使用Python抓取 google 评论和评级。

需要的模块:

  • 美丽的汤: 这里涉及的抓取机制是解析 DOM,即从 HTML 和 XML 文件中提取数据
# Installing with pip
pip install beautifulsoup4

# Installing with conda
conda install -c anaconda beautifulsoup4
  • Scrapy:一个开源包,旨在抓取更大的数据集,作为开源,它也被有效使用。
  • Selenium:通常,为了自动化测试,使用Selenium 。我们也可以这样做来进行抓取,因为这里的浏览器自动化有助于交互 javascript,涉及点击、滚动、多帧之间的数据移动等,
# Installing with pip
pip install selenium

# Installing with conda
conda install -c conda-forge selenium 

Chrome 驱动管理器:

# Below installations are needed as browsers
# are getting changed with different versions
pip install webdriver
pip install webdriver-manager

Web驱动初始化:

Python3
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
  
# As there are possibilities of different chrome
# browser and we are not sure under which it get
# executed let us use the below syntax
driver = webdriver.Chrome(ChromeDriverManager().install())


Python3
url = 'https://www.google.com/maps/place/Rashtrapathi Bavan'
driver.get(url)


Python3
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import ElementNotVisibleException
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup
  
driver = webdriver.Chrome(ChromeDriverManager().install())
  
  
driver.maximize_window()
driver.implicitly_wait(30)
  
# Either we can hard code or can get via input.
# The given input should be a valid one
location = "600028"
print("Search By ")
print("1.Book shops")
print("2.Food")
print("3.Temples")
print("4.Exit")
ch = "Y"
  
while (ch.upper() == 'Y'):
    choice = input("Enter choice(1/2/3/4):")
      
    if (choice == '1'):
        query = "book shops  near " + location
          
    if (choice == '2'):
        query = "food  near " + location
          
    if (choice == '3'):
        query = "temples  near " + location
          
    driver.get("https://www.google.com/search?q=" + query)
    wait = WebDriverWait(driver, 10)
    ActionChains(driver).move_to_element(wait.until(EC.element_to_be_clickable(
        (By.XPATH, "//a[contains(@href, '/search?tbs')]")))).perform()
    wait.until(EC.element_to_be_clickable(
        (By.XPATH, "//a[contains(@href, '/search?tbs')]"))).click()
    names = []
      
    for name in driver.find_elements(By.XPATH, "//div[@aria-level='3']"):
        names.append(name.text)
    print(names)
  
    ch = input("Do you want to continue (Y/N): ")


输出:

让我们尝试定位“Rashtrapathi Bavan”,然后做进一步的处理,有时如果是第一次做,它会要求访问页面的权限,如果看到一种权限问题,同意它并移动更远。

Python3

url = 'https://www.google.com/maps/place/Rashtrapathi Bavan'
driver.get(url)

输出:

刮谷歌评论和评级

在这里,我们将尝试从谷歌地图中获取三个实体,例如书店、食品和寺庙,为此我们将制定特定条件并将它们与位置合并。

Python3

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import ElementNotVisibleException
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup
  
driver = webdriver.Chrome(ChromeDriverManager().install())
  
  
driver.maximize_window()
driver.implicitly_wait(30)
  
# Either we can hard code or can get via input.
# The given input should be a valid one
location = "600028"
print("Search By ")
print("1.Book shops")
print("2.Food")
print("3.Temples")
print("4.Exit")
ch = "Y"
  
while (ch.upper() == 'Y'):
    choice = input("Enter choice(1/2/3/4):")
      
    if (choice == '1'):
        query = "book shops  near " + location
          
    if (choice == '2'):
        query = "food  near " + location
          
    if (choice == '3'):
        query = "temples  near " + location
          
    driver.get("https://www.google.com/search?q=" + query)
    wait = WebDriverWait(driver, 10)
    ActionChains(driver).move_to_element(wait.until(EC.element_to_be_clickable(
        (By.XPATH, "//a[contains(@href, '/search?tbs')]")))).perform()
    wait.until(EC.element_to_be_clickable(
        (By.XPATH, "//a[contains(@href, '/search?tbs')]"))).click()
    names = []
      
    for name in driver.find_elements(By.XPATH, "//div[@aria-level='3']"):
        names.append(name.text)
    print(names)
  
    ch = input("Do you want to continue (Y/N): ")

输出: