📌  相关文章
📜  使用Python将 HTML 源代码转换为 JSON 对象

📅  最后修改于: 2022-05-13 01:55:13.601000             🧑  作者: Mango

使用Python将 HTML 源代码转换为 JSON 对象

在这篇文章中,我们将看到如何将 HTML 源代码转换为 JSON 对象。 JSON 对象可以轻松传输,并且大多数现代编程语言都支持它们。我们可以从 Javascript 中读取 JSON 并将其轻松解析为 Javascript 对象。 Javascript 可用于为您的网页制作 HTML。

我们将在这篇文章中使用xmltojson模块。该模块的 parse函数将 HTML 作为输入并返回解析后的 JSON字符串。

环境设置:



安装所需的 模块:

pip install xmltojson
pip install requests

脚步:

  • 导入库
Python3
import xmltojson
import json
import requests


Python3
# Sample URL to fetch the html page
url = "https://geeksforgeeks-example.surge.sh"
  
# Headers to mimic the browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 \
    (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
  
# Get the page through get() method
html_response = requests.get(url=url, headers = headers)
  
# Save the page content as sample.html
with open("sample.html", "w") as html_file:
    html_file.write(html_response.text)


Python3
with open("sample.html", "r") as html_file:
    html = html_file.read()
    json_ = xmltojson.parse(html)


Python3
with open("data.json", "w") as file:
    json.dump(json_, file)


Python3
print(json_)


Python3
import xmltojson
import json
import requests
  
  
# Sample URL to fetch the html page
url = "https://geeksforgeeks-example.surge.sh"
  
# Headers to mimic the browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 \
    (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
  
# Get the page through get() method
html_response = requests.get(url=url, headers = headers)
  
# Save the page content as sample.html
with open("sample.html", "w") as html_file:
    html_file.write(html_response.text)
      
with open("sample.html", "r") as html_file:
    html = html_file.read()
    json_ = xmltojson.parse(html)
      
with open("data.json", "w") as file:
    json.dump(json_, file)
      
print(json_)


  • 获取 HTML 代码并将其保存到文件中。

蟒蛇3

# Sample URL to fetch the html page
url = "https://geeksforgeeks-example.surge.sh"
  
# Headers to mimic the browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 \
    (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
  
# Get the page through get() method
html_response = requests.get(url=url, headers = headers)
  
# Save the page content as sample.html
with open("sample.html", "w") as html_file:
    html_file.write(html_response.text)
  • 使用 parse函数将此 HTML 转换为 JSON。打开 HTML 文件,使用xmltojson模块的解析函数。

蟒蛇3

with open("sample.html", "r") as html_file:
    html = html_file.read()
    json_ = xmltojson.parse(html)
  • json_变量包含一个 JSON字符串,我们可以将其打印或转储到文件中。

蟒蛇3

with open("data.json", "w") as file:
    json.dump(json_, file)
  • 打印输出。

蟒蛇3

print(json_)

完整代码:



蟒蛇3

import xmltojson
import json
import requests
  
  
# Sample URL to fetch the html page
url = "https://geeksforgeeks-example.surge.sh"
  
# Headers to mimic the browser
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 \
    (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
  
# Get the page through get() method
html_response = requests.get(url=url, headers = headers)
  
# Save the page content as sample.html
with open("sample.html", "w") as html_file:
    html_file.write(html_response.text)
      
with open("sample.html", "r") as html_file:
    html = html_file.read()
    json_ = xmltojson.parse(html)
      
with open("data.json", "w") as file:
    json.dump(json_, file)
      
print(json_)

输出: