📜  Python – 各国冠状病毒病例的详细信息(1)

📅  最后修改于: 2023-12-03 15:04:11.998000             🧑  作者: Mango

Python – 各国冠状病毒病例的详细信息

在当前全球冠状病毒疫情肆虐的情况下,用Python来获取全球各国冠状病毒病例的详细信息,成为了一个非常有意义的事情。下面我们将介绍如何使用Python来实现这一功能。

1. 环境准备

在开始之前,我们需要先准备好Python环境。可以在官网上下载Python的安装包进行安装,或者使用Anaconda作为Python的环境管理器。

在安装好Python之后,我们需要安装一些必要的Python库,包括requests、beautifulsoup4、pandas、numpy。

!pip install requests
!pip install beautifulsoup4
!pip install pandas
!pip install numpy
2. 获取数据

我们可以从一些公开的数据源获取全球各国冠状病毒病例的详细信息。在这里,我们选择来自世界卫生组织(WHO)的数据。

我们使用requests库来获取WHO的数据:

import requests

url='https://covid19.who.int/WHO-COVID-19-global-table-data.csv'
response=requests.get(url)
data=response.content.decode('utf-8')
print(data)

输出结果:

'Name,WHO Region,Cases - cumulative total,Cases - cumulative total per 1 million population,Cases - newly reported in last 7 days,Cases - newly reported in last 24 hours,Deaths - cumulative total,Deaths - cumulative total per 1 million population,Deaths - newly reported in last 7 days,Deaths - newly reported in last 24 hours,Transmission Classification\nGlobal,,-,,4698703,721486,335101,43,5113,6073,803,,\nUnited States of America,Americas,32271426,97561.83,163212,42290,578746,1749.26,2766,845,Community transmission\nIndia,South-East Asia,21393665,15411.34,2027098,272192,234083,168.89,27169,3858,Clusters of cases\nBrazil,Americas,15407929,72387.93,76434,26979,428256,2013.93,3216,1155,Community transmission\n...

接下来,我们可以对数据进行处理并将其保存为DataFrame格式:

import pandas as pd
import numpy as np
from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')
table = soup.find_all('table')[0]

df = pd.read_html(str(table))[0]
df.drop(columns=['Unnamed: 0'], inplace=True)
df.columns = ['Name', 'WHO Region', 'Cases-cumulative total', 'Cases-cumulative total per 1 million population', 'Cases-newly reported in last 7 days',
                  'Cases-newly reported in last 24 hours', 'Deaths-cumulative total', 'Deaths-cumulative total per 1 million population', 
                  'Deaths-newly reported in last 7 days', 'Deaths-newly reported in last 24 hours', 'Transmission Classification']
df.reset_index(drop=True, inplace=True)

print(df.info())

输出结果:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 238 entries, 0 to 237
Data columns (total 11 columns):
 #   Column                                            Non-Null Count  Dtype  
---  ------                                            --------------  -----  
 0   Name                                              238 non-null    object 
 1   WHO Region                                        238 non-null    object 
 2   Cases-cumulative total                            238 non-null    int64  
 3   Cases-cumulative total per 1 million population   238 non-null    float64
 4   Cases-newly reported in last 7 days               238 non-null    int64  
 5   Cases-newly reported in last 24 hours             238 non-null    int64  
 6   Deaths-cumulative total                           238 non-null    int64  
 7   Deaths-cumulative total per 1 million population  236 non-null    float64
 8   Deaths-newly reported in last 7 days              238 non-null    int64  
 9   Deaths-newly reported in last 24 hours            238 non-null    int64  
 10  Transmission Classification                       235 non-null    object 
dtypes: float64(2), int64(6), object(3)
memory usage: 20.6+ KB

以上代码中,我们使用了pandas和beautifulsoup4库,来将数据转化为DataFrame格式,并进行数据清洗和处理。

3. 从数据中获取信息

我们可以从DataFrame中获取所需信息。比如,获取全球确诊病例数和死亡病例数:

print('全球病例总览:')
print(f"确诊 {df['Cases-cumulative total'].sum()}例")
print(f"死亡 {df['Deaths-cumulative total'].sum()}例")

输出结果:

全球病例总览:
确诊 167457740例
死亡 3443834例

此外,我们也可以利用pandas提供的数据分析工具,来对数据进行分析和可视化。

4. 可视化分析

这里,我们使用matplotlib和seaborn库,来对数据进行可视化,比如绘制全球各地区的累计病例数和死亡病例数的散点图:

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10,6))
sns.scatterplot(x='Cases-cumulative total', y='Deaths-cumulative total', data=df)
plt.title('全球各地区累计病例数和累计死亡数散点图')
plt.show()

输出结果:

image-20210526154529086

此外,我们还可以使用其他的图表来对数据进行可视化,比如饼图、条形图、折线图等。由于数据量较大,并且每天都在持续改变,因此,数据处理和分析需要更多的时间和方法。