📜  使用六边形分箱和等高线图进行探索

📅  最后修改于: 2022-05-13 01:54:22.186000             🧑  作者: Mango

使用六边形分箱和等高线图进行探索

六边形分箱是两个数值变量的图,其中记录分箱成六边形。下面的代码是完成平方英尺与房屋税收评估值之间关系的六边形分箱图。记录被分组到六边形箱中,而不是绘制点,颜色表示该箱中的记录数。要获取使用的 csv 文件,请单击此处。加载库

Python3
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


Python3
data = pd.read_csv("kc_tax.csv")
 
print (data.head())


Python3
print (data.shape)
print ("\n", data.info())


Python3
# Take a subset of the King County, Washington
# Tax data, for Assessed Value for Tax purposes
# < $600, 000 and Total Living Sq. Feet > 100 &
# < 2000
 
data = data.loc[(data['TaxAssessedValue'] < 600000) &
                (data['SqFtTotLiving'] > 100) &
                (data['SqFtTotLiving'] < 2000)]


Python3
# As you can see in the info
# that records are not complete
data['TaxAssessedValue'].isnull().values.any()


Python3
x = data['SqFtTotLiving']
y = data['TaxAssessedValue']
 
fig = sns.jointplot(x, y, kind ="hex",
                    color ="# 4CB391")
 
fig.fig.subplots_adjust(top = 0.85)
 
fig.set_axis_labels('Total Sq.Ft of Living Space',
                    'Assessed Value for Tax Purposes')
 
fig.fig.suptitle('Tax Assessed vs. Total Living Space',
                 size = 18);


Python3
fig2 = sns.kdeplot(x, y, legend = True)
 
plt.xlabel('Total Sq.Ft of Space')
 
plt.ylabel('Assessed Value for Taxes')
 
fig2.figure.suptitle('Tax Assessed vs. Total Living', size = 16);


加载数据中

Python3

data = pd.read_csv("kc_tax.csv")
 
print (data.head())

输出:

TaxAssessedValue  SqFtTotLiving  ZipCode
0               NaN           1730  98117.0
1          206000.0           1870  98002.0
2          303000.0           1530  98166.0
3          361000.0           2000  98108.0
4          459000.0           3150  98108.0

数据信息

Python3

print (data.shape)
print ("\n", data.info())

输出:

(498249, 3)


RangeIndex: 498249 entries, 0 to 498248
Data columns (total 3 columns):
TaxAssessedValue    497511 non-null float64
SqFtTotLiving       498249 non-null int64
ZipCode             467900 non-null float64
dtypes: float64(2), int64(1)
memory usage: 11.4 MB

选择数据

Python3

# Take a subset of the King County, Washington
# Tax data, for Assessed Value for Tax purposes
# < $600, 000 and Total Living Sq. Feet > 100 &
# < 2000
 
data = data.loc[(data['TaxAssessedValue'] < 600000) &
                (data['SqFtTotLiving'] > 100) &
                (data['SqFtTotLiving'] < 2000)]

检查空值

Python3

# As you can see in the info
# that records are not complete
data['TaxAssessedValue'].isnull().values.any()

输出:

False

代码 #1:六边形分箱

Python3

x = data['SqFtTotLiving']
y = data['TaxAssessedValue']
 
fig = sns.jointplot(x, y, kind ="hex",
                    color ="# 4CB391")
 
fig.fig.subplots_adjust(top = 0.85)
 
fig.set_axis_labels('Total Sq.Ft of Living Space',
                    'Assessed Value for Tax Purposes')
 
fig.fig.suptitle('Tax Assessed vs. Total Living Space',
                 size = 18);

输出: 等高线图:等高线图是一条曲线,两个变量的函数沿该曲线具有恒定值。它是函数f(x,y) 的三维图的平面截面,平行于 x,y 平面。等高线连接给定水平以上等高(高度)的点。等高线图是在下面的代码中说明的地图。等高线图的等高线间隔是连续等高线之间的高程差。代码 #2:等高线图

Python3

fig2 = sns.kdeplot(x, y, legend = True)
 
plt.xlabel('Total Sq.Ft of Space')
 
plt.ylabel('Assessed Value for Taxes')
 
fig2.figure.suptitle('Tax Assessed vs. Total Living', size = 16);

输出: