变量估计 |设置 2

先决条件：变量估计 |设置 1
与变异性度量相关的术语：

-> Deviation 
-> Variance
-> Standard Deviation
-> Mean Absolute Deviation
-> Meadian Absolute Deviation
-> Order Statistics
-> Range
-> Percentile 
-> Inter-quartile Range

中值绝对偏差：平均绝对偏差、方差和标准偏差（在上一节中讨论）对极端值和异常值不稳健。我们平均偏离中位数的总和。

例子：

Sequence : [2, 4, 6, 8] 
Mean     = 5
Deviation around mean = [-3, -1, 1, 3]

Mean Absolute Deviation = (3 + 1 + 1 + 3)/ 4

Python3

# Median Absolute Deviation
 
import numpy as np
 
def mad(data):
    return np.median(np.absolute(
            data - np.median(data)))
     
Sequence = [2, 4, 10, 6, 8, 11]
 
print ("Median Absolute Deviation : ", mad(Sequence))

Python3

# Percentile
 
import numpy as np
 
     
Sequence = [2, 30, 50, 46, 37, 91]
 
print ("50th Percentile : ", np.percentile(Sequence, 50))
     
print ("60th Percentile : ", np.percentile(Sequence, 60))

Python3

# Inter-Quartile Range
 
import numpy as np
from scipy.stats import iqr
     
Sequence = [2, 30, 50, 46, 37, 91]
 
print ("IQR : ", iqr(Sequence))

Python3

import numpy as np
 
# Inter-Quartile Range
iqr = np.subtract(*np.percentile(Sequence, [75, 25]))
 
print ("\nIQR : ", iqr)

输出：

Median Absolute Deviation :  3.0

订单统计：这种可变性测量方法基于排名（排序）数据的传播。
范围：是订单统计中最基本的度量。它是数据集的最大值和最小值之差。了解数据的传播是件好事，但它对异常值非常敏感。我们可以通过删除极值来使其变得更好。
例子：

Sequence : [2, 30, 50, 46, 37, 91]
Here, 2 and 91 are outliers

Range = 91 - 2 = 89
Range without outliers = 50 - 30 = 20

百分位数：这是衡量数据可变性的一个很好的衡量标准，可以避免异常值。数据中的第 P^个百分位数是这样一个值，即至少 P% 或更少的值小于它，并且至少 (100 – P)% 的值大于 P。
中位数是数据的第 50 个百分位。
例子：

Sequence : [2, 30, 50, 46, 37, 91] 
Sorted   : [2, 30, 37, 46, 50, 91]

50th percentile = (37 + 46) / 2 = 41.5

代码 -

Python3

# Percentile
 
import numpy as np
 
     
Sequence = [2, 30, 50, 46, 37, 91]
 
print ("50th Percentile : ", np.percentile(Sequence, 50))
     
print ("60th Percentile : ", np.percentile(Sequence, 60))

输出：

50th Percentile :  41.5
60th Percentile :  46.0

四分位数间距（IQR）：它适用于排名（排序数据）。它有 3 个四分位数划分数据 - Q1（第 25^个百分位）、Q2（第 50^个百分位）和 Q3（第 75^个百分位）。四分位间距是 Q3 和 Q1 之间的差异。
例子：

Sequence : [2, 30, 50, 46, 37, 91] 
Q1 (25th percentile) : 31.75
Q2 (50th percentile) : 41.5
Q3 (75th percentile) : 49

IQR = Q3 - Q1 = 17.25

代码 – 1

Python3

# Inter-Quartile Range
 
import numpy as np
from scipy.stats import iqr
     
Sequence = [2, 30, 50, 46, 37, 91]
 
print ("IQR : ", iqr(Sequence))

输出：

IQR :  17.25

代码 – 2

Python3

import numpy as np
 
# Inter-Quartile Range
iqr = np.subtract(*np.percentile(Sequence, [75, 25]))
 
print ("\nIQR : ", iqr)

输出：

IQR :  17.25