📜  使用Python统计

📅  最后修改于: 2022-05-13 01:54:18.361000             🧑  作者: Mango

使用Python统计

一般来说,统计是收集数据、制表和解释数值数据的方法。它是应用数学的一个领域,涉及数据收集分析、解释和展示。通过统计,我们可以看到如何使用数据来解决复杂的问题。

在本教程中,我们将学习使用Python解决统计问题,还将了解其背后的概念。让我们首先了解一些在整篇文章中都很有用的概念。

注意:我们将在Python提供的统计模块的帮助下介绍描述性统计。

了解描述性统计

通俗地说,描述性统计一般是指借助图表、表格、Excel文件等具有代表性的方法对数据进行描述。找到一些未来的趋势。描述和总结单个变量称为单变量分析。描述两个变量之间的统计关系称为双变量分析。描述多个变量之间的统计关系称为多变量分析。

有两种类型的描述性统计——



  • 集中趋势的测度
  • 可变性的测量

描述性统计的类型

集中趋势的测度

集中趋势度量是试图描述整个数据集的单个值。集中趋势的三个主要特征——

  • 意思
  • 中位数
    • 中低
    • 中高
  • 模式

集中趋势的测度

意思

它是观察值的总和除以观察值的总数。它也被定义为平均值,即总和除以计数。

Mean (\overline{x}) = \frac{\sum{x}}{n}

均值() 函数返回在其参数中传递的数据的平均值或平均值。如果传递的参数为空,则引发StatisticsError

例子:

Python3
# Python code to demonstrate the working of
# mean()
 
# importing statistics to handle statistical
# operations
import statistics
 
# initializing list
li = [1, 2, 3, 3, 2, 2, 2, 1]
 
# using mean() to calculate average of list
# elements
print ("The average of list values is : ",end="")
print (statistics.mean(li))


Python3
# Python code to demonstrate the
# working of median() on various
# range of data-sets
 
# importing the statistics module
from statistics import median
 
# Importing fractions module as fr
from fractions import Fraction as fr
 
# tuple of positive integer numbers
data1 = (2, 3, 4, 5, 7, 9, 11)
 
# tuple of floating point values
data2 = (2.4, 5.1, 6.7, 8.9)
 
# tuple of fractional numbers
data3 = (fr(1, 2), fr(44, 12),
        fr(10, 3), fr(2, 3))
 
# tuple of a set of negative integers
data4 = (-5, -1, -12, -19, -3)
 
# tuple of set of positive
# and negative integers
data5 = (-1, -2, -3, -4, 4, 3, 2, 1)
 
# Printing the median of above datasets
print("Median of data-set 1 is % s" % (median(data1)))
print("Median of data-set 2 is % s" % (median(data2)))
print("Median of data-set 3 is % s" % (median(data3)))
print("Median of data-set 4 is % s" % (median(data4)))
print("Median of data-set 5 is % s" % (median(data5)))


Python3
# Python code to demonstrate the
# working of median_low()
 
# importing the statistics module
import statistics
 
# simple list of a set of integers
set1 = [1, 3, 3, 4, 5, 7]
 
# Print median of the data-set
 
# Median value may or may not
# lie within the data-set
print("Median of the set is % s"
    % (statistics.median(set1)))
 
# Print low median of the data-set
print("Low Median of the set is % s "
    % (statistics.median_low(set1)))


Python3
# Working of median_high() and median() to
# demonstrate the difference between them.
 
# importing the statistics module
import statistics
 
# simple list of a set of integers
set1 = [1, 3, 3, 4, 5, 7]
 
# Print median of the data-set
 
# Median value may or may not
# lie within the data-set
print("Median of the set is %s"
    % (statistics.median(set1)))
 
# Print high median of the data-set
print("High Median of the set is %s "
    % (statistics.median_high(set1)))


Python3
# Python code to demonstrate the
# working of mode() function
# on a various range of data types
 
# Importing the statistics module
from statistics import mode
 
# Importing fractions module as fr
# Enables to calculate harmonic_mean of a
# set in Fraction
from fractions import Fraction as fr
 
# tuple of positive integer numbers
data1 = (2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 7)
 
# tuple of a set of floating point values
data2 = (2.4, 1.3, 1.3, 1.3, 2.4, 4.6)
 
# tuple of a set of fractional numbers
data3 = (fr(1, 2), fr(1, 2), fr(10, 3), fr(2, 3))
 
# tuple of a set of negative integers
data4 = (-1, -2, -2, -2, -7, -7, -9)
 
# tuple of strings
data5 = ("red", "blue", "black", "blue", "black", "black", "brown")
 
 
# Printing out the mode of the above data-sets
print("Mode of data set 1 is % s" % (mode(data1)))
print("Mode of data set 2 is % s" % (mode(data2)))
print("Mode of data set 3 is % s" % (mode(data3)))
print("Mode of data set 4 is % s" % (mode(data4)))
print("Mode of data set 5 is % s" % (mode(data5)))


Python3
# Sample Data
arr = [1, 2, 3, 4, 5]
 
#Finding Max
Maximum = max(arr)
# Finding Min
Minimum = min(arr)
 
# Difference Of Max and Min
Range = Maximum-Minimum    
print("Maximum = {}, Minimum = {} and Range = {}".format(
    Maximum, Minimum, Range))


Python3
# Python code to demonstrate variance()
# function on varying range of data-types
 
# importing statistics module
from statistics import variance
 
# importing fractions as parameter values
from fractions import Fraction as fr
 
# tuple of a set of positive integers
# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)
 
# tuple of a set of negative integers
sample2 = (-2, -4, -3, -1, -5, -6)
 
# tuple of a set of positive and negative numbers
# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
 
# tuple of a set of fractional numbers
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
           fr(5, 6), fr(7, 8))
 
# tuple of a set of floating point values
sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
 
# Print the variance of each samples
print("Variance of Sample1 is % s " % (variance(sample1)))
print("Variance of Sample2 is % s " % (variance(sample2)))
print("Variance of Sample3 is % s " % (variance(sample3)))
print("Variance of Sample4 is % s " % (variance(sample4)))
print("Variance of Sample5 is % s " % (variance(sample5)))


Python3
# Python code to demonstrate stdev()
# function on various range of datasets
 
# importing the statistics module
from statistics import stdev
 
# importing fractions as parameter values
from fractions import Fraction as fr
 
# creating a varying range of sample sets
# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)
 
# tuple of a set of negative integers
sample2 = (-2, -4, -3, -1, -5, -6)
 
# tuple of a set of positive and negative numbers
# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
 
# tuple of a set of floating point values
sample4 = (1.23, 1.45, 2.1, 2.2, 1.9)
 
# Print the standard deviation of
# following sample sets of observations
print("The Standard Deviation of Sample1 is % s"
      % (stdev(sample1)))
 
print("The Standard Deviation of Sample2 is % s"
      % (stdev(sample2)))
 
print("The Standard Deviation of Sample3 is % s"
      % (stdev(sample3)))
 
 
print("The Standard Deviation of Sample4 is % s"
      % (stdev(sample4)))


输出:



The average of list values is : 2

中位数

它是数据集的中间值。它将数据分成两半。如果数据集中的元素数量是奇数,则中心元素是中位数,如果是偶数,则中位数将是两个中心元素的平均值。

对于奇数:

\frac{n+1}{2}

对于偶数:

\frac{n}{2}, \frac{n}{2}+1

median()函数用于计算中位数,即数据的中间元素。如果传递的参数为空,则引发StatisticsError

例子:

蟒蛇3

# Python code to demonstrate the
# working of median() on various
# range of data-sets
 
# importing the statistics module
from statistics import median
 
# Importing fractions module as fr
from fractions import Fraction as fr
 
# tuple of positive integer numbers
data1 = (2, 3, 4, 5, 7, 9, 11)
 
# tuple of floating point values
data2 = (2.4, 5.1, 6.7, 8.9)
 
# tuple of fractional numbers
data3 = (fr(1, 2), fr(44, 12),
        fr(10, 3), fr(2, 3))
 
# tuple of a set of negative integers
data4 = (-5, -1, -12, -19, -3)
 
# tuple of set of positive
# and negative integers
data5 = (-1, -2, -3, -4, 4, 3, 2, 1)
 
# Printing the median of above datasets
print("Median of data-set 1 is % s" % (median(data1)))
print("Median of data-set 2 is % s" % (median(data2)))
print("Median of data-set 3 is % s" % (median(data3)))
print("Median of data-set 4 is % s" % (median(data4)))
print("Median of data-set 5 is % s" % (median(data5)))

输出:

Median of data-set 1 is 5
Median of data-set 2 is 5.9
Median of data-set 3 is 2
Median of data-set 4 is -5
Median of data-set 5 is 0.0

中低

median_low()函数在元素个数为奇数的情况下返回数据的中位数,但在元素个数为偶数的情况下,返回两个中间元素中较低的一个。如果传递的参数为空,则引发StatisticsError

例子:



蟒蛇3

# Python code to demonstrate the
# working of median_low()
 
# importing the statistics module
import statistics
 
# simple list of a set of integers
set1 = [1, 3, 3, 4, 5, 7]
 
# Print median of the data-set
 
# Median value may or may not
# lie within the data-set
print("Median of the set is % s"
    % (statistics.median(set1)))
 
# Print low median of the data-set
print("Low Median of the set is % s "
    % (statistics.median_low(set1)))

输出:

Median of the set is 3.5
Low Median of the set is 3 

中高

median_high()函数在元素为奇数的情况下返回数据的中位数,但在元素数为偶数的情况下,返回两个中间元素中较高的一个。如果传递的参数为空,则引发StatisticsError

例子:

蟒蛇3

# Working of median_high() and median() to
# demonstrate the difference between them.
 
# importing the statistics module
import statistics
 
# simple list of a set of integers
set1 = [1, 3, 3, 4, 5, 7]
 
# Print median of the data-set
 
# Median value may or may not
# lie within the data-set
print("Median of the set is %s"
    % (statistics.median(set1)))
 
# Print high median of the data-set
print("High Median of the set is %s "
    % (statistics.median_high(set1)))

输出:

Median of the set is 3.5
High Median of the set is 4 

模式

它是给定数据集中出现频率最高的值。如果所有数据点的频率相同,则数据集可能没有众数。此外,如果我们遇到两个或多个具有相同频率的数据点,我们可以拥有不止一种模式。

mode()函数返回出现次数最多的数字。如果传递的参数为空,则引发StatisticsError

例子:

蟒蛇3

# Python code to demonstrate the
# working of mode() function
# on a various range of data types
 
# Importing the statistics module
from statistics import mode
 
# Importing fractions module as fr
# Enables to calculate harmonic_mean of a
# set in Fraction
from fractions import Fraction as fr
 
# tuple of positive integer numbers
data1 = (2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 7)
 
# tuple of a set of floating point values
data2 = (2.4, 1.3, 1.3, 1.3, 2.4, 4.6)
 
# tuple of a set of fractional numbers
data3 = (fr(1, 2), fr(1, 2), fr(10, 3), fr(2, 3))
 
# tuple of a set of negative integers
data4 = (-1, -2, -2, -2, -7, -7, -9)
 
# tuple of strings
data5 = ("red", "blue", "black", "blue", "black", "black", "brown")
 
 
# Printing out the mode of the above data-sets
print("Mode of data set 1 is % s" % (mode(data1)))
print("Mode of data set 2 is % s" % (mode(data2)))
print("Mode of data set 3 is % s" % (mode(data3)))
print("Mode of data set 4 is % s" % (mode(data4)))
print("Mode of data set 5 is % s" % (mode(data5)))

输出:

Mode of data set 1 is 5
Mode of data set 2 is 1.3
Mode of data set 3 is 1/2
Mode of data set 4 is -2
Mode of data set 5 is black

请参阅以下文章以获取有关集中趋势的平均值和度量的详细信息。

  • Python的统计函数 |设置 1(中心位置的平均值和测量值)

可变性的测量

到目前为止,我们已经研究了集中趋势的测度,但仅此不足以描述数据。为了克服这个问题,我们需要可变性度量。可变性的度量被称为数据的传播或我们的数据分布情况。最常见的可变性度量是:



  • 范围
  • 方差
  • 标准差

范围

我们数据集中最大和最小数据点之间的差异称为范围。范围与数据的传播成正比,这意味着范围越大,数据传播越多,反之亦然。

我们可以分别使用max()min()方法计算最大值和最小值。

例子:

蟒蛇3

# Sample Data
arr = [1, 2, 3, 4, 5]
 
#Finding Max
Maximum = max(arr)
# Finding Min
Minimum = min(arr)
 
# Difference Of Max and Min
Range = Maximum-Minimum    
print("Maximum = {}, Minimum = {} and Range = {}".format(
    Maximum, Minimum, Range))

输出:

Maximum = 5, Minimum = 1 and Range = 4

方差

它被定义为与平均值的平均平方偏差。它的计算方法是找出每个数据点与平均值(也称为均值)之间的差异,将它们平方,将所有数据相加,然后除以数据集中存在的数据点数。

\sigma^2=\frac{\sum(x-\mu^2)}{N}

其中 N = 项数

u = 平均值



统计模块提供了方差()方法,该方法在幕后进行所有数学运算。如果传递的参数为空,则引发StatisticsError

例子:

蟒蛇3

# Python code to demonstrate variance()
# function on varying range of data-types
 
# importing statistics module
from statistics import variance
 
# importing fractions as parameter values
from fractions import Fraction as fr
 
# tuple of a set of positive integers
# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)
 
# tuple of a set of negative integers
sample2 = (-2, -4, -3, -1, -5, -6)
 
# tuple of a set of positive and negative numbers
# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
 
# tuple of a set of fractional numbers
sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),
           fr(5, 6), fr(7, 8))
 
# tuple of a set of floating point values
sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
 
# Print the variance of each samples
print("Variance of Sample1 is % s " % (variance(sample1)))
print("Variance of Sample2 is % s " % (variance(sample2)))
print("Variance of Sample3 is % s " % (variance(sample3)))
print("Variance of Sample4 is % s " % (variance(sample4)))
print("Variance of Sample5 is % s " % (variance(sample5)))

输出:

Variance of Sample1 is 15.80952380952381 
Variance of Sample2 is 3.5 
Variance of Sample3 is 61.125 
Variance of Sample4 is 1/45 
Variance of Sample5 is 0.17613000000000006 

标准差

它被定义为方差的平方根。它是通过找到平均值来计算的,然后从平均值中减去每个数字,平均值也称为平均值并对结果求平方。将所有值相加,然后除以平方根后的项数。

\sigma=\sqrt\frac{\sum(x-\mu^2)}{N}

其中 N = 项数

u = 平均值

标准差() 统计模块的方法返回数据的标准偏差。如果传递的参数为空,则引发StatisticsError

例子:

蟒蛇3

# Python code to demonstrate stdev()
# function on various range of datasets
 
# importing the statistics module
from statistics import stdev
 
# importing fractions as parameter values
from fractions import Fraction as fr
 
# creating a varying range of sample sets
# numbers are spread apart but not very much
sample1 = (1, 2, 5, 4, 8, 9, 12)
 
# tuple of a set of negative integers
sample2 = (-2, -4, -3, -1, -5, -6)
 
# tuple of a set of positive and negative numbers
# data-points are spread apart considerably
sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)
 
# tuple of a set of floating point values
sample4 = (1.23, 1.45, 2.1, 2.2, 1.9)
 
# Print the standard deviation of
# following sample sets of observations
print("The Standard Deviation of Sample1 is % s"
      % (stdev(sample1)))
 
print("The Standard Deviation of Sample2 is % s"
      % (stdev(sample2)))
 
print("The Standard Deviation of Sample3 is % s"
      % (stdev(sample3)))
 
 
print("The Standard Deviation of Sample4 is % s"
      % (stdev(sample4)))

输出:

The Standard Deviation of Sample1 is 3.9761191895520196
The Standard Deviation of Sample2 is 1.8708286933869707
The Standard Deviation of Sample3 is 7.8182478855559445
The Standard Deviation of Sample4 is 0.41967844833872525

请参阅以下文章以获取有关变异性度量的详细信息。

  • Python的统计函数 |设置 2(传播度量)