一. 什么是z检验
当总体服从正太分布 N ( μ , δ 2 ) N(\mu, \delta^2) N(μ,δ2)时,统计量 z = X ‾ − μ δ / n z = \frac{\overline{X} - \mu}{\delta/\sqrt{n}} z=δ/nX−μ服从标准正太分布,因此可以利用该统计量对样本均值 X ‾ \overline{X} X进行假设检验。这种检验方法称为 z z z检验法。
适用条件:
- 总体服从正态分布且方差已知,此时统计量 z = X ‾ − μ δ / n z = \frac{\overline{X} - \mu}{\delta/\sqrt{n}} z=δ/nX−μ
- 总体的方差未知,但样本量足够大,一般样本容量 n n n需要大于30,此时统计量 z = X ‾ − μ S / n z = \frac{\overline{X} - \mu}{S/\sqrt{n}} z=S/nX−μ
δ \delta δ为总体标准差, S S S为样本标准差, μ \mu μ为总体均值, X ‾ \overline{X} X为样本均值。
二. 常见z检验的实现
1. 单样本双边检验
假设某车间用一台包装机包装葡萄糖。已知每袋糖的净重是一个随机变量,且服从标准差为 0.015 kg 的正态分布。某日随机抽取它所包装的9袋糖,称得净重为(kg):
0.497 , 0.506 , 0.518 , 0.524 , 0.498 , 0.511 , 0.520 , 0.515 , 0.512 0.497,0.506,0.518,0.524,0.498,0.511,0.520,0.515,0.512 0.497,0.506,0.518,0.524,0.498,0.511,0.520,0.515,0.512 问每袋糖的净重的均值 μ \mu μ是不是0.5kg?
原假设: H 0 : μ = 0.5 k g H_0: \mu = 0.5kg H0:μ=0.5kg
备则假设: H 1 : μ ≠ 0.5 k g H_1: \mu \neq 0.5kg H1:μ=0.5kg
从备则假设的形式可以得知总体的实际均值可能大于0.5kg,也可能小于0.5kg,因此它是一个双边检验,置信水平 α \alpha α = 0.05。
代码实现:
import math
import numpy as np
from scipy.stats import norm
if __name__ == '__main__':
# 总体标准差
std = 0.015
# 置信水平
alpha = 0.05
sample = [0.497,0.506,0.518,0.524,0.498,0.511,0.520,0.515,0.512]
# 样本均值
sample_mean = np.mean(sample)
# 统计量
z_statistics = (sample_mean - 0.5) / (std / math.sqrt(len(sample)))
z_left_value = norm(loc=0, scale=1).ppf(alpha / 2)
z_right_value = norm(loc=0, scale=1).ppf(1 - alpha / 2)
print("z_statistics: ", round(z_statistics, 2))
print("z_left_value: ", round(z_left_value, 2), ", z_right_value: ", round(z_right_value, 2))
# 计算p值
pval = norm(loc=0, scale=1).sf(abs(z_statistics)) * 2
if z_statistics > z_right_value or z_statistics < z_left_value:
print("reject null hypothesis, p value is: ", round(pval, 2))
else:
print("not reject null hypothesis, p value is: ", round(pval, 2))
运行结果:
z_statistics: 2.24
z_left_value: -1.96 , z_right_value: 1.96
reject null hypothesis, p value is: 0.02
2. 单样本单边检验
还是以上面葡萄糖净重的例子,原假设 H 0 : μ = 0.5 k g H_0: \mu = 0.5 kg H0:μ=0.5kg保持不变,将备则假设修改为: H 1 : μ > 0.5 k g H_1: \mu > 0.5kg H1:μ>0.5kg,此时该假设检验就变成了右边检验。置信水平 α = 0.05 \alpha = 0.05 α=0.05。
代码实现:
import math
import numpy as np
from scipy.stats import norm
if __name__ == '__main__':
# 总体标准差
std = 0.015
# 置信水平
alpha = 0.05
sample = [0.497,0.506,0.518,0.524,0.498,0.511,0.520,0.515,0.512]
# 样本均值
sample_mean = np.mean(sample)
# 统计量
z_statistics = (sample_mean - 0.5) / (std / math.sqrt(len(sample)))
z_right_value = norm(loc=0, scale=1).ppf(1 - alpha)
print("z_statistics: ", round(z_statistics, 2))
print("z_right_value: ", round(z_right_value, 2))
# 计算p值
pval = norm(loc=0, scale=1).sf(abs(z_statistics))
if z_statistics > z_right_value:
print("reject null hypothesis, p value is: ", round(pval, 2))
else:
print("not reject null hypothesis, p value is: ", round(pval, 2))
运行结果:
z_statistics: 2.24
z_right_value: 1.64
reject null hypothesis, p value is: 0.01