Statistics with Python
  • Sequential
    • 01 - Introduction
    • 02 - Preparation
    • 03 - Statistical Basics
    • 04 - Work Process
    • 05 - Comparing the Means
  • Unsequential
    • BetaPERT Distribution
Powered by GitBook
On this page
  • Comparing the means
  • Null hypothesis H0: glucose level in blood is the same as in urine at significance level α=0.05?
  1. Sequential

05 - Comparing the Means

Comparing the means

Read the discussion, Let's talk less, do more.

Null hypothesis H0: glucose level in blood is the same as in urine at significance level α=0.05?

This hypothesis is not about comparing each observations directly, but about comparing their means.

Note: α is the probability of rejecting the null hypothesis when null hypothesis is true.

Let's take the data:

import pandas
from fxy.io.baserow import BaserowIO
API = 'https://data.mindey.com/'
API_TOKEN = 'C9gZFb15B8KAvWGihyr8YOGm62SSDlTD'
io = BaserowIO(API, token=API_TOKEN)

# getting data
df = io.get_table(232)

df.date = pandas.to_datetime(df.date)
df.value = pandas.to_numeric(df.value)

Let's assume that we start with very little data, just two observations of April, 2022 (2022-04):

In [1]: df.set_index('date')['2022-04'][['Variable', 'Unit', 'value']]
Out[1]:
                                Variable    Unit  value
date
2022-04-04 11:00:00+00:00  blood:glucose  mmol/l   5.34
2022-04-04 11:00:00+00:00  urine:glucose  mmol/l   4.12

Obviously, 5.34 ≠ 4.12, but is this true that the means are statisticially sigifnicantly different?

Let's talk probability distributions!

from fxy.s import *
from sympy.stats import

(plot of 4.12 mean distribution of Xi-Square, and plot of 5.34 of Levi)

(plot of 4.12 mean distribution of guassian with low variance, and 5.34 with low variance)

(Ask, which one do is the case?) (Answer: we don't know, the best we can do is to (1) get more data, and (2) take some prior knowledge, (3) start making assumptions.

(A) Road: With no data and no knowledge about the phenomena, we have to start with assumption of distributed normally, because sums of random variables tend to provably converge to Normal distribution in most cases, and a lot of natural phenomena are behaving as sums of random variables, it may be reasonable to start (as a first approximation) with fitting normal distribution, and adjust later.

(B) Road: With prior knowledge, such as, for example, if we know that glucose in general, in all animals, is distributed as Gaussian around a particular value with known variance, or what is the most commonly seen value, minimum and maximum, we could start using other distributions, such as BetaPERT, and doing Bayesian reasoning. The normal distribution, due to its immutable symmetry around the mean and median, is not always the best model for phenomena with asymmetric risks.

The means are 5.34 and 4.12, and standard deviations are () (estimates), and we take assumption that we deal with Gaussians.

Previous04 - Work ProcessNextBetaPERT Distribution

Last updated 2 years ago