3.1.4. Modules for analysis

3.1.4.1. Module hypotest

3.1.4.1.1. Description

Quantitative statistics on multi grouped data. Building proper hypothesis test and quantitative analysis requires some basic knowledge on mathematical statistics.

Hypothesis test module in informatics is mainly in the namespace of info.toolbox.libs.hypotest. For convenience the import from mian entry (like from info.me import hypotest) is also supported.

The prefix hypoi denotes the test required independent data populations, based on which the sizes of all population are unnecessary to be identical. hypoj for joint pairs generally required the sizes of two samples are of the same, intrinsically paired. hypos is simulation methods using random sampling.

hypoi_f

perform one-way ANOVA test among multi-grouped data.

hypoi_t

perform pair-wise independent T test among multi-grouped data.

hypoi_sw

perform Shapiro-Wilk test on each group among multi-grouped data.

hypoi_normality

perform Omnibus Normality test on each group among multi-grouped data.

hypoi_ks

perform Kolmogorov-Smirnov test among multi-grouped data.

hypoi_cvm

perform Cramér-von Mises test among multi-grouped data.

hypoi_ag

perform Alexander Govern test among multi-grouped data.

hypoi_thsd

perform Tukey's range test among multi-grouped data.

hypoi_kw

perform Kruskal-Wallis H-test among multi-grouped data.

hypoi_mood

perform Mood's median and scale test among multi-grouped data.

hypoi_bartlett

perform Bartlett's test among multi-grouped data.

hypoi_levene

perform Levene test among multi-grouped data.

hypoi_fk

perform Fligner-Killeen test among multi-grouped data.

hypoi_ad

perform Anderson-Darling test among multi-grouped data.

hypoi_rank

perform rank sum test among multi-grouped data.

hypoi_es

perform Epps-Singleton test on each possible pairs among multi-grouped data.

hypoi_u

perform Mann–Whitney U test on each possible pairs among multi-grouped data.

hypoi_bm

perform Brunner-Munzel test on each possible pairs among multi-grouped data.

hypoi_ab

perform Ansari-Bradley test on each possible pairs among multi-grouped data.

hypoi_skew

perform skew test on each group among multi-grouped data.

hypoi_kurtosis

perform kurtosis test on each group among multi-grouped data.

hypoi_jb

perform Jarque-Bera test on each group among multi-grouped data.

hypoi_pd

perform Cressie-Read power divergence test on each group among multi-grouped data.

hypoi_chi2

perform Chi-Squared test on each group among multi-grouped data.

hypoj_pearson

compute Pearson correlation coefficient on each possible pairs among multi-grouped data.

hypoj_spearman

compute Spearman correlation coefficient on each possible pairs among multi-grouped data.

hypoj_kendall

compute Kendall's tau correlation coefficient on each possible pairs among multi-grouped data.

hypoj_t

perform pair-wise related T test among multi-grouped data.

hypoj_rank

perform single-rank test among multi-grouped data.

hypoj_friedman

perform Friedman test among multi-grouped data.

hypoj_mgc

perform Multiscale Graph Correlation test on each possible pairs among multi high-dimensional data.

hypos_mc

perform Monte Carlo hypothesis test on each group among multi-grouped data.

hypos_permu

perform Permutation test on each possible permutation of groups among multi-grouped data.

3.1.4.1.2. Docstrings

hypoi_f

perform one-way ANOVA test among multi-grouped data.

Arguments:
Parameters:

data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

Returns:

F statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.83 one-way ANOVA on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_f(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_t

perform pair-wise independent T test among multi-grouped data. statistic uses Equation 4.6.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • equal_var (bool) – tigger to determine whether groups under comparison are of the identical variance; False as default

  • trim (float) – fraction to trim data from two-tails (Yuen’s T test); valid value ranges from 0.0 to 0.5; 0.0 as default

  • permutations (Optional[int]) – \(\mathbb{N}\), number of permutations used for calculating numerical solution for \(p\)-value; 0 or None for analytical solution using t distribution without permutations; None as default

  • random_state (Optional[int]) – random state in Monte Carlo; effective when permutations is activated; None as default

  • nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data; 'propagate' return nan; 'raise' will throw exception; 'omit' will ignore null values; 'propagate' as default

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

Returns:

t statistics and \(p\)-values on pair-wised groups

Return type:

dict

Examples:
Code 3.84 independent t test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_t(data=data)
See also:
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_sw

perform Shapiro-Wilk test on each group among multi-grouped data.

Arguments:
Parameters:

data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

Returns:

shapiro statistic and \(p\)-value via Monte Carlo simulation

Return type:

dict

Examples:
Code 3.85 Shapiro-Wilk test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_sw(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_normality

perform Omnibus Normality test on each group among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data; 'propagate' return nan; 'raise' will throw exception; 'omit' will ignore null values; 'omit' as default

Returns:

statistic and \(p\)-value on each group

Return type:

dict

Examples:
Code 3.86 omnibus test for normality on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_normality(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_ks

perform Kolmogorov-Smirnov test among multi-grouped data. calculating on each group, as well as on pair-wised groups.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • dist (Union[dist, list[dist]]) – distribution pre-defined as criterion (or criteria); rv_frozen object, or list of those objects in scipy; the standard uni-variate gaussian scipy.stats.norm(loc=0, scale=1) as default

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

  • method (Literal['exact', 'asymp', 'auto']) – the method to calculate \(p\)-value; 'exact' uses exact distribution of distribution(s); 'asymp' uses asymptotic distribution(s); 'auto' uses one of the above options; 'auto' as default

  • n_sample (int) – number of samples generated from pre-defined distribution(s); 20 as default

Returns:

statistic and \(p\)-value on each group, and pair-wised groups

Return type:

dict

Examples:
Code 3.87 Kolmogorov-Smirnov test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_ks(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_cvm

perform Cramér-von Mises test among multi-grouped data. calculating on each group, as well as on pair-wised groups.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • dist (Union[dist, list[dist]]) – distribution pre-defined as criterion (or criteria); rv_frozen object, or list of those objects in scipy; the standard uni-variate gaussian scipy.stats.norm(loc=0, scale=1) as default

  • method (Literal['exact', 'asymp', 'auto']) – the method to calculate \(p\)-value; 'exact' uses exact distribution of distribution(s); 'asymp' uses asymptotic distribution(s); 'auto' uses one of the above options; 'auto' as default

Returns:

statistic and \(p\)-value on each group, and pair-wised groups

Return type:

dict

Examples:
Code 3.88 Cramér-von Mises test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_cvm(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_ag

perform Alexander Govern test among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data; 'propagate' return nan; 'raise' will throw exception; 'omit' will ignore null values; 'propagate' as default

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.89 Alexander Govern test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_ag(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_thsd

perform Tukey’s range test among multi-grouped data.

Arguments:
Parameters:

data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

Variables:

~full_return (bool) – if True, low and high of confidence interval will be returned as extra information as well; False as default

Returns:

statistic and \(p\)-value on pair-wised groups

Return type:

dict

Examples:
Code 3.90 Tukey’s range test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_thsd(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_kw

perform Kruskal-Wallis H-test among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data; 'propagate' return nan; 'raise' will throw exception; 'omit' will ignore null values; 'propagate' as default

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.91 Kruskal-Wallis H-test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_kw(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_mood

perform Mood’s median and scale test among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • ties (Literal['below', 'above', 'ignore']) – determines how values equal to the grand median are classified; 'below' and 'above' counts for below and above respectively; 'ignore' will not count; 'below' as default

  • power_lambda (float) – number used for power divergence; 1.0 as default for Pearson’s chi-squared statistic

  • nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data; 'propagate' return nan; 'raise' will throw exception; 'omit' will ignore null values; 'propagate' as default

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

Variables:

~full_return (bool) – if True, median and contingency table will be returned as extra information from median test as well; False as default

Returns:

statistics and \(p\)-values for median and scale tests

Return type:

dict

Examples:
Code 3.92 Mood’s median and scale test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_mood(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_bartlett

perform Bartlett’s test among multi-grouped data.

Arguments:
Parameters:

data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.93 Bartlett’s test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_bartlett(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_levene

perform Levene test among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • center (Literal['mean', 'median', 'trimmed']) – the referenced center to determine absolute distance for each observation; 'mean' uses mean; 'median' uses median; 'trimmed' uses the mean calculate from trimmed data; 'median' as default

  • proportiontocut (float) – fraction from leftmost and rightmost to be trimmed; effective when center is 'trimmed'; valid value ranges from 0.0 to 0.5; 0.05 as default

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.94 Levene test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_levene(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_fk

perform Fligner-Killeen test among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • center (Literal['mean', 'median', 'trimmed']) – the referenced center to determine absolute distance for each observation; 'mean' uses mean; 'median' uses median; 'trimmed' uses the mean calculate from trimmed data; 'median' as default

  • proportiontocut (float) – fraction from leftmost and rightmost to be trimmed; effective when center is 'trimmed'; valid value ranges from 0.0 to 0.5; 0.05 as default

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.95 Fligner-Killeen test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_fk(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_ad

perform Anderson-Darling test among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • midrank (bool) – type of Anderson-Darling test; True for to continuous and discrete distributions; False for right side empirical distribution; True as default

Variables:

~full_return (bool) – if True, critical values in different significance levels will be returned as extra information; False as default

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.96 Anderson-Darling test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_ad(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_rank

perform rank sum test among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.97 Rank sum test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_rank(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_es

perform Epps-Singleton test on each possible pairs among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • es_t (tuple[float, float]) – where the characteristic function to be evaluated; (0.4, 0.8) as default

Returns:

statistic and \(p\)-value on each group

Return type:

dict

Examples:
Code 3.98 Epps-Singleton on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_es(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_u

perform Mann–Whitney U test on each possible pairs among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • method (Literal['asymptotic', 'exact', 'auto']) – the method to calculate \(p\)-value; 'exact' uses exact distribution of distribution(s); 'asymptotic' uses approximate distribution(s); 'auto' uses one of the above options; 'auto' as default to choose 'exact' when one of the samples is no greater than 8 and no ties, otherwise 'asymptotic'

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

  • u_continuity (bool) – whether apply continuity correction; effective when 'method' is 'asymptotic'; default is True

Returns:

statistic and \(p\)-value on each group

Return type:

dict

Examples:
Code 3.99 Mann–Whitney U test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_u(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_bm

perform Brunner-Munzel test on each possible pairs among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data; 'propagate' return nan; 'raise' will throw exception; 'omit' will ignore null values; 'propagate' as default

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

  • bm_dis (Literal['t', 'normal']) – determine \(p\)-value calculated from t or normal distribution; "t" as default

Returns:

statistic and \(p\)-value on each group

Return type:

dict

Examples:
Code 3.100 Brunner-Munzel test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_bm(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_ab

perform Ansari-Bradley test on each possible pairs among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

Returns:

statistic and \(p\)-value on each group

Return type:

dict

Examples:
Code 3.101 Ansari-Bradley test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_ab(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_skew

perform skew test on each group among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data; 'propagate' return nan; 'raise' will throw exception; 'omit' will ignore null values; 'propagate' as default

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

Returns:

statistic and \(p\)-value on each group

Return type:

dict

Examples:
Code 3.102 skew test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_skew(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_kurtosis

perform kurtosis test on each group among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data; 'propagate' return nan; 'raise' will throw exception; 'omit' will ignore null values; 'propagate' as default

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

Returns:

statistic and \(p\)-value on each group

Return type:

dict

Examples:
Code 3.103 kurtosis test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_kurtosis(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_jb

perform Jarque-Bera test on each group among multi-grouped data.

Arguments:
Parameters:

data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

Returns:

statistic and \(p\)-value on each group

Return type:

dict

Examples:
Code 3.104 Jarque-Bera test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_jb(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_pd

perform Cressie-Read power divergence test on each group among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • f_exp (Iterable[int]) – expected frequencies of all categories; None as default for all equal for all categories

  • ddf (int) – number to be subtracted from degree of freedom; 0 as default uses degree of freedom \(k-1\) where \(k\) is the number of all observations

  • pd_lambda (Numeric) – real-value to determine the power of statistic; 1 as default for Pearson version

Returns:

statistic and \(p\)-value on each group

Return type:

dict

Examples:
Code 3.105 Power divergence on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_pd(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoi_chi2

perform Chi-Squared test on each group among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • f_exp (Iterable[int]) – expected frequencies of all categories; None as default for all equal for all categories

Returns:

statistic and \(p\)-value on each group

Return type:

dict

Examples:
Code 3.106 Chi-Squared test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoi_chi2(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoj_pearson

compute Pearson correlation coefficient on each possible pairs among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.107 Pearson correlation coefficient on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoj_pearson(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoj_spearman

compute Spearman correlation coefficient on each possible pairs among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data; 'propagate' return nan; 'raise' will throw exception; 'omit' will ignore null values; 'propagate' as default

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.108 Spearman correlation coefficient on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoj_spearman(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoj_kendall

compute Kendall’s tau correlation coefficient on each possible pairs among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data; 'propagate' return nan; 'raise' will throw exception; 'omit' will ignore null values; 'propagate' as default

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

  • method (Literal['asymptotic', 'exact', 'auto']) – the method to calculate \(p\)-value; 'exact' uses exact distribution of distribution(s); 'approx' uses double probability of single-tailed to approximate that of two-tailed; 'asymp' uses asymptotic distribution(s); 'auto' uses one of the above options; 'auto' as default

  • kendall_tau (Literal['b', 'c', 'w']) – determine type of :math:` au` to be calculated; 'b' uses Kendall :math:` au`; 'c' uses Stuart’s :math:` au`; 'w' will activate weighted :math:` au`.

  • rank (bool) – whether using decreasing lexicographical rank; if False, index of element will be processed as rank; effective when weighted :math:` au` is activated; True as default

  • weigher (Optional[Callable]) – trigger to determine whether using weight when computing rank \(r\); acceptable mapping must be able to convert positive integer into weight (e.g. \(f(r) = (1+r)^{-1}\)); None as default to use no weight

  • additive (bool) – determine how weight be calculated on statistic; if True, weight will be processed as item to be added; Otherwise the item to be multiplied; effective when weighted :math:` au` is activated; True as default;

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.109 Kendall’s tau correlation coefficient on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoj_kendall(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoj_t

perform pair-wise related T test among multi-grouped data. statistic uses Equation 4.9.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • nan_policy (Literal['propagate', 'raise', 'omit']) – strategy for null value-contained in data; 'propagate' return nan; 'raise' will throw exception; 'omit' will ignore null values; 'propagate' as default

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

Variables:

~full_return (bool) – if True, degree of freedom will be returned as extra information as well; False as default

Returns:

t statistics and \(p\)-values on pair-wised groups

Return type:

dict

Examples:
See also:
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoj_rank

perform single-rank test among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • zero_method (Literal['wilcox', 'pratt', 'zsplit']) – method for counting the pairs with equal value; 'wilcox' ignore that cases; 'pratt' only include that cases in ranking process; 'zsplit' include that cases in ranking process and split half-half to positive and negative counts; 'wilcox' as default

  • correction (bool) – Whether apply continuity correction to adjust rank statistic if normal approximation used; False as default

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

  • method (Literal['exact', 'approx', 'auto']) – the method to calculate \(p\)-value; 'exact' uses exact distribution of distribution(s); 'approx' uses approximate distribution(s); 'auto' uses one of the above options; 'auto' as default

Variables:

~full_return (bool) – if True, the \(Z\) statistic will be returned; False as default

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.111 Single rank test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoj_rank(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoj_friedman

perform Friedman test among multi-grouped data.

Arguments:
Parameters:

data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.112 Friedman test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypoj_friedman(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypoj_mgc

perform Multiscale Graph Correlation test on each possible pairs among multi high-dimensional data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • distance_criteria (Callable) – criterion to measure the distance of two element when calculating distance matrix; lambda x, y: np.linalg.norm(x-y, ord=2, axis=0) as default to calculate the Euclidean distance

  • n_resamples (int) – number of resampled permutations to calculate \(p\)-value; 1000 as default

  • random_state (Optional[int]) – random state to control random sample generation; None as default

Variables:

~full_return (bool) – if True, scale map, optimal scales, and random points for null distribution will be returned as extra information as well; False as default

Returns:

statistic and \(p\)-value

Return type:

dict

Examples:
Code 3.113 Multiscale Graph Correlation test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random((10, 5)) for _ in range(3)}

res = ht.hypoj_mgc(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypos_mc

perform Monte Carlo hypothesis test on each group among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • dist (Union[dist, list[dist]]) – distribution(s) predefined; standard uni-variate gaussian norm(loc=0, scale=1) as default

  • n_resamples (int) – number of resampled datapoints generated from predefined distribution(s); 9999 as default

  • agg_statistics (dict[str, Callable]) – dict composed of name and aggregation function mapping to calculate statistic; {'mean': lambda x: numpy.mean(x)} as default

  • batch (Optional[int]) – number of samples used for each call of values in agg_statistics; None as default which equals the n_resamples

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

Variables:

~full_return (bool) – if True, random points for null distribution will be returned as extra information as well; False as default

Returns:

statistic and \(p\)-value on each group

Return type:

dict

Examples:
Code 3.114 Monte Carlo hypothesis test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypos_mc(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

hypos_permu

perform Permutation test on each possible permutation of groups among multi-grouped data.

Arguments:
Parameters:
  • data (dict[str, ndarray]) – dict composed of group names as keywords and corresponding values

  • permu_type (Literal['independent', 'samples', 'pairings']) – permutation type; 'samples' and 'pairings' requires all data to be compared have the same size; 'independent' assume all input data are of independent; 'independent' as default

  • n_resamples (int) – number of resampled datapoints generated from predefined distribution(s); 9999 as default

  • binding_groups (int) – number of groups for each call of test; acceptable value is integer equal or greater than 2; 2 as default

  • agg_statistics (dict[str, Callable]) – dict composed of name and aggregation mapping to calculate statistic; {'std_of_mean': lambda *x: np.std([np.mean(_) for _ in x])} as default

  • batch (Optional[int]) – number of samples used for each call of values in agg_statistics; None as default which equals the n_resamples

  • alternative (Literal['two-sided', 'less', 'greater']) – type of alternative hypothesis \(H_1\); 'two-sided' as default

Variables:

~full_return (bool) – if True, random points for null distribution will be returned as extra information as well; False as default

Returns:

statistic and \(p\)-value on each group

Return type:

dict

Examples:
Code 3.115 Permutation test on multi groups
from info.me import hypotest as ht
import numpy as np
data = {f"group{_+1}": np.random.random(20) for _ in range(3)}

res = ht.hypos_permu(data=data)
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06

3.1.4.2. Module factors

3.1.4.2.1. Description

The module factors will support for scientific experiment design, data exploration, and etc. It is a powerful tool for data exploration, allowing researchers to extract meaningful patterns and relationships from complex datasets. Refer supplementary for its scientific background.

Similarly, the import through entry through info.me is available.

priori_scoring

priori scoring implementation for multi factors analysis.

3.1.4.2.2. Docstrings

priori_scoring

priori scoring implementation for multi factors analysis.

Arguments:
Parameters:
  • data (DataFrame) – table with multi factors as indexing, whose columns are un-ranked

  • constructor (dict[str, list[str]]) – constructor to parse the factors and corresponding levels in indexing of data; dict used factor names as keywords, and list composed of level names as the corresponding value

  • response_dimensions (list[str]) – list composed of factors that sensitive to affect the final numeric; the factor selection should follow the common sense, or expertise in that field

  • inertia_dimensions (Optional[list[str]]) – list composed of factors that no sensitive to affect the final numeric; None as default will automatically the unselected factors based on constructor and response_dimensions

  • measure (Optional[Callable]) – the callback aggregation function to map the rearranged pseudo-tensor to a scalar; None as default to use normality combined with ANOVA to measure how extent the data departure from the priori hypothesis

  • empty_value (Optional[Any]) – value to fill the un-existed factor combinations; the measure function should be capable to deal with this value if use customized method; numpy.nan as default

  • score_output (Optional[bool]) – whether export the final scores for all column names; False as default

Returns:

a dict composed of importance level, and column names (and corresponding scores) in that level

Return type:

dict[str, ndarray]

Examples:
Code 3.116 factor analysis automation using priori scoring algorithm
from info.me import priori_scoring
from itertools import product
import numpy as np
import pandas as pd

cons = {
    'A': ['a1', 'a2'],
    'B': ['b1', 'b2', 'b3'],
    'C': ['c1', 'c2']
}

index = np.repeat(['-'.join(_) for _ in product(*[v for k, v in cons.items()])], 10)
where_c1 = np.array(['c1' in _ for _ in index])
columns = np.array([f"group_{_+1}" for _ in range(20)])
_values = np.random.random((len(index), len(columns)))
values = np.array([vec * 10.8 if c1 else vec * 0.3 for c1, vec in zip(where_c1, _values)])

df = pd.DataFrame(values, index=index, columns=columns)
#            group_1   group_2   group_3  ...  group_18  group_19  group_20
# a1-b1-c1  8.330263  0.224121  6.843401  ...  3.152262  9.911961  7.717418
# a1-b1-c1  5.859479  1.535437  4.032080  ...  8.949758  0.506480  6.763901
# ...            ...       ...       ...  ...       ...       ...       ...
# a2-b3-c2  0.205181  0.175918  0.293796  ...  0.020738  0.017385  0.094473
# a2-b3-c2  0.162649  0.077234  0.133392  ...  0.122661  0.200381  0.172522

res = priori_scoring(data=df, constructor=cons, response_dimensions=['C'], score_output=True)
# {'importance_level_0': array([['group_11', 10.583273152581747]]),  # most discriminative
#  'importance_level_1': array([['group_1', 5.543840398683746],
#                               ['group_2', 6.006970191046672],
#                               ['group_3', 4.691317734172809],
#                               ...}
Logs:

Added in version 0.0.3.

– Created by Chen Zhang; Last updated on 01:34, 2025-09-06


Authors:

Chen Zhang

Version:

0.0.5

Created on:

Jun 30, 2023